Re: std.unittests for (final?) review [Update]

2011-01-11 Thread Justin Johansson

On 11/01/11 03:09, Jonathan M Davis wrote:

On Monday 10 January 2011 05:40:39 Justin Johansson wrote:

On 10/01/11 23:29, Jonathan M Davis wrote:

   From the sounds of it, if this code gets voted in, it'll be going into

std.exception.

- Jonathan M Davis


May it be asked by what authority you can say that (ie. as said above)?


Andrei's posts in this thread. Assuming that the code passes the vote (the
deadline for which he set as February 7th), he thinks that std.exception is the
best place for it rather than it being in its own module.


Oh okay.  Thanks for that; I missed Andrei's post in earlier this thread 
but found it now.  Good luck with the voting.  -- Justin


Re: either

2011-01-11 Thread Justin Johansson

On 10/01/11 05:42, Andrei Alexandrescu wrote:

I wrote a simple helper, in spirit with some recent discussions:

// either
struct Either(Ts...)
{
Tuple!Ts data_;
bool opEquals(E)(E e)
{
foreach (i, T; Ts)
{
if (data_[i] == e) return true;
}
return false;
}
}

auto either(Ts...)(Ts args)
{
return Either!Ts(tuple(args));
}

unittest
{
assert(1 == either(1, 2, 3));
assert(4 != either(1, 2, 3));
assert(abac != either(aasd, s));
assert(abac == either(aasd, abac, s));
}

Turns out this is very useful in a variety of algorithms. I just don't
know where in std this helper belongs! Any ideas?


Despite that it may be very useful as you say, personally I think it is 
a fundamental no-no to overload the meaning of == in any manner that 
does not preserve the generally accepted semantics of equality which 
include the notions of reflexivity, symmetry and transitivity**.


**See http://en.wikipedia.org/wiki/Equality_%28mathematics%29

The symmetric and transitive properties of the equality relation imply 
that if (a == c) is true and if (b == c) is true then (a == b) is also true.


In this case the semantics of the overloaded == operator have the 
expressions 1 == either(1, 2, 3) and 2 == either(1, 2, 3) both 
evaluating to true and by implication/expectation (1 == 2).


Clearly though, (1 == 2) evaluates to false in terms of the commonly 
accepted meaning of equality.


Just my 2 cents and I wonder if there some other way of achieving the 
desired functionality of your helper without resorting to overloading 
== and the consequential violation of the commonly held semantics of

equality.

Cheers,
Justin Johansson


Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

2011-01-11 Thread Michel Fortin
On 2011-01-10 22:57:36 -0500, Andrei Alexandrescu 
seewebsiteforem...@erdani.org said:


I've been thinking on how to better deal with Unicode strings. 
Currently strings are formally bidirectional ranges with a 
surreptitious random access interface. The random access interface 
accesses the support of the string, which is understood to hold data in 
a variable-encoded format. For as long as the programmer understands 
this relationship, code for string manipulation can be written with 
relative ease. However, there is still room for writing wrong code that 
looks legit.


Sometimes the best way to tackle a hairy reality is to invite it to the 
negotiation table and offer it promotion to first-class abstraction 
status. Along that vein I was thinking of defining a new range: 
VLERange, i.e. Variable Length Encoding Range. Such a range would have 
the power somewhere in between bidirectional and random access.


The primitives offered would include empty, access to front and back, 
popFront and popBack (just like BidirectionalRange), and in addition 
properties typical of random access ranges: indexing, slicing, and 
length. Note that the result of the indexing operator is not the same 
as the element type of the range, as it only represents the unit of 
encoding.


Seems like a good idea to define things formally.


In addition to these (and connecting the two), a VLERange would offer 
two additional primitives:


1. size_t stepSize(size_t offset) gives the length of the step needed 
to skip to the next element.


2. size_t backstepSize(size_t offset) gives the size of the _backward_ 
step that goes to the previous element.


I like the idea, but I'm not sure about this interface. What's the 
result of stepSize if your range must create two elements from one 
underlying unit? Perhaps in those cases the element type could be an 
array (to return more than one element from one iteration).


For instance, say we have a conversion range taking a Unicode string 
and converting it to ISO Latin 1. The best (lossy) conversion for œ 
is oe (one chararacter to two characters), in this case 'front' could 
simply return oe (two characters) in one iteration, with stepSize 
being the size of the œ code point. In the same conversion process, 
encountering e followed by a combining ´ would return pre-combined 
character é (two characters to one character).



In both cases, offset is assumed to be at the beginning of a logical 
element of the range.


I suspect that a lot of functions in std.string can be written without 
Unicode-specific knowledge just by relying on such an interface. 
Moreover, algorithms can be generalized to other structures that use 
variable-length encoding, such as those used in data compression. (In 
that case, the support would be a bit array and the encoded type would 
be ubyte.)


Applicability to other problems seems like a valuable benefit.



Writing to such ranges is not addressed by this design. Ideas are welcome.


Writing, as in assigning to 'front'? That's not really possible with 
variable-length units as it'd need to shift everything in case of a 
length difference. Or maybe you meant writing as in having an output 
range for variable-length elements... I'm not sure



Adding VLERange would legitimize strings and would clarify their 
handling, at the cost of adding one additional concept that needs to be 
minded. Is the trade-off worthwhile?


In my opinion it's not a trade-off at all, it's a formalization of how 
strings are handled which is better in every regard than a special 
case. I welcome this move very much.



--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/



Re: About std.container.RedBlackTree

2011-01-11 Thread Steven Schveighoffer
On Mon, 10 Jan 2011 18:14:31 -0500, bearophile bearophileh...@lycos.com  
wrote:


I've had to use a search tree, so RedBlackTree was the right data  
structure. It seems to do what I need, so thank you for this useful data  
structure. Some of the things I write here are questions or things that  
show my ignorance about this implementation.


-

Please add some usage examples to this page, this is important and helps  
people reduce a lot the number of experiments to do to use this tree:

http://www.digitalmars.com/d/2.0/phobos/std_container.html#


I will do this, when I have some free time.  If you want to submit some  
examples, I would gladly include them.




-

This doesn't seem to work, it gives a forward reference error:

import std.container: RedBlackTree;
RedBlackTree!int t;
void main() {
t = RedBlackTree!int(1);
}


Grrr... I had issues with forward references (you can see from this  
comment:  
http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/container.d#L4071),  
I thought by reordering the functions I had fixed it, but apparently, it  
resurfaces under certain conditions.


Please vote for that bug.   
http://d.puremagic.com/issues/show_bug.cgi?id=2810


I don't really know what to do about fixing it.  Most likely any 'fix' I  
try will result in some other situation not compiling.  I probably should  
just avoid using auto, not being able to declare a red black tree as a  
global variable is a huge limitation.



-

I need to create an empty tree and add items to it later (where I  
declare a fixed-sized array of trees I don't know the items to add). How  
do you do it?

This code doesn't work:


import std.container: RedBlackTree;
void main() {
auto t = RedBlackTree!int();
t.insert(1);
}


RedBlackTree must be initialized with a constructor.  Otherwise, your root  
node is null.  I chose this path instead of checking for null on every  
function.


I realize the mistake -- you cannot create an empty tree, because you  
cannot have a default constructor.


I have another function that I use to help create trees during unit tests  
because IFTI can be weird.  I will make this function public and always  
present, then you can create an empty tree like this:


auto t = RedBlackTree!int.create();

If Andrei decides eventually that containers should be classes, then this  
problem goes away.


Please bugzillize this



-

Is the tree insert() method documented in the HTML docs?


I thought this would do it, but apparently it doesn't:

http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/container.d#L4457

I will try to make those docs show up.


-

A tree is a kind of set, so instead of insert() I'd like a name like  
add().

(But maybe this is not standard in D).


The function names must be consistent across containers, because the point  
is that complexity and semantic requirements are attached to the function  
name.  The function names were decided long ago by Andrei, and I don't  
think insert is a bad name (I believe std::set and std::map in C++ STL  
uses insert).



-

In theory an helper redBlackTree() function allows to write just this,  
with no need to write types:


redBlackTree(1, 2, 3)


Yes, this should be done.  Please make a bugzilla report.  In fact, this  
can extend to all std.container types.



-

I have tried to use printTree(), but I have failed. I don't know what to  
give to it and the docs don't say that it requires -unittest


If it's a private debug function then there's no need to give it a ddoc  
comment.


It is a private debug function, only enabled when version = doRBChecks is  
enabled.  When developing the red black node, the red-black algorithms to  
fix the tree are very complex and error prone to write.  This function  
basically printed the tree layout in a horribly ugly fashion when the  
red-black properties were not preserved.  It helped me find bugs, but is  
mostly no-longer needed unless I try some more optimizations.


Please ignore the function.  I will make sure the comment is not ddoc'd.



-

I've seen that the tree doesn't seem to contain a length. Using  
walkLength is an option, but a possible idea is to replace:

struct RedBlackTree(T,alias less = a  b,bool allowDuplicates = false)

With:
struct RedBlackTree(T, alias less=a  b, bool allowDuplicates=false,  
bool withLength=false)


The reason for this is to keep it a reference-type (pImpl style), but I  
realize that I can easily fix this (I can just make the root node contain  
a length field).  Please file a bugzilla to add length.



-

If you need to add many value nodes quickly to the tree a memory pool  
may speed up mass allocation. This has some disadvantages too.


I have done this in dcollections, and it helps immensely in node-based  
containers.  It is not 

Re: filling an array of structures

2011-01-11 Thread Steven Schveighoffer
On Tue, 11 Jan 2011 00:39:55 -0500, Brad  
brad.lanam.comp_nos...@nospam_gmail.com wrote:



Given an array of structures that you need to populate.
Also assume the structure is quite large and has many
elements to fill in.

S s[];
while (something) {
  s.length += 1;
  auto sp = s[$-1];   // method 1
  sp.a = 1;
  ...
  with (s[$-1]) {   // method 2
a = 1;
  }
  ...
  foreach (ref sp; s[$-1..$]) {  // method 3
sp.a = 1;
  }
}

I don't mind 'with' statements, but they have a readability and
maintenance problem if their scope is large.  The reader would have
to be aware of the context of the structure and the local variables,
whereas 'sp.a' is self documenting.

method 3 is fine, and provides me with a reference to s[$-1],
but I'd really like to have:
   auto sp = ref s[$-1];  // possible method 4
where sp is a reference, but no pointer arithmetic can be done on it.

Another alternative would be runtime aliases.
   alias s[$-1] as sp;
Or
   sp = with (s[$-1]); // I don't much like this syntax...

In the meantime, I'll go with method 1.



What about:

S sp;
sp.a = 1;
s ~= sp;

Or if you have a constructor for S, or a is the only member in it:

s ~= S(1);

-Steve


Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

2011-01-11 Thread Steven Schveighoffer
On Mon, 10 Jan 2011 22:57:36 -0500, Andrei Alexandrescu  
seewebsiteforem...@erdani.org wrote:


I've been thinking on how to better deal with Unicode strings. Currently  
strings are formally bidirectional ranges with a surreptitious random  
access interface. The random access interface accesses the support of  
the string, which is understood to hold data in a variable-encoded  
format. For as long as the programmer understands this relationship,  
code for string manipulation can be written with relative ease. However,  
there is still room for writing wrong code that looks legit.


Sometimes the best way to tackle a hairy reality is to invite it to the  
negotiation table and offer it promotion to first-class abstraction  
status. Along that vein I was thinking of defining a new range:  
VLERange, i.e. Variable Length Encoding Range. Such a range would have  
the power somewhere in between bidirectional and random access.


The primitives offered would include empty, access to front and back,  
popFront and popBack (just like BidirectionalRange), and in addition  
properties typical of random access ranges: indexing, slicing, and  
length. Note that the result of the indexing operator is not the same as  
the element type of the range, as it only represents the unit of  
encoding.


In addition to these (and connecting the two), a VLERange would offer  
two additional primitives:


1. size_t stepSize(size_t offset) gives the length of the step needed to  
skip to the next element.


2. size_t backstepSize(size_t offset) gives the size of the _backward_  
step that goes to the previous element.


In both cases, offset is assumed to be at the beginning of a logical  
element of the range.


I suspect that a lot of functions in std.string can be written without  
Unicode-specific knowledge just by relying on such an interface.  
Moreover, algorithms can be generalized to other structures that use  
variable-length encoding, such as those used in data compression. (In  
that case, the support would be a bit array and the encoded type would  
be ubyte.)


Writing to such ranges is not addressed by this design. Ideas are  
welcome.


Adding VLERange would legitimize strings and would clarify their  
handling, at the cost of adding one additional concept that needs to be  
minded. Is the trade-off worthwhile?


While this makes it possible to write algorithms that only accept  
VLERanges, I don't think it solves the major problem with strings -- they  
are treated as arrays by the compiler.


I'd also rather see an indexing operation return the element type, and  
have a separate function to get the encoding unit.  This makes more sense  
for generic code IMO.


I noticed you never commented on my proposed string type...

That reminds me, I should update with suggested changes and re-post it.

-Steve


Re: eliminate junk from std.string?

2011-01-11 Thread Ary Borenszweig
Hi Andrei,

It looks nice. Just a small comment: in many of your comments you use words that
not all of us might now. For instance: sans. I happen to know it because I
studied French, but otherwise I wouldn't know that. I just showed that phrase 
to a
colleague here in Argentina and he didn't understand it. He thought it maybe 
meant
since. Maybe sans and in lieu are memes there in the USA, but not
everywhere. So please, stick with English. :-)


Re: eliminate junk from std.string?

2011-01-11 Thread Ary Borenszweig
Oh, one more thing: can the names be consistent?

inpattern
countChars
expandtabs
chompPrefix
toupper
toupperInPlace ??

If this can't be done for backwards compatibility maybe you can make alias for 
the
previous ones.

Also:

stripl
stripr
strip

Strips *l*eading and *t*railing whitespaces...

It took me some time to notice that it was strip*r* (for right), but the comment
says trailing, and I never think of remove right space, always remove
trailing spaces (like in the comment!). So why not name that function stript?


Re: eliminate junk from std.string?

2011-01-11 Thread Max Samukha

On 01/11/2011 04:34 PM, Ary Borenszweig wrote:

Oh, one more thing: can the names be consistent?

inpattern
countChars
expandtabs
chompPrefix
toupper
toupperInPlace ??

If this can't be done for backwards compatibility maybe you can make alias for 
the
previous ones.

Also:

stripl
stripr
strip

Strips *l*eading and *t*railing whitespaces...


stripLeft, stripRight

Anyway, the necessity for super-cryptic abbreviated names doesn't exist 
any more. Maybe, they are justified for very frequently used stuff but 
stripl/stripr is not the case.




It took me some time to notice that it was strip*r* (for right), but the comment
says trailing, and I never think of remove right space, always remove
trailing spaces (like in the comment!). So why not name that function stript?




Re: eliminate junk from std.string?

2011-01-11 Thread Ary Borenszweig
Yes, what I meant was that the names are stripl and stripr yet the description 
of
those functions are strip leading and strip trailing... at least put strip left
and string right on the description so it matches the names.


Re: eliminate junk from std.string?

2011-01-11 Thread Max Samukha

On 01/11/2011 05:36 PM, Ary Borenszweig wrote:

Yes, what I meant was that the names are stripl and stripr yet the description 
of
those functions are strip leading and strip trailing... at least put strip left
and string right on the description so it matches the names.


Sorry for misunderstanding.

I don't think that the description needs to match the names literally. 
However, I would aviod trailing and leading, because in RTL 
environments they can have the opposite meaning.


Re: eliminate junk from std.string?

2011-01-11 Thread Andrei Alexandrescu

On 1/11/11 6:29 AM, Ary Borenszweig wrote:

Hi Andrei,

It looks nice. Just a small comment: in many of your comments you use words that
not all of us might now. For instance: sans. I happen to know it because I
studied French, but otherwise I wouldn't know that. I just showed that phrase 
to a
colleague here in Argentina and he didn't understand it. He thought it maybe 
meant
since. Maybe sans and in lieu are memes there in the USA, but not
everywhere. So please, stick with English. :-)


Okay. I think sans is Walter's...

Andrei


Re: eliminate junk from std.string?

2011-01-11 Thread Steven Schveighoffer
On Tue, 11 Jan 2011 11:39:11 -0500, Andrei Alexandrescu  
seewebsiteforem...@erdani.org wrote:



On 1/11/11 6:29 AM, Ary Borenszweig wrote:

Hi Andrei,

It looks nice. Just a small comment: in many of your comments you use  
words that
not all of us might now. For instance: sans. I happen to know it  
because I
studied French, but otherwise I wouldn't know that. I just showed that  
phrase to a
colleague here in Argentina and he didn't understand it. He thought it  
maybe meant

since. Maybe sans and in lieu are memes there in the USA, but not
everywhere. So please, stick with English. :-)


Okay. I think sans is Walter's...


sans is in the english dictionary:

http://www.merriam-webster.com/dictionary/sans

According to that reference, Shakespeare used it :)  Don't think you can  
get more English than that...


BTW, it would be impossible to phrase everything so everyone who has their  
specific dialect of English would understand it, I don't think there's  
much sense in worrying about it.


That being said, using 'without' instead of 'sans' is probably fine.

-Steve


Re: eliminate junk from std.string?

2011-01-11 Thread Andrei Alexandrescu

On 1/11/11 6:34 AM, Ary Borenszweig wrote:

Oh, one more thing: can the names be consistent?

inpattern
countChars
expandtabs
chompPrefix
toupper
toupperInPlace ??

If this can't be done for backwards compatibility maybe you can make alias for 
the
previous ones.


The names are for compatibility with... other languages :o|.


Also:

stripl
stripr
strip

Strips *l*eading and *t*railing whitespaces...

It took me some time to notice that it was strip*r* (for right), but the comment
says trailing, and I never think of remove right space, always remove
trailing spaces (like in the comment!). So why not name that function stript?


Same thing. These names are imported from other languages.


Andrei


Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

2011-01-11 Thread Andrei Alexandrescu

On 1/11/11 5:30 AM, Steven Schveighoffer wrote:

While this makes it possible to write algorithms that only accept
VLERanges, I don't think it solves the major problem with strings --
they are treated as arrays by the compiler.


Except when they're not - foreach with dchar...


I'd also rather see an indexing operation return the element type, and
have a separate function to get the encoding unit. This makes more sense
for generic code IMO.


But that's neither here nor there. That would return the logical element 
at a physical position. I am very doubtful that much generic code could 
work without knowing they are in fact dealing with a variable-length 
encoding.



I noticed you never commented on my proposed string type...

That reminds me, I should update with suggested changes and re-post it.


To be frank, I think it didn't mark a visible improvement. It solved 
some problems and brought others. There was disagreement over the 
offered primitives and their semantics.


That being said, it's good you are doing this work. In the best case, 
you could bring a compelling abstraction to the table. In the worst, 
you'll become as happy about D's strings as I am :o).



Andrei



Re: eliminate junk from std.string?

2011-01-11 Thread Nick Sabalausky
Andrei Alexandrescu seewebsiteforem...@erdani.org wrote in message 
news:igi18o$e5...@digitalmars.com...
 On 1/11/11 6:34 AM, Ary Borenszweig wrote:
 Oh, one more thing: can the names be consistent?

 inpattern
 countChars
 expandtabs
 chompPrefix
 toupper
 toupperInPlace ??

 If this can't be done for backwards compatibility maybe you can make 
 alias for the
 previous ones.

 The names are for compatibility with... other languages :o|.


Would that other language be Walterish or C?

If C, it's not like using the wrong case will suddendly change the semantics 
of the function. And if the worry is other non-phobos functions that might 
have the old C-style name (but different semantics), then Ary's suggestion 
of compatibly-named alases would take care of that.




Re: eliminate junk from std.string?

2011-01-11 Thread Nick Sabalausky
Steven Schveighoffer schvei...@yahoo.com wrote in message 
news:op.vo5kspmfeav...@steve-laptop...
 On Tue, 11 Jan 2011 11:39:11 -0500, Andrei Alexandrescu 
 seewebsiteforem...@erdani.org wrote:

 On 1/11/11 6:29 AM, Ary Borenszweig wrote:
 Hi Andrei,

 It looks nice. Just a small comment: in many of your comments you use 
 words that
 not all of us might now. For instance: sans. I happen to know it 
 because I
 studied French, but otherwise I wouldn't know that. I just showed that 
 phrase to a
 colleague here in Argentina and he didn't understand it. He thought it 
 maybe meant
 since. Maybe sans and in lieu are memes there in the USA, but not
 everywhere. So please, stick with English. :-)

 Okay. I think sans is Walter's...

 sans is in the english dictionary:

 http://www.merriam-webster.com/dictionary/sans

 According to that reference, Shakespeare used it :)  Don't think you can 
 get more English than that...


Thoust words are true.

Seriously though, I'm pretty sure a lot of native english speakers don't 
know sans either, unless they're familiar with font-related terminology. 
In lieu of is widely-known though, at least in the US.




Re: eliminate junk from std.string?

2011-01-11 Thread Daniel Gibson

Am 11.01.2011 19:07, schrieb Nick Sabalausky:

Thoust words are true.

Seriously though, I'm pretty sure a lot of native english speakers don't
know sans either, unless they're familiar with font-related terminology.
In lieu of is widely-known though, at least in the US.




I'm neither representative nor a native speaker (I'm german) and I knew sans, 
but didn't know In lieu of.


Re: About std.container.RedBlackTree

2011-01-11 Thread spir

On 01/11/2011 02:22 PM, Steven Schveighoffer wrote:

A tree is a kind of set, so instead of insert() I'd like a name like
add().
(But maybe this is not standard in D).


The function names must be consistent across containers, because the
point is that complexity and semantic requirements are attached to the
function name.  The function names were decided long ago by Andrei, and
I don't think insert is a bad name (I believe std::set and std::map in
C++ STL uses insert).


I have thought at this naming issue, precisely, for a while.
add is bad because of connotation with addition. D does not use '+' 
as operator for putting new elements in a container: this is a very 
sensible choice imo.
insert is bad because of in-between connotation: does not fit when 
putting an element at the end of a seq, even less for unordered containers.
put instead seems to me the right term, obvious and general enough: 
one puts a new element in there. This can nicely adapt to very diverse 
container types such as sequences including stacks (no explicite index 
-- put at end), sets/AAs, trees,...


Denis
_
vita es estrany
spir.wikidot.com



Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

2011-01-11 Thread spir

On 01/11/2011 05:36 PM, Andrei Alexandrescu wrote:

On 1/11/11 4:41 AM, Michel Fortin wrote:

On 2011-01-10 22:57:36 -0500, Andrei Alexandrescu
seewebsiteforem...@erdani.org said:

In addition to these (and connecting the two), a VLERange would offer
two additional primitives:

1. size_t stepSize(size_t offset) gives the length of the step needed
to skip to the next element.

2. size_t backstepSize(size_t offset) gives the size of the _backward_
step that goes to the previous element.


I like the idea, but I'm not sure about this interface. What's the
result of stepSize if your range must create two elements from one
underlying unit? Perhaps in those cases the element type could be an
array (to return more than one element from one iteration).

For instance, say we have a conversion range taking a Unicode string and
converting it to ISO Latin 1. The best (lossy) conversion for œ is
oe (one chararacter to two characters), in this case 'front' could
simply return oe (two characters) in one iteration, with stepSize
being the size of the œ code point. In the same conversion process,
encountering e followed by a combining ´ would return pre-combined
character é (two characters to one character).


In the design as I thought of it, the effective length of one logical
element is one or more representation units. My understanding is that
you are referring to a fractional number of representation units for one
logical element.


I think Michel is right. If I understand correctly, VLERange addresses 
the low-level and rather simple issue of each codepoint beeing encoding 
as a variable number of code units. Right?
If yes, then what is the advantage of VLERange? D already has 
string/wstring/dstring, allowing to work with the most advatageous 
encoding according to given source data, and dstring abstracting from 
low-level encoding issues.


The main (and massively ignored) issue when manipulating unicode text is 
rather that, unlike with legacy character sets, one codepoint does *not* 
represent a character in the common sense. In character sets like latin-1:

* each code represents a character, in the common sense (eg à)
* each character representation has the same size (1 or 2 bytes)
* each character has a single representation (à -- always 0xe0)
All of this is wrong with unicode. And these are complicated and 
high-level issues, that appear _after_ decoding, on codepoint sequences.


If VLERange is helpful is dealing with those problems, then I don't 
understand your presentation, sorry. Do you for instance mean such a 
range would, under the hood, group together codes belonging to the same 
character (thus making indexing meaningful), and/or normalise (decomp  
order) (thus allowing to comp/find/count correctly).?



denis
_
vita es estrany
spir.wikidot.com



Re: eliminate junk from std.string?

2011-01-11 Thread spir

On 01/11/2011 04:11 PM, Max Samukha wrote:

Anyway, the necessity for super-cryptic abbreviated names doesn't exist
any more. Maybe, they are justified for very frequently used stuff but
stripl/stripr is not the case.


+++
Standard names should all be as obvious as possible. Then, everyone is 
free to alias stripLeft  stripRight to sl  sr ;-) But standard lib 
should be super clear code; show the right example of what clarity means 
--not the opposite!
And I ask again: what to do with all inherited junk breaking naming 
rules like uint, size_t, malloc...?


Denis
_
vita es estrany
spir.wikidot.com



Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

2011-01-11 Thread spir

On 01/11/2011 02:30 PM, Steven Schveighoffer wrote:

On Mon, 10 Jan 2011 22:57:36 -0500, Andrei Alexandrescu
seewebsiteforem...@erdani.org wrote:


I've been thinking on how to better deal with Unicode strings.
Currently strings are formally bidirectional ranges with a
surreptitious random access interface. The random access interface
accesses the support of the string, which is understood to hold data
in a variable-encoded format. For as long as the programmer
understands this relationship, code for string manipulation can be
written with relative ease. However, there is still room for writing
wrong code that looks legit.

Sometimes the best way to tackle a hairy reality is to invite it to
the negotiation table and offer it promotion to first-class
abstraction status. Along that vein I was thinking of defining a new
range: VLERange, i.e. Variable Length Encoding Range. Such a range
would have the power somewhere in between bidirectional and random
access.

The primitives offered would include empty, access to front and back,
popFront and popBack (just like BidirectionalRange), and in addition
properties typical of random access ranges: indexing, slicing, and
length. Note that the result of the indexing operator is not the same
as the element type of the range, as it only represents the unit of
encoding.

In addition to these (and connecting the two), a VLERange would offer
two additional primitives:

1. size_t stepSize(size_t offset) gives the length of the step needed
to skip to the next element.

2. size_t backstepSize(size_t offset) gives the size of the _backward_
step that goes to the previous element.

In both cases, offset is assumed to be at the beginning of a logical
element of the range.

I suspect that a lot of functions in std.string can be written without
Unicode-specific knowledge just by relying on such an interface.
Moreover, algorithms can be generalized to other structures that use
variable-length encoding, such as those used in data compression. (In
that case, the support would be a bit array and the encoded type would
be ubyte.)

Writing to such ranges is not addressed by this design. Ideas are
welcome.

Adding VLERange would legitimize strings and would clarify their
handling, at the cost of adding one additional concept that needs to
be minded. Is the trade-off worthwhile?


While this makes it possible to write algorithms that only accept
VLERanges, I don't think it solves the major problem with strings --
they are treated as arrays by the compiler.

I'd also rather see an indexing operation return the element type, and
have a separate function to get the encoding unit. This makes more sense
for generic code IMO.

I noticed you never commented on my proposed string type...

That reminds me, I should update with suggested changes and re-post it.


People interested in solving the general problem with Unicode strings 
may have a look at https://bitbucket.org/denispir/denispir-d. All 
constructive feedback welcome.
(This will be asked for review in a short while. The main / client 
interface module is Text.d. A (long) presentation of the issues, 
reasons, solution can be found in the text called U missing level of 
abstraction)


Denis
_
vita es estrany
spir.wikidot.com



Re: eliminate junk from std.string?

2011-01-11 Thread Nick Sabalausky
Daniel Gibson metalcae...@gmail.com wrote in message 
news:igi6n5$27p...@digitalmars.com...
 Am 11.01.2011 19:07, schrieb Nick Sabalausky:
 Thoust words are true.

 Seriously though, I'm pretty sure a lot of native english speakers don't
 know sans either, unless they're familiar with font-related 
 terminology.
 In lieu of is widely-known though, at least in the US.



 I'm neither representative nor a native speaker (I'm german) and I knew 
 sans, but didn't know In lieu of.

I guess that just goes to show, we should all just switch to Esperanto ;)




Re: eliminate junk from std.string?

2011-01-11 Thread Nick Sabalausky
Max Samukha spam...@d-coding.com wrote in message 
news:ighvca$ap...@digitalmars.com...
 On 01/11/2011 05:36 PM, Ary Borenszweig wrote:
 Yes, what I meant was that the names are stripl and stripr yet the 
 description of
 those functions are strip leading and strip trailing... at least put 
 strip left
 and string right on the description so it matches the names.

 Sorry for misunderstanding.

 I don't think that the description needs to match the names literally. 
 However, I would aviod trailing and leading, because in RTL 
 environments they can have the opposite meaning.

I would have thought RTL languages got stored as RTL. If so, then leading 
and trailing would be correct and left/right would be wrong (unless 
the internal behavior of stripl and stripr takes language-direction into 
account, which would surprise me).




Re: eliminate junk from std.string?

2011-01-11 Thread spir

On 01/11/2011 07:14 PM, Nick Sabalausky wrote:

Daniel Gibsonmetalcae...@gmail.com  wrote in message
news:igi6n5$27p...@digitalmars.com...

Am 11.01.2011 19:07, schrieb Nick Sabalausky:

Thoust words are true.

Seriously though, I'm pretty sure a lot of native english speakers don't
know sans either, unless they're familiar with font-related
terminology.
In lieu of is widely-known though, at least in the US.




I'm neither representative nor a native speaker (I'm german) and I knew
sans, but didn't know In lieu of.


I guess that just goes to show, we should all just switch to Esperanto ;)


No, esperanto is just a heap of language-design errors!


Denis
_
vita es estrany
spir.wikidot.com



Re: eliminate junk from std.string?

2011-01-11 Thread Justin Johansson

On 12/01/11 05:07, Nick Sabalausky wrote:

Steven Schveighofferschvei...@yahoo.com  wrote in message
news:op.vo5kspmfeav...@steve-laptop...

On Tue, 11 Jan 2011 11:39:11 -0500, Andrei Alexandrescu
seewebsiteforem...@erdani.org  wrote:


On 1/11/11 6:29 AM, Ary Borenszweig wrote:

Hi Andrei,

It looks nice. Just a small comment: in many of your comments you use
words that
not all of us might now. For instance: sans. I happen to know it
because I
studied French, but otherwise I wouldn't know that. I just showed that
phrase to a
colleague here in Argentina and he didn't understand it. He thought it
maybe meant
since. Maybe sans and in lieu are memes there in the USA, but not
everywhere. So please, stick with English. :-)


Okay. I think sans is Walter's...


sans is in the english dictionary:

http://www.merriam-webster.com/dictionary/sans

According to that reference, Shakespeare used it :)  Don't think you can
get more English than that...



Thoust words are true.


As an aside you might find some amusement in The Shakespeare 
Programming Language


http://shakespearelang.sourceforge.net/report/shakespeare/


Re: eliminate junk from std.string?

2011-01-11 Thread spir

On 01/11/2011 07:01 PM, Nick Sabalausky wrote:

The names are for compatibility with... other languages :o|.


Would that other language be Walterish or C?

If C, it's not like using the wrong case will suddendly change the semantics
of the function. And if the worry is other non-phobos functions that might
have the old C-style name (but different semantics), then Ary's suggestion
of compatibly-named alases would take care of that.


Agreed, Ary's suggestion makes much sense.
Anyway, when shall we endly get rid of half-a-century-old naming issues? 
In the XXIInd century?



Denis
_
vita es estrany
spir.wikidot.com



Re: eliminate junk from std.string?

2011-01-11 Thread Walter Bright

Nick Sabalausky wrote:
Andrei Alexandrescu seewebsiteforem...@erdani.org wrote in message 
news:igi18o$e5...@digitalmars.com...

On 1/11/11 6:34 AM, Ary Borenszweig wrote:

Oh, one more thing: can the names be consistent?

inpattern
countChars
expandtabs
chompPrefix
toupper
toupperInPlace ??

If this can't be done for backwards compatibility maybe you can make 
alias for the

previous ones.

The names are for compatibility with... other languages :o|.



Would that other language be Walterish or C?


The names generally come from Python, Ruby and Javascript.


Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

2011-01-11 Thread Andrei Alexandrescu

On 1/11/11 9:09 AM, spir wrote:

On 01/11/2011 05:36 PM, Andrei Alexandrescu wrote:

On 1/11/11 4:41 AM, Michel Fortin wrote:

On 2011-01-10 22:57:36 -0500, Andrei Alexandrescu
seewebsiteforem...@erdani.org said:

In addition to these (and connecting the two), a VLERange would offer
two additional primitives:

1. size_t stepSize(size_t offset) gives the length of the step needed
to skip to the next element.

2. size_t backstepSize(size_t offset) gives the size of the _backward_
step that goes to the previous element.


I like the idea, but I'm not sure about this interface. What's the
result of stepSize if your range must create two elements from one
underlying unit? Perhaps in those cases the element type could be an
array (to return more than one element from one iteration).

For instance, say we have a conversion range taking a Unicode string and
converting it to ISO Latin 1. The best (lossy) conversion for œ is
oe (one chararacter to two characters), in this case 'front' could
simply return oe (two characters) in one iteration, with stepSize
being the size of the œ code point. In the same conversion process,
encountering e followed by a combining ´ would return pre-combined
character é (two characters to one character).


In the design as I thought of it, the effective length of one logical
element is one or more representation units. My understanding is that
you are referring to a fractional number of representation units for one
logical element.


I think Michel is right. If I understand correctly, VLERange addresses
the low-level and rather simple issue of each codepoint beeing encoding
as a variable number of code units. Right?
If yes, then what is the advantage of VLERange? D already has
string/wstring/dstring, allowing to work with the most advatageous
encoding according to given source data, and dstring abstracting from
low-level encoding issues.


It' not about the data, it's about algorithms. Currently there are 
algorithms that ostensibly work for bidirectional ranges, but internally 
cheat by detecting that the input is actually a string, and use that 
knowledge for better implementations.


The benefit of VLERange would that that it legitimizes those algorithms. 
I wouldn't be surprised if an entire class of algorithms would in fact 
require VLERange (e.g. many of those that we commonly consider today 
string algorithms).



The main (and massively ignored) issue when manipulating unicode text is
rather that, unlike with legacy character sets, one codepoint does *not*
represent a character in the common sense. In character sets like latin-1:
* each code represents a character, in the common sense (eg à)
* each character representation has the same size (1 or 2 bytes)
* each character has a single representation (à -- always 0xe0)
All of this is wrong with unicode. And these are complicated and
high-level issues, that appear _after_ decoding, on codepoint sequences.

If VLERange is helpful is dealing with those problems, then I don't
understand your presentation, sorry. Do you for instance mean such a
range would, under the hood, group together codes belonging to the same
character (thus making indexing meaningful), and/or normalise (decomp 
order) (thus allowing to comp/find/count correctly).?


VLERange would offer automatic decoding in front, back, popFront, and 
popBack - just like BidirectionalRange does right now. It would also 
offer access to the representational support by means of indexing - also 
like char[] et al already do now. The difference is that VLERange being 
a formal concept, algorithms can specialize on it instead of (a) 
specializing for UTF strings or (b) specializing for BidirectionalRange 
and then manually detecting isSomeString inside. Conversely, when 
defining an algorithm you can specify VLARange as a requirement. 
Boyer-Moore is a perfect example - it doesn't work on bidirectional 
ranges, but it does work on VLARange. I suspect there are many like it.


Of course, it would help a lot if we figured other remarkable VLARanges. 
Here are a few that come to mind:


* Multibyte encodings other than UTF. Currently we have no special 
support for those beyond e.g. forward or bidirectional ranges.


* Huffman, RLE, LZ encoded buffers (and many other compressed formats)

* Vocabulary-based translation systems, e.g. associate each word with a 
number.


* Others...?

Some of these are forward-only (don't allow bidirectional access). Once 
we have a number of examples, it would be great to figure a number of 
remarkable algorithms operating on them.



Andrei


Re: eliminate junk from std.string?

2011-01-11 Thread Nick Sabalausky
spir denis.s...@gmail.com wrote in message 
news:mailman.550.1294771968.4748.digitalmar...@puremagic.com...
 On 01/11/2011 07:14 PM, Nick Sabalausky wrote:
 Daniel Gibsonmetalcae...@gmail.com  wrote in message
 news:igi6n5$27p...@digitalmars.com...
 Am 11.01.2011 19:07, schrieb Nick Sabalausky:
 Thoust words are true.

 Seriously though, I'm pretty sure a lot of native english speakers 
 don't
 know sans either, unless they're familiar with font-related
 terminology.
 In lieu of is widely-known though, at least in the US.



 I'm neither representative nor a native speaker (I'm german) and I knew
 sans, but didn't know In lieu of.

 I guess that just goes to show, we should all just switch to Esperanto ;)

 No, esperanto is just a heap of language-design errors!


And that differs from English, how? ;)




Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

2011-01-11 Thread Michel Fortin
On 2011-01-11 11:36:54 -0500, Andrei Alexandrescu 
seewebsiteforem...@erdani.org said:



On 1/11/11 4:41 AM, Michel Fortin wrote:

For instance, say we have a conversion range taking a Unicode string and
converting it to ISO Latin 1. The best (lossy) conversion for œ is
oe (one chararacter to two characters), in this case 'front' could
simply return oe (two characters) in one iteration, with stepSize
being the size of the œ code point. In the same conversion process,
encountering e followed by a combining ´ would return pre-combined
character é (two characters to one character).


In the design as I thought of it, the effective length of one logical 
element is one or more representation units. My understanding is that 
you are referring to a fractional number of representation units for 
one logical element.


Your understanding is correct.

I think both cases (one becomes many  many becomes one) are important 
and must be supported. Your proposal only deal with the 
many-becomes-one case.


I proposed returning arrays so we can deal with the one-becomes-many 
case (œ becoming oe). Another idea would be to introduce 
substeps. When checking for the next character, in addition to 
determining its step length you could also determine the number of 
substeps in it. œ would have two substeps, o and e, and when 
there is no longer any substep you move to the next step.


All this said, I think this should stay an implementation detail as 
this would allow a variety of strategies. Also, keeping this an 
implementation detail means that your proposed 'stepSize' and 
'backstepSize' need to be an implementation detail too (because they 
won't make sense for the one-to-many case). So they can't really be 
part of a standard VLE interface.


As far as I know, all we really need to expose to algorithms is whether 
a range has elements of variable length, because this has an impact on 
your indexing capabilities. The rest seems unnecessary to me, or am I 
missing some use cases?


--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/



Re: eliminate junk from std.string?

2011-01-11 Thread Ary Borenszweig
Why care where they come from? Why not make them intuitive? Say, like, Always
camel case?


Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

2011-01-11 Thread Steven Schveighoffer
On Tue, 11 Jan 2011 11:54:08 -0500, Andrei Alexandrescu  
seewebsiteforem...@erdani.org wrote:



On 1/11/11 5:30 AM, Steven Schveighoffer wrote:

While this makes it possible to write algorithms that only accept
VLERanges, I don't think it solves the major problem with strings --
they are treated as arrays by the compiler.


Except when they're not - foreach with dchar...


This solitary difference is a very thin argument -- foreach(d;  
byDchar(str)) would be just as good without requiring compiler help.





I'd also rather see an indexing operation return the element type, and
have a separate function to get the encoding unit. This makes more sense
for generic code IMO.


But that's neither here nor there. That would return the logical element  
at a physical position. I am very doubtful that much generic code could  
work without knowing they are in fact dealing with a variable-length  
encoding.


It depends on the function, and the way the indexing is implemented.


I noticed you never commented on my proposed string type...

That reminds me, I should update with suggested changes and re-post it.


To be frank, I think it didn't mark a visible improvement. It solved  
some problems and brought others. There was disagreement over the  
offered primitives and their semantics.


It is supposed to be simple, and provide the expected interface, without  
causing any undue performance degradation.  That is, I should be able to  
do all the things with a replacement string type that I can with a char  
array today, as efficiently as I can today, except I should have to work  
to get at the code-units.  The huge benefit is that I can say I'm dealing  
with this as an array when I know it's safe


The disagreement will never be fully solved, as there is just as much  
disagreement about the current state of affairs ;)  e.g. should foreach  
default to using dchar?


That being said, it's good you are doing this work. In the best case,  
you could bring a compelling abstraction to the table. In the worst,  
you'll become as happy about D's strings as I am :o).


I don't think I'll ever be 'happy' with the way strings sit in phobos  
currently.  I typically deal in ASCII (i.e. code units), and phobos works  
very hard to prevent that.


-Steve


Re: eliminate junk from std.string?

2011-01-11 Thread Walter Bright

Nick Sabalausky wrote:
Andrej Mitrovic andrej.mitrov...@gmail.com wrote in message 
news:mailman.543.1294713068.4748.digitalmar...@puremagic.com...

Speaking of regex.. I see there are two enums in std.regex, email and
url, which are regular expressions. Why not collect more of these
common regexes? And we could pack them up in a struct to avoid
polluting the local namespace. I think this might encourage the use of
std.regex, since the average Joe wouldn't have to reach for the regex
book whenever he's processing strings. E.g.:

foreach(m; match(10abc20def30, regex(patterns.number))) //
std.regex.patterns.number
{
   writefln(%s[%s]%s, m.pre, m.hit, m.post);
}

Just a passing thought..


I think that's a great idea.


I agree.


Re: either

2011-01-11 Thread KennyTM~

On Jan 11, 11 17:10, Justin Johansson wrote:

On 10/01/11 05:42, Andrei Alexandrescu wrote:

I wrote a simple helper, in spirit with some recent discussions:

// either
struct Either(Ts...)
{
Tuple!Ts data_;
bool opEquals(E)(E e)
{
foreach (i, T; Ts)
{
if (data_[i] == e) return true;
}
return false;
}
}

auto either(Ts...)(Ts args)
{
return Either!Ts(tuple(args));
}

unittest
{
assert(1 == either(1, 2, 3));
assert(4 != either(1, 2, 3));
assert(abac != either(aasd, s));
assert(abac == either(aasd, abac, s));
}

Turns out this is very useful in a variety of algorithms. I just don't
know where in std this helper belongs! Any ideas?


Despite that it may be very useful as you say, personally I think it is
a fundamental no-no to overload the meaning of == in any manner that
does not preserve the generally accepted semantics of equality which
include the notions of reflexivity, symmetry and transitivity**.

**See http://en.wikipedia.org/wiki/Equality_%28mathematics%29

The symmetric and transitive properties of the equality relation imply
that if (a == c) is true and if (b == c) is true then (a == b) is also
true.

In this case the semantics of the overloaded == operator have the
expressions 1 == either(1, 2, 3) and 2 == either(1, 2, 3) both
evaluating to true and by implication/expectation (1 == 2).

Clearly though, (1 == 2) evaluates to false in terms of the commonly
accepted meaning of equality.

Just my 2 cents and I wonder if there some other way of achieving the
desired functionality of your helper without resorting to overloading
== and the consequential violation of the commonly held semantics of
equality.

Cheers,
Justin Johansson


We could use in instead of ==

if (1 in oneOf(1, 2, 3)) { ... }
if (4 !in oneOf(1, 2, 3)) { ... }


Re: eliminate junk from std.string?

2011-01-11 Thread Walter Bright

Adam Ruppe wrote:

I don't know about bearophile, but I used a lot of the functions
you are talking about removing in my HTML - Plain Text conversion
function used for emails and other similar environments. squeeze the
whitespace, align text, wrap for the target, etc.


As has been pointed out, a lot of these seemingly odd functions come from 
Python/Ruby/Javascript. Users of those languages will be familiar with them, and 
they've proven themselves handy in those languages.


Let's not be cavalier about dumping them just because they aren't familiar to C 
programmers.


Re: eliminate junk from std.string?

2011-01-11 Thread Walter Bright

Ary Borenszweig wrote:

Why care where they come from? Why not make them intuitive? Say, like, Always
camel case?


Because people are used to those names due to their wide use. It's the same 
reason that we still use Qwerty keyboards.


Re: DVCS (was Re: Moving to D)

2011-01-11 Thread Walter Bright

retard wrote:
Ubuntu has a menu entry for restricted drivers. It provides support for 
both ATI/AMD (Radeon 8500 or better, appeared in 1998 or 1999!) and 
NVIDIA cards (Geforce 256 or better, appeared in 1999!) and I think it 
automatically suggests (a pop-up window) correct drivers in the latest 
releases right after the first install.


Intel chips are automatically supported by the open source drivers. VIA 
and S3 may or may not work out of the box. I'm just a bit curious to know 
what GPU you have? If it's some ancient VLB (vesa local bus) or ISA card, 
I can donate $15 for buying one that uses AGP or PCI Express.


Ubuntu doesn't support all video formats out of the box, but the media 
players and browsers automatically suggest loading missing drivers. At 
least in the 3 or 4 latest releases. Maybe the problem isn't the encoder, 
it might be the Linux incompatible web site.


My mobo is an ASUS M2A-VM. No graphics cards, or any other cards plugged into 
it. It's hardly weird or wacky or old (it was new at the time I bought it to 
install Ubuntu).


My display is 1920 x 1200. That just seems to cause grief for Ubuntu. Windows 
has no issues at all with it.




Or you could download the latest version from meld's website and
compile it yourself.

Yeah, I could spend an afternoon doing that.


Another one of these jokes? Probably one of the best compiler authors in 
the whole world uses a whole afternoon doing something (compiling a 
program)


On the other hand, I regularly get emails from people with 10 years of coding 
experience who are flummoxed by a symbol not defined message from the linker. :-)


that total Linux noobs do in less than 30 minutes with the help 
of Google search.


Yeah, I've spent a lot of time googling for solutions to problems with Linux. 
You know what? I get pages of results from support forums - every solution is 
different and comes with statements like seems to work, doesn't work for me, 
etc. The advice is clearly from people who do not know what they are doing, and 
randomly stab at things, and these are the first page of google results.


Re: eliminate junk from std.string?

2011-01-11 Thread Daniel Gibson

Am 11.01.2011 20:42, schrieb Walter Bright:

Ary Borenszweig wrote:

Why care where they come from? Why not make them intuitive? Say, like, Always
camel case?


Because people are used to those names due to their wide use. It's the same
reason that we still use Qwerty keyboards.


And C++ :-P


Re: DVCS (was Re: Moving to D)

2011-01-11 Thread Walter Bright

retard wrote:

One thing came to my mind. Unless you're using Ubuntu 8.04 LTS,


I'm using 8.10, and I've noticed that no more updates are coming.

your 
Ubuntu version isn't supported anymore. They might have already removed 
the package repositories for unsupported versions and that might indeed 
lead to problems with graphics and video players as you said.


What annoyed the heck out of me was the earlier (7.xx) version of Ubuntu *did* 
work.

The support for desktop 8.04 and 9.10 is also nearing its end (April this 
year). I'd recommend backing up your /home and installing 10.04 LTS or 
10.10 instead.


Yeah, I know I'll be forced to upgrade soon. One thing that'll make it easier is 
I abandoned using Ubuntu for multimedia. For example, to play Pandora I now just 
plug my ipod into my stereo g. I just stopped using youtube on Ubuntu, as I 
got tired of the video randomly going black, freezing, etc.


Re: eliminate junk from std.string?

2011-01-11 Thread Ary Borenszweig
Agreed. So what's wrong with improving things and leaving old things as aliases?


Re: eliminate junk from std.string?

2011-01-11 Thread Ary Borenszweig
Welcome to D. Do you program in C, Javascript, Python or Ruby? Cool! Then you
will feel at home.

That phrase currently ends like this:

You don't? Oh, sorry, you will have to learn that some names are all lowercase,
some not.

But it could end like this:

You don't? Don't worry. D has the convention of writing all function names 
with X
convention, but we keep some aliases for things that we want to keep backwards
compatibility for.


Re: either

2011-01-11 Thread Justin Johansson

On 12/01/11 06:28, KennyTM~ wrote:

On Jan 11, 11 17:10, Justin Johansson wrote:

On 10/01/11 05:42, Andrei Alexandrescu wrote:

I wrote a simple helper, in spirit with some recent discussions:
unittest
{
assert(1 == either(1, 2, 3));
assert(4 != either(1, 2, 3));
assert(abac != either(aasd, s));
assert(abac == either(aasd, abac, s));
}


Just my 2 cents and I wonder if there some other way of achieving the
desired functionality of your helper without resorting to overloading
== and the consequential violation of the commonly held semantics of
equality.


We could use in instead of ==

if (1 in oneOf(1, 2, 3)) { ... }
if (4 !in oneOf(1, 2, 3)) { ... }


Nice suggestion.

At the end of the day though it basically boils down to having either a 
binary operator** or a function for it.


(** preferably excluding == and other undesirable operator overloads 
of course).


Re: eliminate junk from std.string?

2011-01-11 Thread Max Samukha

On 01/11/2011 09:42 PM, Walter Bright wrote:

Ary Borenszweig wrote:

Why care where they come from? Why not make them intuitive? Say, like,
Always
camel case?


Because people are used to those names due to their wide use. It's the
same reason that we still use Qwerty keyboards.


We should be careful in assuming what people are used to. Compare:

D/Python/Lisp/... - strip

.NET/Delphi/Java/Qt/Haskell/... - Trim/trim/trimmed

stripl/stripr are TrimStart/TrimEnd in .NET




Re: eliminate junk from std.string?

2011-01-11 Thread Max Samukha

On 01/11/2011 08:18 PM, Nick Sabalausky wrote:

Max Samukhaspam...@d-coding.com  wrote in message
news:ighvca$ap...@digitalmars.com...

On 01/11/2011 05:36 PM, Ary Borenszweig wrote:

Yes, what I meant was that the names are stripl and stripr yet the
description of
those functions are strip leading and strip trailing... at least put
strip left
and string right on the description so it matches the names.


Sorry for misunderstanding.

I don't think that the description needs to match the names literally.
However, I would aviod trailing and leading, because in RTL
environments they can have the opposite meaning.


I would have thought RTL languages got stored as RTL. If so, then leading
and trailing would be correct and left/right would be wrong (unless
the internal behavior of stripl and stripr takes language-direction into
account, which would surprise me).




AFAIK, there is no universal standard on storing RTL text. There are 
recommendations to prefer logical order over visual order because visual 
order is extremely inflexible. I am not an expert in this field and have 
to shut up.


Re: std.unittests for (final?) review [Update]

2011-01-11 Thread Tomek Sowiński
Jonathan M Davis napisał:

 On Monday, January 10, 2011 13:48:50 Tomek Sowiński wrote:
  Jonathan M Davis napisał:
   I followed Andrei's suggestion and merged most of the functions into a
   highly flexible assertPred. I also renamed the functions as suggested
   and attempted to fully document everything with fully functional
   examples instead of examples using types or functions which don't
   actually exist.
  
  Did you zip the right file? I still see things like nameFunc and
  assertPlease.
 
 ??? Those are supposed to be there. All examples are tested in the unit tests 
 exactly as they are.

I just thought instead of examples using types or functions which don't 
actually exist meant well-known Phobos functions would be used.

  On the whole the examples are too long. It's just daunting I can't see docs
  for *one* function without scrolling. Please give them a solid hair-cut --
  max 10 lines with a median of 5. The descriptions are also watered down by
  over-explanatory writing.
 
 Perhaps. If I cut down on the examples though, the usage wouldn't be as 
 clear. 
 The idea was to be thorough. Andrei wanted better examples, so I gave better 
 examples.

Not sure if longer means better.

 However, it is a bit of a balancing act, and I may have put too many 
 in. It's debatable. Nick's suggestion of a main description before each 
 individual overload would help with that.

I agree. Perhaps a synopsis for the whole module like in std.variant would help 
too.

   So, now there's just assertThrown, assertNotThrown, collectExceptionMsg,
   and assertPred (though there are eight different overloads of
   assertPred). So, review away.
  
  Some suggestions:
  
  assertPred:
  Try putting expected in front; uniform call syntax can then set it apart
  from the operands: assertPred!%(7, 5, 2); // old
  2.assertPred!%(7, 5); // new
 
 I really don't see any value to this.
 
 1. You can't do that with assert, and assertPred is essentially supposed to 
 be a 
 fancy assert.
 
 2. A number of assertPred overloads don't even have an expected, so it would 
 be 
 inconsistent.
 
 3. People already are annoyed enough that the operator doesn't end up between 
 the arguments. Putting the result on the left-hand side of the operator like 
 that would make it that much more confusing.

OK, I understand.

  assertNotThrown: chain the original exception with AssertError as its
  cause? Oh, this one badly needs a real-life example.
 
 I suppose that chaining it would be a good idea. I didn't think of that. But 
 if 
 you want examples, it's used in the unit tests in this very module, and I 
 used 
 it heavily in std.datetime.

I meant a real-life example in documentation. People may often ask themselves 
how is it different than !assertThrown()?.

  assertThrown: I'd rather see generified collectException (call it
  collectThrown?). assertThrown may stay as a convenience wrapper, though.
 
 ??? I don't get what you're trying for here. assertThrown isn't trying to 
 collect exceptions at all. It's testing whether the given exception was 
 thrown 
 like it's supposed to be for the given function call. If it was, then the 
 assertion succeeded. If it wasn't, then an AssertError is thrown. Just like 
 assert.

I mean now collectException doesn't have a parametrized catch block like 
assertThrown does. If it did, the latter could come down to:

void assertThrown(T : Throwable = Exception, F)
   (lazy F funcToCall, string msg = null, string file = 
__FILE__, size_t line = __LINE__)
{
T e = collectThrown!T(funcToCall);
if (e is null)
throw new AssertError(...);
}

Shortening assertThrown's implementation is a bonus, main gain is better 
collectThrown().

[there's more down]

  Looking at the code I'm seeing the same cancerous coding style std.datetime
  suffered from (to a lesser extent, I admit).
  
  For instance, this routine:
  
  if(result != expected)
  {
  if(msg.empty)
  {
  throw new AssertError(format(`assertPred!%s failed: [%s] %s
  [%s]: actual [%s], expected [%s].`, op,
   lhs,
   op,
   rhs,
   result,
   expected),
 file,
 line);
  }
  else
  {
  throw new AssertError(format(`assertPred!%s failed: [%s] %s
  [%s]: actual [%s], expected [%s]: %s`, op,
   lhs,
   op,
   rhs,
   result,
   expected,
   msg),
file,
line);

Re: eliminate junk from std.string?

2011-01-11 Thread Walter Bright

Ary Borenszweig wrote:

Agreed. So what's wrong with improving things and leaving old things as aliases?


Clutter.

One of the risks with Phobos development is it becoming a river miles wide, and 
only an inch deep. In other words, endless gobs of shallow, trite functions, 
with very little depth. (Aliases are as shallow as they get!)


As a general rule, I don't want functionality in Phobos that takes more time for 
a user to find/read/understand the documentation on than to reimplement it 
himself. Those things give the illusion of comprehensiveness, but are just 
useless wankery.


Do we really want a 1000 page reference manual on Phobos, but no database 
interface? No network interface? No D lexer? No disassembler? No superfast XML 
parser? No best-of-breed regex implementation? No CGI support? No HTML parsing? 
No sound support? No jpg reading?


I worry by endless bikeshedding about perfecting the spelling of some name, we 
miss the whole show.


I'd like to see more meat. For example, Don has recently added gamma functions 
to the math library. These are hard to implement correctly, and are perfect for 
inclusion.


Re: eliminate junk from std.string?

2011-01-11 Thread Nick Sabalausky
Walter Bright newshou...@digitalmars.com wrote in message 
news:igib2q$12g...@digitalmars.com...
 Adam Ruppe wrote:
 I don't know about bearophile, but I used a lot of the functions
 you are talking about removing in my HTML - Plain Text conversion
 function used for emails and other similar environments. squeeze the
 whitespace, align text, wrap for the target, etc.

 As has been pointed out, a lot of these seemingly odd functions come from 
 Python/Ruby/Javascript. Users of those languages will be familiar with 
 them, and they've proven themselves handy in those languages.

 Let's not be cavalier about dumping them just because they aren't familiar 
 to C programmers.

I agree with this reasoning for having them. However, I don't think it means 
we shouldn't D-ify or Phobos-ify them, at least as far as capitalization 
conventions.




Re: eliminate junk from std.string?

2011-01-11 Thread Walter Bright

Nick Sabalausky wrote:
I agree with this reasoning for having them. However, I don't think it means 
we shouldn't D-ify or Phobos-ify them, at least as far as capitalization 
conventions.


I also object to rather pointlessly annoying people wanting to move their code 
from D1 to D2 by renaming everything. Endlessly renaming things searching for 
the perfect name gives the illusion of progress, whereas time would be better 
spent on improving the documentation, unittests, performance, etc.


Naming of things isn't nearly as critical an issue in D as it is in, say, C, 
because of the excellent antihijacking support in D's module system.



Some name changes have turned out to be a big win, like invariant = 
immutable. But I don't think that implies open season for wholesale renaming 
of swaths of functions.


Re: eliminate junk from std.string?

2011-01-11 Thread Walter Bright

Ary Borenszweig wrote:

Agreed. So what's wrong with improving things and leaving old things as aliases?


I want to add that having multiple names for the same thing doesn't really do 
anyone any good.


Re: filling an array of structures

2011-01-11 Thread Ali Çehreli

Brad wrote:
 Given an array of structures that you need to populate.
 Also assume the structure is quite large and has many
 elements to fill in.

 S s[];
 while (something) {
   s.length += 1;
   auto sp = s[$-1];   // method 1
   sp.a = 1;
   ...
   with (s[$-1]) {   // method 2
 a = 1;
   }
   ...
   foreach (ref sp; s[$-1..$]) {  // method 3
 sp.a = 1;
   }
 }

 I don't mind 'with' statements, but they have a readability and
 maintenance problem if their scope is large.  The reader would have
 to be aware of the context of the structure and the local variables,
 whereas 'sp.a' is self documenting.

 method 3 is fine, and provides me with a reference to s[$-1],
 but I'd really like to have:
auto sp = ref s[$-1];  // possible method 4
 where sp is a reference, but no pointer arithmetic can be done on it.

 Another alternative would be runtime aliases.
alias s[$-1] as sp;
 Or
sp = with (s[$-1]); // I don't much like this syntax...

 In the meantime, I'll go with method 1.

   -- Brad

I've been using a method in C++, which involves

boost::shared_ptr
boost::enable_from_shared
boost::list_of

That was useful when objects had both some required and some optional 
properties. Anyway... If polymorphism is not needed something similar 
can be achieved very simply in D:


S[] esses = [ S(42), S(100).optional(3) ];

The whole code:

import std.stdio;
import std.string;

struct S
{
int must_have_;
int optional_;

this(int must_have)
{
must_have_ = must_have;
}

ref S optional(int optional_arg)
{
optional_ = optional_arg;
return this;
}

string toString() const
{
return format(%s.%s, must_have_, optional_);
}
}

void main()
{
S[] esses = [ S(42), S(100).optional(3) ];
writeln(esses);
}

Ali


Re: eliminate junk from std.string?

2011-01-11 Thread Adam D. Ruppe
On Tue, Jan 11, 2011 at 12:43:28PM -0800, Walter Bright wrote:
 Naming of things isn't nearly as critical an issue in D as it is in, say, 
 C, because of the excellent antihijacking support in D's module system.

And the spell checker will quickly point out messed up capitalization at
compile time anyway.



Re: eliminate junk from std.string?

2011-01-11 Thread Nick Sabalausky
Walter Bright newshou...@digitalmars.com wrote in message 
news:igibu6$154...@digitalmars.com...
 Ary Borenszweig wrote:
 Why care where they come from? Why not make them intuitive? Say, like, 
 Always
 camel case?

 Because people are used to those names due to their wide use. It's the 
 same reason that we still use Qwerty keyboards.

Then why switch langauges at all?

When you move to a different language you expect that language is going to 
have its own set of conventions. And even more than that, you also expect it 
to at least be internally-consistent, not a grab-bag of different styles. 
Are they really supposed to remember Oh, oh, this func comes from this 
language, so it's capitalized this way, and that one comes from that 
language so it's capitalized that way...

Not only that, but D has far, far bigger, more significant differences from 
Ruby/Python/JS/etc than the capitalization of a few functions. If people are 
going to come over and get used to *those* changes, then using toLower 
instead of tolower is going to be a downright triviality for them. Your cart 
is before your horse.




Re: eliminate junk from std.string?

2011-01-11 Thread Nick Sabalausky
Walter Bright newshou...@digitalmars.com wrote in message 
news:igifgt$1cu...@digitalmars.com...
 Nick Sabalausky wrote:
 I agree with this reasoning for having them. However, I don't think it 
 means we shouldn't D-ify or Phobos-ify them, at least as far as 
 capitalization conventions.

 I also object to rather pointlessly annoying people wanting to move their 
 code from D1 to D2 by renaming everything. Endlessly renaming things 
 searching for the perfect name gives the illusion of progress, whereas 
 time would be better spent on improving the documentation, unittests, 
 performance, etc.

 Naming of things isn't nearly as critical an issue in D as it is in, say, 
 C, because of the excellent antihijacking support in D's module system.


 Some name changes have turned out to be a big win, like invariant = 
 immutable. But I don't think that implies open season for wholesale 
 renaming of swaths of functions.

We're not asking for free-for-all bikeshedding, we're asking to get rid of 
the free-for-all naming-convention-carnival in the std lib. Just basic 
sensible consistency, that's all.

And breaking compatibility with D1 for the sake of progress is the whole 
point of D2.




Re: DVCS (was Re: Moving to D)

2011-01-11 Thread Andrej Mitrovic
On 1/11/11, Walter Bright newshou...@digitalmars.com wrote:
 Yeah, I've spent a lot of time googling for solutions to problems with
 Linux.
 You know what? I get pages of results from support forums - every solution
 is
 different and comes with statements like seems to work, doesn't work for
 me,
 etc. The advice is clearly from people who do not know what they are doing,
 and
 randomly stab at things, and these are the first page of google results.


That's my biggest problem with Linux. Having technical problems is not
the issue, finding the right solution in the sea of forum posts is the
problem. When I have a problem with something breaking down on
Windows, most of the time a single google search reveals the solution
in one of the very first results (it's either on an MSDN page or one
of the more popular forums).

This probably has to do with the fact that regular users have either
XP or Vista/7 installed. So there's really not much searching you have
to do. Once someone posts a solution, that's the end of the story
(more often than not).

I remember a few years ago I got a copy of Ubuntu, and I wanted to
disable antialiased fonts (they looked really bad on the screen). So I
simply disabled antialised fonts in one of the display property
panels, and thought that would be the end of the story. But guess
what? Firefox and other applications don't want to follow the OS
settings, and they will override your settings and render websites
with antialised fonts. So now I had to search for half an hour to find
a solution. I finally find a guide where the instructions are to edit
the etc/fonts.conf file. So I do that. But antialised fonts were still
active. So I spend another 30 minutes looking for more information.
Then I run into another website where the instructions are to delete a
couple of fonts from the system. OK. I run the command in the
terminal, I reset the system, but then on boot x-org crashes. So now
I'm left with a blinking cursor on a black background, with no
knowledge whatsover of how to fix x-org or reset its settings.
Instinctively I run help and I get back a list of 100 commands, but
I can only read the last 20 and I've no idea how to scroll up to read
more.

So, hours wasted and a broken Linux system all because I wanted to
disable antialiased fonts. But that's just one example. I have plenty
more. GRUB failing to install properly, GRUB failing to detect all of
my windows installations, and then there's that wubi which *does
not* work. Of course there are numerous guides on how to fix wubi as
well but those fail too. Bleh. I like open-source, Linux - the kernel
might be awesome for all I know, but the distributions plain-simple
*suck*.


Re: eliminate junk from std.string?

2011-01-11 Thread Jerry Quinn
Andrei Alexandrescu Wrote:

 On 1/9/11 4:51 PM, Andrei Alexandrescu wrote:
  There's a lot of junk in std.string that should be gone. I'm trying to
  motivate myself to port some functions to different string widths and...
  it's not worth it.
 
  What functions do you think we should remove from std.string? Let's make
  a string and then send them the way of the dino.
 
 
  Thanks,
 
  Andrei
 
 I have uploaded a preview of the changed APIs here:
 
 http://d-programming-language.org/cutting-edge/phobos/std_string.html

Unclear if iswhite() refers to ASCII whitespace or Unicode.  If Unicode, which 
version of the standard?
Same comment for icmp().  Also, in the Unicode standard, case folding can 
depend on the specific language.

There is room for ascii-only functions, but unless a D version of ICU is going 
to be done separately, it would be nice to have full unicode-aware functions 
available.

You've got chop() marked as deprecated.  Is popBack() going to make sense as 
something that removes a variable number of chars from a string in the CR-LF 
case?  That might be a bit too magical.

Rather than zfill, what about modifying ljustify, rjustify, and center to take 
an optional fill character?

One set of functions I'd like to see are startsWith() and endsWith().  I find 
them frequently useful in Java and an irritating lack in the C++ standard 
library.

Jerry











Re: eliminate junk from std.string?

2011-01-11 Thread Jerry Quinn
Jerry Quinn Wrote:

 One set of functions I'd like to see are startsWith() and endsWith().  I find 
 them frequently useful in Java and an irritating lack in the C++ standard 
 library.

Just adding that these functions are useful because they're more efficient than 
doing a find and checking that the match is in the first position.

Jerry



Re: DVCS (was Re: Moving to D)

2011-01-11 Thread Daniel Gibson

Am 11.01.2011 22:36, schrieb Walter Bright:

Andrej Mitrovic wrote:

That's my biggest problem with Linux. Having technical problems is not
the issue, finding the right solution in the sea of forum posts is the
problem.


The worst ones begin with you might try this... or I think this might work,
but YMMV... How do these wind up being the top ranked results by google? Who
embeds links to that stuff?

My experience with Windows is, like yours, the opposite. The top ranked result
will be correct and to the point. No weasel wording.


Those results are often in big forums like ubuntuforums.org that get a lot of 
links etc, so even if one thread doesn't have many incoming links, it may still 
get a top ranking.


Also my blog entries (hosted at wordpress.com) get on the google frontpage when 
looking for the specific topic, even though my blog is mostly unknown, has 2-20 
visitors per day and almost no incoming links.. Googles algorithms often do seem 
like voodoo ;)


Also: Many problems (and their correct solutions) heavily depend on your system. 
What desktop environment is used, what additional stuff (dbus, hal, ...) is 
used, what are the versions of this stuff (and X.org), what distribution is 
used, ...
There may be different default configurations shipped depending on what 
distribution (and what version of that distribution) you use, ...

So there often is no single correct answer that will work for anyone.

Still, in my experience those HOWTOs often work (it may help to look at multiple 
HOWTOs and compare them if you're not sure, if it applies to your system) or at 
least push you in the right direction.


Cheers,
- Daniel


Re: eliminate junk from std.string?

2011-01-11 Thread Dmitry Olshansky

On 12.01.2011 0:47, Jerry Quinn wrote:

Jerry Quinn Wrote:


One set of functions I'd like to see are startsWith() and endsWith().  I find 
them frequently useful in Java and an irritating lack in the C++ standard 
library.

Just adding that these functions are useful because they're more efficient than 
doing a find and checking that the match is in the first position.

Jerry

Those are present in std.algorithm and seem to work just fine. What's 
wrong with them?


--
Dmitry Olshansky



Re: std.unittests for (final?) review [Update]

2011-01-11 Thread Jonathan M Davis
On Tuesday, January 11, 2011 12:25:53 Tomek Sowiński wrote:
 Jonathan M Davis napisał:
  On Monday, January 10, 2011 13:48:50 Tomek Sowiński wrote:
   Jonathan M Davis napisał:
I followed Andrei's suggestion and merged most of the functions into
a highly flexible assertPred. I also renamed the functions as
suggested and attempted to fully document everything with fully
functional examples instead of examples using types or functions
which don't actually exist.
   
   Did you zip the right file? I still see things like nameFunc and
   assertPlease.
  
  ??? Those are supposed to be there. All examples are tested in the unit
  tests exactly as they are.
 
 I just thought instead of examples using types or functions which don't
 actually exist meant well-known Phobos functions would be used.

Well, that would be better, but at least when it comes to types, that doesn't 
work. Not only is Phobos generally lacking in types, but some of the examples 
which show what a typical error message from the functions would look like 
require incorrectly implemented types. I might be able to use existing 
functions 
for the examples using functions though.

   assertThrown: I'd rather see generified collectException (call it
   collectThrown?). assertThrown may stay as a convenience wrapper,
   though.
  
  ??? I don't get what you're trying for here. assertThrown isn't trying to
  collect exceptions at all. It's testing whether the given exception was
  thrown like it's supposed to be for the given function call. If it was,
  then the assertion succeeded. If it wasn't, then an AssertError is
  thrown. Just like assert.
 
 I mean now collectException doesn't have a parametrized catch block like
 assertThrown does. If it did, the latter could come down to:
 
 void assertThrown(T : Throwable = Exception, F)
(lazy F funcToCall, string msg = null, string file =
 __FILE__, size_t line = __LINE__) {
   T e = collectThrown!T(funcToCall);
   if (e is null)
   throw new AssertError(...);
 }
 
 Shortening assertThrown's implementation is a bonus, main gain is better
 collectThrown().
 
 [there's more down]
 
   Looking at the code I'm seeing the same cancerous coding style
   std.datetime suffered from (to a lesser extent, I admit).
   
   For instance, this routine:
   if(result != expected)
   {
   
   if(msg.empty)
   {
   
   throw new AssertError(format(`assertPred!%s failed: [%s]
   %s
   
   [%s]: actual [%s], expected [%s].`, op,
   
lhs,
op,
rhs,
result,
expected),
  
  file,
  line);
   
   }
   else
   {
   
   throw new AssertError(format(`assertPred!%s failed: [%s]
   %s
   
   [%s]: actual [%s], expected [%s]: %s`, op,
   
lhs,
op,
rhs,
result,
expected,
msg),
 
 file,
 line);
   
   }
   
   }
   
   can be easily compressed to:
   
   enforce(result==expected, new AssertError(
   
   format([%s] %s [%s] failed: actual [%s], expected [%s] ~
   (msg.empty ?
   
   . : : %s), op, lhs, op, rhs, result, expected, msg), file, line));
  
  I really have no problem with them being separate as they are. I think
  that I end up writing them that way because I see them as two separate
  code paths. It wouldn't necessarily be a bad idea to combine them, but I
  really don't think that it's a big deal.
  
   Another example:
   
   {
   
   bool thrown = false;
   try
   
   assertNotThrown!AssertError(throwEx(new AssertError(It's
   an
   
   AssertError, __FILE__, __LINE__)), It's a message);
   catch(AssertError)
   
   thrown = true;
   
   assert(thrown);
   
   }
   
   can be:
   try {
   
   assertNotThrown!AssertError(throwEx(new AssertError(It's an
   
   AssertError, __FILE__, __LINE__)), It's a message); assert(false);
   
   } catch(AssertError) { /*OK*/ }
   
   and you don't have to introduce a new scope every time.
  
  Doesn't work actually - at least not in the general case (for this
  particular test, it's arguably okay). It doesn't take into account the
  case where an exception other than AssertError is 

Re: eliminate junk from std.string?

2011-01-11 Thread Jonathan M Davis
On Tuesday, January 11, 2011 12:44:57 Nick Sabalausky wrote:
 Walter Bright newshou...@digitalmars.com wrote in message
 news:igibu6$154...@digitalmars.com...
 
  Ary Borenszweig wrote:
  Why care where they come from? Why not make them intuitive? Say, like,
  Always
  camel case?
  
  Because people are used to those names due to their wide use. It's the
  same reason that we still use Qwerty keyboards.
 
 Then why switch langauges at all?
 
 When you move to a different language you expect that language is going to
 have its own set of conventions. And even more than that, you also expect
 it to at least be internally-consistent, not a grab-bag of different
 styles. Are they really supposed to remember Oh, oh, this func comes from
 this language, so it's capitalized this way, and that one comes from that
 language so it's capitalized that way...
 
 Not only that, but D has far, far bigger, more significant differences from
 Ruby/Python/JS/etc than the capitalization of a few functions. If people
 are going to come over and get used to *those* changes, then using toLower
 instead of tolower is going to be a downright triviality for them. Your
 cart is before your horse.

I agree. Having the functions named similarly so that they're quickly 
recognized 
is good - if a function has a particular name in a variety of languages, why 
not 
give it essentially the same name in D? But I don't see why it must be 
_exactly_ 
the same name. At least using the same casing as the rest of Phobos. Unless 
you're directly porting code, the fact that it's toLower instead of tolower 
really shouldn't be an issue. It's a new a language, a new library, you're 
going 
to have to learn how it works anyway. The function names don't need to be 
_exactly_ the same as other languages. It does look bad when functions in 
Phobos 
don't follow the same naming conventions as the rest of it, and it makes it 
much 
harder to remember exactly how they're named.

So, I'm all for picking names which are essentially the same as functions with 
the same functionality in other languages, but I think that insisting that the 
casing of the names match the casing of the functions from other languages when 
it doesn't match how functions are normally cased in Phobos is definitely a bad 
idea. Not to mention, I don't think that I've ever heard anyone complain that 
the casing on a function in Phobos didn't match the casing of a function with 
essentially the same name in another language, but complaints definitely pop up 
about how some of the std.string functions don't use the same casing as the 
rest 
of Phobos.

I vote for consistency. Using essentially the same names for functions as is 
used in other languages is great. Insisting on the same casing for the function 
names strikes me as inconsistent and undesirable. I find that it increases the 
burden of remembering function names rather than reducing it.

- Jonathan M Davis


Re: eliminate junk from std.string?

2011-01-11 Thread Jonathan M Davis
On Tuesday, January 11, 2011 11:12:44 Nick Sabalausky wrote:
 spir denis.s...@gmail.com wrote in message
 news:mailman.550.1294771968.4748.digitalmar...@puremagic.com...
 
  On 01/11/2011 07:14 PM, Nick Sabalausky wrote:
  Daniel Gibsonmetalcae...@gmail.com  wrote in message
  news:igi6n5$27p...@digitalmars.com...
  
  Am 11.01.2011 19:07, schrieb Nick Sabalausky:
  Thoust words are true.
  
  Seriously though, I'm pretty sure a lot of native english speakers
  don't
  know sans either, unless they're familiar with font-related
  terminology.
  In lieu of is widely-known though, at least in the US.
  
  I'm neither representative nor a native speaker (I'm german) and I knew
  sans, but didn't know In lieu of.
  
  I guess that just goes to show, we should all just switch to Esperanto
  ;)
  
  No, esperanto is just a heap of language-design errors!
 
 And that differs from English, how? ;)

English wasn't designed.

- Jonathan M Davis


Re: DVCS (was Re: Moving to D)

2011-01-11 Thread Andrej Mitrovic
Google does seem to take into account whatever information it has on
you, which might explain why your own blog is a top result for you.

If I log out of Google and delete my preferences, searching for D
won't find anything about the D language in the top results. But if I
log in and search D again, the D website will be the top result.


@templated()

2011-01-11 Thread bearophile
(I am busy, I am late with some answers, I am sorry, I will catch up)

This paper is Minimizing Dependencies within Generic Classes for Faster and 
Smaller Programs, by Dan Tsafrir, Bjarne Stroustrup and others:
http://www2.research.att.com/~bs/SCARY.pdf

The article shows problems of C++/D template bloat, and a way to avoid some of 
it. It talks a bit about D too, in two points. Near the end it shows an idea 
for C++-like languages, Figure 21, page 18:

templatetypename X, typename Y, typename Z struct C {
void f1() utilizes X,Z {
// only allowed to use X or Z, not Y
}

void f2() {
// for backward compatibility, this is
// equivalent to: void f2() utilizes X,Y,Z
}

class Inner_t utilizes Y {
// only allowed to use Y, not X nor Z
};
};


I have adapted it to a possible syntax for D:

struct C(X, Y, Z) {
// only allowed to use X or Z, not Y
@templated(X,Z) void f1() {
}

// for backward compatibility, this is
// equivalent to: @templated(X,Y,Z) void f2()
void f2() {
}

// only allowed to use Y, not X nor Z
@templated(Y) static class Inner {
}
}


The purpose of @templated() is to help the compiler avoid some template bloat. 
Here the class Inner is allowed to use just the Y template argument of C, this 
means that if you instantiate C in two ways like this:

C!(int, int, float)
C!(float, int, double)

The Y doesn't change, so the compiler instantiates the code of Inner only once. 
If you try to use X or Z in Inner it will not compile.

A sufficiently smart compiler is able to remove duplicated functions with no 
need of @templated(), in practice an annotation may help reduce compiler work 
or compilation time, to produce smaller code. It also helps document a bit of 
the semantics of the code, an enforced documentation.

Bye,
bearophile


Re: @templated()

2011-01-11 Thread Andrej Mitrovic
I think that hardcoding instructions in user code for how the compiler
should do its optimizations is a bad idea. But that's just me!


Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

2011-01-11 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 I've been thinking on how to better deal with Unicode strings. Currently 
 strings are formally bidirectional ranges with a surreptitious random 
 access interface. The random access interface accesses the support of 
 the string, which is understood to hold data in a variable-encoded 
 format. For as long as the programmer understands this relationship, 
 code for string manipulation can be written with relative ease. However, 
 there is still room for writing wrong code that looks legit.
 
 Sometimes the best way to tackle a hairy reality is to invite it to the 
 negotiation table and offer it promotion to first-class abstraction 
 status. Along that vein I was thinking of defining a new range: 
 VLERange, i.e. Variable Length Encoding Range. Such a range would have 
 the power somewhere in between bidirectional and random access.
 
 The primitives offered would include empty, access to front and back, 
 popFront and popBack (just like BidirectionalRange), and in addition 
 properties typical of random access ranges: indexing, slicing, and 
 length.

For some compressions implementing *back is troublesome if not impossible...

 Note that the result of the indexing operator is not the same as 
 the element type of the range, as it only represents the unit of encoding.

It's worth to mention it explicitly -- a VLERange is dually typed. It's 
important for searching. Statically check if original and encoded match, if so, 
perform fast search on directly on encoded elements. I think an important 
feature of a VLERange should be dropping  itself down to a encoded-typed range, 
so that front and back return raw data.

Dual typing will also affect foreach -- in general case you'd want to choose 
whether to decode or not by typing the element.

I can't stop thinking that VLERange is a two-piece bikini making a bare 
random-access range safe to look at, and that you can take off when partners 
have confidence, not a limited random-access probing facility to span the void 
between front and back.

 In addition to these (and connecting the two), a VLERange would offer 
 two additional primitives:
 
 1. size_t stepSize(size_t offset) gives the length of the step needed to 
 skip to the next element.
 
 2. size_t backstepSize(size_t offset) gives the size of the _backward_ 
 step that goes to the previous element.
 
 In both cases, offset is assumed to be at the beginning of a logical 
 element of the range.

So when I move the spinner in an iPod, I get catapulted in position with the 
raw data opIndex and from there I try to work my way to the next frame to start 
playback. Sounds promising.

 I suspect that a lot of functions in std.string can be written without 
 Unicode-specific knowledge just by relying on such an interface. 
 Moreover, algorithms can be generalized to other structures that use 
 variable-length encoding, such as those used in data compression. (In 
 that case, the support would be a bit array and the encoded type would 
 be ubyte.)

I agree, acknowledging encoding/compression as a general direction will bring 
substantial benefits.

 Writing to such ranges is not addressed by this design. Ideas are welcome.

Yeah, we can address outputting later, that's fair.

 Adding VLERange would legitimize strings and would clarify their 
 handling, at the cost of adding one additional concept that needs to be 
 minded. Is the trade-off worthwhile?

Well, the only way to find out is try it. My advice: VLERanges originated as a 
solution to the string problem, so start with a non-string incarnation. Having 
at least two (one, we know, is string) plugs that fit the same socket will spur 
confidence in the abstraction. 

-- 
Tomek



Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

2011-01-11 Thread Andrei Alexandrescu

On 1/11/11 11:13 AM, Michel Fortin wrote:

On 2011-01-11 11:36:54 -0500, Andrei Alexandrescu
seewebsiteforem...@erdani.org said:


On 1/11/11 4:41 AM, Michel Fortin wrote:

For instance, say we have a conversion range taking a Unicode string and
converting it to ISO Latin 1. The best (lossy) conversion for œ is
oe (one chararacter to two characters), in this case 'front' could
simply return oe (two characters) in one iteration, with stepSize
being the size of the œ code point. In the same conversion process,
encountering e followed by a combining ´ would return pre-combined
character é (two characters to one character).


In the design as I thought of it, the effective length of one logical
element is one or more representation units. My understanding is that
you are referring to a fractional number of representation units for
one logical element.


Your understanding is correct.

I think both cases (one becomes many  many becomes one) are important
and must be supported. Your proposal only deal with the many-becomes-one
case.


I disagree. When I suggested this design I was worried of 
over-abstracting. Now this looks like abstracting for stuff that hasn't 
even been addressed concretely yet.


Besides, using bit as an encoding unit sounds like an acceptable 
approach for anything fractional.



I proposed returning arrays so we can deal with the one-becomes-many
case (œ becoming oe). Another idea would be to introduce substeps.
When checking for the next character, in addition to determining its
step length you could also determine the number of substeps in it. œ
would have two substeps, o and e, and when there is no longer any
substep you move to the next step.

All this said, I think this should stay an implementation detail as this
would allow a variety of strategies. Also, keeping this an
implementation detail means that your proposed 'stepSize' and
'backstepSize' need to be an implementation detail too (because they
won't make sense for the one-to-many case). So they can't really be part
of a standard VLE interface.


If you don't have at least stepSize that tells you how large the stride 
is to get to the next element, it becomes impossible to move within the 
range using integral indexes.



As far as I know, all we really need to expose to algorithms is whether
a range has elements of variable length, because this has an impact on
your indexing capabilities. The rest seems unnecessary to me, or am I
missing some use cases?


I think you could say that you don't really need stepSize because you 
can compute it as follows:


auto r1 = r;
r1.popFront();
size_t stepSize = r.length - r1.length;

This is tenuous, inefficient, and impossible if the support range 
doesn't support length (I realize that variable-length encodings work 
over other ranges than random access, but then again this may be an 
overgeneralization).



Andrei


Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

2011-01-11 Thread Andrei Alexandrescu

On 1/11/11 11:21 AM, Steven Schveighoffer wrote:

On Tue, 11 Jan 2011 11:54:08 -0500, Andrei Alexandrescu
seewebsiteforem...@erdani.org wrote:


On 1/11/11 5:30 AM, Steven Schveighoffer wrote:

While this makes it possible to write algorithms that only accept
VLERanges, I don't think it solves the major problem with strings --
they are treated as arrays by the compiler.


Except when they're not - foreach with dchar...


This solitary difference is a very thin argument -- foreach(d;
byDchar(str)) would be just as good without requiring compiler help.




I'd also rather see an indexing operation return the element type, and
have a separate function to get the encoding unit. This makes more sense
for generic code IMO.


But that's neither here nor there. That would return the logical
element at a physical position. I am very doubtful that much generic
code could work without knowing they are in fact dealing with a
variable-length encoding.


It depends on the function, and the way the indexing is implemented.


I noticed you never commented on my proposed string type...

That reminds me, I should update with suggested changes and re-post it.


To be frank, I think it didn't mark a visible improvement. It solved
some problems and brought others. There was disagreement over the
offered primitives and their semantics.


It is supposed to be simple, and provide the expected interface, without
causing any undue performance degradation. That is, I should be able to
do all the things with a replacement string type that I can with a char
array today, as efficiently as I can today, except I should have to work
to get at the code-units. The huge benefit is that I can say I'm
dealing with this as an array when I know it's safe


Unfinished sentence? Anyway, for my money you just described what we 
have now.



The disagreement will never be fully solved, as there is just as much
disagreement about the current state of affairs ;) e.g. should foreach
default to using dchar?


I disagree about the disagreement being unsolvable. I'm not rigid; if I 
saw a terrific abstraction in your string, I'd be all for it. It just 
shuffles some issues about, and although I agree it does one thing or 
two better than char[], at the end of the day it doesn't carry its weight.



That being said, it's good you are doing this work. In the best case,
you could bring a compelling abstraction to the table. In the worst,
you'll become as happy about D's strings as I am :o).


I don't think I'll ever be 'happy' with the way strings sit in phobos
currently. I typically deal in ASCII (i.e. code units), and phobos works
very hard to prevent that.


I wonder if we could and should extend some of the functions in 
std.string to work with ubyte[]. I did add a function called 
representation() that I didn't document yet. Essentially representation 
gives you the ubyte[], ushort[], or uint[] underneath a string, with the 
same qualifiers. Whenever you want an algorithm to work on ASCII in 
earnest, you can pass representation(s) to it instead of s.


If you work a lot with ASCII, an AsciiString abstraction may be a better 
and more likely to be successful string type. Better yet, you could 
simply focus on AsciiChar and then define ASCII strings as arrays of 
AsciiChar.



Andrei


Re: eliminate junk from std.string?

2011-01-11 Thread Andrei Alexandrescu

On 1/11/11 11:21 AM, Ary Borenszweig wrote:

Why care where they come from? Why not make them intuitive? Say, like, Always
camel case?


If there's enough support for this, I'll do it.

Andrei


Re: eliminate junk from std.string?

2011-01-11 Thread David Nadlinger

On 1/12/11 12:00 AM, Andrei Alexandrescu wrote:

If there's enough support for this, I'll do it.

Andrei


+1 from me – sticking to names commonly used in other programming 
languages is good for ease of adoption, but also inheriting the various 
naming convention is, in my humble opinion, just plain weird.


David


Re: @templated()

2011-01-11 Thread bearophile
I've now remembered that I have discussed this a bit in past, I am sorry for 
the partially dupe thread:
http://www.digitalmars.com/d/archives/digitalmars/D/Few_ideas_to_reduce_template_bloat_108136.html

Bye,
bearophile


Re: eliminate junk from std.string?

2011-01-11 Thread Andrei Alexandrescu

On 1/11/11 1:47 PM, Jerry Quinn wrote:

Jerry Quinn Wrote:


One set of functions I'd like to see are startsWith() and endsWith().  I find 
them frequently useful in Java and an irritating lack in the C++ standard 
library.


Just adding that these functions are useful because they're more efficient than 
doing a find and checking that the match is in the first position.

Jerry


They're in std.algorithm.

Andrei


Re: eliminate junk from std.string?

2011-01-11 Thread Ary Borenszweig
So what's a good use for aliases?


Re: @templated()

2011-01-11 Thread Ary Borenszweig
Can't the compiler see what is used and where?


Re: eliminate junk from std.string?

2011-01-11 Thread Andrei Alexandrescu

On 1/11/11 1:45 PM, Jerry Quinn wrote:

Andrei Alexandrescu Wrote:


On 1/9/11 4:51 PM, Andrei Alexandrescu wrote:

There's a lot of junk in std.string that should be gone. I'm trying to
motivate myself to port some functions to different string widths and...
it's not worth it.

What functions do you think we should remove from std.string? Let's make
a string and then send them the way of the dino.


Thanks,

Andrei


I have uploaded a preview of the changed APIs here:

http://d-programming-language.org/cutting-edge/phobos/std_string.html


Unclear if iswhite() refers to ASCII whitespace or Unicode.  If Unicode, which 
version of the standard?


Not sure.

enum dchar LS = '\u2028';   /// UTF line 
separator
enum dchar PS = '\u2029';   /// UTF 
paragraph separator


bool iswhite(dchar c)
{
return c = 0x7F
? indexOf(whitespace, c) != -1
: (c == PS || c == LS);
}

Which version?


Same comment for icmp().  Also, in the Unicode standard, case folding can 
depend on the specific language.


That uses toUniLower. Not sure how that works.


There is room for ascii-only functions, but unless a D version of ICU
is going to be done separately, it would be nice to have full
unicode-aware functions available.


Yah, I'm increasingly thinking of defining an AsciiChar entity and 
perhaps a Zstring one for zero-terminated strings.



You've got chop() marked as deprecated.  Is popBack() going to make
sense as something that removes a variable number of chars from a
string in the CR-LF case?  That might be a bit too magical.


Well I found little use for chop in e.g. Perl. People either use chomp 
or want to remove the last character. I think chop is useless.



Rather than zfill, what about modifying ljustify, rjustify, and
center to take an optional fill character?


Yah, I wanted to do that but postponed because it's quite a bit of work 
with general dchars etc.



One set of functions I'd like to see are startsWith() and endsWith().  I find 
them frequently useful in Java and an irritating lack in the C++ standard 
library.


Yah, those are in std.algorithm. Ideally we'd move everything that's 
applicable beyond strings to std.algorithm.



Andrei


Re: eliminate junk from std.string?

2011-01-11 Thread Jonathan M Davis
On Tuesday, January 11, 2011 15:29:54 Ary Borenszweig wrote:
 So what's a good use for aliases?

Oh, there's not necessarily anything wrong with aliases. The problem is if an 
API has a lot of them. The typical place to use typedef in C++ is when you have 
long, nasty template types which you don't want to actually have to type out, 
and while auto and D's improved templates reduce the need for that sort of 
typedef, I'm sure that folks will still want to use them for that sort of thing.

Personally, I've used them for three things:

1. When there's a templated function that you want to be able to call with a 
set 
of specific names. A prime example would be get on core.time.Duration. It 
properly genericizes dealing that functionality, but it would be annoying to 
have to type duration.get!days(), duration.get!hours, etc. all over the 
place, so it aliases them to the properties days, hours, etc.

2. Deprecating a function name. For instance, let's say that we rename splitl 
to 
splitL or SplitLeft in std.string. Having a deprecated alias to splitl would 
avoid immediately breaking code.

3. In the new std.datetime, DateTimeException is an alias of 
core.time.TimeException, so that you can use the same exception type throughout 
the time stuff (std.datetime also publicly imports core.time) without worrying 
whether it was core.time or std.datetime which threw the exception and yet 
still 
have an exception type with the same name as the module as is typical in a 
number of Phobos modules. So, you get one exception type for all of the time 
code but still follow the typical naming convention.

However, none of these are things that I'd do very often. alias is a tool that 
can be very handy at times, and I think that it's very good that we have, it 
but 
using it all over the place is likely ill-advised - especially if all you're 
really doing with it is making it possible to call the same function with 
different names.

I'd say that, on the whole, aliases should be used when they simplify code or 
when renaming functions or types, and you want a good deprecation path, but 
other than that, in general, it's probably not a good idea to use them much.

- Jonathan M Davis


Re: eliminate junk from std.string?

2011-01-11 Thread Jonathan M Davis
On Tuesday, January 11, 2011 16:07:11 Daniel Gibson wrote:
 Am 12.01.2011 00:59, schrieb Jonathan M Davis:
  On Tuesday, January 11, 2011 15:29:54 Ary Borenszweig wrote:
  So what's a good use for aliases?
  
  2. Deprecating a function name. For instance, let's say that we rename
  splitl to splitL or SplitLeft in std.string. Having a deprecated alias
  to splitl would avoid immediately breaking code.
 
 Isn't this exactly what Ary had in mind? :-)

No, or at least that's not the impression that I got. I understood that he 
meant 
to have to aliases around permanently. It's just confusing and adds clutter to 
do things like have both splitl and splitLeft (or splitL or whotever splitl got 
renamed to) around in the long run. _That_ is what Andrei and Walter is 
objecting to.

Renaming a function and having a deprecated alias to the old name for a few 
releases eases the transition would definitely be good practice. aliasing a 
function just to have another name for the same thing wouldn't be good 
practice. 
There has to be a real benefit to having the second name. Providing a smooth 
deprecation route would be a case where there's a real benefit.

- Jonathan M Davis


Re: eliminate junk from std.string?

2011-01-11 Thread Daniel Gibson

Am 12.01.2011 01:17, schrieb Jonathan M Davis:

On Tuesday, January 11, 2011 16:07:11 Daniel Gibson wrote:

Am 12.01.2011 00:59, schrieb Jonathan M Davis:

On Tuesday, January 11, 2011 15:29:54 Ary Borenszweig wrote:

So what's a good use for aliases?


2. Deprecating a function name. For instance, let's say that we rename
splitl to splitL or SplitLeft in std.string. Having a deprecated alias
to splitl would avoid immediately breaking code.


Isn't this exactly what Ary had in mind? :-)


No, or at least that's not the impression that I got. I understood that he meant
to have to aliases around permanently. It's just confusing and adds clutter to
do things like have both splitl and splitLeft (or splitL or whotever splitl got
renamed to) around in the long run. _That_ is what Andrei and Walter is
objecting to.

Renaming a function and having a deprecated alias to the old name for a few
releases eases the transition would definitely be good practice. aliasing a
function just to have another name for the same thing wouldn't be good practice.
There has to be a real benefit to having the second name. Providing a smooth
deprecation route would be a case where there's a real benefit.

- Jonathan M Davis


Ok, you're right, that is a slight difference.

Deprecating them is certainly a good idea, but I'd suggest to keep the 
deprecated aliases around for longer (until D3), so anybody porting a 
Phobos1-based application to D2/Phobos2 can use them, even if he doesn't 
do this within the next few releases.


Cheers,
- Daniel


Re: eliminate junk from std.string?

2011-01-11 Thread BlazingWhitester

On 2011-01-12 01:00:51 +0200, Andrei Alexandrescu said:


On 1/11/11 11:21 AM, Ary Borenszweig wrote:

Why care where they come from? Why not make them intuitive? Say, like, Always
camel case?


If there's enough support for this, I'll do it.

Andrei


++vote.
Uniformity in how functions are named will improve readibility.



Re: DVCS (was Re: Moving to D)

2011-01-11 Thread Jesse Phillips
Andrej Mitrovic Wrote:

 Google does seem to take into account whatever information it has on
 you, which might explain why your own blog is a top result for you.
 
 If I log out of Google and delete my preferences, searching for D
 won't find anything about the D language in the top results. But if I
 log in and search D again, the D website will be the top result.

Best place to go for ranking information on your website:

https://www.google.com/webmasters/tools/home?hl=enpli=1

Need to show you own the site though.


Re: DVCS (was Re: Moving to D)

2011-01-11 Thread Nick Sabalausky
Walter Bright newshou...@digitalmars.com wrote in message 
news:igb5uo$26a...@digitalmars.com...
 Vladimir Panteleev wrote:
  From taking a quick look, I don't see meld's advantage over WinMerge 
 (other than being cross-platform).

 Thanks for pointing me at winmerge. I've been looking for one to work on 
 Windows.

Beyond Compare and Ultra Compare 




Re: eliminate junk from std.string?

2011-01-11 Thread Ary Borenszweig
You are right, deprecating those names and removing them in the long
run is what I think should be done.


Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

2011-01-11 Thread spir

On 01/11/2011 08:09 PM, Andrei Alexandrescu wrote:

The main (and massively ignored) issue when manipulating unicode text is
rather that, unlike with legacy character sets, one codepoint does *not*
represent a character in the common sense. In character sets like
latin-1:
* each code represents a character, in the common sense (eg à)
* each character representation has the same size (1 or 2 bytes)
* each character has a single representation (à -- always 0xe0)
All of this is wrong with unicode. And these are complicated and
high-level issues, that appear _after_ decoding, on codepoint sequences.

If VLERange is helpful is dealing with those problems, then I don't
understand your presentation, sorry. Do you for instance mean such a
range would, under the hood, group together codes belonging to the same
character (thus making indexing meaningful), and/or normalise (decomp 
order) (thus allowing to comp/find/count correctly).?


VLERange would offer automatic decoding in front, back, popFront, and
popBack - just like BidirectionalRange does right now. It would also
offer access to the representational support by means of indexing - also
like char[] et al already do now.


IIUC, for the case of text, VLERange helps abstracting from the annoying 
fact that a codepoint is encoded as a variable number of code units.

What I meant is issues like:

auto text = a\u0302d;
writeln(text);  // â
auto range = VLERange(text);
// extracts characters correctly?
auto letter = range.front();// a or â?
// case yes: compares correctly?
assert(range.front() == â);   // fail or pass?

Both fail using all unicode-aware types I know of, because
1. They do not recognise that a character is represented by an arbitrary 
number of codes (code _points_).

2. They do not use normalised forms for comp, search, count, etc...
(while in unicode a given char can have several representations).


The difference is that VLERange being
a formal concept, algorithms can specialize on it instead of (a)
specializing for UTF strings or (b) specializing for BidirectionalRange
and then manually detecting isSomeString inside. Conversely, when
defining an algorithm you can specify VLARange as a requirement.
Boyer-Moore is a perfect example - it doesn't work on bidirectional
ranges, but it does work on VLARange. I suspect there are many like it.

Of course, it would help a lot if we figured other remarkable VLARanges.


I think I see the point, and the general usefulness of such an 
abstraction. But it would certainly be more useful in other fields than 
text manipulation, because there are far more annoying issues (that, 
like in example above, simply prevent code correctness).


Denis
_
vita es estrany
spir.wikidot.com



Re: eliminate junk from std.string?

2011-01-11 Thread Jonathan M Davis
On Tuesday, January 11, 2011 16:23:13 Daniel Gibson wrote:
 Am 12.01.2011 01:17, schrieb Jonathan M Davis:
  On Tuesday, January 11, 2011 16:07:11 Daniel Gibson wrote:
  Am 12.01.2011 00:59, schrieb Jonathan M Davis:
  On Tuesday, January 11, 2011 15:29:54 Ary Borenszweig wrote:
  So what's a good use for aliases?
  
  2. Deprecating a function name. For instance, let's say that we rename
  splitl to splitL or SplitLeft in std.string. Having a deprecated alias
  to splitl would avoid immediately breaking code.
  
  Isn't this exactly what Ary had in mind? :-)
  
  No, or at least that's not the impression that I got. I understood that
  he meant to have to aliases around permanently. It's just confusing and
  adds clutter to do things like have both splitl and splitLeft (or splitL
  or whotever splitl got renamed to) around in the long run. _That_ is
  what Andrei and Walter is objecting to.
  
  Renaming a function and having a deprecated alias to the old name for a
  few releases eases the transition would definitely be good practice.
  aliasing a function just to have another name for the same thing
  wouldn't be good practice. There has to be a real benefit to having the
  second name. Providing a smooth deprecation route would be a case where
  there's a real benefit.
  
  - Jonathan M Davis
 
 Ok, you're right, that is a slight difference.
 
 Deprecating them is certainly a good idea, but I'd suggest to keep the
 deprecated aliases around for longer (until D3), so anybody porting a
 Phobos1-based application to D2/Phobos2 can use them, even if he doesn't
 do this within the next few releases.

Well, leaving an alias until D3 would equate to a permanent alias in D2, which 
is exactly what Walter and Andrei don't want (and I don't either). There's 
already plenty in Phobos 2 that's different from Phobos 1. So, while I don't 
think that we should rename stuff just to rename stuff, I also don't think that 
we 
should keep aliases around just to make porting D1 code easier - especially 
when 
most D1 code is probably using Tango anyway. We don't really have a policy in 
place for how long deprecation should last prior to outright removal, but until 
D3 is definitely too long. I would have thought that the question would be more 
along the lines of whether it should be a couple of releases or more like 6 
months to a year before removing deprecated functions and modules at this 
point, 
not whether something will remain deprecated until D3.

- Jonathan M Davis


Re: eliminate junk from std.string?

2011-01-11 Thread spir

On 01/11/2011 09:11 PM, Ary Borenszweig wrote:

Welcome to D. Do you program in C, Javascript, Python or Ruby? Cool! Then you
will feel at home.

That phrase currently ends like this:

You don't? Oh, sorry, you will have to learn that some names are all lowercase,
some not.

But it could end like this:

You don't? Don't worry. D has the convention of writing all function names 
with X
convention, but we keep some aliases for things that we want to keep backwards
compatibility for.


Yop. And anyway those legacy names are not all the same in C, 
Javascript, Python, Ruby, etc.. One has to be chosen or created for D, 
why not follow a guideline for the standard D name?
(I really cannot (under)stand this general politic of sticking at wrong 
design choices from the past for generations and generations --even in 
brand new languages. How do improvements happen in other fields than 
programming? One day or the other, one needs to throw away old (mental) 
garbage.)


Denis
_
vita es estrany
spir.wikidot.com



levenshteinDistanceAndPath Source bug

2011-01-11 Thread tsukikage

Hello, there is a bug at std.algorithm source.

dsource,org's source:
4120levenshteinDistanceAndPath(alias equals = a == b, Range1, Range2)
4121(Range1 s, Range2 t)
4122if (isForwardRange!(Range1)  isForwardRange!(Range2))
4123{
4124Levenshtein!(Range, binaryFun!(equals)) lev;

'Range' at line 4124( 3975 at my downloaded dmd 2.051 ) should be 'Range1' ?

The windows lib binary seems ok if this source line is fixed.


Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

2011-01-11 Thread Steven Wawryk


Sorry if I'm jumping inhere without the appropriate background, but I 
don't understand why jumping through these hoops are necessary.  Please 
let me know if I'm missing anything.


Many problems can be solved by another layer of indirection.  Isn't a 
string essentially a bidirectional range of code points built on top of 
a random access range of code units?  It seems to me that each 
abstraction separately already fits within the existing D range 
framework and all the difficulties arise as a consequence of trying to 
lump them into a single abstraction.


Why not choose which of these abstractions is most appropriate in a 
given situation instead of trying to shoe-horn both concepts into a 
single abstraction, and provide for easy conversion between them?  When 
character representation is the primary requirement then make it a 
bidirectional range of code points.  When storage representation and 
random access is required then make it a random access range of code units.


Re: DVCS (was Re: Moving to D)

2011-01-11 Thread Nick Sabalausky
Daniel Gibson metalcae...@gmail.com wrote in message 
news:igijc7$27p...@digitalmars.com...
 Am 11.01.2011 22:36, schrieb Walter Bright:
 Andrej Mitrovic wrote:
 That's my biggest problem with Linux. Having technical problems is not
 the issue, finding the right solution in the sea of forum posts is the
 problem.

 The worst ones begin with you might try this... or I think this might 
 work,
 but YMMV... How do these wind up being the top ranked results by google? 
 Who
 embeds links to that stuff?

 My experience with Windows is, like yours, the opposite. The top ranked 
 result
 will be correct and to the point. No weasel wording.

 Those results are often in big forums like ubuntuforums.org that get a lot 
 of links etc, so even if one thread doesn't have many incoming links, it 
 may still get a top ranking.

 Also my blog entries (hosted at wordpress.com) get on the google frontpage 
 when looking for the specific topic, even though my blog is mostly 
 unknown, has 2-20 visitors per day and almost no incoming links.. Googles 
 algorithms often do seem like voodoo ;)

 Also: Many problems (and their correct solutions) heavily depend on your 
 system. What desktop environment is used, what additional stuff (dbus, 
 hal, ...) is used, what are the versions of this stuff (and X.org), what 
 distribution is used, ...
 There may be different default configurations shipped depending on what 
 distribution (and what version of that distribution) you use, ...
 So there often is no single correct answer that will work for anyone.

 Still, in my experience those HOWTOs often work (it may help to look at 
 multiple HOWTOs and compare them if you're not sure, if it applies to your 
 system) or at least push you in the right direction.


That's probably one of the biggest things that's always bothered me about 
linux (not that there aren't plenty of other things that bother me about 
every other OS in existence). For something that's considered so 
standards-compliant/standards-friendly (compared to, say MS), it's painfully 
*un*standardized.





D standard style [was: Re: eliminate junk from std.string?]

2011-01-11 Thread spir

On 01/12/2011 12:07 AM, Daniel Gibson wrote:

Am 12.01.2011 00:00, schrieb Andrei Alexandrescu:

On 1/11/11 11:21 AM, Ary Borenszweig wrote:

Why care where they come from? Why not make them intuitive? Say, like,
Always
camel case?


If there's enough support for this, I'll do it.

Andrei


Please do, having different naming conventions of functions within the
standard library makes it harder to remember the exact spelling of a
function and also doesn't look professional.

+1 vote for making the standard library comply with the D style guide[1]


+1 as well

But while we're at conventions, and before any change is actually done, 
we may take the opportunity to agree not only on morphology, but on 
semantics ;-)


For instance, from online doc:
string capitalize(string s);
Capitalize first character of string s[], convert rest of string 
s[] to lower case.

Then, use it:
auto s = capital;
s.capitalize();
writeln(s); // capital
Uh?
Not only the name is misleading, but the doc as well.

For this kind of issue, some guidelines read like:
* perform an action -- action verb (eg capitalise: changes the passed 
string)

* return a result -- named after result (eg capitalised: return new string)
Sure, the func's interface also tells the reader what's actually done. 
But having name (and doc) contradict it is not very helpful. And beeing 
forced to open the doc or even the source for every unknown bit is an 
annoying obstacle.


There are probably other common issues like this. My personal evaluation 
is whether some newcomer can guess the purpose of the func, the type, 
the constant, etc...


I would also vote for:
* full words, except for rare exception used everywhere in programming 
_and_ really helpful (eg OS)

* get rid of obscure, ambiguous, or misleading namings
* when possible, use international words rather than english-only (eg 
section better than slice if everything else equal)


Finally, take the opportunity to make the doc usable, eg:
string format(...);
Format arguments into a string.
???


Denis
_
vita es estrany
spir.wikidot.com



Re: levenshteinDistanceAndPath Source bug

2011-01-11 Thread tsukikage

tsukikage wrote:

Hello, there is a bug at std.algorithm source.

dsource,org's source:
4120 levenshteinDistanceAndPath(alias equals = a == b, Range1, 
Range2)

4121 (Range1 s, Range2 t)
4122 if (isForwardRange!(Range1)  isForwardRange!(Range2))
4123 {
4124 Levenshtein!(Range, binaryFun!(equals)) lev;

'Range' at line 4124( 3975 at my downloaded dmd 2.051 ) should be 
'Range1' ?


The windows lib binary seems ok if this source line is fixed.


sorry, wrong place, please ignore.


Re: levenshteinDistanceAndPath Source bug

2011-01-11 Thread Andrei Alexandrescu

On 1/11/11 5:28 PM, tsukikage wrote:

tsukikage wrote:

Hello, there is a bug at std.algorithm source.

dsource,org's source:
4120 levenshteinDistanceAndPath(alias equals = a == b, Range1, Range2)
4121 (Range1 s, Range2 t)
4122 if (isForwardRange!(Range1)  isForwardRange!(Range2))
4123 {
4124 Levenshtein!(Range, binaryFun!(equals)) lev;

'Range' at line 4124( 3975 at my downloaded dmd 2.051 ) should be
'Range1' ?

The windows lib binary seems ok if this source line is fixed.


sorry, wrong place, please ignore.


Fixed and readded unittest:

http://www.dsource.org/projects/phobos/changeset/2315
http://www.dsource.org/projects/phobos/changeset/2316

To post bugs, you may want to go to http://d.puremagic.com/issues. What 
you post there will automatically appear in digitalmars.d.bugs (no need 
to post there).



Andrei


Re: eliminate junk from std.string?

2011-01-11 Thread spir

On 01/12/2011 02:17 AM, Daniel Gibson wrote:

Somewhere in this thread:

Am 11.01.2011 21:43, schrieb Walter Bright:
  Nick Sabalausky wrote:
  I agree with this reasoning for having them. However, I don't think it
  means we shouldn't D-ify or Phobos-ify them, at least as far as
  capitalization conventions.
 
  I also object to rather pointlessly annoying people wanting to move
  their code from D1 to D2 by renaming everything. Endlessly renaming
  things searching for the perfect name gives the illusion of progress,
  whereas time would be better spent on improving the documentation,
  unittests, performance, etc.
 

So his objection was specifically that renaming those functions could
annoy people migrating D1 code (and certainly he meant Phobos1 users,
because Tango-people either port (parts of) Tango or will have to
rewrite that anyway).
So, to accomplish that goal (not annoying those people), these aliases
should be kept for longer.

(An alternative may be to one/some phobos1-compat modules that contain
such aliases and maybe even wrappers with old signatures for new
functions, that could be imported to ease porting of old applications.
That would have the benefit of not cluttering the regular Phobos2
modules with that legacy stuff.)


When D2 / Phobos2 stabilise, what about a semi-automatic porting tool 
(at least signaling potential issues, first of all occurrences of 
deprecated stdlib names)?


Denis
_
vita es estrany
spir.wikidot.com



Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

2011-01-11 Thread spir

On 01/12/2011 02:22 AM, Andrei Alexandrescu wrote:

IIUC, for the case of text, VLERange helps abstracting from the annoying
fact that a codepoint is encoded as a variable number of code units.
What I meant is issues like:

auto text = a\u0302d;
writeln(text); // â
auto range = VLERange(text);
// extracts characters correctly?
auto letter = range.front(); // a or â?
// case yes: compares correctly?
assert(range.front() == â); // fail or pass?


You should try text.front right now, you might be surprised :o).


Hum, right now incorrectly returns a as expected. And indeed
assert (â == a\u0302);
incorrectly fails as expected.
Both would work with legacy charsets like latin-1. This is a new issue 
introduced with UCS, that requires an additional level of abstraction 
(in addition to the one required by the distincton codepoint/codeunit!)


You may have a look at 
https://bitbucket.org/denispir/denispir-d/src/5ec6fe1e1065/Text.html for 
a rough implementation of a type that does the right thing,  at 
https://bitbucket.org/denispir/denispir-d/src/5ec6fe1e1065/U%20missing%20level%20of%20abstraction 
for a (far too long) explanation.
(I have tried to mention those problems a dozen times already, but for 
any reason nearly everybody seem definitely deaf in front of them.)



Denis
_
vita es estrany
spir.wikidot.com



Re: eliminate junk from std.string?

2011-01-11 Thread Jonathan M Davis
On Tuesday, January 11, 2011 17:17:43 Daniel Gibson wrote:
 Am 12.01.2011 01:55, schrieb Jonathan M Davis:
  On Tuesday, January 11, 2011 16:23:13 Daniel Gibson wrote:
  Deprecating them is certainly a good idea, but I'd suggest to keep the
  deprecated aliases around for longer (until D3), so anybody porting a
  Phobos1-based application to D2/Phobos2 can use them, even if he doesn't
  do this within the next few releases.
  
  Well, leaving an alias until D3 would equate to a permanent alias in D2,
  which is exactly what Walter and Andrei don't want (and I don't either).
  There's already plenty in Phobos 2 that's different from Phobos 1. So,
  while I don't think that we should rename stuff just to rename stuff, I
  also don't think that we should keep aliases around just to make porting
  D1 code easier - especially when most D1 code is probably using Tango
  anyway. We don't really have a policy in place for how long deprecation
  should last prior to outright removal, but until D3 is definitely too
  long. I would have thought that the question would be more along the
  lines of whether it should be a couple of releases or more like 6 months
  to a year before removing deprecated functions and modules at this
  point, not whether something will remain deprecated until D3.
  
  - Jonathan M Davis
 
 Somewhere in this thread:
 
 Am 11.01.2011 21:43, schrieb Walter Bright:
   Nick Sabalausky wrote:
   I agree with this reasoning for having them. However, I don't think it
   means we shouldn't D-ify or Phobos-ify them, at least as far as
   capitalization conventions.
   
   I also object to rather pointlessly annoying people wanting to move
   their code from D1 to D2 by renaming everything. Endlessly renaming
   things searching for the perfect name gives the illusion of progress,
   whereas time would be better spent on improving the documentation,
   unittests, performance, etc.
 
 So his objection was specifically that renaming those functions could
 annoy people migrating D1 code (and certainly he meant Phobos1 users,
 because Tango-people either port (parts of) Tango or will have to
 rewrite that anyway).
 So, to accomplish that goal (not annoying those people), these aliases
 should be kept for longer.
 
 (An alternative may be to one/some phobos1-compat modules that contain
 such aliases and maybe even wrappers with old signatures for new
 functions, that could be imported to ease porting of old applications.
 That would have the benefit of not cluttering the regular Phobos2
 modules with that legacy stuff.)

Well, I didn't say that Walter wasn't concerned about it. I just don't see the 
point. Phobos has changed enough from D1 to D2 that even D1 Phobos users (of 
which I get the impression there are relatively few) that there's probably 
already plenty of stuff which is going to break for anyone porting over. I do 
think that keeping a deprecated alias around longer for a function which has 
been around longer makes sense, and the Phobos 1 functions have been around 
longer than anything else. So, deprecating a function that was added 2 releases 
ago probably shouldn't require a deprecated alias for as long as deprecating a 
function that was in Phobos 1 would, but there's still a limit to how long it 
makes sense.

And given that your average D1 user uses Tango rather than Phobos, it makes 
that 
much less sense to keep aliases to Phobos 1 functions around for a long time.

So, no, we shoudln't get rid of the deprecated alias for a Phobos 1 function 
after only a release or two, but I don't think that it makes sense to keep it 
around for a year or two either.

- Jonathan M Davis


Re: eliminate junk from std.string?

2011-01-11 Thread Daniel Gibson

Am 12.01.2011 03:10, schrieb Jonathan M Davis:

On Tuesday, January 11, 2011 17:17:43 Daniel Gibson wrote:

Am 12.01.2011 01:55, schrieb Jonathan M Davis:

On Tuesday, January 11, 2011 16:23:13 Daniel Gibson wrote:

Deprecating them is certainly a good idea, but I'd suggest to keep the
deprecated aliases around for longer (until D3), so anybody porting a
Phobos1-based application to D2/Phobos2 can use them, even if he doesn't
do this within the next few releases.


Well, leaving an alias until D3 would equate to a permanent alias in D2,
which is exactly what Walter and Andrei don't want (and I don't either).
There's already plenty in Phobos 2 that's different from Phobos 1. So,
while I don't think that we should rename stuff just to rename stuff, I
also don't think that we should keep aliases around just to make porting
D1 code easier - especially when most D1 code is probably using Tango
anyway. We don't really have a policy in place for how long deprecation
should last prior to outright removal, but until D3 is definitely too
long. I would have thought that the question would be more along the
lines of whether it should be a couple of releases or more like 6 months
to a year before removing deprecated functions and modules at this
point, not whether something will remain deprecated until D3.

- Jonathan M Davis


Somewhere in this thread:

Am 11.01.2011 21:43, schrieb Walter Bright:
Nick Sabalausky wrote:
I agree with this reasoning for having them. However, I don't think it
means we shouldn't D-ify or Phobos-ify them, at least as far as
capitalization conventions.
  
I also object to rather pointlessly annoying people wanting to move
their code from D1 to D2 by renaming everything. Endlessly renaming
things searching for the perfect name gives the illusion of progress,
whereas time would be better spent on improving the documentation,
unittests, performance, etc.

So his objection was specifically that renaming those functions could
annoy people migrating D1 code (and certainly he meant Phobos1 users,
because Tango-people either port (parts of) Tango or will have to
rewrite that anyway).
So, to accomplish that goal (not annoying those people), these aliases
should be kept for longer.

(An alternative may be to one/some phobos1-compat modules that contain
such aliases and maybe even wrappers with old signatures for new
functions, that could be imported to ease porting of old applications.
That would have the benefit of not cluttering the regular Phobos2
modules with that legacy stuff.)


Well, I didn't say that Walter wasn't concerned about it. I just don't see the
point. Phobos has changed enough from D1 to D2 that even D1 Phobos users (of
which I get the impression there are relatively few) that there's probably
already plenty of stuff which is going to break for anyone porting over. I do
think that keeping a deprecated alias around longer for a function which has
been around longer makes sense, and the Phobos 1 functions have been around
longer than anything else. So, deprecating a function that was added 2 releases
ago probably shouldn't require a deprecated alias for as long as deprecating a
function that was in Phobos 1 would, but there's still a limit to how long it
makes sense.

And given that your average D1 user uses Tango rather than Phobos, it makes that
much less sense to keep aliases to Phobos 1 functions around for a long time.

So, no, we shoudln't get rid of the deprecated alias for a Phobos 1 function
after only a release or two, but I don't think that it makes sense to keep it
around for a year or two either.

- Jonathan M Davis


Hmm maybe.
I guess there will be further similar discussions (e.g. the depreation 
of std.stream once the successor is ready).
I think those aliases should at least be kept until all Phobos1 stuff 
that is to be replaced is indeed replaced.
That'd allow a decision that is at least consistent for most Phobos1 
stuff (some has already been removed/replaced, e.g. by the druntime 
modules like core.thread).


Cheers,
- Daniel


Re: DVCS (was Re: Moving to D)

2011-01-11 Thread Jean Crystof
Walter Bright Wrote:

 retard wrote:
  One thing came to my mind. Unless you're using Ubuntu 8.04 LTS,
 
 I'm using 8.10, and I've noticed that no more updates are coming.

Huh! You should seriously consider upgrading. If you are running any kind of 
services in the system or browsing the web, you're exposed to both remote and 
local attacks. I know at least one local root exploit 8.10 is vulnerable to. 
It's just plainly stupid to use a distro after the support has died. Are you 
running Windows 98 still too?

If you upgrade Ubuntu, do a clean install. Upgrading 8.10 in-place goes via - 
9.04 - 9.10 - 10.4 - 10.10. Each one takes 1 or 2 hours. Clean install of 
Ubuntu 10.10 or 11.04 (soon available) will only take less than 30 minutes.

  The support for desktop 8.04 and 9.10 is also nearing its end (April this 
  year). I'd recommend backing up your /home and installing 10.04 LTS or 
  10.10 instead.
 
 Yeah, I know I'll be forced to upgrade soon.

Soon? Your system already sounds like it's broken.

 One thing that'll make it easier is 
 I abandoned using Ubuntu for multimedia. For example, to play Pandora I now 
 just 
 plug my ipod into my stereo g. I just stopped using youtube on Ubuntu, as I 
 got tired of the video randomly going black, freezing, etc.

I'm using Amarok and Spotify. Both work fine.


Re: levenshteinDistanceAndPath Source bug

2011-01-11 Thread Jesse Phillips
Andrei Alexandrescu Wrote:

 Fixed and readded unittest:
 
 http://www.dsource.org/projects/phobos/changeset/2315
 http://www.dsource.org/projects/phobos/changeset/2316
 
 To post bugs, you may want to go to http://d.puremagic.com/issues. What 
 you post there will automatically appear in digitalmars.d.bugs (no need 
 to post there).
 
 
 Andrei

In fact bugs should not be posted to digitalmars.d.bugs directly as it is not 
their for tracking bugs and many may not follow it.


Re: DVCS (was Re: Moving to D)

2011-01-11 Thread Jean Crystof
Walter Bright Wrote:

 My mobo is an ASUS M2A-VM. No graphics cards, or any other cards plugged into 
 it. It's hardly weird or wacky or old (it was new at the time I bought it to 
 install Ubuntu).

ASUS M2A-VM has 690G chipset. Wikipedia says:
http://en.wikipedia.org/wiki/AMD_690_chipset_series#690G

AMD recently dropped support for Windows and Linux drivers made for Radeon 
X1250 graphics integrated in the 690G chipset, stating that users should use 
the open-source graphics drivers instead. The latest available AMD Linux driver 
for the 690G chipset is fglrx version 9.3, so all newer Linux distributions 
using this chipset are unsupported.

Fast forward to this day:
http://www.phoronix.com/scan.php?page=articleitem=amd_driver_q111num=2

Benchmark page says: the only available driver for your graphics gives only 
about 10-20% of the real performance. Why? ATI sucks on Linux. Don't buy ATI. 
Buy Nvidia instead:

http://geizhals.at/a466974.html

This is 3rd latest Nvidia GPU generation. How long support lasts? Ubuntu 10.10 
still supports all Geforce 2+ which is 10 years old. I foretell Ubuntu 19.04 is 
last one supporting this. Use Nvidia and your problems are gone.


Re: DVCS (was Re: Moving to D)

2011-01-11 Thread Andrej Mitrovic
Did you hear that, Walter? Just buy a 500$ video card so you can watch
youtube videos on Linux. Easy. :D


Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

2011-01-11 Thread Michel Fortin

On 2011-01-11 20:28:26 -0500, Steven Wawryk stev...@acres.com.au said:

Sorry if I'm jumping inhere without the appropriate background, but I 
don't understand why jumping through these hoops are necessary.  Please 
let me know if I'm missing anything.


Many problems can be solved by another layer of indirection.  Isn't a 
string essentially a bidirectional range of code points built on top of 
a random access range of code units?


Actually, displaying a UTF-8/UTF-16 string involves a range of of 
glyphs layered over a range of graphemes layered over a range of code 
points layered over a range of code units. Glyphs represent the visual 
characters you can get from a font, they often map one-to-one with 
graphemes but not always (ligatures for instance). Graphemes are what 
people generally reason about when they see text (the so called 
user-perceived characters), they often map one-to-one with code 
points but not always (combining marks for instance). Code points are a 
list of standardized codes representing various elements of a string, 
and code units basically encode the code points.


If you're writing an XML, JSON or whatever else parser you'll probably 
care about code points. If you're advancing the insertion point in a 
text field or count the number of user-perceived characters you'll 
probably want to deal with graphemes. For searching a substring inside 
a string, or comparing strings you'll probably want to deal with either 
graphemes or collation elements (collation elements are layered on top 
of code points). To print a string you'll need to map graphemes to the 
glyphs from a particular font.


Reducing string operations to code points manipulations will only work 
as long as all your graphemes, collation elements, or glyphs map 
one-to-one with code points.



It seems to me that each abstraction separately already fits within the 
existing D range framework and all the difficulties arise as a 
consequence of trying to lump them into a single abstraction.


It's true that each of these abstraction can fit within the existing 
range framework.



Why not choose which of these abstractions is most appropriate in a 
given situation instead of trying to shoe-horn both concepts into a 
single abstraction, and provide for easy conversion between them?  When 
character representation is the primary requirement then make it a 
bidirectional range of code points.  When storage representation and 
random access is required then make it a random access range of code 
units.


I think you're right. The need for a new concept isn't that great, and 
it gets complicated really fast.



--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/



  1   2   >