Re: Memory leak with dynamic array

2010-04-12 Thread Brad Roberts
On Mon, 12 Apr 2010, Joseph Wakeling wrote:

> Curious question: how come with a registered email address my messages
> have to be moderated, whereas via the http interface I can post straight
> away? :-P
> 
> Best wishes,
> 
> -- Joe

All new subscribers to the lists start off moderated until a post flows 
through that isn't obvious spam.  As soon as I see that, I remove the 
moderated flag.  The list is a much bigger spam target than the news 
server is.  I deflect dozens of spam messages a day, some of which are 
clever enough to register for the list first (not daily, but probably 
monthly).

Later,
Brad



Re: Memory leak with dynamic array

2010-04-12 Thread Joseph Wakeling
Joseph Wakeling wrote:
> (Actually I'm still having some issues with this, despite using
> assumeSafeAppend, but more on that in a separate email.)

... solved; it's interesting to note that assumeSafeAppend has to be
used in _exactly_ the same scope as the append ~= itself.

e.g. with this loop, the memory blows up:


foreach(uint i;0..100) {
x.length = 0;

assumeSafeAppend(x);

foreach(uint j;0..5000)
foreach(uint k;0..1000)
x ~= j*k;

writefln("At iteration %u, x has %u elements.",i,x.length);
}


... while with this one it's OK:


foreach(uint i;0..100) {
x.length = 0;

foreach(uint j;0..5000) {
foreach(uint k;0..1000) {
assumeSafeAppend(x);
x ~= j*k;
}
}

writefln("At iteration %u, x has %u elements.",i,x.length);
}


Curious question: how come with a registered email address my messages
have to be moderated, whereas via the http interface I can post straight
away? :-P

Best wishes,

-- Joe


Re: Memory leak with dynamic array

2010-04-12 Thread Joseph Wakeling
Steven Schveighoffer wrote:
> On Mon, 12 Apr 2010 12:03:38 -0400, Joseph Wakeling
>  wrote:
> 
>> I thought dev effort was now focusing back on GDC ... ? :-P
> 
> AFAIK, gdc hasn't been actively developed for a few years.
> 
> ldc, on the other hand, has regular releases.  I think ldc may be the
> future of D compilers, but I currently use dmd since I'm using D2.

http://bitbucket.org/goshawk/gdc/wiki/Home

:-)

Either way I'm happy.  I don't have any issues with dmd, but I do want
to see a properly free D compiler that can be prepackaged in all the
Linux distros.

> Yes, you get around this by preallocating.

Sure.  But that wasn't what shocked me -- what I was amazed by was
setting up a situation where I _had_ preallocated the memory and still
seeing the memory usage explode, because D was preserving the memory
from each round of the loop.

(Actually I'm still having some issues with this, despite using
assumeSafeAppend, but more on that in a separate email.)

> It's often these types of performance discrepancies that critics point
> to (not that you are a critic), but it's the cost of having a more
> comprehensive language.  Your appetite for the sheer performance of a
> language will sour once you get bit by a few of these nasty bugs.

For sure.

> But D fosters a completely different way of thinking about solving
> problems.

I can see how the example you give would be fantastically useful.

More generally, I think this is the point -- I need to adjust my head to
writing D-ish code, just as when moving from C to C++ I needed to switch
to various new ways of doing things.

> There are many in the community that use D for numerical stuff.  It's
> definitely not as mature as it could be, but getting better.  Don is
> adding a lot of cool stuff to it, including a builtin exponent operator
> and arbitrary precision numbers.

I guessed there would be -- I knew for example that there was someone
out there working on a fairly major mathematical/numerical library for
D, but it's a while since I checked that out.

So take my earlier comment about numerical work to refer only to doing
things the way I'm used to ... ;-)

> Yes, but that's not what I meant ;)  I mean, you can write your own
> types, like the Appender (or what the appender *should* be) that
> optimize the behavior of code to meet any needs.  And it can do it with
> a much better syntax than C.  D's template system and ability to make
> user-types seem like builtins I think is unparalleled in C-like languages.

Hence much pleasure and excitement in learning D ... :-)

Don wrote:
> There are quite a lot of us here with exactly that kind of background.
> 
> Something about the array issue -- D dynamic arrays are heavily geared 
> towards algorithms which perform an initial allocation and afterwards avoid 
> memory allocation entirely.
> In D, such slicing algorithms are extremely clean, extremely fast, and memory 
> safe.
> In C++, it's much more difficult to write code in that manner. A 
> straightforward translation from C++ will generally miss the benefits of D 
> arrays, and you'll end up with slower code.

Exactly my current situation. :-P

> A kind of "Zen of D" is to use array slices as much as possible.

I will look into this more and see if this approach can help with some
of my code -- are there existing projects I could take a look at to get
some examples?

Thanks & best wishes,

-- Joe


Re: Memory leak with dynamic array

2010-04-12 Thread Don

Joseph Wakeling wrote:

Thanks to all again for the discussion, examples and explanations. :-)


[snip]

My needs are in some ways quite narrow -- numerical simulations in
interdisciplinary physics -- hence the C background, and hence the premium
on performance.  They're also not very big programs -- simple enough for me
to generally keep a personal overview on the memory management, even though
with C++ that's usually all taken care of automatically (no new or delete
statements if I can avoid it).

What I'm fairly confident about is that, given not too much time, D will
become a _far_ preferable language for that kind of development.


There are quite a lot of us here with exactly that kind of background.

Something about the array issue -- D dynamic arrays are heavily geared 
towards algorithms which perform an initial allocation and afterwards 
avoid memory allocation entirely.
In D, such slicing algorithms are extremely clean, extremely fast, and 
memory safe.
In C++, it's much more difficult to write code in that manner. A 
straightforward translation from C++ will generally miss the benefits of 
D arrays, and you'll end up with slower code.


A kind of "Zen of D" is to use array slices as much as possible.



Re: Memory leak with dynamic array

2010-04-12 Thread Steven Schveighoffer
On Mon, 12 Apr 2010 12:03:38 -0400, Joseph Wakeling  
 wrote:



I thought dev effort was now focusing back on GDC ... ? :-P


AFAIK, gdc hasn't been actively developed for a few years.

ldc, on the other hand, has regular releases.  I think ldc may be the  
future of D compilers, but I currently use dmd since I'm using D2.




Steven Schveighoffer wrote:

The C++ example is reallocating memory, freeing memory it is no longer
using.  It also manually handles the memory management, allocating  
larger
and larger arrays in some algorithmically determined fashion (for  
example,
multiplying the length by some constant factor).  This gives it an edge  
in
performance because it does not have to do any costly lookup to  
determine

if it can append in place, plus the realloc of the memory probably is
cheaper than the GC realloc of D.


Right.  In fact you get precisely 24 allocs/deallocs, each doubling the
memory reserve to give a total capacity of 2^23 -- and then that memory  
is
there and can be used for the rest of the 100 iterations of the outer  
loop.

The shock for me was finding that D wasn't treating the memory like this
but was preserving each loop's memory (as you say, for good reason).


Yes, you get around this by preallocating.


D does not assume you stopped caring about the memory being pointed to
when it had to realloc. [...] You can't do the same thing with C++
vectors, when they reallocate, the memory they used to own could be
freed.  This invalidates all pointers and iterators into the vector,
but the language doesn't prevent you from having such dangling pointers.


I have a vague memory of trying to do something exactly like your example
when I was working with C++ for the first time, and getting bitten on the
arse by exactly the problem you describe.  I wish I could remember where.
I know that I found another (and possibly better) solution to do what I
wanted, but it would be nice to see if a D-ish solution would give me
something good.


It's often these types of performance discrepancies that critics point to  
(not that you are a critic), but it's the cost of having a more  
comprehensive language.  Your appetite for the sheer performance of a  
language will sour once you get bit by a few of these nasty bugs.


But D fosters a completely different way of thinking about solving  
problems.  One problem with C++'s vector is it is a value type -- you must  
pass a reference in order to avoid copying an entire vector.  However, D's  
arrays are a hybrid between reference and value type.  Often, once you set  
data in a vector/array, you never change it again.  D allows ways to  
enforce this (i.e. immutable) and also allows you to pass around "slices"  
of your array with zero overhead (no copying).  It results in some  
extremely high-performance code, which wouldn't be easy, or maybe even  
possible, with C++.


Take for instance a split function.  In C++, I'd expect split(string x) to  
return a vector.  However, vector makes a copy of each  
part of the string it has split out.  D, however, can return references to  
the original data (slices), which consume no overhead.  The only extra  
space allocated is the array to hold the string references.  All this is  
also completely safe!


You could then even modify the original string (assuming you were not  
using immutable strings) in place!  Or append to any one of the strings in  
the array safely.



This must be fixed, the appender should be blazingly fast at appending
(almost as fast as C++), with the drawback that the overhead is higher.


Overhead = memory cost?  I'm not so bothered as long as the memory stays
within constant, predictable bounds.  It was the memory explosion that
scared me.  And I suspect I'd pay a small performance cost (though it
would have to be small) for the kind of safety and flexibility the arrays
have.


Overhead = bigger initialization cost, memory footprint.  It's not  
important if you are building a large array (which is what appender should  
be for), but the cost would add up if you had lots of little appenders  
that you didn't append much to.  The point is, the builtin array optimizes  
performance for operations besides append, but allows appending as a  
convenience.  Appender should optimize appending, sacrificing performance  
in other areas.  It all depends on your particular application whether you  
should use appender or builtin arrays (or something entirely  
different/custom).


You haven't done much with it yet.  When you start discovering how much  
D

takes care of, you will be amazed :)


I know. :-)

My needs are in some ways quite narrow -- numerical simulations in
interdisciplinary physics -- hence the C background, and hence the  
premium
on performance.  They're also not very big programs -- simple enough for  
me
to generally keep a personal overview on the memory management, even  
though

with C++ that's usually all taken care of automatically (no new or delete
statements i

Re: Memory leak with dynamic array

2010-04-12 Thread Joseph Wakeling
Thanks to all again for the discussion, examples and explanations. :-)

One note -- I wouldn't want anyone to think I'm bashing D or complaining.
I've been interested in the language for some time and this seemed an
opportune time to start experimenting properly.  It's fun, I'm learning
a lot, and I'm genuinely touched by the amount of effort put in by
everyone on this list to teach and share examples.

I'm also fully aware that D is still growing, and that I need to be
patient in some cases ... :-)

bearophile wrote:
> D dynamic arrays are more flexible than C++ vector, they can be sliced,
> such slicing is O(1), and the slices are seen by the language just like
> other arrays. So you pay the price of some performance for such
> increased flexibility. The idea here is that the built-in data types
> must be as flexible as possible even if their performance is not so
> high, so they can be used for many different purposes.

No complaint there. :-)

> Then D standard library will have specialized data structures that are
> faster thanks to being more specialized and less flexible.

In my case -- I'm turning into 'Mr C++' again -- probably that's often
what I need.  If I look at the major benefits I found in moving from C
to C++, the first was memory management that was as automatic as I required.
For example, C++ vectors are great because they do away with having to
put in malloc/realloc/free statements and let you treat dynamic arrays
pretty much as 'just another variable'.

Within my own needs I've not yet found a case where the kind of smart GC
functionality discussed on this thread seemed necessary, but of course
I've never had it available to use before ... :-)

> In D dynamic arrays some of the performance price is also paid for the
> automatic memory management, for the GC that's not a precise GC (for
> example if your array has some empty items at the end past its true
> length, the GC must ignore them).

An idea was floating in my head about whether it is/could be possible to
turn off GC safety features in a scope where they are unnecessary --
rather like a more general version of the 'assumeSafeAppend' function...

> With LDC (once we'll have a D2 version of it) the performance of D2
> can probably be the same as the C++. DMD maybe loses a little here
> because it's not so good at inlining, or maybe because the C++ vector
> is better than this D2 code.

I thought dev effort was now focusing back on GDC ... ? :-P

I have actually not made much use of the -inline function because in
the code I wrote (maybe not best suited to inlining...), it made the
program generally run slower ...

Steven Schveighoffer wrote:
> The C++ example is reallocating memory, freeing memory it is no longer
> using.  It also manually handles the memory management, allocating larger
> and larger arrays in some algorithmically determined fashion (for example,
> multiplying the length by some constant factor).  This gives it an edge in
> performance because it does not have to do any costly lookup to determine
> if it can append in place, plus the realloc of the memory probably is
> cheaper than the GC realloc of D.

Right.  In fact you get precisely 24 allocs/deallocs, each doubling the
memory reserve to give a total capacity of 2^23 -- and then that memory is
there and can be used for the rest of the 100 iterations of the outer loop.
The shock for me was finding that D wasn't treating the memory like this
but was preserving each loop's memory (as you say, for good reason).

> D does not assume you stopped caring about the memory being pointed to
> when it had to realloc. [...] You can't do the same thing with C++
> vectors, when they reallocate, the memory they used to own could be
> freed.  This invalidates all pointers and iterators into the vector,
> but the language doesn't prevent you from having such dangling pointers.

I have a vague memory of trying to do something exactly like your example
when I was working with C++ for the first time, and getting bitten on the
arse by exactly the problem you describe.  I wish I could remember where.
I know that I found another (and possibly better) solution to do what I
wanted, but it would be nice to see if a D-ish solution would give me
something good.

> This must be fixed, the appender should be blazingly fast at appending
> (almost as fast as C++), with the drawback that the overhead is higher.

Overhead = memory cost?  I'm not so bothered as long as the memory stays
within constant, predictable bounds.  It was the memory explosion that
scared me.  And I suspect I'd pay a small performance cost (though it
would have to be small) for the kind of safety and flexibility the arrays
have.

> You haven't done much with it yet.  When you start discovering how much D
> takes care of, you will be amazed :)

I know. :-)

My needs are in some ways quite narrow -- numerical simulations in
interdisciplinary physics -- hence the C background, and hence the premium
on performance.  They're a

Re: Memory leak with dynamic array

2010-04-12 Thread bearophile
Something that I have forgotten, for the original poster: the D append is not 
templated code, so this produces no template bloat as in the C++ . This 
is another compromise.

---

Ellery Newcomer:

>It won't work for classes, though.<

You have to write:

class List(T) {
static struct Node {
T data;
Node* next;
this(T d, Node* p=null) {
data = d;
next = p;
}
}

Node* lh;

this(T[] arr) {
foreach_reverse (el; arr)
lh = new Node(el, lh);
}
}

void main() {
auto items = new List!int([1, 2, 3]);
}



>Ick, that turned into a mess fast.<

I'd like to be a computer scientist, able to create a DMeta, similar to PyMeta. 
A DMeta can be much cleaner than that "mess".
See the original OMeta:
http://tinlizzie.org/ometa/
and its Python version:
http://washort.twistedmatrix.com/
https://launchpad.net/pymeta

Bye,
bearophile


Re: Overload resolution for string

2010-04-12 Thread Steven Schveighoffer

On Mon, 12 Apr 2010 00:40:44 -0400, Ali Çehreli  wrote:


Steven Schveighoffer wrote:
 > On Sun, 11 Apr 2010 15:33:09 -0400, Ali Çehreli   
wrote:

 >
 >> This is a bug, right? I've been assuming that unqualified string
 >> literals were immutable char arrays, but the behavior is different
 >> between "hello" vs. "hello"c.
 >>
 >> Am I missing something?
 >
 > "hello" is typed as a string *only* if you are using at a string.  If
 > you are using it as a wstring or a dstring, then it is typed that way.
 > You can even use it as a const(char) * and it becomes an ASCII C-style
 > string with a zero terminator!

I did not know that. :)

Could you please share some guidelines about chosing the type of the  
D-string to use at the interface...


Should applications stick to one of these types and require callers to  
convert explicitly if needed? Should the common type be dstring? On the  
other hand, string seems to be a better choice because classes have the  
common toString member functions that return string.


In my case, I have a set of classes that represent alphabet letters and  
alphabet strings. The motivation is to provide logical sorting and  
capitalization. (To me, even for the English alphabet, â should be  
sorted between a and b.)


Since I want these types to be used as seamlessly as possibly, I wanted  
to provide opEquals overloads for char[], wchar[], and dchar[]. Should I  
not bother doing that? In fact, I really shouldn't due to this compiler  
error.


Should my classes only interface with dchar and dstring?


If you had to choose, I'd suggest choosing string.  The reason is simple:

auto s = "hello";
yourFunc(s);

Because the compiler must choose a type when declaring s, it chooses  
string.


I think perhaps the compiler should use the same rules when calling an  
overloaded function with a literal.  You should file an enhancement  
request with bugzilla, see what Walter thinks.  If he doesn't like the  
idea, he usually shoots it down pretty quick :)


-Steve


Re: gdc and Make

2010-04-12 Thread Jacob Carlborg

On 4/11/10 21:38, nedbrek wrote:

Hello,

"Jacob Carlborg"  wrote in message
news:hps9bf$1q9...@digitalmars.com...

On 4/11/10 13:12, nedbrek wrote:

Hello,

"Daniel Keep"   wrote in message
news:hpra3o$2pp...@digitalmars.com...




At some point, I'd like to make plugins for Tcl, and that means building
a
.dll.  From my experience that can work through gdc, but not in dmd.
Maybe
that has changed?


I think these patches change things:
http://d.puremagic.com/issues/show_bug.cgi?id=4071


Interesting, thanks.  I will have to re-evaluate things... the other problem
I had was that the Tcl lib is in COFF format, and dmd needed OPT.  I don't
think that has changed...

Ned


There is a tool that can convert COFF to OMF: 
http://www.digitalmars.com/ctg/coff2omf.html