Re: Loop optimization

2010-05-19 Thread Joseph Wakeling
On 05/17/2010 01:15 AM, Walter Bright wrote:
 bearophile wrote:
 DMD compiler doesn't perform many optimizations,
 
 This is simply false. DMD does an excellent job with integer and pointer
 operations. It does a so-so job with floating point.

Interesting to note, relative to my earlier experience with D vs. C++ speed:
http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.comgroup=digitalmars.D.learnartnum=19567

I'll have to try and put together a no-floating-point bit of code to
make a comparison.

Best wishes,

-- Joe


dmd and gdc2 speed comparison [was: Re: Newsgroups, off-topic]

2010-05-04 Thread Joseph Wakeling
On 04/23/2010 10:04 PM, Steven Schveighoffer wrote:
 My measurements were 2 minutes 2 seconds on D to 1 minute 20 seconds on
 C++, so not quite a 2x difference, but still significant.
 
 I can only attribute the difference to g++'s more mature
 optimization/inlining techniques.  I can see why you are interested in
 having gdc working to try and compare the two :)

Just a small note.  As per discussion on the D.gnu list, I installed an
alpha version of gdc2 based on the dmd 2.015 frontend and gcc 4.3.
There were difficulties in making a direct comparison -- compilation
problems which I guess are down to changes between 2.015 and 2.043 meant
I had to make some code tweaks -- but the speed of the gdc-compiled code
in terms of iterations per second did not seem much different to the
dmd-compiled code.


Re: Arrays of many different (sub)classes

2010-04-25 Thread Joseph Wakeling
Robert Clipsham wrote:
 This should do what you want:

Thanks! :-)

Is it possible to do this with an interface instead of a base class?
I'm not familiar with how the former work ...

Best wishes,

-- Joe


Re: Newsgroups, off-topic

2010-04-24 Thread Joseph Wakeling
Steven Schveighoffer wrote:
 I did a little review of the code, I concur that the code is pretty
 identical, and the D version does not really do any extra allocation.  I
 found one place where you were using pow(x, 2) to square something in
 the D version but doing it with simple multiplication in the C++
 version, but that didn't account for any slowdowns when fixed.  I also
 saw some places where you write 0 to arrays several times, but removing
 those didn't help either.

Not several times superfluously?  I think I would be embarrassed if that
were true :-P

 I can only attribute the difference to g++'s more mature
 optimization/inlining techniques.  I can see why you are interested in
 having gdc working to try and compare the two :)

ldc already speeds things up somewhat (its inlining is better), but
still leaves a fairly sizeable gap.  There seem to be some fairly
sizeable performance differences between gcc and llvm:
http://www.phoronix.com/scan.php?page=articleitem=apple_llvm_gccnum=1

... which if I recall correctly was the motivation for the current gdc
team to start working on it again.

 In spite of all this, I still remain convinced that there is nothing
 inherently bad about these results, D compilers can and probably will
 get better at optimizing code.  That it doesn't beat a compiler which
 has been in production/mainstream for many more years probably shouldn't
 be surprising to any of us, even though we want D compilers to perform
 the best.

Completely agree here.  I was concerned based on early experience that I
was doing 'un-D-ish' things that were screwing performance, but now I'm
fairly confident that I can write OK D code.  From now on it will be all
pleasure as the compilers speed things up ... :-)

Thanks and best wishes,

-- Joe


Arrays of many different (sub)classes

2010-04-24 Thread Joseph Wakeling
Hello all,

Occasionally in C++ I find it useful to build an array which contains
classes of multiple different types all using the same interface -- by
constructing an array of pointers to some common base class, e.g.

class BaseClass {
// blah, blah ...
};

class A : BaseClass {
// ... blah ...
};

class C : BaseClass {
// ... blah ...
};

int main()
{
vectorBaseClass * vec;
vec.push_back(new A());
vec.push_back(new C());
// etc. etc.
}

(This code might be wrong; I'm just typing it to give the idea.  And in
practice, I usually do not use 'new' statements but pass pointers to
already-existing objects...:-)

Anyway, the point is that at the end of the day I have an array of
different objects with a common interface.  What's the appropriate way
to achieve the same effect in D?

Thanks  best wishes,

-- Joe


Re: Newsgroups, off-topic

2010-04-23 Thread Joseph Wakeling
Steven Schveighoffer wrote:
 No, the problem is that it potentially makes him give away the rights
 to the dmd backend. Which I think he can't legally do, even if he
 wanted to.
 
 I don't think there is any danger of this, it would be well established
 that Walter wrote all his proprietary backend code before he viewed gcc
 source.  The danger is for future code he writes.

I can see the concern here, certainly.

 Personally, I am not too concerned about the backend performance, it's
 not critical to D at this time.  Someone, somewhere, will make this
 better, and then any code written in D magically gets faster :)  We're
 talking about decreasing the constant for the D compiler complexity, not
 decreasing the complexity.  Code built with dmd runs plenty fast for me
 (save the GC performance, maybe we can focus on that first?).

I'm looking forward to seeing gdc released for D2 -- I think it will be
interesting to compare.  From what I understand part of the motivation
for reawakening it was a comparison of performance of code generated by
llvm and gcc respectively.

Part of my original issue over speed was that I'd heard D described as
'performance already as good as C++'.  So, I was coming with
expectations about what I'd be able to achieve ... :-)


Re: Newsgroups, off-topic

2010-04-23 Thread Joseph Wakeling
Steven Schveighoffer wrote:
 As long as you discount the vast differences in allocation performance,
 the code generated should be just as good as code generated by a C++
 compiler.  Your interpretation of performance did not focus on the right
 part :)  Your test application heavily used allocation and reallocation,
 things that have nothing to do with how fast the code compiled by the
 compiler is, but are based more on the algorithms behind the
 allocation.  An equivalent C++-based GC would probably show similar
 performance (in fact, I think D's GC was based on a C++ GC).
 
 This is all taken with a grain of salt of course, the perception is
 often more important than the technical details.  This thread being a
 prime example of it.

I do see the point about allocation and reallocation -- what was
bothering me a bit was that even taking those aspects out of the code
and preallocating everything, I could write C++ code that _didn't_
preallocate and still ran (much) faster ... :-)

 How I would characterize D when talking about performance is that it is
 possible to make it as high-performing as C++, but often favors memory
 safety over performance.  As far as syntax, D wins that battle hands
 down IMO.  And syntax is way more important to me than performance,
 especially at this stage of D's life.  Performance can always be tweaked
 and improved with few changes to the source code, but syntax changes can
 force you to have to modify an entire program.

Certainly agree about syntax -- it was not quite love at first sight,
but close.  In my case performance matters a lot, safety somewhat less
-- since I'm used to taking responsibility for it myself, and my
programming is quite small-scale.

As for perception, my perception is that I like D a lot and will surely
be using it more in future... :-)


Re: Newsgroups, off-topic

2010-04-23 Thread Joseph Wakeling
Steven Schveighoffer wrote:
 I do see the point about allocation and reallocation -- what was
 bothering me a bit was that even taking those aspects out of the code
 and preallocating everything, I could write C++ code that _didn't_
 preallocate and still ran (much) faster ... :-)
 
 If you are comparing vector's push_back to D's array append after
 pre-allocating, you are still not comparing apples to apples...

No ... !  That was true in the original code I posted, but following
bearophile's kind example that part of the code was updated to a form
along the lines of,

x.length = 5_000_000;

for(uint i=0;i100;++i) {
size_t pos = 0;
for(uint j=0;j5_000;++j) {
for(uint k=0;k1_000;++k) {
x[pos++] = j*k;
}
}
}

... which in itself indeed runs about the same speed as C++.  But that's
not the main cause of the difference in running time of the codes.

 D's standard library should have a construct that duplicates the
 performance of vector, I'm not sure if there is anything right now.  I
 thought Appender would do it, but you have said in the past it is slow. 
 This needs to be remedied.

It would be great -- but by itself it's not responsible for the timing
differences I'm observing.

Best wishes,

-- Joe


Re: Newsgroups, off-topic

2010-04-23 Thread Joseph Wakeling
Joseph Wakeling wrote:
 No ... !  That was true in the original code I posted, but following
 bearophile's kind example that part of the code was updated to a form
 along the lines of,

Just for reference, here are the two pieces of code, for side-by-side
comparison.  As far as I can tell the algorithms are identical; the
results are certainly identical; and the D code is significantly slower
(on the order of half the speed on my machine).  I don't know what the
cause is, but my guess would be something along the lines of iterating
across or writing values to arrays in D.

The C++ code relies on the Boost libraries, another reason to like D2:
you get a lot of nice functionality built-in ... :-)

Best wishes,

-- Joe
import std.stdio: printf;
import std.math: sqrt, pow;
import std.random;


struct Rating
{
	size_t user;
	size_t object;
	double value;
}


class ReputationAlgorithm
{
	this() {}
	this(ref Rating[] ratings,
	 ref double[] reputationUser,
	 ref double[] reputationObject) {}
}


class AvgWeighted : ReputationAlgorithm
{
	double[] weightSum;

	uint opCall(ref Rating[] ratings,
	ref double[] reputationUser,
	ref double[] reputationObject)
	{
		weightSum.length = reputationObject.length;
		weightSum[] = 0;

		reputationObject[] = 0;

		foreach(ref const(Rating) r; ratings) {
			reputationObject[r.object] += reputationUser[r.user]*r.value;
			weightSum[r.object] += reputationUser[r.user];
		}

		foreach(size_t o, ref double r; reputationObject)
			r /= weightSum[o];

		return 0;
	}

	this() {}

	this(ref Rating[] ratings,
	 ref double[] reputationUser,
	 ref double[] reputationObject)
	{
		opCall(ratings,reputationUser,reputationObject);
	}
}


final class AvgArithmetic : AvgWeighted
{
	uint opCall(ref Rating[] ratings,
	ref double[] reputationUser,
	ref double[] reputationObject)
	{
		reputationUser[] = 1;
		return super.opCall(ratings,reputationUser,reputationObject);
	}

	this() {}

	this(ref Rating[] ratings,
	 ref double[] reputationUser,
	 ref double[] reputationObject)
	{
		opCall(ratings,reputationUser,reputationObject);
	}
}


final class Yzlm : AvgWeighted
{
	double beta;
	double convergenceRequirement;
	double errorMin;
	size_t[] userLinks;
	double[] weightSum;
	double[] oldReputationObject;

	double objectReputationUpdate(ref Rating[] ratings,
	  ref double[] reputationUser,
	  ref double[] reputationObject)
	{
		double diff = 0;
		double[] tmp = oldReputationObject;
		oldReputationObject = reputationObject;
		reputationObject = tmp;
		reputationObject[] = 0;

		super.opCall(ratings,reputationUser,reputationObject);

		foreach(size_t object, ref const(double) r; reputationObject) {
			auto aux = r - oldReputationObject[object];
			diff += aux*aux;
		}

		return sqrt(diff);
	}

	void userReputationUpdate(ref Rating[] ratings,
	  ref double[] reputationUser,
	  ref double[] reputationObject)
	{
		reputationUser[] = 0;

		foreach(ref const(Rating) r; ratings) {
			auto aux = r.value - reputationObject[r.object];
			reputationUser[r.user] += aux*aux;
		}

		foreach(size_t user, ref double r; reputationUser) {
			if(userLinks[user]0)
r = pow( (r/userLinks[user]) + errorMin, -beta);
		}
	}

	uint opCall(ref Rating[] ratings,
	ref double[] reputationUser,
	ref double[] reputationObject)
	{
		userLinks.length = reputationUser.length;
		userLinks[] = 0;

		weightSum.length = reputationObject.length;
		oldReputationObject.length = reputationObject.length;

		foreach(ref const(Rating) r; ratings)
			userLinks[r.user]++;

		double diff;
		uint iterations=0;
		do {
			userReputationUpdate(ratings,
			 reputationUser,
			 reputationObject);
			diff = objectReputationUpdate(ratings,
			  reputationUser,
			  reputationObject);
			++iterations;
		} while(diff  convergenceRequirement);
		printf(Exited in %u iterations with diff = %g  %g\n,
		 iterations,diff,convergenceRequirement);
		return iterations;
	}

	this() {}

	this(double b,
	 double c,
	 double e)
	{
		beta = b;
		convergenceRequirement = c;
		errorMin = e;
		assert(beta=0);
		assert(convergenceRequirement0);
		assert(errorMin=0);
	}

	this(ref Rating[] ratings,
	 ref double[] reputationUser,
	 ref double[] reputationObject,
	 double b,
	 double c,
	 double e)
	{
		this(b,c,e);

		opCall(ratings,
		   reputationUser,
		   reputationObject);
	}
}


void main()
{
	Rating[] ratings;
	double[] reputationObject, reputationUser;
	double[] objectQuality, userError;
	uint iterTotal = 0;
	Mt19937 rng;
	auto aa = new AvgArithmetic;
	auto yzlm = new Yzlm(0.8,1e-12,1e-36);

	rng.seed(1001);

	reputationObject.length = 1000;
	reputationUser.length = 1000;

	objectQuality.length = reputationObject.length

Re: Code speed and C++

2010-04-20 Thread Joseph Wakeling
bearophile wrote:
 Joseph Wakeling:
 
 I took a little time last night and wrote up a C++ version that
 
 I'll take a look at this, I was away for few days.

Thank you very much!

No rush or even necessity -- I'm just curious if it sheds any light on
D-related issues. :-)


Re: Newsgroups, off-topic

2010-04-18 Thread Joseph Wakeling
Moritz Warning wrote:
 I think many people behind these compilers lack the personal motivation
 to integrate the D2 front end.
 That may change, but it may take time.

Maybe true, but I was thinking of it from a different angle -- why the
main D2 development does not switch backends.  To be fair, the work
involved would surely detract from the main development effort on the
language and features.  I can also see the desire to have a backend that
is fully under the control of the main developers.

 - not everybody likes the road the D2 design takes

What are the concerns here ... ?  I don't currently have a strong enough
sense of the differences.

 anyway, gdc has D2 support already (but not up to date/DMD), see http://
 bitbucket.org/goshawk/gdc/overview

I've been following that D2 work for a while now, and am looking forward
to where it is going ... :-)

Best wishes,

-- Joe


Re: Code speed and C++

2010-04-17 Thread Joseph Wakeling
bearophile wrote:
 This is the middle parts (the two loops) of Yzlm.objectReputationUpdate 
 compiled with ldc, this is one of the two main hot spots of the program, the 
 you can compared this asm with the asm of the same part from the C++ version:
 
 To find why the C++ code is faster you can show me the equivalent C++ code 
 that I can compile myself, or you can compile your C++ code and show me the 
 asm of the two functions that are hot spots.

I took a little time last night and wrote up a C++ version that is much
closer in design to the D code (as well as smaller, simpler, less hell
to read...).  It has libboost as a dependency for random number
generation and for the BOOST_FOREACH macro.

Following your observation with D, it looks like the C++ also benefits
from using references in the FOREACH loops for elements larger than
size_t -- this shaves about 20s off running time on my machine.  Either
way, like its uglier predecessor it's quite a bit faster than DMD or
LDC.  Unfortunately I can't get llvm-g++ running on my machine, or I
would compare performance there too.

Hope this makes an interesting point of comparison. :-)

Best wishes,

-- Joe
#include iostream
#include vector
#include boost/foreach.hpp
#include boost/random.hpp

using namespace std;
using namespace boost;

namespace fric
{

class Rating {
private:
	size_t _user;
	size_t _object;
	double _value;
public:
	Rating() {}

	Rating(size_t u,
	   size_t o,
	   double v)
	{
		_user = u;
		_object = o;
		_value = v;
	}

	const size_t user() { return _user; }
	const size_t object() { return _object; }
	const double value() { return _value; }
};

class ReputationAlgorithm {
	virtual size_t reputation(vectordouble reputationUser,
	  vectordouble reputationObject,
	  vectorRating ratings)
	{
		return 0;
	}

	virtual size_t addRating(vectordouble reputationUser,
	 vectordouble reputationObject,
	 vectorRating ratings,
	 Rating r)
	{
		return 0;
	}
};


namespace algorithm
{

class AvgWeighted : ReputationAlgorithm {
protected:
	vectordouble weightSum;
public:
	virtual size_t reputation(vectordouble reputationUser,
	  vectordouble reputationObject,
	  vectorRating ratings)
	{
		weightSum.assign(reputationObject.size(),0.0);
		reputationObject.assign(reputationObject.size(),0.0);

		BOOST_FOREACH(Rating r, ratings) {
			reputationObject.at(r.object()) += reputationUser.at(r.user()) * r.value();
			weightSum.at(r.object()) += reputationUser.at(r.user());
		}

		for(size_t o=0;o!=reputationObject.size();++o)
			reputationObject.at(o) /= (weightSum.at(o)0) ? weightSum.at(o) : 1;

		return 0;
	}
};

class AvgArithmetic : AvgWeighted {
public:
	virtual size_t reputation(vectordouble reputationUser,
	  vectordouble reputationObject,
	  vectorRating ratings)
	{
		reputationUser.assign(reputationUser.size(),1.0);
		return AvgWeighted::reputation(reputationUser,reputationObject,ratings);
	}
};


class Yzlm : AvgWeighted {
private:
	double _beta;
	double _convergence;
	double _errorMin;

protected:
	vectordouble oldReputationObject;
	vectorsize_t userLinks;

	double objectReputationUpdate(vectordouble reputationUser,
	  vectordouble reputationObject,
	  vectorRating ratings)
	{
		double diff = 0.0;

		oldReputationObject.assign(reputationObject.begin(), reputationObject.end());
		AvgWeighted::reputation(reputationUser, reputationObject, ratings);

		for(size_t o=0;o!=reputationObject.size();++o) {
			double aux = reputationObject.at(o) - oldReputationObject.at(o);
			diff += aux*aux;
		}

		return sqrt(diff);
	}

	void userReputationUpdate(vectordouble reputationUser,
	  vectordouble reputationObject,
	  vectorRating ratings)
	{
		reputationUser.assign(reputationUser.size(),0.0);

		BOOST_FOREACH(Rating r, ratings) {
			double aux = r.value() - reputationObject.at(r.object());
			reputationUser.at(r.user()) += aux*aux;
		}

		for(size_t u=0;u!=reputationUser.size();++u) {
			if(userLinks.at(u)!=0)
reputationUser.at(u) = pow( (reputationUser.at(u)/userLinks.at(u)) + errorMin(), -beta() );
		}
	}

public:
	void init(double b,
	  double c,
	  double e)
	{
		_beta = b;
		_convergence = c;
		_errorMin = e;
	}
	const double beta() { return _beta; }
	const double convergence() { return _convergence; }
	const double errorMin() { return _errorMin; }

	virtual size_t reputation(vectordouble reputationUser,
	  vectordouble reputationObject,
	  vectorRating ratings)
	{
		double diff;
		size_t iterations = 0;

		userLinks.assign(reputationUser.size(),0);
		oldReputationObject.resize(reputationObject.size());

		BOOST_FOREACH(Rating r, ratings)
			userLinks.at(r.user())++;

		

Re: Code speed (and back to the memory leaks...)

2010-04-16 Thread Joseph Wakeling
bearophile wrote:
 You are right, sorry.

No need to apologise!  You helped me significantly improve my code,
helped me understand D a lot better and left me feeling generally very
positive about developing further in D.  I'd call that a result. :-)

 So I think it's probably just compiler difference that's to blame for speed 
 differences
 
 This can be true, but such differences are not magic, they can be found, and 
 eventually they can even become precise enhancement requests for llvm 
 developers, they are open to such requests.

I hope my questions here have been useful in this respect ...

 After all, D in general and DMD in particular is in development.
 
 DMD has a quite old back-end that is not in active development. Maybe it will 
 become 64 bit.

The DMD backend seems a generally problematic piece of code given the
proprietary restrictions that apply to it.  I have the impression that
it's being used simply because it's the backend that is known and tested
and that over the longer term there may be a switch ... ?

The one compiler-related concern I have is whether there will be an
effective free-as-in-freedom compiler for D2 in the near future.  Work
is going on GDC but it's moving slowly -- and I don't see a clear LDC
roadmap to D2 support.

 In this program I have seen that a good percentage of the speed difference 
 between ldc/dmd is caused by foreach loops. When you iterate over structs 
 longer than size_t use ref (or in D2 better ref const(...)).
 foreach (Rating r; ratings)
 ==
 foreach (ref const(Rating) r; ratings)
 
 There is nothing algorithmically strange going on here, but to be sure you 
 can take a look at the produced asm.

Interesting ... !  I'm afraid the asm is over my head, but I will read
up and try to understand it.

 To find why the C++ code is faster you can show me the equivalent C++
 code that I can compile myself, or you can compile your C++ code and
 show me the asm of the two functions that are hot spots.

Sure -- attached.  The key functions are the ones in tref.cpp (the
smalltref class) and trefsim.cpp, and the treftest.cpp file that runs a
roughly-equivalent simulation to test.d.

It's a little embarrassing to share because it's fairly badly-designed
code -- my first serious C++ attempt -- it served its purpose some time
ago and was done with.  It started to become vaguely relevant again so I
thought I'd try and redo it with a better design, and rather than do it
in C++ (which I could do much better by now) this seemed like a nice
excuse to build my first D project ... :-)

I already made a couple of tweaks based on your improvements to the D
code, but I think g++ probably already has such optimizations during the
compile process.

I also realised that I wasn't comparing like with like -- the code was
using the RanLux random number generator, and with mt19937 it shaves
quite a bit of time off.  So C++ still has a bit of an edge right now.

There's surely a lot of other stuff that can be done to improve this
C++, but please don't put lots of effort in unless it helps with
improving D (the language and/or frontend, not my code, which is not so
important:-).

Thanks  best wishes,

-- Joe
#ifndef __TREF_HPP__
#define __TREF_HPP__

#include vector

typedef long unsigned int objectID;
typedef long unsigned int userID;

namespace tref
{

class rating {
public:
	objectID o;
	userID u;
	double value;
	rating(objectID _o, userID _u, double _value);
};

// class __tref
// Base class for all iterative refinement algorithms.
// Contains only minimally needed info and functions.
class __tref {
protected:
	bool verbose;
	std::vectortref::rating ratings;
	std::vectordouble quality;
	std::vectordouble weight;
	double beta;
	double epsilon;
	double convergence;
	virtual void addLink(objectID o, userID u);
	virtual void iterationInit();
	virtual double qualityUpdate();
	virtual void weightUpdate();
public:
	__tref(long unsigned int _objects, long unsigned int _users, long unsigned int _links,
	   double _beta, double _epsilon, double _convergence,
	   bool _verbose);
	void addRating(objectID o, userID u, double value);
	unsigned int iterationCycle();
	long unsigned int objects();
	long unsigned int users();
	long unsigned int links();
	double objectQuality(objectID o);
	double userWeight(userID u);
	void printRatings();
};

// class smalltref : public __tref
// Memory-efficient but slow algorithm.
class smalltref : public __tref {
protected:
	std::vectordouble qualityLast;
	std::vectordouble weightSum;
	std::vectorunsigned long int userLinks;
	virtual void addLink(objectID o, userID u);
	virtual double qualityUpdate();
	virtual void weightUpdate();
public:
	smalltref(long unsigned int _objects, long unsigned int _users, long unsigned int _links,
	  double _beta, double _epsilon, double _convergence,
	  bool _verbose);
};

// class bigtref : public __tref
// Supposedly faster algorithm, but needs more memory.
// Extra allocation required may actually 

Re: Newsgroups, off-topic

2010-04-16 Thread Joseph Wakeling
Lars T. Kyllingstad wrote:
 The D2 language, which has so far been the experimental branch of D
 and as such has been a rapidly moving target, is in its final stages of
 completion.  The specification has more or less been frozen, and
 currently work is being done on bringing the DMD compiler up to spec.
 When this is done (this summer, I believe) work is supposed to start on
 a 64-bit compiler.

My concern about this is more along the lines of: when will D2 versions
of LDC and/or GDC be available?

To an extent I'm not sure I understand why, with two great backends
available, D2 development did not switch over entirely to one or the
other of these -- though I understand the fact that the DMD backend is
probably the one the Digital Mars developers are most comfortable with.


Re: Code speed (and back to the memory leaks...)

2010-04-15 Thread Joseph Wakeling
Hi Bearophile,

Thanks ever so much for the amount of effort you put into this -- it is
extremely kind and generous. :-)

I'm not seeing the significant gains that you see with DMD, but for me
too LDC doubles the speed.

My guess for the difference is that you ran less than the full 100
iterations of the main for loop.  As you can see in the code the Yzlm
class launches an iterative procedure which takes more or less
iterations to converge depending on initial conditions.

This _normally_ is relatively few, but can in some circumstances be
large (several hundred).

An effect of changing the random number generator is that it changes the
initial conditions presented to the algorithm.  For example, in the
first 10 runs of the loop, the original code performs 404 iterations
overall and your new code performs only 348.

Combine that with other savings like the removal of the appender and the
faster random number generation and you get a fairly sizeable saving --
about a 25-30% cut in running time when compiling with dmd.  But those
savings become less of an influence on running time as you increase the
number of simulations.

The _real_ measure of speed is thus not overall program running time but
running time relative to the total number of iterations.  The good news
is that while the LDC-compiled version doesn't beat g++ it comes pretty
close, with about 32 iterations per second for the D code compared to 26
for the C++. :-)

The C++ code can probably be optimised a bit further but not too much;
the algorithms in the iterative process are pretty much identical
between the two versions.  (The C++ doesn't have the 'clever' memory
swap-around that I put in the D code, but that doesn't make any
difference to D's performance anyway...)  So I think it's probably just
compiler difference that's to blame for speed differences -- which is
fine.  After all, D in general and DMD in particular is in development.

I _am_ surprised that in your profiling the random number generator took
up so much time, as if you cut out the lines,

aa(ratings, reputationUser, reputationObject);
yzlm(ratings, reputationUser, reputationObject);

... and recompile, the program screams through in no time at all --
which would lead me to think that it's the yzlm opCall and the functions
it calls that take up the majority of time.  Or is there some lazy
evaluation going on here ... ?

I don't think it's the algorithm in any case (although it might be
Phobos' implementation) as working with Tango and LDC, changing between
the Twister or Kiss random number generators doesn't make for any
meaningful performance difference.

One last thing.  While comparing DMD and LDC I noticed something odd.
Take this code, which is an LDC and DMD-compatible version of the 'array
leak' code I posted originally:

/
version(Tango) {
import tango.stdc.stdio: printf;
} else {
import std.stdio: printf;
}

void main()
{
double[] x;

for(uint i=0;i100;++i) {
x.length = 0;

for(uint j=0;j5_000;++j) {
for(uint k=0;k1_000;++k) {
x ~= j*k;
}
}

printf(At iteration %u, x has %u elements.\n,i,x.length);
}
}
/

I noticed that when compiled with LDC the memory usage stays constant,
but in DMD the memory use blows up.  Is this a D1/D2 difference (with D2
having the more advanced features that require preservation of memory)
or is it a bug in one or the other of the compilers ... ?

Thanks  best wishes,

-- Joe



Tango2 and Phobos2 [was: Re: Code speed]

2010-04-15 Thread Joseph Wakeling
bearophile wrote:
 Robert Clipsham Wrote:
 
 this way if/when 
 tango is ported to D2 it doesn't have to hack around this to allow it to 
 allow users to use ^^.
 
 I hope Tango2 will be designed to be installed beside Phobos2, and not in 
 place of it.

I thought that was the aim ... ?  At least according to D's Wikipedia page:
http://en.wikipedia.org/wiki/D_%28programming_language%29#Division_concerning_the_standard_library


Re: Memory leak with dynamic array

2010-04-13 Thread Joseph Wakeling
Steven Schveighoffer wrote:
 This should work as you expect.  Can you produce a full code example? 
 Are you reserving space in x before-hand?

Weird; now it works.  I wonder if because I upgraded to 2.043 ... ?  I
understand there was a fix in there for array append.

On the other hand, if the assumeSafeAppend takes place outside the loops
entirely, blow-up occurs -- because the command is not issued after the
array resize to zero?  I've attached examples.

Thanks  best wishes,

-- Joe
import std.array;
import std.stdio;

void main()
{
	double[] x;

	assumeSafeAppend(x);
	
	foreach(uint i;0..100) {
		x.length = 0;
		
		foreach(uint j;0..5_000) {
			foreach(uint k;0..1_000) {
x ~= j*k;
			}
		}
		
		writefln(At iteration %u, x has %u elements.,i,x.length);
	}
}
import std.array;
import std.stdio;

void main()
{
	double[] x;

	foreach(uint i;0..100) {
		x.length = 0;
	
		assumeSafeAppend(x);
		
		foreach(uint j;0..5_000) {
			foreach(uint k;0..1_000) {
x ~= j*k;
			}
		}
		
		writefln(At iteration %u, x has %u elements.,i,x.length);
	}
}


Re: Memory leak with dynamic array

2010-04-13 Thread Joseph Wakeling
Joseph Wakeling wrote:
 On the other hand, if the assumeSafeAppend takes place outside the loops
 entirely, blow-up occurs -- because the command is not issued after the
 array resize to zero?  I've attached examples.

If I include a line

 x.reserve(5_000_000);

before the assumeSafeAppend, the memory does not explode indefinitely --
but it uses about 3x as much memory as the 'NoLeak' code even when the
latter lacks advance reservation of memory.


Re: Memory leak with dynamic array

2010-04-13 Thread Joseph Wakeling
Steven Schveighoffer wrote:
 Assuming you may ask questions about this, it's not an exact description
 of what happens.  The append code and GC are smarter about it than this,
 I just wanted to explain it in an easy-to-understand way :)  The real
 code uses algorithms to determine optimal grow sizes and can extend into
 consecutive blocks if possible.

I realised that while running some of the code that bearophile suggested
to me, because if you look at 'capacity' in a non-pre-reserved D array
that's appended to 5 million times, its capacity is only a little over 5
million -- compared to a C++ vector which is the nearest power of 2
(2^23, over 8 million).

The GC overhead means that in this case actual memory usage is a bit
higher than C++, but on a larger scale this could surely make a big
difference.

In the case of the piece of code I'm working on I don't think the
pre-reserve is really so important as far as performance goes -- the
append speed is a bit of a bugger, but the main issue is probably to do
with speed of copying and iterating across arrays.

For example, I suspect that the D array's,

x[] = 0;
y[] = z[];

is not as efficient as a C++ vector's

x.assign(x.size(),0);
y.assign(z.begin(),z.end());

I think I can find some ways round this by taking advantage of some of
the D arrays' cleverness, limiting the copying of values and instead
just swapping around which memory each array is pointing to.

As for iteration, I don't know to what degree D's foreach() across
arrays compares to for() commands in C++ -- I guess it should be pretty
close in performance, no?

Best wishes,

-- Joe


Code speed [was: Re: Memory leak with dynamic array]

2010-04-13 Thread Joseph Wakeling
Don wrote:
 The D code compiles directly to a memset and memcpy, which are
 intrinsics. There's no way C++ could beat it.

OK.  I'm just bemused about why -- having solved the memory leak -- my D
code is running significantly slower than equivalent C++ code.

For now I'm just playing with D by doing a rewrite of some C++ code I
did for a project a while back -- to see if I can write a neater version
of the code in the new language, and learn something about D in the
process.  I may have fallen into the trap you described earlier, of just
'translating' C++ style code directly into D, but I don't think that's
the issue here.

There are a couple of algorithmic inefficiencies in there with
deliberate reason, but removing them for testing purposes doesn't
improve matters.  The array append is slow, but it's not the major part
of the running time of the program.

For the sake of clarity, I've attached the complete code.  The really
important part to look at is the file yzlm.d in the opCall() function
and class functions objectReputationUpdate() and userReputationUpdate().
This is a reputation-generating process which takes a set of ratings by
users of objects, and uses them to iteratively recalculate user
reputation and re-aggregate the ratings taking this into account.

I don't see anything particularly bad about the way these algorithms are
written but maybe they are imperfectly D-ish ... :-)

Best wishes,

-- Joe
module dregs.algorithms.yzlm;

import std.math;
import std.stdio;
public import dregs.core;

class Yzlm : ReputationAlgorithm
{
	double beta;
	double convergenceRequirement;
	double errorMin;
//	uint[] userLinks; // commented out because for now it
	  // has a constant value for all users
	double[] weightSum;
	double[] oldReputationObject;
	
	double objectReputationUpdate(ref Rating[] ratings,
	  ref double[] reputationUser,
	  ref double[] reputationObject)
	{
		double diff = 0;
		double[] temp = oldReputationObject;

		// Original version had:
		//
		//oldReputationObject[] = reputationObject[]
		//
		// This version is an attempt to save effort
		// by just switching round the memory the two
		// arrays are pointing at -- not sure if it
		// actually does what I'm expecting it to.
		// Doesn't seem to improve speed. :-(
		oldReputationObject = reputationObject;
		reputationObject = temp;
		reputationObject[] = 0;
		
		weightSum[] = 0;
		
		foreach(Rating r; ratings) {
			reputationObject[r.object] += reputationUser[r.user] * r.value;
			weightSum[r.object] += reputationUser[r.user];
		}
		
		foreach(uint object, ref double r; reputationObject) {
			r /= (weightSum[object]0) ? weightSum[object] : 1;
			diff += pow( (r - oldReputationObject[object]), 2);
		}
		
		return sqrt(diff);
	} 
	
	void userReputationUpdate(ref Rating[] ratings,
	  ref double[] reputationUser,
	  ref double[] reputationObject)
	{
		reputationUser[] = 0;
		
		foreach(Rating r; ratings)
			reputationUser[r.user] += pow( (r.value - reputationObject[r.object]), 2);
		
		foreach(uint user, ref double r; reputationUser) {
			//if(userLinks[user]0)
r = pow( (r/reputationObject.length/*userLinks[user]*/) + errorMin, -beta);
		}
	}
	
	void opCall(ref Rating[] ratings,
	ref double[] reputationUser,
	ref double[] reputationObject)
	{
	//	userLinks.length = reputationUser.length;
	//	userLinks[] = 0;
		
		weightSum.length = reputationObject.length;
		oldReputationObject.length = reputationObject.length;
		
	//	foreach(Rating r; ratings)
	//		userLinks[r.user]++;
		
		double diff;
		uint iterations=0;
		do {
			userReputationUpdate(ratings,
			 reputationUser,
			 reputationObject);
			diff = objectReputationUpdate(ratings,
			  reputationUser,
			  reputationObject);
			++iterations;
		} while(diff  convergenceRequirement);
		writefln(Exited in %u iterations with diff = %g  %g,iterations,diff,convergenceRequirement);
	}
	
	this() {}
	
	this(double b,
	 double c,
	 double e)
	{
		beta = b;
		convergenceRequirement = c;
		errorMin = e;
		assert(beta=0);
		assert(convergenceRequirement0);
		assert(errorMin=0);
	}
	
	this(ref Rating[] ratings,
	 ref double[] reputationUser,
	 ref double[] reputationObject,
	 double b,
	 double c,
	 double e)
	{
		this(b,c,e);
		
		opCall(ratings,
		   reputationUser,
		   reputationObject);
	}
}
module dregs.algorithms.avg;

public import dregs.core;

class AvgWeighted : ReputationAlgorithm
{
	double[] weightSum;

	void opCall(ref Rating[] ratings,
	ref double[] reputationUser,
	ref double[] reputationObject)
	{
		weightSum.length = reputationObject.length;
		weightSum[] = 0;
		
		reputationObject[] = 0;
		
		foreach(Rating r; ratings) {
			reputationObject[r.object] += reputationUser[r.user]*r.value;

Re: Memory leak with dynamic array

2010-04-12 Thread Joseph Wakeling
Thanks to all again for the discussion, examples and explanations. :-)

One note -- I wouldn't want anyone to think I'm bashing D or complaining.
I've been interested in the language for some time and this seemed an
opportune time to start experimenting properly.  It's fun, I'm learning
a lot, and I'm genuinely touched by the amount of effort put in by
everyone on this list to teach and share examples.

I'm also fully aware that D is still growing, and that I need to be
patient in some cases ... :-)

bearophile wrote:
 D dynamic arrays are more flexible than C++ vector, they can be sliced,
 such slicing is O(1), and the slices are seen by the language just like
 other arrays. So you pay the price of some performance for such
 increased flexibility. The idea here is that the built-in data types
 must be as flexible as possible even if their performance is not so
 high, so they can be used for many different purposes.

No complaint there. :-)

 Then D standard library will have specialized data structures that are
 faster thanks to being more specialized and less flexible.

In my case -- I'm turning into 'Mr C++' again -- probably that's often
what I need.  If I look at the major benefits I found in moving from C
to C++, the first was memory management that was as automatic as I required.
For example, C++ vectors are great because they do away with having to
put in malloc/realloc/free statements and let you treat dynamic arrays
pretty much as 'just another variable'.

Within my own needs I've not yet found a case where the kind of smart GC
functionality discussed on this thread seemed necessary, but of course
I've never had it available to use before ... :-)

 In D dynamic arrays some of the performance price is also paid for the
 automatic memory management, for the GC that's not a precise GC (for
 example if your array has some empty items at the end past its true
 length, the GC must ignore them).

An idea was floating in my head about whether it is/could be possible to
turn off GC safety features in a scope where they are unnecessary --
rather like a more general version of the 'assumeSafeAppend' function...

 With LDC (once we'll have a D2 version of it) the performance of D2
 can probably be the same as the C++. DMD maybe loses a little here
 because it's not so good at inlining, or maybe because the C++ vector
 is better than this D2 code.

I thought dev effort was now focusing back on GDC ... ? :-P

I have actually not made much use of the -inline function because in
the code I wrote (maybe not best suited to inlining...), it made the
program generally run slower ...

Steven Schveighoffer wrote:
 The C++ example is reallocating memory, freeing memory it is no longer
 using.  It also manually handles the memory management, allocating larger
 and larger arrays in some algorithmically determined fashion (for example,
 multiplying the length by some constant factor).  This gives it an edge in
 performance because it does not have to do any costly lookup to determine
 if it can append in place, plus the realloc of the memory probably is
 cheaper than the GC realloc of D.

Right.  In fact you get precisely 24 allocs/deallocs, each doubling the
memory reserve to give a total capacity of 2^23 -- and then that memory is
there and can be used for the rest of the 100 iterations of the outer loop.
The shock for me was finding that D wasn't treating the memory like this
but was preserving each loop's memory (as you say, for good reason).

 D does not assume you stopped caring about the memory being pointed to
 when it had to realloc. [...] You can't do the same thing with C++
 vectors, when they reallocate, the memory they used to own could be
 freed.  This invalidates all pointers and iterators into the vector,
 but the language doesn't prevent you from having such dangling pointers.

I have a vague memory of trying to do something exactly like your example
when I was working with C++ for the first time, and getting bitten on the
arse by exactly the problem you describe.  I wish I could remember where.
I know that I found another (and possibly better) solution to do what I
wanted, but it would be nice to see if a D-ish solution would give me
something good.

 This must be fixed, the appender should be blazingly fast at appending
 (almost as fast as C++), with the drawback that the overhead is higher.

Overhead = memory cost?  I'm not so bothered as long as the memory stays
within constant, predictable bounds.  It was the memory explosion that
scared me.  And I suspect I'd pay a small performance cost (though it
would have to be small) for the kind of safety and flexibility the arrays
have.

 You haven't done much with it yet.  When you start discovering how much D
 takes care of, you will be amazed :)

I know. :-)

My needs are in some ways quite narrow -- numerical simulations in
interdisciplinary physics -- hence the C background, and hence the premium
on performance.  They're also not very big programs -- 

Re: Memory leak with dynamic array

2010-04-12 Thread Joseph Wakeling
Steven Schveighoffer wrote:
 On Mon, 12 Apr 2010 12:03:38 -0400, Joseph Wakeling
 joseph.wakel...@gmail.com wrote:
 
 I thought dev effort was now focusing back on GDC ... ? :-P
 
 AFAIK, gdc hasn't been actively developed for a few years.
 
 ldc, on the other hand, has regular releases.  I think ldc may be the
 future of D compilers, but I currently use dmd since I'm using D2.

http://bitbucket.org/goshawk/gdc/wiki/Home

:-)

Either way I'm happy.  I don't have any issues with dmd, but I do want
to see a properly free D compiler that can be prepackaged in all the
Linux distros.

 Yes, you get around this by preallocating.

Sure.  But that wasn't what shocked me -- what I was amazed by was
setting up a situation where I _had_ preallocated the memory and still
seeing the memory usage explode, because D was preserving the memory
from each round of the loop.

(Actually I'm still having some issues with this, despite using
assumeSafeAppend, but more on that in a separate email.)

 It's often these types of performance discrepancies that critics point
 to (not that you are a critic), but it's the cost of having a more
 comprehensive language.  Your appetite for the sheer performance of a
 language will sour once you get bit by a few of these nasty bugs.

For sure.

 But D fosters a completely different way of thinking about solving
 problems.

I can see how the example you give would be fantastically useful.

More generally, I think this is the point -- I need to adjust my head to
writing D-ish code, just as when moving from C to C++ I needed to switch
to various new ways of doing things.

 There are many in the community that use D for numerical stuff.  It's
 definitely not as mature as it could be, but getting better.  Don is
 adding a lot of cool stuff to it, including a builtin exponent operator
 and arbitrary precision numbers.

I guessed there would be -- I knew for example that there was someone
out there working on a fairly major mathematical/numerical library for
D, but it's a while since I checked that out.

So take my earlier comment about numerical work to refer only to doing
things the way I'm used to ... ;-)

 Yes, but that's not what I meant ;)  I mean, you can write your own
 types, like the Appender (or what the appender *should* be) that
 optimize the behavior of code to meet any needs.  And it can do it with
 a much better syntax than C.  D's template system and ability to make
 user-types seem like builtins I think is unparalleled in C-like languages.

Hence much pleasure and excitement in learning D ... :-)

Don wrote:
 There are quite a lot of us here with exactly that kind of background.
 
 Something about the array issue -- D dynamic arrays are heavily geared 
 towards algorithms which perform an initial allocation and afterwards avoid 
 memory allocation entirely.
 In D, such slicing algorithms are extremely clean, extremely fast, and memory 
 safe.
 In C++, it's much more difficult to write code in that manner. A 
 straightforward translation from C++ will generally miss the benefits of D 
 arrays, and you'll end up with slower code.

Exactly my current situation. :-P

 A kind of Zen of D is to use array slices as much as possible.

I will look into this more and see if this approach can help with some
of my code -- are there existing projects I could take a look at to get
some examples?

Thanks  best wishes,

-- Joe


Re: Memory leak with dynamic array

2010-04-12 Thread Joseph Wakeling
Joseph Wakeling wrote:
 (Actually I'm still having some issues with this, despite using
 assumeSafeAppend, but more on that in a separate email.)

... solved; it's interesting to note that assumeSafeAppend has to be
used in _exactly_ the same scope as the append ~= itself.

e.g. with this loop, the memory blows up:


foreach(uint i;0..100) {
x.length = 0;

assumeSafeAppend(x);

foreach(uint j;0..5000)
foreach(uint k;0..1000)
x ~= j*k;

writefln(At iteration %u, x has %u elements.,i,x.length);
}


... while with this one it's OK:


foreach(uint i;0..100) {
x.length = 0;

foreach(uint j;0..5000) {
foreach(uint k;0..1000) {
assumeSafeAppend(x);
x ~= j*k;
}
}

writefln(At iteration %u, x has %u elements.,i,x.length);
}


Curious question: how come with a registered email address my messages
have to be moderated, whereas via the http interface I can post straight
away? :-P

Best wishes,

-- Joe


Re: Memory leak with dynamic array

2010-04-11 Thread Joseph Wakeling
Thanks very much for the extended and detailed explanations.  The 
assumeSafeAppend
function
indeed fixes the memory leakage. :-)

In reply to some comments ...

  That is to be expected.  The append function is not the most efficient
  thing.  It still must do a lot of work just to look up the valid length
  and verify appending can be done, whereas the v2 append is a single
  instruction!

 The v2 append seems two instructions:
 arr[arr_len] = j;
 arr_len++;

 And the C++ v1 too has to verify if the appending can be done, comparing
 the size with the capacity and if so assigning the item and increasing the
 size.

 I understand that the D code has to work more here, and I presume your code
 can't be improved. But those timings are a little sad anyway :-|

  A better method is to set the length, and then write the data.

 What I have done in the D v2 example.

I was very happy to see that D _does_ have a 'reserve' function for arrays,
which I had been missing compared to C++ (it's not mentioned in the array
docs).

Still, I don't think that pre-reserving the memory per se is the influencing
factor on the differences in performance.  To see why, compare these two pieces
of code -- a non-leaking D code, and the equivalent in C++:

// D example
/
import std.stdio;

void main()
{
double[] x;

foreach(uint i;0..100) {
x.length = 0;

writefln(Array capacity (1) = %u,x.capacity);
assumeSafeAppend(x);
writefln(Array capacity (2) = %u,x.capacity);

foreach(uint j;0..500)
x ~= j;

writefln(At iteration %u, x has %u elements.,i,x.length);
}
}
/

 C++ example
/
#include vector
#include iostream

using namespace std;

int main()
{
vectordouble x;

for(unsigned int i=0;i!=100;++i) {
x.resize(0);

cout  Vector capacity:   x.capacity()  endl;

for(unsigned int j=0;j!=500;++j)
x.push_back(j);

cout  At iteration   i  , x has   x.size()   elements. 

endl;
}

return 0;
}
/

Note that in C++ the memory is not preassigned either.  The difference
between the performance of these pieces of code is striking -- on my
machine the D example takes about 70--75 seconds to run, whereas the
C++ example speeds through it in 10s or less.

D also uses about 20% more memory than the C++ even though the C++ code
declares a higher capacity for the vector (above 8 million) than D does
for the array (a little over 5 million).

I don't know if it was naive to assume that D's dynamic arrays would be
equivalent to vectors in C++.  But it's clear that the D array appender
~= is much slower than the C++ vector push_back().

Using an Appender class and put() (from std.array) is even slower,
despite the std.array docs recommending this over ~. :-(

It's disappointing because I'm loving so much of D's syntax, but I can
write far more efficient code (with less explicit memory management)
in C++ ...


Memory leak with dynamic array

2010-04-10 Thread Joseph Wakeling
Hello all,

This one is another 'oh God it doesn't work like C++' query ... :-)

I've been having a problem with a memory leak that I've tied down to a dynamic
array that is repeatedly resized to zero and then extended within a loop.  The
following code gives a minimal example:


import std.array;
import std.stdio;

void main()
{
real[] x;

x.length = 500;

foreach(uint i;0..100) {
x.length = 0;

foreach(uint j;0..500)
x ~= j;

writefln(At iteration %u, x has %u elements.,i,x.length);
}
}


If I run this on my system, the memory usage explodes within a very short
time, despite the fact that the length of the array is set early on, so the
memory should all be reserved.  The writefln() command confirms that the
actual number of elements always stays within the original bounds set.

I'm guessing this is down to concatenation ~ working differently from C++
vectors' push_back() command.  I have also tried using the Appender and its
put() command, but the same result happens.

I even tried moving the array declaration within the first foreach() loop, and
putting an explicit delete at the end, but to no avail.

Can anyone advise? :-)

Thanks  best wishes,

-- Joe


Re: Class-related queries [was: Re: 'Undefined reference' linking errors]

2010-04-09 Thread Joseph Wakeling
Thanks for the interesting and detailed explanation. :-)

 Now you see, 'new' is for dynamic memory allocation in D as well, it's
 just that for classes it is required.  You normally don't need to worry
 about 'delete', as the GC will take care of deallocation.

I guess I am worried about what could happen in the case of code like this in 
C++:

for(i=0;i1;++i) {
Foo f(i);
// Do something with f ...
}

... when it reappears in D as:

foreach(uint i;0..1) {
auto f = new Foo(i);
// Do something with f ...
}

Of course, it's not so terrible to have to put an explicit 'delete' statement at
the end of the foreach loop if that is necessary (I am after all a C person at
heart:-).  It's also clear that GC will probably make such a loop more efficient
as it will manage the alloc/dealloc'ing of the memory more intelligently within
the system constraints.  But the concern is there ...


Re: Class-related queries [was: Re: 'Undefined reference' linking errors]

2010-04-09 Thread Joseph Wakeling
 Also try:

 foreach(uint i;0..1) {
scope Foo f = new Foo(i);
// Do something with f ...
 }

 That sometimes doesn't require allocations.

To be sure I understand -- is there anything wrong with writing

 scope auto f = new Foo(i)

... or, in general, using auto to infer the class being initiated?  I found 
myself
disliking the double writing of Foo, although I guess there could be a positive
side to it in ensuring that the class really is what it's meant to be.


Re: 'Undefined reference' linking errors

2010-04-08 Thread Joseph Wakeling
Thanks to everyone for the responses.

I'll respond to Bearophile's detailed comments:

 Few notes:
 - opCall() of AvgWeighted was abstract.
 - keep in mind that in D classes are CamelCase;
 - variable names are written like weightSum (but once in a while a underscore
doesn't kill).

I think it's obvious from my syntax that my background is with C; I'm not
experienced with Java, C# etc.  This may explain some of the problems I'm 
having.

Regarding opCall I was following the syntax described here:
http://www.digitalmars.com/d/2.0/operatoroverloading.html#FunctionCall

... but clearly without understanding it properly.

What I was aiming for was a bit smartarse -- to have a class which could in some
cases be treated as a function.  Each of these classes (later ones will be more
sophisticated) is meant to be a data analysis tool which takes a dataset of
user-object ratings and user and object reputation values and helps aggregate 
the
ratings and in the process update the reputation values.

The aim was that if you just wanted a once-off analysis you could use the class 
in
a throwaway fashion -- hence the use of,

   avg_weighted(..);

rather than

   avg_weighted aw(.);

The aim is that you would use the second if you were interested in employing the
analysis multiple times, and that the class will have other functions that can 
be
used for different or secondary analyses from the main one.

It's maybe not the best way to approach what I want to do, but since D is a new
language for me, I thought I would be playful with it and try and bend it around
in some interesting ways.

 - Be careful because ref arguments are tricky.

The choice is deliberate here, because the arrays passed to the constructor (or
opCall) are meant to be modified.

 - There is a line like foreach (r; reputationUser) r = 1; that can be a bug.

I guess that I should put a 'double' in front of the r, no?  In any case, I 
guess
there is a better way of setting all elements of an array equal to 1.0.

 - foreach (objectID, rating; reputationObject) rating /= weightSum[objectID];
can be another bug.

... so should be uint objectID, double rating ... ?

I think it's obvious that I want each the value of each element of
reputationObject to be divided by the value of the corresponding element of
weightSum -- is there a more intelligent way of doing this?  Reading Andrei
Alexandrescu's article on Dr Dobb's gave me the impression something could be 
done
using chain(), but I couldn't work out how (and probably misunderstood).

 - Use better attribute names in Rating struct, when you need to comment a
variable name then it's often a wrong name.
 - To create structs you can most times use the syntax I've used in the main.
 - In methods/functions divide your code into paragraphs;
 - keep your indentations more coherent

It's nice to see the stress in D on well-written code.  Thanks for taking the 
time
to clean up mine. :-)

 - I suggest to add contracts and unittests.

As you might have guessed, I'm not a developer -- can you provide more info?

Thanks  best wishes,

-- Joe


'Undefined reference' linking errors

2010-04-07 Thread Joseph Wakeling
Hello everyone,

A probably stupid but I-can't-find-the-solution-with-Google problem.

I'm trying to compile a small project I'm working on to learn D, called
'dregs'.  It's just 3 files for now (2 modules plus a little test program).
Unfortunately every time I try and compile, I get 'undefined reference' errors
for the classes/structs in the modules.

Here are the files:

// reputation.d
module dregs.reputation;

struct rating
{
uint u;   // user ID
uint o;   // object ID
double r; // rating value
}

class reputation
{
this()
{
}
this(ref rating[] ratings,
 ref double[] reputation_user,
 ref double[] reputation_object)
{
}
}
// END

// avg.d
module dregs.avg;

public import dregs.reputation;

class avg_weighted : reputation
{
double[] weight_sum;
this(){}
this(ref rating[] ratings,
 ref double[] reputation_user,
 ref double[] reputation_object)
{
weight_sum.length = reputation_object.length;
foreach(r; ratings) {
reputation_object[r.o] += r.r;
weight_sum[r.o] += reputation_user[r.u];
}
foreach(o, r; reputation_object)
r /= weight_sum[o];
}
void opCall(ref rating[] ratings,
ref double[] reputation_user,
ref double[] reputation_object);
}

class avg_arithmetic : avg_weighted
{
this(ref rating[] ratings,
 ref double[] reputation_user,
 ref double[] reputation_object)
{
foreach(r; reputation_user)
r = 1;
super(ratings,reputation_user,reputation_object);
}
void something(ref dregs.reputation.rating[] ratings,
   ref double[] reputation_user,
   ref double[] reputation_object)
{
avg_weighted(ratings,reputation_user,reputation_object);
}
}
// END

// test.d
import std.stdio;
import dregs.reputation;
import dregs.avg;

void main()
{
rating[] r;
double[] reputation_user;
double[] reputation_object;

reputation_user.length = 999;
reputation_object.length = 1;

foreach(u;0..reputation_user.length) {
rating _r = {u,0,(u%3)};
r ~= _r;
}
}
// END

I'm running dmd 2.042 on Ubuntu 9.10.

   dmd -O -I../ test.d avg.d reputation.d

... produces the following errors:


test.o:(.rodata+0x98): undefined reference to
`_D5dregs3avg12avg_weighted6opCallMFKAS5dregs10reputation6ratingKAdKAdZv'
test.o:(.rodata+0xf8): undefined reference to
`_D5dregs3avg12avg_weighted6opCallMFKAS5dregs10reputation6ratingKAdKAdZv'
test.o: In function
`_D5dregs3avg14avg_arithmetic9somethingMFKAS5dregs10reputation6ratingKAdKAdZv':
reputation.d:(.text._D5dregs3avg14avg_arithmetic9somethingMFKAS5dregs10reputation6ratingKAdKAdZv+0x1b):
undefined reference to
`_D5dregs3avg12avg_weighted6opCallMFKAS5dregs10reputation6ratingKAdKAdZv'
collect2: ld returned 1 exit status
--- errorlevel 1


This surely is something very basic, but I couldn't find a reason for it in my
search of the archives ... :-(  Can anyone advise what I'm doing wrong?

Thanks  best wishes,

-- Joe