Re: Adam D. Ruppe's "D Cookbook" now available!

2014-06-10 Thread Jacob Carlborg via Digitalmars-d-announce

On 10/06/14 19:43, Adam D. Ruppe wrote:


blargh, I thought it could do more. Does it at least work to pull out
extern "C" functions from a C++ header?


Hmm, I haven't tried that. You need to specified which language to use. 
Currently DStep has hard coded its language support, in which C++ is not 
included.


It starts to get more complicate if it needs to support multiple 
languages in the same file. It should be possible, but then I think 
every declaration will need to be prefixed with "extern (C)".


--
/Jacob Carlborg


Re: hap.random: a new random number library for D

2014-06-10 Thread Joseph Rushton Wakeling via Digitalmars-d-announce
On Monday, 9 June 2014 at 18:09:21 UTC, Joseph Rushton Wakeling 
wrote:

Hello all,


Incidentally, would it be a good idea to post a link to the blog 
post on r/programming?  Haven't done so yet, as generally I 
prefer to leave decisions about D publicity to others, but can do 
so if people would like.


Re: hap.random: a new random number library for D

2014-06-10 Thread Joseph Rushton Wakeling via Digitalmars-d-announce

On Tuesday, 10 June 2014 at 23:48:09 UTC, bearophile wrote:
Please stop, I am not worth that, and I don't even know how 
much good that generator is. So for you it's better to focus on 
more important matters of the new random module. Extra 
generators can be added later if needed.


After all the advice and help you've given me (and the rest of 
this community) over the course of years, it's really a pleasure 
to be able to offer you a small favour like this.  But of course 
it could be fun to first run things through e.g. the TestU01 
suite ...


Passing several cpu words by value for each generated value 
seems not very efficient. But this generator is for special 
situations, so a certain performance loss could be acceptable. 
And if the compiler is able to inline the functions, the data 
transfer overhead is removed, and most of the performance loss 
is restored (but I don't know if non-templated Phobos functions 
get inlined).


Well, I think it's worth experimenting with.  For clarity, I 
wasn't suggesting modifying the existing Xorshift code, but 
creating a separate implementation in strongly pure style, and 
seeing how that differs performance-wise from what already exists.


I guess I might also consider finally getting my head round 
monads, and relating that to RNG design ... :-)


Re: hap.random: a new random number library for D

2014-06-10 Thread Joseph Rushton Wakeling via Digitalmars-d-announce

On Tuesday, 10 June 2014 at 23:08:33 UTC, Chris Cain wrote:
I had an opportunity to give the entire code a good once over 
read and I have a few comments.


Thanks! :-)

1. Biggest thing about the new hap.random is how much nicer it 
is to actually READ. The first few times I went through the 
current std.random, I remember basically running out of breath. 
hap.random was almost a refreshing read, in contrast. I'm 
guessing it has a lot to do with breaking it down into smaller, 
more manageable pieces. Regardless, good work on that. I 
suspect it'll make it easier to contribute to in the future.


That's great to hear, as it was a design goal.  I think there 
will probably at some point be a need to separate things further 
(e.g. std.random.generator will probably have to be separated as 
will std.random.distribution) but always keeping the principle of 
implementing packages to make it possible to just "import 
hap.random" (or "import hap.random.generator", or whatever).


2. Something I'd really like to see is for the seed-by-range 
functions to take the range by reference instead of by value to 
ensure that the seed values used are less likely to be used in 
another RNG inadvertently later. Basically, I envision a 
similar problem with seedRanges as we currently have with RNGs 
where we have to make sure people are careful with what they do 
with the ranges in the end. This should cover use cases where 
users do things like `blah.seed(myEntropyRange.take(3))` as 
well, so that might take some investigation to figure out how 
realistic it would be to support.


Yea, that's an interesting point.  I mean, you'd hope that 
myEntropyRange would be a reference type anyway, but every little 
helps :-)


3. I'd also REALLY like to see seed support ranges/values 
giving ANY type of integer and guarantee that few bytes are 
wasted (so, if it supplies 64-bit ints and the generator's 
internal state array only accepts 32-bit ints, it should spread 
the 64-bit int across two cells in the array). I have working 
code in another language that does this, and I wouldn't mind 
porting it to D for the standard library. I think this would 
greatly simplify the seeding process in user code (since they 
wouldn't have to care what the internal representation of the 
Random state is, then).


That would be very cool.  Can you point me at your code examples?

4. I'd just like to say the idea of using ranges for seeds gets 
me giddy because I could totally see a range that queries 
https://random.org for true random bits to seed with, wrapped 
by a range that zeroes out the memory on popFront. Convenient 
and safe (possibly? Needs review before I get excited, 
obviously) for crypto purposes!


The paranoiac in me feels that anything that involves getting 
random data via HTTPS is probably insecure crypto-wise :-)  
However, I think sourcing random.org is a perfect case for an 
entry in hap.random.device.  I think the best thing to do would 
probably be to offer a RandomOrgClient (which offers a very thin 
API around the random.org HTTP API) and then to wrap that in a 
range type that uses the client internally to generate random 
numbers with particular properties.


5. Another possible improvement would be something akin to a 
"remix" function. It should work identically to reseeding, but 
instead of setting the internal state to match the seed (as I 
see in 
https://github.com/WebDrake/hap/blob/master/source/hap/random/generator.d#L485), 
remixing should probably be XOR'd into the current state. That 
way if you have a state based on some real entropy, you can 
slowly, over time, drip in more entropy into the state.


Also a very interesting suggestion.  Is there a standard name for 
this kind of procedure?


6. I'd like to see about supporting xorshift1024 as well 
(described here: http://xorshift.di.unimi.it/ and it's public 
domain code, so very convenient to port ... I'd do it too, of 
course, if that seems like an okay idea). This is a really 
small thing because xorshift1024 isn't really much better than 
xorshift128 (but some people might like the idea of it having 
significantly longer period).


Fantastic, I will see about implementing those.  I wasn't 
previously aware of that work, but I _was_ aware that the 
standard Xorshift generators have some statistical flaws, so it's 
great to have some alternatives.  It should be straightforward to 
implement things like XorshiftP128 or XorshiftS1024 and 
XorshiftS4096 (using P and S in place of + and *).


With these in place we might even be able to deprecate the old 
Xorshift generators.


Just for clarity, here's how I see things rolling out for the 
future:


  * First goal is to ensure the existing codebase "plays nice" 
with
people's programs and that it works OK with dub, rdmd, etc. 
and
doesn't have any serious architectural or other bugs.  The 
1.0.0
release will not have any new functionality compared to what 
is

in place now.

  * Once it seems 

Re: DMD 2.066 Alpha

2014-06-10 Thread deadalnix via Digitalmars-d-announce

On Wednesday, 11 June 2014 at 04:17:04 UTC, Andrew Edwards wrote:

On 6/10/14, 10:01 PM, Brian Schott wrote:
Please do not tag anything until we decide if "virtual" is a 
keyword in D.


See: 
https://github.com/D-Programming-Language/dlang.org/pull/584


The branch will not be created until 30 June. I trust that this 
will be sorted out by then.


I'll be there to test and bug report ! Thank for being the 
release lieutenant.


Re: DMD 2.066 Alpha

2014-06-10 Thread Andrew Edwards via Digitalmars-d-announce

On 6/10/14, 10:01 PM, Brian Schott wrote:

Please do not tag anything until we decide if "virtual" is a keyword in D.

See: https://github.com/D-Programming-Language/dlang.org/pull/584


The branch will not be created until 30 June. I trust that this will be 
sorted out by then.


Re: DMD 2.066 Alpha

2014-06-10 Thread Brian Schott via Digitalmars-d-announce
Please do not tag anything until we decide if "virtual" is a 
keyword in D.


See: https://github.com/D-Programming-Language/dlang.org/pull/584


DMD 2.066 Alpha

2014-06-10 Thread Andrew Edwards via Digitalmars-d-announce
It is time to begin preparations for the next release of DMD. I am aim 
for a two week beta release to commence on 30 June with branching of 
2.066 and end on 7 July with the release of 2.066.0.


Concurrently with this release, I would like to produce a maintenance 
release for 2.065. Please identify "non breaking" changes, ICEs, and 
regressions that are suitable of including in the 2.065.1.


Regards,
Andrew


Re: hap.random: a new random number library for D

2014-06-10 Thread bearophile via Digitalmars-d-announce

Joseph Rushton Wakeling:


I'll implement R250/521 for you, though.


Please stop, I am not worth that, and I don't even know how much 
good that generator is. So for you it's better to focus on more 
important matters of the new random module. Extra generators can 
be added later if needed.




It'd be interesting to see if this has any speed implications.


Passing several cpu words by value for each generated value seems 
not very efficient. But this generator is for special situations, 
so a certain performance loss could be acceptable. And if the 
compiler is able to inline the functions, the data transfer 
overhead is removed, and most of the performance loss is restored 
(but I don't know if non-templated Phobos functions get inlined).


Bye,
bearophile


Re: K-Nearest Neighbor + pointger alignments

2014-06-10 Thread Walter Bright via Digitalmars-d-announce

On 6/10/2014 1:46 AM, bearophile wrote:

I don't like D to
throw away static information that can be used to avoid run-time crashes, this
is the opposite of what is usually called a safe language.


To be pedantic, D being a "safe" language means "memory safe", not "no seg 
faults of any sort".


Memory safety means that all memory accessed is valid memory, i.e. no memory 
corruption.


http://en.wikipedia.org/wiki/Memory_safety

Memory alignment seg faults are something else entirely.


Re: hap.random: a new random number library for D

2014-06-10 Thread Chris Cain via Digitalmars-d-announce

Hey again Joe,

I had an opportunity to give the entire code a good once over 
read and I have a few comments.


1. Biggest thing about the new hap.random is how much nicer it is 
to actually READ. The first few times I went through the current 
std.random, I remember basically running out of breath. 
hap.random was almost a refreshing read, in contrast. I'm 
guessing it has a lot to do with breaking it down into smaller, 
more manageable pieces. Regardless, good work on that. I suspect 
it'll make it easier to contribute to in the future.
2. Something I'd really like to see is for the seed-by-range 
functions to take the range by reference instead of by value to 
ensure that the seed values used are less likely to be used in 
another RNG inadvertently later. Basically, I envision a similar 
problem with seedRanges as we currently have with RNGs where we 
have to make sure people are careful with what they do with the 
ranges in the end. This should cover use cases where users do 
things like `blah.seed(myEntropyRange.take(3))` as well, so that 
might take some investigation to figure out how realistic it 
would be to support.
3. I'd also REALLY like to see seed support ranges/values giving 
ANY type of integer and guarantee that few bytes are wasted (so, 
if it supplies 64-bit ints and the generator's internal state 
array only accepts 32-bit ints, it should spread the 64-bit int 
across two cells in the array). I have working code in another 
language that does this, and I wouldn't mind porting it to D for 
the standard library. I think this would greatly simplify the 
seeding process in user code (since they wouldn't have to care 
what the internal representation of the Random state is, then).
4. I'd just like to say the idea of using ranges for seeds gets 
me giddy because I could totally see a range that queries 
https://random.org for true random bits to seed with, wrapped by 
a range that zeroes out the memory on popFront. Convenient and 
safe (possibly? Needs review before I get excited, obviously) for 
crypto purposes!
5. Another possible improvement would be something akin to a 
"remix" function. It should work identically to reseeding, but 
instead of setting the internal state to match the seed (as I see 
in 
https://github.com/WebDrake/hap/blob/master/source/hap/random/generator.d#L485), 
remixing should probably be XOR'd into the current state. That 
way if you have a state based on some real entropy, you can 
slowly, over time, drip in more entropy into the state.
6. I'd like to see about supporting xorshift1024 as well 
(described here: http://xorshift.di.unimi.it/ and it's public 
domain code, so very convenient to port ... I'd do it too, of 
course, if that seems like an okay idea). This is a really small 
thing because xorshift1024 isn't really much better than 
xorshift128 (but some people might like the idea of it having 
significantly longer period).



Why not write to the paper's author and ask about it?


Done :) ... if I get a response, I'll make sure to incorporate 
everything said.


Re: hap.random: a new random number library for D

2014-06-10 Thread Joseph Rushton Wakeling via Digitalmars-d-announce

On Tuesday, 10 June 2014 at 21:02:54 UTC, bearophile wrote:
Sorry, the R250/521 idea and the strongly pure idea are 
unrelated to each other.


Ah, good.  That makes things simpler.  I'll implement R250/521 
for you, though.


For the strongly pure random generator we should choose a 
generator with a small internal state (let's say less than 5 
CPU words, they get passed by immutable value).


We might be able to rework the Xorshift generators in this way -- 
they all rely on a very small internal state.  It'd be 
interesting to see if this has any speed implications.


Re: hap.random: a new random number library for D

2014-06-10 Thread bearophile via Digitalmars-d-announce

Joseph Rushton Wakeling:

Forgive me if I'm missing something obvious, but as it stands I 
don't see how the R250/521 algorithm you pointed me to can be 
strongly pure.


Sorry, the R250/521 idea and the strongly pure idea are unrelated 
to each other.




but wouldn't that be a memory allocation nightmare?


For the strongly pure random generator we should choose a 
generator with a small internal state (let's say less than 5 CPU 
words, they get passed by immutable value).


Bye,
bearophile


Re: hap.random: a new random number library for D

2014-06-10 Thread Joseph Rushton Wakeling via Digitalmars-d-announce

On Tuesday, 10 June 2014 at 11:32:54 UTC, bearophile wrote:
So can you can generate random values in strongly pure 
functions with this? You can allocate the RNG class inside the 
function... If that's right, then is this simple strongly pure 
random generator worth adding to std.random2?


Forgive me if I'm missing something obvious, but as it stands I 
don't see how the R250/521 algorithm you pointed me to can be 
strongly pure.  As it's defined in the link you pointed me to, 
it's accessing (and updating) global mutable state.


It would surely be possible to define it to take as input 
constant buffers, and to return constant buffers, which ought to 
allow purity -- but wouldn't that be a memory allocation 
nightmare?


Can you clarify what you're thinking of here it terms of D's 
strong purity?


Re: hap.random: a new random number library for D

2014-06-10 Thread Joseph Rushton Wakeling via Digitalmars-d-announce

On Tuesday, 10 June 2014 at 10:37:17 UTC, Kagamin wrote:

Pass it by reference, I see no reason why MT can't be pure.


For what it's worth, the Mersenne Twister in hap.random is 
already weakly pure (.front and .popFront are both pure methods).


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-10 Thread bearophile via Digitalmars-d-announce
At about 40.42 in the "Thoughts on static regex" there is written 
"even compile-time printf would be awesome". There is a patch 
about __ctWrite in GitHug, it should be fixed and merged.


Bye,
bearophile


Re: Embarrassment of riches: another talk came online today

2014-06-10 Thread deadalnix via Digitalmars-d-announce
On Tuesday, 10 June 2014 at 16:30:31 UTC, Andrei Alexandrescu 
wrote:

"Leverage" - my talk at Lang.NEXT.

http://www.reddit.com/r/programming/comments/27sp6r/langnext_2014_leverage_by_andrei_alexandrescu/

https://news.ycombinator.com/newest

https://twitter.com/D_Programming/status/476400279160885248

https://www.facebook.com/dlang.org/posts/863665863647096


Andrei


I think you explanation of the "talking address of a function" is 
quite goofy, and the crowd at Land.NEXT probably knows it. C and 
C++ are literally the only languages (with D) that have this 
idiotic notion of an address of a function. Even the assembly 
code it compiler to do not !


Re: Adam D. Ruppe's "D Cookbook" now available!

2014-06-10 Thread Adam D. Ruppe via Digitalmars-d-announce
On Tuesday, 10 June 2014 at 17:31:52 UTC, Lars T. Kyllingstad 
wrote:
Like the fact that you can @disable this() for a struct, even 
though you can't implement it.


If my memory is working properly I actually think I was the one 
who suggested that to Walter a few years ago when it was 
introduced, though odds are I stole the idea from somebody else 
first and my memory is just a bit selectively egotistical :P


But I think @disable is a really cool thing for so many reasons. 
The two first ideas I had with it was the not null and ranged 
integer structs. Then the move semantics can come from it too. So 
can selectively disabling other operator overloads; forwarding 
most things to a member but filtering out some operations.


Disabling default construction still has a few compiler bugs so 
it isn't watertight but I've found it is really nice to have.


Re: Adam D. Ruppe's "D Cookbook" now available!

2014-06-10 Thread Adam D. Ruppe via Digitalmars-d-announce

On Monday, 9 June 2014 at 19:14:15 UTC, Jacob Carlborg wrote:
Adam, I noticed that you mentioned DStep in the book. By 
reading the part about integrating with C++ I got the 
impression that DStep can handle C++. Currently, that's not the 
case.


blargh, I thought it could do more. Does it at least work to pull 
out extern "C" functions from a C++ header?


Re: Adam D. Ruppe's "D Cookbook" now available!

2014-06-10 Thread Lars T. Kyllingstad via Digitalmars-d-announce

On Friday, 30 May 2014 at 11:48:57 UTC, Chris wrote:
There's _always_ something you can learn, even if you think you 
know it all.


Like the fact that you can @disable this() for a struct, even 
though you can't implement it. I didn't know that, but I have the 
perfect use case for it (and it's one which has bothered me for a 
long time). Thanks, Adam!


Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-10 Thread Dicebot via Digitalmars-d-announce
On Tuesday, 10 June 2014 at 15:37:11 UTC, Andrei Alexandrescu 
wrote:

Watch, discuss, upvote!

https://news.ycombinator.com/newest

https://twitter.com/D_Programming/status/476386465166135296

https://www.facebook.com/dlang.org/posts/863635576983458

http://www.reddit.com/r/programming/comments/27sjxf/dconf_2014_day_1_talk_4_inside_the_regular/


Andrei


http://youtu.be/hkaOciiP11c


Re: Lang.NEXT panel

2014-06-10 Thread justme via Digitalmars-d-announce
On Wednesday, 4 June 2014 at 06:13:39 UTC, Andrei Alexandrescu 
wrote:
Of possible interest. 
http://www.reddit.com/r/programming/comments/278twt/panel_systems_programming_in_2014_and_beyond/


Andrei


IMHO, the coolest thing was when Rob Pike told about the tool 
they made for automatically upgrading user source code to their 
next language version.


That should be quite easy to implement now in D, and once done, 
would give much needed room for breaking changes we feel should 
be done. Pike seemed to be extremely satisfied they did it.


Embarrassment of riches: another talk came online today

2014-06-10 Thread Andrei Alexandrescu via Digitalmars-d-announce

"Leverage" - my talk at Lang.NEXT.

http://www.reddit.com/r/programming/comments/27sp6r/langnext_2014_leverage_by_andrei_alexandrescu/

https://news.ycombinator.com/newest

https://twitter.com/D_Programming/status/476400279160885248

https://www.facebook.com/dlang.org/posts/863665863647096


Andrei



Re: Interview at Lang.NEXT

2014-06-10 Thread Andrei Alexandrescu via Digitalmars-d-announce

On 6/10/14, 6:28 AM, Mattcoder wrote:

Andrei's D Talk (Day 2) is up:

http://channel9.msdn.com/Events/Lang-NEXT/Lang-NEXT-2014/D

Matheus.


Topics overlap a tad with NDC's so if you watched that you may want to 
skip over the portion between 7:41 and 15:42.


Andrei


DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

2014-06-10 Thread Andrei Alexandrescu via Digitalmars-d-announce

Watch, discuss, upvote!

https://news.ycombinator.com/newest

https://twitter.com/D_Programming/status/476386465166135296

https://www.facebook.com/dlang.org/posts/863635576983458

http://www.reddit.com/r/programming/comments/27sjxf/dconf_2014_day_1_talk_4_inside_the_regular/


Andrei



Re: Interview at Lang.NEXT

2014-06-10 Thread Mattcoder via Digitalmars-d-announce

Andrei's D Talk (Day 2) is up:

http://channel9.msdn.com/Events/Lang-NEXT/Lang-NEXT-2014/D

Matheus.


Re: hap.random: a new random number library for D

2014-06-10 Thread bearophile via Digitalmars-d-announce

Joseph Rushton Wakeling:

However, I don't see any reason why one couldn't have a 
strongly pure function that purely transforms state, which 
could be wrapped by an RNG class


So can you can generate random values in strongly pure functions 
with this? You can allocate the RNG class inside the function... 
If that's right, then is this simple strongly pure random 
generator worth adding to std.random2?


Bye,
bearophile


Re: hap.random: a new random number library for D

2014-06-10 Thread Joseph Rushton Wakeling via Digitalmars-d-announce

On Tuesday, 10 June 2014 at 10:21:39 UTC, bearophile wrote:
I have appreciated to use this generator (but I am not yet sure 
how much good it is. I have seen it's fast and sufficiently 
good for some of my simpler purposes):

http://en.literateprograms.org/R250/521_%28C%29


Should be straightforward enough to implement. :-)

Is it worth having a fully pure generator that takes a constant 
state and returns the modified state? (The state should be 
small, so Mersenne Twister is not fit for this). Writing such 
generator is easy, but then how do you use it with the API of 
the functions of the random module?


The API as given is of course designed to create ranges of random 
variates, and that in turn means that you're dealing with weakly 
pure class methods.


However, I don't see any reason why one couldn't have a strongly 
pure function that purely transforms state, which could be 
wrapped by an RNG class or otherwise used as needed.


Re: hap.random: a new random number library for D

2014-06-10 Thread bearophile via Digitalmars-d-announce

Kagamin:


Pass it by reference, I see no reason why MT can't be pure.


I meant strongly pure :-)

Bye,
bearophile


Re: hap.random: a new random number library for D

2014-06-10 Thread Kagamin via Digitalmars-d-announce

Pass it by reference, I see no reason why MT can't be pure.


Re: hap.random: a new random number library for D

2014-06-10 Thread bearophile via Digitalmars-d-announce

Joseph Rushton Wakeling:


Thanks in advance for all testing and feedback.


I have appreciated to use this generator (but I am not yet sure 
how much good it is. I have seen it's fast and sufficiently good 
for some of my simpler purposes):

http://en.literateprograms.org/R250/521_%28C%29

--

Is it worth having a fully pure generator that takes a constant 
state and returns the modified state? (The state should be small, 
so Mersenne Twister is not fit for this). Writing such generator 
is easy, but then how do you use it with the API of the functions 
of the random module?


Bye,
bearophile


Re: Three^WFour Cool Things about D by Andrei Alexandrescu at NDC 2014

2014-06-10 Thread Bastiaan Veelo via Digitalmars-d-announce

Really nice. I watched it twice.

Bastiaan.


Re: hap.random: a new random number library for D

2014-06-10 Thread Joseph Rushton Wakeling via Digitalmars-d-announce

On Tuesday, 10 June 2014 at 06:53:46 UTC, Chris Cain wrote:

Awesome! I'll definitely check this out :)


Thanks, that would be great!

Would there be any chance of additional contributions, such as 
an ISAAC RNG implementation, being accepted? I wouldn't go as 
far as to guarantee it for crypto purposes, but I've been 
messing around with an implementation recently and wouldn't 
mind porting it over to D (it's based on the public domain 
implementation found on this website: 
http://burtleburtle.net/bob/rand/isaacafa.html )


Yea, it'd be great to have submissions like this.  I plan on
having a hap.random.crypto as another experimental module (i.e.
not included if you do "import hap.random", copiously labelled as
experimental until it's had a security review, etc.) so
guaranteeing crypto possibilities straight away is not a problem.
  Part of the point of hap is that it gives us a place where we
can get things wrong and correct them. ;-)

I think I'll create a 1.x.x branch for the current release
process and add a crypto module shortly in the ~master branch,
I'll ping you when that's done.

So far the numbers it puts out appear to be pretty good from my 
observations, PLUS it's really fast for a large number of 
outputs (it costs a lot up-front, however).


I also have a variation of "ISAAC+" as described by the paper 
here: http://eprint.iacr.org/2006/438.pdf


The problem I have with "ISAAC+", though, is that the paper 
incorrectly describes the original ISAAC algorithm (Algorithm 
1.1 fails to `xor a` at line 6) so it's unclear whether the 
paper actually solves a problem. Furthermore, I'd really prefer 
to keep that xor regardless (because it may have simply been an 
oversight but intended) so it's hard (I don't want to) to 
really call it "ISAAC+" since it is notably different than the 
paper's description.


That said, it's a paper that comes up often enough in 
discussions about ISAAC that people suggest a desire for it.


Why not write to the paper's author and ask about it?  It may
seem like a small thing, but they'll probably be grateful for the
interest and feedback.


Re: K-Nearest Neighbor + pointger alignments

2014-06-10 Thread bearophile via Digitalmars-d-announce

Ali Cehreli:


I wonder what bearophile's response will be. ;)


Despite looking like a silly sequence of optimizations, I do have 
some general comments on that text. Thanks to Kenji 
(https://github.com/D-Programming-Language/dmd/pull/3650 ) this 
code is now valid:


void foo(size_t N)(ref int[N] b) if (N == 4) {}
void main() {
int[5] a;
foo(a[1 .. $]);
}


The D type system is able to understand that if you slice away 
the first item of an an array of 5 items, you produce a pointer 
to an array of 4 items.


But the D static typing is not very strong (precise), D is not 
yet using all the fruits given by static typing.


D throws away the compile-time knowledge about pointer 
alignments, so if you write code like this without the 7 shorts 
of padding, the program crashes at run-time, because of 
misalignment:



uint distance(immutable ref short[nCols - 1] p1,
  immutable ref short[nCols - 1] p2)
pure nothrow @nogc {
alias TV = short8;
enum size_t Vlen = TV.init.array.length;
assert(p1.length % Vlen == 0);
immutable v1 = cast(immutable TV*)p1.ptr;
immutable v2 = cast(immutable TV*)p2.ptr;

TV totV = 0;
foreach (immutable i; 0 .. p1.length / Vlen) {
TV d = v1[i] - v2[i];
totV += d * d;
}

uint tot = 0;
foreach (immutable t; totV.array)
tot += t;
return tot;
}

TLabel classify(immutable short[nCols][] trainingSet,
immutable ref short[nCols - 1] pixels)
pure nothrow @nogc {
auto closestDistance = uint.max;
auto closestLabel = TLabel.max;

foreach (immutable ref s; trainingSet) {
immutable dist = pixels.distance(s[1 .. $]);
if (dist < closestDistance) {
closestDistance = dist;
closestLabel = s[labelIndex];
}
}

return closestLabel;
}



In this program there is all the information necessary to compute 
simply at compile-time the alignment of v1 and v2 and generate a 
compile-time error if you try to perform SIMD operations using 
such unaligned pointers. I don't like D to throw away static 
information that can be used to avoid run-time crashes, this is 
the opposite of what is usually called a safe language.


D type system is able to keep the length of arrays at 
compile-time, allowing data types like ushort[N], but in a system 
language that allows such simple usage of SIMD with core.simd 
it's also useful to encode in the pointer/array type the 
alignment.



So this code should not compile:

uint distance(immutable ref short[nCols - 1] p1,
  immutable ref short[nCols - 1] p2)
pure nothrow @nogc {
alias TV = short8;
enum size_t Vlen = TV.init.array.length;
assert(p1.length % Vlen == 0);
immutable v1 = cast(immutable TV*)p1.ptr;
immutable v2 = cast(immutable TV*)p2.ptr;

TV totV = 0;
foreach (immutable i; 0 .. p1.length / Vlen) {
TV d = v1[i] - v2[i];
totV += d * d;


While this should compile:

uint distance(immutable ref align(16) short[nCols - 1] p1,
  immutable ref align(16) short[nCols - 1] p2)
pure nothrow @nogc {
alias TV = short8;
enum size_t Vlen = TV.init.array.length;
assert(p1.length % Vlen == 0);
immutable v1 = cast(immutable TV*)p1.ptr;
immutable v2 = cast(immutable TV*)p2.ptr;

TV totV = 0;
foreach (immutable i; 0 .. p1.length / Vlen) {
TV d = v1[i] - v2[i];
totV += d * d;


And now this function that calls distance should not compile:

TLabel classify(immutable short[nCols][] trainingSet,
immutable ref short[nCols - 1] pixels)
pure nothrow @nogc {
auto closestDistance = uint.max;
auto closestLabel = TLabel.max;

foreach (immutable ref s; trainingSet) {
immutable dist = pixels.distance(s[1 .. $]); // error


And now this should compile:

TLabel classify(immutable align(16) short[nCols][] trainingSet,
immutable ref align(16) short[nCols - 1] pixels)
pure nothrow @nogc {
auto closestDistance = uint.max;
auto closestLabel = TLabel.max;

foreach (immutable ref s; trainingSet) {
immutable dist = pixels.distance(s[8 .. $]);



And this should compile because std.file.read returns memory 
allocated by the GC that is align(16):


align(16) immutable(short[nCols])[] readData(size_t nCols)(in 
string fileName) {

return cast(typeof(return))std.file.read(fileName);
}


Where the alignment is not known at compile-time the D compiler 
could add run-time alignment asserts in debug builds, to give 
nice run-time error messages.


The simpler int[4] or int[] types are still valid and usable, 
they could be align(1). If you think of them as align(16) you are 
writing faith-based code when you use SIMD instructions.



Automatic variables (stack-allocated) too could allow alignment 
annotations (perhaps ldc2 is already supporting this syntax):


void main() {
align(16) ubyte[60] ubs;
}


I discussed this topic another time in past:
http://forum.dla