Re: Performance of tables slower than built in?

2019-05-24 Thread Basilez B. via Digitalmars-d-learn

On Thursday, 23 May 2019 at 10:16:42 UTC, Alex wrote:

On Wednesday, 22 May 2019 at 08:25:58 UTC, Basile B. wrote:

On Wednesday, 22 May 2019 at 00:22:09 UTC, JS wrote:
I am trying to create some fast sin, sinc, and exponential 
routines to speed up some code by using tables... but it 
seems it's slower than the function itself?!?


[...]


Hi, lookup tables ARE faster but the problem you have here, 
and I'm surprised that nobody noticed it so far, is that YOUR 
SWITCH LEADS TO A RUNTIME STRING COMPARISON AT RUNTIME. Just 
replace it with a static if (Method = "Linear") { /*...*/} 
else { /*...*/}


Also takes care to the type used. With DMD the implicit 
coercion of float and double can lead to extra conversions.


You'll directly see a 15% gain after refactoring the switch.


Surely not?!?! Surely the compiler can optimize that switch 
since the value passed is CT? I thought the whole point of not 
having static switch(analogous to static if) was because it 
would go ahead and optimize these cases for us... and it's just 
a switch, just a jmp table.


Try by yourself but to be clear note that I don't like your 
attitude, which I find disrespectful. I'm just here to help, I 
explained you the big problem you have in your code and you start 
discussing something that's not to be discussed AT ALL. Look at 
this https://d.godbolt.org/z/vtzVdp, in sinTab you'll be able to 
see


  callpure nothrow @nogc @safe int 
object.__switch!(immutable(char), "Linear", 
"Constant").__switch(scope const(immutable(char)[]))@PLT


and this even with LDC2 -O3. That's why your LUT is so slow. 
refactor the switch with "static if", the branch will be 
eliminated and you'll to see the perf improvement.


Re: Using D's precise GC when running an app with DUB

2019-05-24 Thread Per Nordlöw via Digitalmars-d-learn

On Thursday, 23 May 2019 at 15:25:31 UTC, Per Nordlöw wrote:

You mean wise versa, right?


Nevermind that comment. No "wise versa". You're answer is 
correct, rikki cattermole.


Thanks


Re: Performance of tables slower than built in?

2019-05-24 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Friday, 24 May 2019 at 08:33:34 UTC, Ola Fosheim Grøstad wrote:
So the LUT is 3-4 times faster even with your quarter-LUT 
overhead.


4-5 times faster actually, since I made the LUT size known at 
compiletime.


Re: Performance of tables slower than built in?

2019-05-24 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Thursday, 23 May 2019 at 21:47:45 UTC, Alex wrote:
Either way, sin it's still twice as fast. Also, in the code the 
sinTab version is missing the writeln so it would have been 
faster.. so it is not being optimized out.


Well, when I run this modified version:

https://gist.github.com/run-dlang/9f29a83b7b6754da98993063029ef93c

on https://run.dlang.io/

then I get:

LUT:709
sin(x): 2761

So the LUT is 3-4 times faster even with your quarter-LUT 
overhead.






Re: Performance of tables slower than built in?

2019-05-24 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Friday, 24 May 2019 at 08:33:34 UTC, Ola Fosheim Grøstad wrote:


https://gist.github.com/run-dlang/9f29a83b7b6754da98993063029ef93c


I made an error here:

"return s*((1 - f)*QuarterSinTab[ai&511] + 
f*QuarterSinTab[(ai+1)&511]);"


Should of course be:

return s*((1 - f)*QuarterSinTab[ai&511] + 
f*QuarterSinTab[(ai&511)+1]);


However that does not impact the performance.



Re: Blog Post #0038 - Dialogs IV - Saving a File

2019-05-24 Thread Ron Tarrant via Digitalmars-d-learn

Almost forgot...

I also redid the titles for all posts to clarify and group them 
under various themes.


Blog Post #0038 - Dialogs IV - Saving a File

2019-05-24 Thread Ron Tarrant via Digitalmars-d-learn
Today's blog post over on gtkDcoding.com is about using a GTK 
dialog for saving a file. You can find it here: 
http://gtkdcoding.com/2019/05/24/0038-file-save-dialog.html


Re: Blog Post #0038 - Dialogs IV - Saving a File

2019-05-24 Thread Radu via Digitalmars-d-learn

On Friday, 24 May 2019 at 09:52:38 UTC, Ron Tarrant wrote:
Today's blog post over on gtkDcoding.com is about using a GTK 
dialog for saving a file. You can find it here: 
http://gtkdcoding.com/2019/05/24/0038-file-save-dialog.html


Interesting posts you have.

I might not be the first one to ask for this, but including some 
screen shots when talking about UI is usually a good idea.




Is there a way to disable copying of an alias this'd member? - trying to make a NotNull type

2019-05-24 Thread aliak via Digitalmars-d-learn

Basically, I want this to fail:

auto notNull(T, Args...)(Args args) {
return NotNull!T(new T(args));
}

struct NotNull(T) {
  private T _value;
  @property ref inout(T) value() inout { return this._value; }
  alias value this;
  //disable opAssign to null as well
}

class C {}
void func(ref C t) {
  t = null;
}

auto a = notNull!C;
func(a); // i want a compile error here

Any ideas that don't involve disabling copying or making the 
property non-ref?


Full example here: https://run.dlang.io/is/ubOwkd

Thanks!



Re: Performance of tables slower than built in?

2019-05-24 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Wednesday, 22 May 2019 at 00:22:09 UTC, JS wrote:

for(int i = 0; i < res; i++)
QuarterSinTab[i] = sin(PI*(i/cast(double)res)); 


Btw, I think this creates a half sine, not a quarter, so you want 
(?):


 QuarterSinTab[i] = sin(PI*(0.5*i/cast(double)res));


QuarterSinTab[$-1] = QuarterSinTab[0];


This creates a discontinuity if you create a quarter sine, in 
that case you probably wanted:


   QuarterSinTab[$-1] = sin(PI*0.5)

Otherwise you will never get 1 or -1.

But none of these affect performance.



Re: DLS server can't install on my pc

2019-05-24 Thread Laurent Tréguier via Digitalmars-d-learn

On Wednesday, 22 May 2019 at 12:59:13 UTC, greatsam4sure wrote:
I am having some difficulty installing DLS for dlang 1.16.4 the 
visual studio code plugin for Dlang on my pc-windows 10 Lenovo 
laptop ci7. it actually install in my ci3 running windows 10. 
It says this app can't install on this pc.



I will be appreciate any help


I have a few questions to clarify some things:
- What is the exact edition of Windows you are using ?
- What is your processor's architecture ? (DLS only supports 
x86_64 processors)
- What do you get from manually running `dub fetch dls; dub run 
dls:bootstrap` in a command prompt ?


Re: Is there a way to disable copying of an alias this'd member? - trying to make a NotNull type

2019-05-24 Thread Simen Kjærås via Digitalmars-d-learn

On Friday, 24 May 2019 at 10:16:50 UTC, aliak wrote:

Basically, I want this to fail:

auto notNull(T, Args...)(Args args) {
return NotNull!T(new T(args));
}

struct NotNull(T) {
  private T _value;
  @property ref inout(T) value() inout { return this._value; }
  alias value this;
  //disable opAssign to null as well
}

class C {}
void func(ref C t) {
  t = null;
}

auto a = notNull!C;
func(a); // i want a compile error here

Any ideas that don't involve disabling copying or making the 
property non-ref?


Pretty sure that can't be done. On the other hand, why is the 
property ref if you're explicitly not going to use its ref-ness? 
Alternatively, can you show me how you use its ref-ness?


And just for completeness, you are aware that alias this takes an 
overload set, so that this works?


   struct NotNull(T) {
   private T _value;
   @property inout(T) value() inout { return _value; }
   @property void value(T val) { _value = val; } // new
   alias value this;
   // disable opAssign to null as well
   }

   class C {}
   void func(ref C t) { t = null; }

   unittest {
   NotNull n;
   n = new C(); // Look ma, I'm assigning without ref!
   func(n); // Does not compile - value() doesn't return by 
ref

   }

--
  Simen


Re: Is there a way to disable copying of an alias this'd member? - trying to make a NotNull type

2019-05-24 Thread Simen Kjærås via Digitalmars-d-learn

On Friday, 24 May 2019 at 10:40:01 UTC, Simen Kjærås wrote:

   NotNull n;


Typo. Should be NotNull!C n;

--
  Simen


Re: Blog Post #0038 - Dialogs IV - Saving a File

2019-05-24 Thread Ron Tarrant via Digitalmars-d-learn

On Friday, 24 May 2019 at 10:09:06 UTC, Radu wrote:


Interesting posts you have.

Thanks.

I might not be the first one to ask for this, but including 
some screen shots when talking about UI is usually a good idea.

You're right; you're not. :)

I'm still debating this idea, but I won't say I'm adamantly 
opposed to it. Research into how people learn seems to support 
both sides of the argument: should imagery be included or not?


And I suppose if I ever pull all this stuff together into a book, 
it'll be expected.


Re: Blog Post #0038 - Dialogs IV - Saving a File

2019-05-24 Thread Radu via Digitalmars-d-learn

On Friday, 24 May 2019 at 11:06:47 UTC, Ron Tarrant wrote:

I'm still debating this idea, but I won't say I'm adamantly 
opposed to it. Research into how people learn seems to support 
both sides of the argument: should imagery be included or not?


And I suppose if I ever pull all this stuff together into a 
book, it'll be expected.


For learning it is important so you can immediately have a 
reference to compare against.


But there is also the marketing effect, these posts will be more 
inviting for newcomers, especially ones coming from no D 
experience. Might worth considering even for this reason as your 
posts could be a powerful marketing tool for D.


Good luck with your book!


Re: Is there a way to disable copying of an alias this'd member? - trying to make a NotNull type

2019-05-24 Thread aliak via Digitalmars-d-learn

On Friday, 24 May 2019 at 10:40:01 UTC, Simen Kjærås wrote:

On Friday, 24 May 2019 at 10:16:50 UTC, aliak wrote:

Basically, I want this to fail:

auto notNull(T, Args...)(Args args) {
return NotNull!T(new T(args));
}

struct NotNull(T) {
  private T _value;
  @property ref inout(T) value() inout { return this._value; }
  alias value this;
  //disable opAssign to null as well
}

class C {}
void func(ref C t) {
  t = null;
}

auto a = notNull!C;
func(a); // i want a compile error here

Any ideas that don't involve disabling copying or making the 
property non-ref?


Pretty sure that can't be done. On the other hand, why is the 
property ref if you're explicitly not going to use its 
ref-ness? Alternatively, can you show me how you use its 
ref-ness?


It's ref so that you can do this for e.g.

class C { int i; }
auto a = notNull!C;
a.i = 3;

I guess maybe there's a way to go about supporting that kinda 
thing with opDispatch. But I've tried doing that in the optional 
type [0] and it's rather painful to get right, and trickier to 
make @safe as well.




And just for completeness, you are aware that alias this takes 
an overload set, so that this works?


   struct NotNull(T) {
   private T _value;
   @property inout(T) value() inout { return _value; }
   @property void value(T val) { _value = val; } // new
   alias value this;
   // disable opAssign to null as well
   }

   class C {}
   void func(ref C t) { t = null; }

   unittest {
   NotNull n;
   n = new C(); // Look ma, I'm assigning without ref!
   func(n); // Does not compile - value() doesn't return by 
ref

   }

--
  Simen


Si si. I is aware. I've disabled opAssign to T actually (i think 
i have at least). And I only allow assigning to another NotNull 
else there's no way to guarantee that the NotNull stays non null.


It looks like I'm going to have to sacrifice being able to 
manipulate member variables of the type the NotNull is wrapping, 
or give up on the guarantee of the inner value not being null.


[0] https://code.dlang.org/packages/optional



Re: Performance of tables slower than built in?

2019-05-24 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Friday, 24 May 2019 at 08:33:34 UTC, Ola Fosheim Grøstad wrote:

On Thursday, 23 May 2019 at 21:47:45 UTC, Alex wrote:
Either way, sin it's still twice as fast. Also, in the code 
the sinTab version is missing the writeln so it would have 
been faster.. so it is not being optimized out.


Well, when I run this modified version:

https://gist.github.com/run-dlang/9f29a83b7b6754da98993063029ef93c

on https://run.dlang.io/

then I get:

LUT:709
sin(x): 2761

So the LUT is 3-4 times faster even with your quarter-LUT 
overhead.


FWIW, as far as I can tell I managed to get the lookup version 
down to 104 by using bit manipulation tricks like these:


auto fastQuarterLookup(double x){
const ulong mantissa = cast(ulong)( (x - floor(x)) * 
(cast(double)(1UL<<63)*2.0) );
const double sign = 
cast(double)(-cast(uint)((mantissa>>63)&1));

… etc

So it seems like a quarter-wave LUT is 27 times faster than sin…

You just have to make sure that the generated instructions fills 
the entire CPU pipeline.





Re: Performance of tables slower than built in?

2019-05-24 Thread Alex via Digitalmars-d-learn

On Friday, 24 May 2019 at 08:13:00 UTC, Basilez B. wrote:

On Thursday, 23 May 2019 at 10:16:42 UTC, Alex wrote:

On Wednesday, 22 May 2019 at 08:25:58 UTC, Basile B. wrote:

On Wednesday, 22 May 2019 at 00:22:09 UTC, JS wrote:
I am trying to create some fast sin, sinc, and exponential 
routines to speed up some code by using tables... but it 
seems it's slower than the function itself?!?


[...]


Hi, lookup tables ARE faster but the problem you have here, 
and I'm surprised that nobody noticed it so far, is that YOUR 
SWITCH LEADS TO A RUNTIME STRING COMPARISON AT RUNTIME. Just 
replace it with a static if (Method = "Linear") { /*...*/} 
else { /*...*/}


Also takes care to the type used. With DMD the implicit 
coercion of float and double can lead to extra conversions.


You'll directly see a 15% gain after refactoring the switch.


Surely not?!?! Surely the compiler can optimize that switch 
since the value passed is CT? I thought the whole point of not 
having static switch(analogous to static if) was because it 
would go ahead and optimize these cases for us... and it's 
just a switch, just a jmp table.


Try by yourself but to be clear note that I don't like your 
attitude, which I find disrespectful.


Are you an idiot or on medications? My attitude? What did you 
want me to do? Suck your cock? My attitude? Seriously? WHERE? 
WHERE? You quoted everything?


I don't like your attitude? You seem to be extremely 
oversensitive and take things out of context?


It sounds like you just don't like people questioning you in any 
way shape or form even if you are wrong.


"Surely not?!?! Surely the compiler can optimize that switch
since the value passed is CT? I thought the whole point of not
having static switch(analogous to static if) was because it
would go ahead and optimize these cases for us... and it's
just a switch, just a jmp table."

Where? I seriously do not like your attitude though! You attack 
me when I said nothing offensive to you. You have serious 
problems. Get back on or get off the lithium.


[Now, of course, you get to see what an attitude really looks 
like... maybe I've taught you a bit about perspective? I doubt 
it.]




Re: Is there a way to disable copying of an alias this'd member? - trying to make a NotNull type

2019-05-24 Thread Simen Kjærås via Digitalmars-d-learn

On Friday, 24 May 2019 at 11:40:20 UTC, aliak wrote:

It's ref so that you can do this for e.g.

class C { int i; }
auto a = notNull!C;
a.i = 3;


That's a valid concern for a struct, but classes are already 
reference types, so you're only adding a new layer of ref-ness. 
The above will work great with a non-ref alias this.



--
  Simen


Re: DLS server can't install on my pc

2019-05-24 Thread Laurent Tréguier via Digitalmars-d-learn

On Wednesday, 22 May 2019 at 12:59:13 UTC, greatsam4sure wrote:
I am having some difficulty installing DLS for dlang 1.16.4 the 
visual studio code plugin for Dlang on my pc-windows 10 Lenovo 
laptop ci7. it actually install in my ci3 running windows 10. 
It says this app can't install on this pc.



I will be appreciate any help


I just launched VSCode on Windows, and Windows Defender started 
telling me that DLS v0.25.9 contained a Trojan. The releases are 
automated and produced by AppVeyor [1], so it should probably be 
a false positive. Maybe this is what happened to you ?


A new patch (v0.25.10) is out now, and my Windows Defender 
doesn't seem to consider that one a threat apparently.



[1] https://ci.appveyor.com/project/dlanguageserver/dls


Re: Performance of tables slower than built in?

2019-05-24 Thread Alex via Digitalmars-d-learn

On Friday, 24 May 2019 at 11:45:46 UTC, Ola Fosheim Grøstad wrote:
On Friday, 24 May 2019 at 08:33:34 UTC, Ola Fosheim Grøstad 
wrote:

On Thursday, 23 May 2019 at 21:47:45 UTC, Alex wrote:
Either way, sin it's still twice as fast. Also, in the code 
the sinTab version is missing the writeln so it would have 
been faster.. so it is not being optimized out.


Well, when I run this modified version:

https://gist.github.com/run-dlang/9f29a83b7b6754da98993063029ef93c

on https://run.dlang.io/

then I get:

LUT:709
sin(x): 2761

So the LUT is 3-4 times faster even with your quarter-LUT 
overhead.


FWIW, as far as I can tell I managed to get the lookup version 
down to 104 by using bit manipulation tricks like these:


auto fastQuarterLookup(double x){
const ulong mantissa = cast(ulong)( (x - floor(x)) * 
(cast(double)(1UL<<63)*2.0) );
const double sign = 
cast(double)(-cast(uint)((mantissa>>63)&1));

… etc

So it seems like a quarter-wave LUT is 27 times faster than sin…

You just have to make sure that the generated instructions 
fills the entire CPU pipeline.



Well, the QuarterWave was suppose to generate just a quarter 
since that is all that is required for these functions due to 
symmetry and periodicity. I started with a half to get that 
working then figure out the sign flipping.


Essentially one just has to tabulate a quarter of sin, that is, 
from 0 to 90o and then get the sin right. This allows one to have 
4 times the resolution or 1/4 the size at the same cost.


Or, to put it another say, sin as 4 fold redundancy.

I'll check out your code, thanks for looking in to it.


Re: Performance of tables slower than built in?

2019-05-24 Thread Alex via Digitalmars-d-learn

On Friday, 24 May 2019 at 11:57:44 UTC, Alex wrote:

On Friday, 24 May 2019 at 08:13:00 UTC, Basilez B. wrote:

On Thursday, 23 May 2019 at 10:16:42 UTC, Alex wrote:

On Wednesday, 22 May 2019 at 08:25:58 UTC, Basile B. wrote:

On Wednesday, 22 May 2019 at 00:22:09 UTC, JS wrote:
I am trying to create some fast sin, sinc, and exponential 
routines to speed up some code by using tables... but it 
seems it's slower than the function itself?!?


[...]


Hi, lookup tables ARE faster but the problem you have here, 
and I'm surprised that nobody noticed it so far, is that 
YOUR SWITCH LEADS TO A RUNTIME STRING COMPARISON AT RUNTIME. 
Just replace it with a static if (Method = "Linear") { 
/*...*/} else { /*...*/}


Also takes care to the type used. With DMD the implicit 
coercion of float and double can lead to extra conversions.


You'll directly see a 15% gain after refactoring the switch.


Surely not?!?! Surely the compiler can optimize that switch 
since the value passed is CT? I thought the whole point of 
not having static switch(analogous to static if) was because 
it would go ahead and optimize these cases for us... and it's 
just a switch, just a jmp table.


Try by yourself but to be clear note that I don't like your 
attitude, which I find disrespectful.


Are you an idiot or on medications? My attitude? What did you 
want me to do? Suck your cock? My attitude? Seriously? WHERE? 
WHERE? You quoted everything?


I don't like your attitude? You seem to be extremely 
oversensitive and take things out of context?


It sounds like you just don't like people questioning you in 
any way shape or form even if you are wrong.


"Surely not?!?! Surely the compiler can optimize that switch
since the value passed is CT? I thought the whole point of not
having static switch(analogous to static if) was because it
would go ahead and optimize these cases for us... and it's
just a switch, just a jmp table."

Where? I seriously do not like your attitude though! You attack 
me when I said nothing offensive to you. You have serious 
problems. Get back on or get off the lithium.


[Now, of course, you get to see what an attitude really looks 
like... maybe I've taught you a bit about perspective? I doubt 
it.]


What amazes me is that you Basilez come in and start stuff and 
yet my response to you starting stuff will get deleted and yours 
won't... that is typical.


You really have mental issues. No where in my post was I hostile 
to you and yet you decided to have a hissy fit.  You are a 
snowflake that obviously need a ass kicking. That will teach you 
not to be so sensitive and interpret things in your own little 
cushioned wall universe.


See, no one was attacking you or questioning your intelligence... 
that is something you perceived and made up all on your own. But 
you will be supported by all the other people who love cushy 
walls and koolaid rather than getting the help you need.





Re: Is there a way to disable copying of an alias this'd member? - trying to make a NotNull type

2019-05-24 Thread aliak via Digitalmars-d-learn

On Friday, 24 May 2019 at 12:03:08 UTC, Simen Kjærås wrote:

On Friday, 24 May 2019 at 11:40:20 UTC, aliak wrote:

It's ref so that you can do this for e.g.

class C { int i; }
auto a = notNull!C;
a.i = 3;


That's a valid concern for a struct, but classes are already 
reference types, so you're only adding a new layer of ref-ness. 
The above will work great with a non-ref alias this.



--
  Simen


Ah true. Yes I guess I can do this for ref types indeed. And 
maybe I can just constrain it to classes, interfaces and pointers 
🤔


Cheers,
- Ali


Re: Performance of tables slower than built in?

2019-05-24 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Friday, 24 May 2019 at 12:01:55 UTC, Alex wrote:
Well, the QuarterWave was suppose to generate just a quarter 
since that is all that is required for these functions due to 
symmetry and periodicity. I started with a half to get that 
working then figure out the sign flipping.


Sure, it is a tradeoff. You pollute the cache less this way, but 
you have to figure out the sign and the lookup-direction.


The trick is then to turn the phase into an unsigned integer then 
you get:


1. the highest bit will tell you that you need to use the inverse 
sign for the result.
2. the next highest bit will tell you that you need too look up 
in the reverse direction


What is key to performance here is that x86 can do many simple 
integer/bit operations in parallel, but only a few floating point 
operations.


Also avoid all conditionals. Use bitmasking instead, something 
along the line of:


const ulong phase = mantissa^((1UL<<63)-((mantissa>>62)&1));
const uint quarterphase = (phase>>53)&511;

(Haven't checked the correctness of this, but this shows the 
general principle.)


Ola.







Re: Performance of tables slower than built in?

2019-05-24 Thread Alex via Digitalmars-d-learn

On Friday, 24 May 2019 at 11:45:46 UTC, Ola Fosheim Grøstad wrote:
On Friday, 24 May 2019 at 08:33:34 UTC, Ola Fosheim Grøstad 
wrote:

On Thursday, 23 May 2019 at 21:47:45 UTC, Alex wrote:
Either way, sin it's still twice as fast. Also, in the code 
the sinTab version is missing the writeln so it would have 
been faster.. so it is not being optimized out.


Well, when I run this modified version:

https://gist.github.com/run-dlang/9f29a83b7b6754da98993063029ef93c

on https://run.dlang.io/

then I get:

LUT:709
sin(x): 2761

So the LUT is 3-4 times faster even with your quarter-LUT 
overhead.


FWIW, as far as I can tell I managed to get the lookup version 
down to 104 by using bit manipulation tricks like these:


auto fastQuarterLookup(double x){
const ulong mantissa = cast(ulong)( (x - floor(x)) * 
(cast(double)(1UL<<63)*2.0) );
const double sign = 
cast(double)(-cast(uint)((mantissa>>63)&1));

… etc

So it seems like a quarter-wave LUT is 27 times faster than sin…



If so then that is great and what I'd expected to achieve 
originally.


I guess this is using LDC though? I wasn't able to compile with 
LDC since after updating I'm getting linker errors that I have to 
go figure out.


You just have to make sure that the generated instructions 
fills the entire CPU pipeline.


What exactly does this mean? I realize the pipeline in cpu's is 
how the cpu decodes and optimizes the instructions but when you 
say "You have to make sure" that pre-supposes there is a method 
or algorithm to know.


Are you saying that I did not have enough instructions that the 
pipeline could take advantage of?



In any case, I'll check your code out and try to figure out the 
details and see exactly what is going on.


If it truly is a 27x faster then then that is very relevant and 
knowing why is important.


Of course, a lot of that might simply be due to LDC and I wasn't 
able to determine this.


Can you do some stats for dmd and ldc?

You seem to be interested in this, are you up for a challenge?


The idea is to use tables to optimize these functions.

Half sin was done above but quarter sine can be used(there are 4 
quadrants but only one has to be tabularized because all the 
others differ by sign and reversion(1 - x), it's a matter of 
figuring out the sign).


Of course it requires extra computation so it would be 
interesting to see the difference in performance for the extra 
logic.


Then there is exp

exp(x) can be written as exp(floor(x) + {x}) = 
exp(floor(x))*exp({x})


and so one can optimize this by tabulating exp(x) for 0<= x < 1 
which is the fractional part of exp(x).


Then tabulating it for a wide range of integers(probably in 2's).


e.g.,

exp(3.5) = exp(3)*exp(.5)

both come from a lookup table.

or one could do

exp(3) = exp(1 + 1 + 1) = exp(1)*exp(1)*exp(1)

(this requires iteration if we do not tabulate exp(3).

Hence one would limit the iteration count by tabulating things 
like exp(10^k) and exp(k) for -10 < k < 10.


The idea is basically one can get really dense LUT's for a small 
range that then are used to build the function for arbitrary 
inputs.


With linear interpolation one can get very accurate(for all 
practical purposes) LUT table methods that, if your code is 
right, is at least an order of magnitude faster. The memory 
requirements will be quite small with linear interpolation(and 
ideally quadratic or cubic if the costs are not too high).


That was what I was starting to work on before I got thrown off 
by it being much slower.


It seems you already have the half-sin done.

One could do things like sin, cos(obviously easy), exp, exp()^2, 
erf(x), sinh, cosh, etc. Things like sec could also be done as it 
would save a division since it seems they take about 30 cycles. 
But it would depend on the memory used.


[I can't mess with this now because I've gotta work other things 
at the moment]


Thanks.




















Re: Performance of tables slower than built in?

2019-05-24 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Friday, 24 May 2019 at 12:24:02 UTC, Alex wrote:
So it seems like a quarter-wave LUT is 27 times faster than 
sin…




If so then that is great and what I'd expected to achieve 
originally.


I guess this is using LDC though? I wasn't able to compile with 
LDC since after updating I'm getting linker errors that I have 
to go figure out.


Yes, the gist linked above is just your code with minor changes, 
that was 4-5 times faster. To get to 27 times faster you need to 
use the integer bit-manipulation scheme that I suggest above. 
Just beware that I haven't checked the code, so it might be off 
by ±1 and such.


Anyway, it is more fun for you to code up your own version than 
to try to figure out mine. Just follow the principles and you 
should get close to that performance, I think. (I'll refine the 
code later, but don't really have time now)



You just have to make sure that the generated instructions 
fills the entire CPU pipeline.


What exactly does this mean? I realize the pipeline in cpu's is 
how the cpu decodes and optimizes the instructions but when you 
say "You have to make sure" that pre-supposes there is a method 
or algorithm to know.


Yes, you have to look up information about the CPU in your 
computer. Each core has a set of "lanes" that are computed 
simultanously. Some instructions can go into many lanes, but not 
all. Then there might be bubbles in the pipeline (the lane) that 
can be filled up with integer/bit manipulation instructions. It 
is tedious to look that stuff up. So, last resort. Just try to 
mix simple integer with simple double computations (avoid 
division).



Are you saying that I did not have enough instructions that the 
pipeline could take advantage of?


Yes, you most likely got bubbles. Empty space where the core has 
nothing to send down a lane, because it is waiting for some 
computation to finish so that it can figure out what to do next.


Basic optimization:

Step 1: reduce dependencies between computations

Step 2: make sure you generate a mix of simple integer/double 
instructions that can fill up all the computation lanes at the 
same time


Step 3: make sure loops only contain a few instructions, the CPU 
can unroll loops in hardware if they are short. (not valid here 
though)



Of course, a lot of that might simply be due to LDC and I 
wasn't able to determine this.


I think I got better performance  because I filled more lanes in 
the pipeline.



Half sin was done above but quarter sine can be used(there are 
4 quadrants but only one has to be tabularized because all the 
others differ by sign and reversion(1 - x), it's a matter of 
figuring out the sign).


Yes, as I mentioned, the first bit of the phase is the sign and 
the second bit of the phase is the reversion of the indexing.



Of course it requires extra computation so it would be 
interesting to see the difference in performance for the extra 
logic.


It adds perhaps 2-5 cycles or so, my guessing.


exp(x) can be written as exp(floor(x) + {x}) = 
exp(floor(x))*exp({x})

[...]
With linear interpolation one can get very accurate(for all 
practical purposes) LUT table methods that, if your code is 
right, is at least an order of magnitude faster. The memory 
requirements will be quite small with linear interpolation


I think you need to do something with the x before you look up, 
so that you have some kind of fast nonlinear mapping to the 
indexes.


But for exp() you might prefer an approximation instead, perhaps 
polynomial taylor series perhaps.


Searching the web should give some ideas.



It seems you already have the half-sin done.


I did the quarter sin though, not the half-sin (but that is 
almost the same, just drop the reversion of the indexing).


(Let's talk about this later, since we both have other things on 
our plate. Fun topic! :-)




Re: Performance of tables slower than built in?

2019-05-24 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Friday, 24 May 2019 at 12:56:55 UTC, Ola Fosheim Grøstad wrote:
suggest above. Just beware that I haven't checked the code, so 
it might be off by ±1 and such.


So before using such code for anything important, make sure that 
it is tested for the edge cases, like denormal numbers (values 
close to zero). Roundoff-errors where computations "accidentally" 
cross over 1.0 and stuff like that.




Re: Performance of tables slower than built in?

2019-05-24 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Friday, 24 May 2019 at 12:24:02 UTC, Alex wrote:
If it truly is a 27x faster then then that is very relevant and 
knowing why is important.


Of course, a lot of that might simply be due to LDC and I 
wasn't able to determine this.


Just one more thing you really ought to consider:

It isn't obvious that a LUT using double will be more precise 
than computing sin using single precision float.


So when I use single precision float with ldc and "-O3 
-ffast-math" then I get roughly 300ms. So in that case the LUT is 
only 3 times faster.


Perhaps not worth it then. You might as well just use float 
instead of double.


The downside is that -ffast-math makes all your computations more 
inaccurate.


It is also possible that recent processors can do multiple 
sin/cos as simd.


Too many options… I know.



Re: Performance of tables slower than built in?

2019-05-24 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Friday, 24 May 2019 at 11:45:46 UTC, Ola Fosheim Grøstad wrote:
const double sign = 
cast(double)(-cast(uint)((mantissa>>63)&1));


Yep, this was wrong (0 or -1). Should be something like (1 or -1):

const double sign = 
cast(double)(1-cast(uint)((mantissa>>62)&2));


You'll have to code it up more carefully than I did, just 
following the same principles.  (These ±1 errors do not affect 
the performance.).


Also, for comparison, just running the 2 lookups in the loop are 
at 32ms.


So 3 times that sounds reasonable for extracting the phase, 
determining the sign, reversing the phase and doing the linear 
interpolation.







Re: Blog Post #0038 - Dialogs IV - Saving a File

2019-05-24 Thread Ron Tarrant via Digitalmars-d-learn

On Friday, 24 May 2019 at 11:19:14 UTC, Radu wrote:

But there is also the marketing effect, these posts will be 
more inviting for newcomers, especially ones coming from no D 
experience. Might worth considering even for this reason as 
your posts could be a powerful marketing tool for D.


Yup, that's what the research says *in favour of* images.

However, other research indicates that images slow down learning 
because the learner gets complacent. Rather than engaging in deep 
learning, they glance at the image, copy-n-paste the code, and 
never truly understand what they're working with. At some point 
down the road, they get into trouble because they don't have that 
deep understanding and they abandon any further efforts.


So, that's the dilemma. Both arguments seem sound to me, so 
that's why I'm still on the fence.




Re: DLS server can't install on my pc

2019-05-24 Thread greatsam4sure via Digitalmars-d-learn

On Friday, 24 May 2019 at 12:03:37 UTC, Laurent Tréguier wrote:

On Wednesday, 22 May 2019 at 12:59:13 UTC, greatsam4sure wrote:
I am having some difficulty installing DLS for dlang 1.16.4 
the visual studio code plugin for Dlang on my pc-windows 10 
Lenovo laptop ci7. it actually install in my ci3 running 
windows 10. It says this app can't install on this pc.



I will be appreciate any help


I just launched VSCode on Windows, and Windows Defender started 
telling me that DLS v0.25.9 contained a Trojan. The releases 
are automated and produced by AppVeyor [1], so it should 
probably be a false positive. Maybe this is what happened to 
you ?


A new patch (v0.25.10) is out now, and my Windows Defender 
doesn't seem to consider that one a threat apparently.



[1] https://ci.appveyor.com/project/dlanguageserver/dls


no luck yet. I have visual studio 2019 install on my laptop, ci7, 
8GB ram running windows 10


I have turn off my window defender, yet it says app can't run on 
this system


Re: Performance of tables slower than built in?

2019-05-24 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

This appears to get roughly the same results as sin(x):


__gshared double[512+1] QuarterSinTab;

void init(){
const auto res = QuarterSinTab.length-1;
for(int i = 0; i < res; i++)
QuarterSinTab[i] = sin(PI*(0.5*i/cast(double)res)); 
QuarterSinTab[$-1] = sin(PI*0.5);
}

auto fastQuarterLookup(double x){
const ulong mantissa = cast(ulong)( (x - floor(x)) * 
(cast(double)(1UL<<54)) );
const double sign = cast(double)(1 - 
cast(int)((mantissa>>52)&2));
const ulong phase = 
(mantissa^((1UL<<53)-((mantissa>>52)&1)))&((1UL<<53) -1);

const uint quarterphase = (phase>>43)&511;
const double frac = 
cast(double)(phase&((1UL<<43)-1))*cast(double)(1.0/(1UL<<43));
return sign*((1.0-frac)*QuarterSinTab[quarterphase] + 
frac*QuarterSinTab[quarterphase+1]);

}


Ola.


Re: Performance of tables slower than built in?

2019-05-24 Thread Ola Fosheim Grøstad via Digitalmars-d-learn
With linear interpolation you get roughly the same result with 
floats, a little more efficient too (half the memory and a bit 
faster):


__gshared float[512+1] QuarterSinTab;

void init(){
const auto res = QuarterSinTab.length-1;
for(int i = 0; i < res; i++)
QuarterSinTab[i] = sin(PI*(0.5*i/cast(double)res)); 
QuarterSinTab[$-1] = sin(PI*0.5);
}

auto fastQuarterLookup(float x){
const uint mantissa = cast(uint)( (x - floor(x)) * 
(cast(float)(1U<<24)) );
const float sign = cast(float)(1 - 
cast(int)((mantissa>>22)&2));
const uint phase = 
(mantissa^((1U<<23)-((mantissa>>22)&1)))&((1U<<23) -1);

const uint quarterphase = (phase>>13)&511;
const float frac = 
cast(float)(phase&((1U<<13)-1))*cast(float)(1.0f/(1U<<13));
return sign*((1.0f-frac)*QuarterSinTab[quarterphase] + 
frac*QuarterSinTab[quarterphase+1]);

}


Re: Performance of tables slower than built in?

2019-05-24 Thread Alex via Digitalmars-d-learn

On Friday, 24 May 2019 at 13:57:30 UTC, Ola Fosheim Grøstad wrote:

On Friday, 24 May 2019 at 12:24:02 UTC, Alex wrote:
If it truly is a 27x faster then then that is very relevant 
and knowing why is important.


Of course, a lot of that might simply be due to LDC and I 
wasn't able to determine this.


Just one more thing you really ought to consider:

It isn't obvious that a LUT using double will be more precise 
than computing sin using single precision float.


So when I use single precision float with ldc and "-O3 
-ffast-math" then I get roughly 300ms. So in that case the LUT 
is only 3 times faster.


Perhaps not worth it then. You might as well just use float 
instead of double.


The downside is that -ffast-math makes all your computations 
more inaccurate.


It is also possible that recent processors can do multiple 
sin/cos as simd.


Too many options… I know.


The thing is, the LUT can have as much precision as one wants. 
One could even spend several days calculating it then loading it 
from disk.


I'm not sure what the real precision of the build in functions 
are but it shouldn't be hard to max out a double using standard 
methods(even if slow, but irrelevant after the LUT has been 
created).


Right now I'm just using the built ins... Maybe later I'll get 
back around to all this and make some progress.




Re: Performance of tables slower than built in?

2019-05-24 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Friday, 24 May 2019 at 17:04:33 UTC, Alex wrote:
I'm not sure what the real precision of the build in functions 
are but it shouldn't be hard to max out a double using standard 
methods(even if slow, but irrelevant after the LUT has been 
created).


LUTs are primarily useful when you use sin(x) as a signal or when 
a crude approximation is good enough.


One advantage of a LUT is that you can store a more complex 
computation than the basic function.  Like a filtered square wave.