Re: Why 16Mib static array size limit?

2016-08-18 Thread Ali Çehreli via Digitalmars-d

On 08/18/2016 05:20 AM, Johan Engelen wrote:

> `arr` is a global symbol that can be modified by code
> not seen by the compiler. So no assumptions can be made about the
> contents of arr.ptr and arr.length

Yet there is the following text in the spec:

* https://dlang.org/spec/statement.html#ForeachStatement

  "The aggregate must be loop invariant, meaning that elements to
   the aggregate cannot be added or removed from it in the
   NoScopeNonEmptyStatement."

I think the spirit of the spec includes other code outside of 
NoScopeNonEmptyStatement as well. If so, if elements are added to the 
array from other code (e.g. other threads) it should be a case of "all 
bets are off". So, I think both .ptr and .length could be cached.


Ali



Re: Why 16Mib static array size limit?

2016-08-18 Thread Johan Engelen via Digitalmars-d

On Thursday, 18 August 2016 at 12:20:50 UTC, Johan Engelen wrote:


I don't know if we can mark `private` global variables as 
internal. If so, wow :-)


Nevermind, not possible. Templates, cross-module inlining, and 
probably other reasons.


Re: Why 16Mib static array size limit?

2016-08-18 Thread Johan Engelen via Digitalmars-d

On Thursday, 18 August 2016 at 01:18:03 UTC, Yuxuan Shui wrote:

On Thursday, 18 August 2016 at 00:20:32 UTC, Chris Wright wrote:


The language can analyze all code that affects a local 
variable in many cases. You don't always need the language to 
guarantee it's impossible if the compiler can see that the 
user isn't doing anything funky.


That's right. But for Ali's code, the compiler is clearly not 
smart enough.


Perhaps not smart enough, but it is very close to being smart 
enough. Perhaps we, the LDC devs, weren't smart enough in using 
the LLVM backend. ;-)


For Ali's code, `arr` is a global symbol that can be modified by 
code not seen by the compiler. So no assumptions can be made 
about the contents of arr.ptr and arr.length, and thus `arr[i]` 
could index into `arr` itself.


Now, if we tell [*] the compiler that `arr` is not touched by 
code outside the module... the resulting machine code is 
identical with and without POINTER!


It's an interesting case to look further into. For example with 
Link-Time Optimization, delaying codegen until linktime, we can 
internalize many symbols (i.e. telling the compiler no other 
module is going to do anything with it) allowing the compiler to 
reason better about the code and generate faster executables. 
Ideally, without much user effort. It's something I am already 
looking into.


I don't know if we can mark `private` global variables as 
internal. If so, wow :-)


-Johan


[*] Unfortunately `private arr` does not have the desired effect 
(it does not make it an `internal` LLVM variable), so you have to 
hack into the LLVM IR file to make it happen.




Re: Why 16Mib static array size limit?

2016-08-18 Thread Walter Bright via Digitalmars-d

On 8/17/2016 9:16 PM, Chris Wright wrote:

So it's not surprising that the compiler doesn't handle global variables
as well as it does local ones.


Global variables are pretty much spawn of the devil. You're right that dmd does 
not make a special effort optimizing them.


The way to deal with it:

1. minimize use of globals

2. if using them in a hot loop, copy a reference to the global into a local 
variable, and use the local variable in the loop instead.


This also gets around the problem that thread local storage address calculation 
is slow and inefficient on all platforms.




Re: Why 16Mib static array size limit?

2016-08-17 Thread Chris Wright via Digitalmars-d
On Thu, 18 Aug 2016 01:18:03 +, Yuxuan Shui wrote:

> On Thursday, 18 August 2016 at 00:20:32 UTC, Chris Wright wrote:
>> On Wed, 17 Aug 2016 22:12:25 +, Yuxuan Shui wrote:
>>> But doing so would be incorrect if D doesn't provide strong aliasing
>>> guarantees. And if D does provide these guarantees,
>>> we won't need to do this manually.
>>
>> The language can analyze all code that affects a local variable in many
>> cases. You don't always need the language to guarantee it's impossible
>> if the compiler can see that the user isn't doing anything funky.
> 
> That's right. But for Ali's code, the compiler is clearly not smart
> enough.

Most of the time, the compiler can successfully make those optimizations 
for local variables. The compiler can seldom make them for global 
variables. You get into whole program analysis territory pretty fast.

So it's not surprising that the compiler doesn't handle global variables 
as well as it does local ones.


Re: Why 16Mib static array size limit?

2016-08-17 Thread Yuxuan Shui via Digitalmars-d

On Thursday, 18 August 2016 at 00:20:32 UTC, Chris Wright wrote:

On Wed, 17 Aug 2016 22:12:25 +, Yuxuan Shui wrote:
But doing so would be incorrect if D doesn't provide strong 
aliasing guarantees. And if D does provide these guarantees, 
we won't need to do this manually.


The language can analyze all code that affects a local variable 
in many cases. You don't always need the language to guarantee 
it's impossible if the compiler can see that the user isn't 
doing anything funky.


That's right. But for Ali's code, the compiler is clearly not 
smart enough.


Re: Why 16Mib static array size limit?

2016-08-17 Thread Chris Wright via Digitalmars-d
On Wed, 17 Aug 2016 22:12:25 +, Yuxuan Shui wrote:
> But doing so would be incorrect if D doesn't provide strong aliasing
> guarantees. And if D does provide these guarantees, we won't need to do
> this manually.

The language can analyze all code that affects a local variable in many 
cases. You don't always need the language to guarantee it's impossible if 
the compiler can see that the user isn't doing anything funky.


Re: Why 16Mib static array size limit?

2016-08-17 Thread Chris Wright via Digitalmars-d
On Wed, 17 Aug 2016 21:37:11 +, ketmar wrote:

> On Wednesday, 17 August 2016 at 13:27:14 UTC, deadalnix wrote:
>> Especially since there are already ways to do this is a way that makes
>> the AA happy
> exactly the thing i was writing about. "hey, you, meatbag! i, Teh Great
> Machine, said that you have to use unions, not pointers! what? making a
> pointer to union point into the middle of the buffer is exactly the same
> aliasing problem, so unions doesn't solve anything? I, Teh Great
> Machine, don't care. it is your problems, meatbag, i'm not here to serve
> you."

It makes your intent more obvious. It's more obvious to other humans as 
well as the compiler. For me, it's a win with no downsides. For you, 
don't use @safe and you opt out of this class of optimizations and the 
related restrictions.


Re: Why 16Mib static array size limit?

2016-08-17 Thread Walter Bright via Digitalmars-d

On 8/17/2016 3:12 PM, Yuxuan Shui wrote:

But doing so would be incorrect if D doesn't provide strong aliasing guarantees.
And if D does provide these guarantees, we won't need to do this manually.


It would be correct for that loop if the user does it.


Re: Why 16Mib static array size limit?

2016-08-17 Thread Yuxuan Shui via Digitalmars-d

On Wednesday, 17 August 2016 at 19:36:17 UTC, Walter Bright wrote:

On 8/17/2016 5:20 AM, deadalnix wrote:
Controlling aliasing is really the #1 optimization barrier 
these days, so I

don't think it's that good of a thing.

Almost every single one case where Rust end up being faster 
than C++ is because
their type system allow for more AA information available for 
the optimizer.


AA is also key to do non GC memory management at language 
level.


At least for this case, as I mentioned in another post, if the 
pointer and length of the global is cached in a local, it can 
be cached in a register. The contents of locals don't have 
aliasing problems because if their addresses are not taken, 
nobody can point to them. Optimization relies heavily on that.


But doing so would be incorrect if D doesn't provide strong 
aliasing guarantees. And if D does provide these guarantees, we 
won't need to do this manually.


Re: Why 16Mib static array size limit?

2016-08-17 Thread ketmar via Digitalmars-d

On Wednesday, 17 August 2016 at 13:27:14 UTC, deadalnix wrote:
Especially since there are already ways to do this is a way 
that makes the AA happy
exactly the thing i was writing about. "hey, you, meatbag! i, Teh 
Great Machine, said that you have to use unions, not pointers! 
what? making a pointer to union point into the middle of the 
buffer is exactly the same aliasing problem, so unions doesn't 
solve anything? I, Teh Great Machine, don't care. it is your 
problems, meatbag, i'm not here to serve you."


Re: Why 16Mib static array size limit?

2016-08-17 Thread Walter Bright via Digitalmars-d

On 8/17/2016 5:20 AM, deadalnix wrote:

Controlling aliasing is really the #1 optimization barrier these days, so I
don't think it's that good of a thing.

Almost every single one case where Rust end up being faster than C++ is because
their type system allow for more AA information available for the optimizer.

AA is also key to do non GC memory management at language level.


At least for this case, as I mentioned in another post, if the pointer and 
length of the global is cached in a local, it can be cached in a register. The 
contents of locals don't have aliasing problems because if their addresses are 
not taken, nobody can point to them. Optimization relies heavily on that.


Re: Why 16Mib static array size limit?

2016-08-17 Thread Johan Engelen via Digitalmars-d

How about the specific case of array indexing?
Is it UB to have `arr[i]` ever point into `arr` itself, or can we 
make it UB?


-Johan



Re: Why 16Mib static array size limit?

2016-08-17 Thread Steven Schveighoffer via Digitalmars-d

On 8/17/16 10:38 AM, deadalnix wrote:

On Wednesday, 17 August 2016 at 14:21:32 UTC, Steven Schveighoffer wrote:

void * is almost useless. In D you can assign a void[] from another
void[], but other than that, there's no way to write the memory or
read it.

In C, void * is also allowed to alias any other pointer. But char * is
also allowed to provide arbitrary byte reading/writing.

I'd expect that D also would provide a similar option.



Yes, but everything can alias with void*/void[] . Thus, you can cast
from void* to T* "safely".


Sure, but how do you implement, let's say, byte swapping on an integer?

ubyte[] x = [0 .. 1];
foreach(i; 0 .. x.length / 2)
   swap(x[i], x[$-i-1]);

So if the compiler can assume that x can't point at myInt, and thus 
myInt can't have changed, then we have a problem.


You just can't do this with void (or at least not very easily).

-Steve


Re: Why 16Mib static array size limit?

2016-08-17 Thread deadalnix via Digitalmars-d
On Wednesday, 17 August 2016 at 14:21:32 UTC, Steven 
Schveighoffer wrote:
void * is almost useless. In D you can assign a void[] from 
another void[], but other than that, there's no way to write 
the memory or read it.


In C, void * is also allowed to alias any other pointer. But 
char * is also allowed to provide arbitrary byte 
reading/writing.


I'd expect that D also would provide a similar option.

-Steve


Yes, but everything can alias with void*/void[] . Thus, you can 
cast from void* to T* "safely".


Re: Why 16Mib static array size limit?

2016-08-17 Thread deadalnix via Digitalmars-d

On Wednesday, 17 August 2016 at 13:41:09 UTC, jmh530 wrote:

On Wednesday, 17 August 2016 at 12:20:28 UTC, deadalnix wrote:


Controlling aliasing is really the #1 optimization barrier 
these days, so I don't think it's that good of a thing.


Almost every single one case where Rust end up being faster 
than C++ is because their type system allow for more AA 
information available for the optimizer.


AA is also key to do non GC memory management at language 
level.


AA? Associative Array?


Alias Analysis. This is a common compiler acronym.

Associative array are called map by everybody outside this forum.


Re: Why 16Mib static array size limit?

2016-08-17 Thread Steven Schveighoffer via Digitalmars-d

On 8/16/16 7:23 PM, Charles Hixson via Digitalmars-d wrote:

On 08/16/2016 01:49 PM, Steven Schveighoffer via Digitalmars-d wrote:

On 8/16/16 4:11 PM, Yuxuan Shui wrote:

Wait, doesn't D have strict aliasing rules? ubyte* () should not be
allowed to alias with ubyte** ().


Even if it did, I believe the wildcard is ubyte*. Just like in C,
char* can point at anything, ubyte is D's equivalent.


I think what you say is true (look at the code of std.outbuffer), but
IIRC the documentation says that's supposed to be the job of void*.


void * is almost useless. In D you can assign a void[] from another 
void[], but other than that, there's no way to write the memory or read it.


In C, void * is also allowed to alias any other pointer. But char * is 
also allowed to provide arbitrary byte reading/writing.


I'd expect that D also would provide a similar option.

-Steve


Re: Why 16Mib static array size limit?

2016-08-17 Thread jmh530 via Digitalmars-d

On Wednesday, 17 August 2016 at 12:20:28 UTC, deadalnix wrote:


Controlling aliasing is really the #1 optimization barrier 
these days, so I don't think it's that good of a thing.


Almost every single one case where Rust end up being faster 
than C++ is because their type system allow for more AA 
information available for the optimizer.


AA is also key to do non GC memory management at language level.


AA? Associative Array?


Re: Why 16Mib static array size limit?

2016-08-17 Thread Timon Gehr via Digitalmars-d

On 17.08.2016 15:41, jmh530 wrote:

On Wednesday, 17 August 2016 at 12:20:28 UTC, deadalnix wrote:


Controlling aliasing is really the #1 optimization barrier these days,
so I don't think it's that good of a thing.

Almost every single one case where Rust end up being faster than C++
is because their type system allow for more AA information available
for the optimizer.

AA is also key to do non GC memory management at language level.


AA? Associative Array?


(Alias Analysis.)


Re: Why 16Mib static array size limit?

2016-08-17 Thread deadalnix via Digitalmars-d

On Wednesday, 17 August 2016 at 12:32:20 UTC, ketmar wrote:
from my PoV, this kind of "optimizing" is overrated. i'm 
absolutely unable to understand why should i obey orders from 
machine instead of machine obeys my orders. if i want to go 
wild with pointers, don't tell me that i can't, just compile my 
code! C is literally ridden with this shit, and in the end it 
is a freakin' pain to write correct C code (if it is possible 
at all for something comlex).


Because making 99.9% of the code slower because of a fringe use 
case isn't sound engineering. Especially since there are already 
ways to do this is a way that makes the AA happy, for instance 
using unions.




Re: Why 16Mib static array size limit?

2016-08-17 Thread ketmar via Digitalmars-d

On Wednesday, 17 August 2016 at 12:20:28 UTC, deadalnix wrote:

On Tuesday, 16 August 2016 at 20:19:32 UTC, ketmar wrote:

On Tuesday, 16 August 2016 at 20:11:12 UTC, Yuxuan Shui wrote:

Wait, doesn't D have strict aliasing rules?


luckily, no. at least this is not enforced by dmd. and it is 
great.


Controlling aliasing is really the #1 optimization barrier 
these days, so I don't think it's that good of a thing.


Almost every single one case where Rust end up being faster 
than C++ is because their type system allow for more AA 
information available for the optimizer.


AA is also key to do non GC memory management at language level.


from my PoV, this kind of "optimizing" is overrated. i'm 
absolutely unable to understand why should i obey orders from 
machine instead of machine obeys my orders. if i want to go wild 
with pointers, don't tell me that i can't, just compile my code! 
C is literally ridden with this shit, and in the end it is a 
freakin' pain to write correct C code (if it is possible at all 
for something comlex).


Re: Why 16Mib static array size limit?

2016-08-17 Thread deadalnix via Digitalmars-d

On Tuesday, 16 August 2016 at 20:19:32 UTC, ketmar wrote:

On Tuesday, 16 August 2016 at 20:11:12 UTC, Yuxuan Shui wrote:

Wait, doesn't D have strict aliasing rules?


luckily, no. at least this is not enforced by dmd. and it is 
great.


Controlling aliasing is really the #1 optimization barrier these 
days, so I don't think it's that good of a thing.


Almost every single one case where Rust end up being faster than 
C++ is because their type system allow for more AA information 
available for the optimizer.


AA is also key to do non GC memory management at language level.


Re: Why 16Mib static array size limit?

2016-08-17 Thread Jacob Carlborg via Digitalmars-d

On 2016-08-15 22:28, NX wrote:

https://issues.dlang.org/show_bug.cgi?id=14859

This limitation is put there because of optlink (which fails to link
when you have enough static data), and is actually entirely meaningless
when combined with -m32mscoff & -m64 switches (since other linkers
handle huge static data just fine).


Or using any other platform than Windows :)

--
/Jacob Carlborg


Re: Why 16Mib static array size limit?

2016-08-16 Thread Walter Bright via Digitalmars-d

On 8/15/2016 6:28 PM, Ali Çehreli wrote:

void main() {
auto p = arr.ptr;

foreach (j; 0 .. 100) {
foreach (i; 0..arr.length) {
version (POINTER) {
p[i] += cast(ubyte)i;
}
else {
arr[i] += cast(ubyte)i;
}
}
}
}


When accessing global arrays like this, cache the address of the data in a local 
variable. This will enable the compiler to enregister it. Putting global data in 
registers is problematic because any assignment through a pointer could change 
it, so the compiler takes a pessimistic view of it.


Your POINTER version does cache the pointer, but arr.length needs to be cached 
as well.




Re: Why 16Mib static array size limit?

2016-08-16 Thread Charles Hixson via Digitalmars-d

On 08/16/2016 01:49 PM, Steven Schveighoffer via Digitalmars-d wrote:

On 8/16/16 4:11 PM, Yuxuan Shui wrote:

On Tuesday, 16 August 2016 at 17:51:13 UTC, Johan Engelen wrote:

On Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:


With ldc2, the best option is to go with a dynamic array ONLY IF you
access the elements through the .ptr property. As seen in the last
result, using the [] operator on the array is about 4 times slower
than that.


As Yuxuan Shui mentioned the difference is in vectorization. The
non-POINTER version is not vectorized because the semantics of the
code is not the same as the POINTER version. Indexing `arr`, and
writing to that address could change `arr.ptr`, and so the loop would
do something different when "caching" `arr.ptr` in `p` (POINTER
version) versus the case without caching (non-POINTER version).

Evil code demonstrating the problem:
```
ubyte evil;
ubyte[] arr;

void doEvil() {
// TODO: use this in the obfuscated-D contest
arr = ()[0..50];
}
```

The compiler somehow has to prove that `arr[i]` will never point to
`arr.ptr` (it's called Alias Analysis in LLVM).

Perhaps it is UB in D to have `arr[i]` ever point into `arr` itself, I
don't know. If so, the code is vectorizable and we can try to make 
it so.


-Johan


Wait, doesn't D have strict aliasing rules? ubyte* () should not be
allowed to alias with ubyte** ().


Even if it did, I believe the wildcard is ubyte*. Just like in C, 
char* can point at anything, ubyte is D's equivalent.


-Steve

I think what you say is true (look at the code of std.outbuffer), but 
IIRC the documentation says that's supposed to be the job of void*.


Re: Why 16Mib static array size limit?

2016-08-16 Thread Steven Schveighoffer via Digitalmars-d

On 8/16/16 4:11 PM, Yuxuan Shui wrote:

On Tuesday, 16 August 2016 at 17:51:13 UTC, Johan Engelen wrote:

On Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:


With ldc2, the best option is to go with a dynamic array ONLY IF you
access the elements through the .ptr property. As seen in the last
result, using the [] operator on the array is about 4 times slower
than that.


As Yuxuan Shui mentioned the difference is in vectorization. The
non-POINTER version is not vectorized because the semantics of the
code is not the same as the POINTER version. Indexing `arr`, and
writing to that address could change `arr.ptr`, and so the loop would
do something different when "caching" `arr.ptr` in `p` (POINTER
version) versus the case without caching (non-POINTER version).

Evil code demonstrating the problem:
```
ubyte evil;
ubyte[] arr;

void doEvil() {
// TODO: use this in the obfuscated-D contest
arr = ()[0..50];
}
```

The compiler somehow has to prove that `arr[i]` will never point to
`arr.ptr` (it's called Alias Analysis in LLVM).

Perhaps it is UB in D to have `arr[i]` ever point into `arr` itself, I
don't know. If so, the code is vectorizable and we can try to make it so.

-Johan


Wait, doesn't D have strict aliasing rules? ubyte* () should not be
allowed to alias with ubyte** ().


Even if it did, I believe the wildcard is ubyte*. Just like in C, char* 
can point at anything, ubyte is D's equivalent.


-Steve


Re: Why 16Mib static array size limit?

2016-08-16 Thread ketmar via Digitalmars-d

On Tuesday, 16 August 2016 at 20:11:12 UTC, Yuxuan Shui wrote:

Wait, doesn't D have strict aliasing rules?


luckily, no. at least this is not enforced by dmd. and it is 
great.


Re: Why 16Mib static array size limit?

2016-08-16 Thread Yuxuan Shui via Digitalmars-d

On Tuesday, 16 August 2016 at 17:51:13 UTC, Johan Engelen wrote:

On Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:


With ldc2, the best option is to go with a dynamic array ONLY 
IF you access the elements through the .ptr property. As seen 
in the last result, using the [] operator on the array is 
about 4 times slower than that.


As Yuxuan Shui mentioned the difference is in vectorization. 
The non-POINTER version is not vectorized because the semantics 
of the code is not the same as the POINTER version. Indexing 
`arr`, and writing to that address could change `arr.ptr`, and 
so the loop would do something different when "caching" 
`arr.ptr` in `p` (POINTER version) versus the case without 
caching (non-POINTER version).


Evil code demonstrating the problem:
```
ubyte evil;
ubyte[] arr;

void doEvil() {
// TODO: use this in the obfuscated-D contest
arr = ()[0..50];
}
```

The compiler somehow has to prove that `arr[i]` will never 
point to `arr.ptr` (it's called Alias Analysis in LLVM).


Perhaps it is UB in D to have `arr[i]` ever point into `arr` 
itself, I don't know. If so, the code is vectorizable and we 
can try to make it so.


-Johan


Wait, doesn't D have strict aliasing rules? ubyte* () should 
not be allowed to alias with ubyte** ().


Re: Why 16Mib static array size limit?

2016-08-16 Thread Yuxuan Shui via Digitalmars-d

On Tuesday, 16 August 2016 at 19:50:14 UTC, Yuxuan Shui wrote:

On Tuesday, 16 August 2016 at 18:46:06 UTC, Ali Çehreli wrote:

On 08/16/2016 10:51 AM, Johan Engelen wrote:
> On Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli 
> wrote:

>>
>> With ldc2, the best option is to go with a dynamic array
ONLY IF you
>> access the elements through the .ptr property. As seen in
the last
>> result, using the [] operator on the array is about 4 times
slower
>> than that.
>
> [...]
>
> -Johan

Thank you all. That makes sense... Agreeing that the POINTER 
version is applicable only in some cases, looking only at the 
non-POINTER cases, for ldc2, a static array is faster, making 
the "arbitrary" 16MiB limit a performance issue. For ldc2, 
static array is about 40% faster:


[...]

Ali


Actually, the STATIC version is always faster on my machine 
(Core i5 5200U), in both dmd and ldc2 cases


In dmd's case, the non-STATIC version seems to evaluate the loop 
condition (arr.length) **everytime**.


Re: Why 16Mib static array size limit?

2016-08-16 Thread Yuxuan Shui via Digitalmars-d

On Tuesday, 16 August 2016 at 18:46:06 UTC, Ali Çehreli wrote:

On 08/16/2016 10:51 AM, Johan Engelen wrote:
> On Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:
>>
>> With ldc2, the best option is to go with a dynamic array
ONLY IF you
>> access the elements through the .ptr property. As seen in
the last
>> result, using the [] operator on the array is about 4 times
slower
>> than that.
>
> [...]
>
> -Johan

Thank you all. That makes sense... Agreeing that the POINTER 
version is applicable only in some cases, looking only at the 
non-POINTER cases, for ldc2, a static array is faster, making 
the "arbitrary" 16MiB limit a performance issue. For ldc2, 
static array is about 40% faster:


[...]

Ali


Actually, the STATIC version is always faster on my machine (Core 
i5 5200U), in both dmd and ldc2 cases


Re: Why 16Mib static array size limit?

2016-08-16 Thread Ali Çehreli via Digitalmars-d

On 08/16/2016 10:51 AM, Johan Engelen wrote:
> On Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:
>>
>> With ldc2, the best option is to go with a dynamic array ONLY IF you
>> access the elements through the .ptr property. As seen in the last
>> result, using the [] operator on the array is about 4 times slower
>> than that.
>
> As Yuxuan Shui mentioned the difference is in vectorization. The
> non-POINTER version is not vectorized because the semantics of the code
> is not the same as the POINTER version. Indexing `arr`, and writing to
> that address could change `arr.ptr`, and so the loop would do something
> different when "caching" `arr.ptr` in `p` (POINTER version) versus the
> case without caching (non-POINTER version).
>
> Evil code demonstrating the problem:
> ```
> ubyte evil;
> ubyte[] arr;
>
> void doEvil() {
> // TODO: use this in the obfuscated-D contest
> arr = ()[0..50];
> }
> ```
>
> The compiler somehow has to prove that `arr[i]` will never point to
> `arr.ptr` (it's called Alias Analysis in LLVM).
>
> Perhaps it is UB in D to have `arr[i]` ever point into `arr` itself, I
> don't know. If so, the code is vectorizable and we can try to make it so.
>
> -Johan

Thank you all. That makes sense... Agreeing that the POINTER version is 
applicable only in some cases, looking only at the non-POINTER cases, 
for ldc2, a static array is faster, making the "arbitrary" 16MiB limit a 
performance issue. For ldc2, static array is about 40% faster:


6) ldc2 deneme.d -ofdeneme  -O5 -release -boundscheck=off -d-version=STATIC

  0.472s


8) ldc2 deneme.d -ofdeneme  -O5 -release -boundscheck=off

  0.792s


It's the opposite for dmd:

2) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=STATIC

   4.238s


4) dmd deneme.d -ofdeneme -O -boundscheck=off -inline

   3.845s

Ali



Re: Why 16Mib static array size limit?

2016-08-16 Thread Johan Engelen via Digitalmars-d

On Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:


With ldc2, the best option is to go with a dynamic array ONLY 
IF you access the elements through the .ptr property. As seen 
in the last result, using the [] operator on the array is about 
4 times slower than that.


As Yuxuan Shui mentioned the difference is in vectorization. The 
non-POINTER version is not vectorized because the semantics of 
the code is not the same as the POINTER version. Indexing `arr`, 
and writing to that address could change `arr.ptr`, and so the 
loop would do something different when "caching" `arr.ptr` in `p` 
(POINTER version) versus the case without caching (non-POINTER 
version).


Evil code demonstrating the problem:
```
ubyte evil;
ubyte[] arr;

void doEvil() {
// TODO: use this in the obfuscated-D contest
arr = ()[0..50];
}
```

The compiler somehow has to prove that `arr[i]` will never point 
to `arr.ptr` (it's called Alias Analysis in LLVM).


Perhaps it is UB in D to have `arr[i]` ever point into `arr` 
itself, I don't know. If so, the code is vectorizable and we can 
try to make it so.


-Johan


Re: Why 16Mib static array size limit?

2016-08-15 Thread Yuxuan Shui via Digitalmars-d

On Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:

On 08/15/2016 12:09 PM, Ali Çehreli wrote:

[...]


Could you please help me understand the following results, 
possibly by analyzing the produced assembly?


[...]


There seem to be two things at work here: 1) when not accessed 
via pointer, access to the array is going through TCB everytime 
(for it's a TLS variable) 2) LDC is able to vectorize the pointer 
version but not the array version.




[...]




Re: Why 16Mib static array size limit?

2016-08-15 Thread ketmar via Digitalmars-d

On Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:

for x86 dmd, the results are:
0m0.694s
0m1.176s
0m0.722s
0m1.454s


x86_64 codegen looks... not normal.


Re: Why 16Mib static array size limit?

2016-08-15 Thread Ali Çehreli via Digitalmars-d

On 08/15/2016 12:09 PM, Ali Çehreli wrote:

dmd does not allow anything larger.


Could you please help me understand the following results, possibly by 
analyzing the produced assembly?


I wanted to see whether there were any performance penalties when one 
used D's recommendation of using dynamic arrays beyond 16MiB.


Here is the test code:

enum size = 15 * 1024 * 1024;

version (STATIC) {
ubyte[size] arr;
}
else {
ubyte[] arr;

static this() {
arr = new ubyte[](size);
}
}

void main() {
auto p = arr.ptr;

foreach (j; 0 .. 100) {
foreach (i; 0..arr.length) {
version (POINTER) {
p[i] += cast(ubyte)i;
}
else {
arr[i] += cast(ubyte)i;
}
}
}
}

My CPU is an i7 with 4M cache:

$ lscpu
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):4
On-line CPU(s) list:   0-3
Thread(s) per core:2
Core(s) per socket:2
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 78
Model name:Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
Stepping:  3
CPU MHz:   513.953
CPU max MHz:   3400.
CPU min MHz:   400.
BogoMIPS:  5615.89
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  4096K

I tried two compilers:

- DMD64 D Compiler v2.071.2-b2

- LDC - the LLVM D compiler (1.0.0):
   based on DMD v2.070.2 and LLVM 3.8.0

As seen in the code, I tried two version identifiers:

-  STATIC: Use static array
-else: Use dynamic array

- POINTER: Access array elements through .ptr
-else: Access array elements through the [] operator

So, that gave me 8 combinations. Below, I list both the compilation 
command lines that I used and the wallclock times that each program 
execution took (as reported by the 'time' utility).


1) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=STATIC 
-version=POINTER


   4.332s


2) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=STATIC

   4.238s


3) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=POINTER

   4.321s


4) dmd deneme.d -ofdeneme -O -boundscheck=off -inline

   3.845s  <== BEST for dmd


5) ldc2 deneme.d -ofdeneme  -O5 -release -boundscheck=off 
-d-version=POINTER -d-version=STATIC


   0.469s


6) ldc2 deneme.d -ofdeneme  -O5 -release -boundscheck=off -d-version=STATIC

  0.472s


7) ldc2 deneme.d -ofdeneme  -O5 -release -boundscheck=off -d-version=POINTER

  0.182s  <== BEST for ldc2


8) ldc2 deneme.d -ofdeneme  -O5 -release -boundscheck=off

  0.792s


So, for dmd, going with the recommendation of using a dynamic array is 
faster. Interestingly, using .ptr is actually slower. How?


With ldc2, the best option is to go with a dynamic array ONLY IF you 
access the elements through the .ptr property. As seen in the last 
result, using the [] operator on the array is about 4 times slower than 
that.


Does that make sense to you? Why would that be?

Thank you,
Ali



Re: Why 16Mib static array size limit?

2016-08-15 Thread NX via Digitalmars-d
You can apply a patch if you're willing to compile dmd from 
source by doing the following:


Find the following code in file 'mtype.d' at line 4561:

bool overflow = false;
if (mulu(tbn.size(loc), d2, overflow) >= 
0x100 || overflow) // put a 'reasonable' limit on it

goto Loverflow;

And change it to:

bool overflow = false;
mulu(tbn.size(loc), d2, overflow);
if (overflow)
goto Loverflow;

I would make a PR if I had the time (anyone?)...


Re: Why 16Mib static array size limit?

2016-08-15 Thread NX via Digitalmars-d

https://issues.dlang.org/show_bug.cgi?id=14859

This limitation is put there because of optlink (which fails to 
link when you have enough static data), and is actually entirely 
meaningless when combined with -m32mscoff & -m64 switches (since 
other linkers handle huge static data just fine).


Why 16Mib static array size limit?

2016-08-15 Thread Ali Çehreli via Digitalmars-d
dmd does not allow anything larger. Shouldn't the limit depend on the 
.data segment size, which can be specified by the system? Is there a 
technical reason?


Observing that the limit is actually one less (note -1 below), it's 
probably due to an old limitation where indexes were limited to 24 bits?


static ubyte[16 * 1024 * 1024 - 1] arr;

Ali