Re: Better watch out! D runs on watchOS!

2016-01-06 Thread Dan Olson via Digitalmars-d-announce
Joakim  writes:

> On Thursday, 31 December 2015 at 00:11:34 UTC, Dan Olson wrote:
>> On Wednesday, 30 December 2015 at 23:11:06 UTC, Joakim wrote:
>>> That sounds like this issue I ran into with ARM EH:
>>>
>>> https://github.com/ldc-developers/ldc/issues/489#issuecomment-143560075
>>>
>>> I was able to work around it by disabling the mentioned llvm
>>> optimization pass:
>>>
>>> https://gist.github.com/joakim-noah/1fb23fba1ba5b7e87e1a#file-android_tls-L42
>>>
>>> https://gist.github.com/joakim-noah/63693ead3aa62216e1d9#file-ldc_android_arm-L3133
>>
>> Yup, that's exactly it!  The approach I took was to leave
>> optimization on, removed the casts, and byte load the data into the
>> uint vars.  If the dwarf data is not guaranteed to be aligned to the
>> data type, then I think this is the approach to take.
>
> Sounds good, submit a PR and let's get it in.

https://github.com/ldc-developers/druntime/pull/51


Re: Better watch out! D runs on watchOS!

2016-01-04 Thread Dan Olson via Digitalmars-d-announce
Joakim  writes:

> On Monday, 4 January 2016 at 09:26:39 UTC, Dan Olson wrote:
>> Joakim  writes:
>>
>>> On Thursday, 31 December 2015 at 00:11:34 UTC, Dan Olson wrote:
 [...]
>>>
>>> Sounds good, submit a PR and let's get it in.
>>
>> Was planning to get that PR going then got side tracked by a more
>> difficult ARM exeption unwinding bug.  It happens in std.random
>> unittest at LDC -O2 or higher.  Does this sound familiar Joakim?
>
> Yep, except tests were failing in three unittest blocks with -O1 too,
> but I never looked into exactly why:
>
> https://gist.github.com/joakim-noah/63693ead3aa62216e1d9#file-ldc_android_arm-L3139

I must add, I don't think the optimizer or inliner are the cause of this
unwinding bug.  They are just good at making big functions.  I think I
could create the same bug at -O0.


Re: Better watch out! D runs on watchOS!

2016-01-04 Thread Dan Olson via Digitalmars-d-announce
Joakim  writes:

> On Thursday, 31 December 2015 at 00:11:34 UTC, Dan Olson wrote:
>> On Wednesday, 30 December 2015 at 23:11:06 UTC, Joakim wrote:
>>> That sounds like this issue I ran into with ARM EH:
>>>
>>> https://github.com/ldc-developers/ldc/issues/489#issuecomment-143560075
>>>
>>> I was able to work around it by disabling the mentioned llvm
>>> optimization pass:
>>>
>>> https://gist.github.com/joakim-noah/1fb23fba1ba5b7e87e1a#file-android_tls-L42
>>>
>>> https://gist.github.com/joakim-noah/63693ead3aa62216e1d9#file-ldc_android_arm-L3133
>>
>> Yup, that's exactly it!  The approach I took was to leave
>> optimization on, removed the casts, and byte load the data into the
>> uint vars.  If the dwarf data is not guaranteed to be aligned to the
>> data type, then I think this is the approach to take.
>
> Sounds good, submit a PR and let's get it in.

Was planning to get that PR going then got side tracked by a more
difficult ARM exeption unwinding bug.  It happens in std.random unittest
at LDC -O2 or higher.  Does this sound familiar Joakim?

The bug is a bad stack pointer which blows up when the last unittest
returns.  This unittest has all the right conditions to generate stack
adjustments around some of the function calls that throw exceptions.
The exception landing pad does not fixup the stack adjustment, thus a
stack leak on each caught exception.  The unittest function epilog
restores the stack by adding a fixed offset to match the prolog, so the
stack pointer stays wrong when the saved registers and return address
are popped.

Really looks like LLVM is not doing the right thing with landing pads.
In the meantime I patched LLVM to generate epilog that always uses frame
pointer to restore the stack pointer.  WatchOS requires a frame pointer,
so this isn't too bad.  Now all unittests pass at -O3 for watchOS.

I am guessing iOS is not effected since it uses SjLj to restore the
stack after an exception is thrown.  I'll have to pursue this later.  My
mind is freed up for the original PR.
-- 
Dan


Re: Better watch out! D runs on watchOS!

2016-01-04 Thread Joakim via Digitalmars-d-announce

On Monday, 4 January 2016 at 09:26:39 UTC, Dan Olson wrote:

Joakim  writes:


On Thursday, 31 December 2015 at 00:11:34 UTC, Dan Olson wrote:

[...]


Sounds good, submit a PR and let's get it in.


Was planning to get that PR going then got side tracked by a 
more difficult ARM exeption unwinding bug.  It happens in 
std.random unittest at LDC -O2 or higher.  Does this sound 
familiar Joakim?


Yep, except tests were failing in three unittest blocks with -O1 
too, but I never looked into exactly why:


https://gist.github.com/joakim-noah/63693ead3aa62216e1d9#file-ldc_android_arm-L3139

The bug is a bad stack pointer which blows up when the last 
unittest returns.  This unittest has all the right conditions 
to generate stack adjustments around some of the function calls 
that throw exceptions. The exception landing pad does not fixup 
the stack adjustment, thus a stack leak on each caught 
exception.  The unittest function epilog restores the stack by 
adding a fixed offset to match the prolog, so the stack pointer 
stays wrong when the saved registers and return address are 
popped.


Really looks like LLVM is not doing the right thing with 
landing pads. In the meantime I patched LLVM to generate epilog 
that always uses frame pointer to restore the stack pointer.  
WatchOS requires a frame pointer, so this isn't too bad.  Now 
all unittests pass at -O3 for watchOS.


Could be the same issue for me, not sure.  If you put your fix 
online, I can try it and see.


I am guessing iOS is not effected since it uses SjLj to restore 
the stack after an exception is thrown.  I'll have to pursue 
this later.  My mind is freed up for the original PR.


That one is much simpler, let's get it in.


Re: Better watch out! D runs on watchOS!

2016-01-04 Thread Dan Olson via Digitalmars-d-announce
Joakim  writes:

> On Monday, 4 January 2016 at 09:26:39 UTC, Dan Olson wrote:
>> Joakim  writes:
>
>> The bug is a bad stack pointer which blows up when the last unittest
>> returns.  This unittest has all the right conditions to generate
>> stack adjustments around some of the function calls that throw
>> exceptions. The exception landing pad does not fixup the stack
>> adjustment, thus a stack leak on each caught exception.  The
>> unittest function epilog restores the stack by adding a fixed offset
>> to match the prolog, so the stack pointer stays wrong when the saved
>> registers and return address are popped.
>>
>> Really looks like LLVM is not doing the right thing with landing
>> pads. In the meantime I patched LLVM to generate epilog that always
>> uses frame pointer to restore the stack pointer.  WatchOS requires a
>> frame pointer, so this isn't too bad.  Now all unittests pass at -O3
>> for watchOS.
>
> Could be the same issue for me, not sure.  If you put your fix online,
> I can try it and see.

It is this commit based on a 2 week old LLVM 3.8 trunk:

https://github.com/smolt/llvm/commit/91a4420615c6ec83b227b63d36054f12ccffb00f

A small change but took me a long time in debugger on an Apple Watch to
figure out.  Something the x86 simulator can't show.  It is tailored to
watchOS which uses thumb2 instructions.  watchOS always has a frame,
hasFP() is always true. You will want to add Android to the hasFP() or
disable frame pointer elimination some other way.  I noticed that
-disable-fp-elim for LDC with LLVM 3.7 and above is broken so can't use
that.

The pattern to look for if you have a suspect is this:

A function that throws an exception is codegened with stack adjustment
surrounding the call:

sub sp, #16
str r1, [sp]
mov r1, r2
movsr0, #66
movwr2, #2424
blx 
__D3std9exception25__T7bailOutHTC9ExceptionZ7bailOutFNaNfAyakxAaZv
add sp, #16<--- This adjustment is missed on exception

Epilog without hack (llvm 3.8 git 0838b1f Add iOS TLS support for WatchOS)
i   ne
addne.w sp, sp, #9984 <-- stack adjust matches prolog, but stack
  is off by 16 bytes if 
above throws
addne   sp, #48
popne.w {r8, r10, r11}
popne   {r4, r5, r6, r7, pc}

Epilog with hack (commit 91a4420)
i   ne
subne.w r4, r7, #24   <-- stack set from frame pointer (r7)
movne   sp, r4
popne.w {r8, r10, r11}
popne   {r4, r5, r6, r7, pc}

-- 
Dan


Re: Better watch out! D runs on watchOS!

2015-12-31 Thread Jacob Carlborg via Digitalmars-d-announce

On Wednesday, 30 December 2015 at 20:55:44 UTC, Dan Olson wrote:

I'm going to start with Plan B.1 though because LLVM does nice 
optimizations for TLS.


What is Plan B.1?

--
/Jacob Carlborg


Re: Better watch out! D runs on watchOS!

2015-12-31 Thread Joakim via Digitalmars-d-announce
On Thursday, 31 December 2015 at 10:10:20 UTC, Jacob Carlborg 
wrote:

On Wednesday, 30 December 2015 at 20:55:44 UTC, Dan Olson wrote:

I'm going to start with Plan B.1 though because LLVM does nice 
optimizations for TLS.


What is Plan B.1?

--
/Jacob Carlborg


Getting it into llvm:

http://forum.dlang.org/post/m237um75x7@comcast.net


Re: Better watch out! D runs on watchOS!

2015-12-30 Thread Dan Olson via Digitalmars-d-announce
Jacob Carlborg  writes:

> On 2015-12-30 08:02, Dan Olson wrote:
>
>> I know some of it from hacking dyld for iOS, but not all.  How does this
>> fit in with "Plan B.2"?
>
> If you need to figure out how TLS works, I can give you some help,
> that's all I'm saying :)

Oh, good.  Always like help.  I'm going to start with Plan B.1 though
because LLVM does nice optimizations for TLS.


Re: Better watch out! D runs on watchOS!

2015-12-30 Thread Dan Olson via Digitalmars-d-announce

On Wednesday, 30 December 2015 at 23:11:06 UTC, Joakim wrote:

That sounds like this issue I ran into with ARM EH:

https://github.com/ldc-developers/ldc/issues/489#issuecomment-143560075

I was able to work around it by disabling the mentioned llvm 
optimization pass:


https://gist.github.com/joakim-noah/1fb23fba1ba5b7e87e1a#file-android_tls-L42

https://gist.github.com/joakim-noah/63693ead3aa62216e1d9#file-ldc_android_arm-L3133


Yup, that's exactly it!  The approach I took was to leave 
optimization on, removed the casts, and byte load the data into 
the uint vars.  If the dwarf data is not guaranteed to be aligned 
to the data type, then I think this is the approach to take.


Re: Better watch out! D runs on watchOS!

2015-12-30 Thread Joakim via Digitalmars-d-announce

On Thursday, 31 December 2015 at 00:11:34 UTC, Dan Olson wrote:

On Wednesday, 30 December 2015 at 23:11:06 UTC, Joakim wrote:

That sounds like this issue I ran into with ARM EH:

https://github.com/ldc-developers/ldc/issues/489#issuecomment-143560075

I was able to work around it by disabling the mentioned llvm 
optimization pass:


https://gist.github.com/joakim-noah/1fb23fba1ba5b7e87e1a#file-android_tls-L42

https://gist.github.com/joakim-noah/63693ead3aa62216e1d9#file-ldc_android_arm-L3133


Yup, that's exactly it!  The approach I took was to leave 
optimization on, removed the casts, and byte load the data into 
the uint vars.  If the dwarf data is not guaranteed to be 
aligned to the data type, then I think this is the approach to 
take.


Sounds good, submit a PR and let's get it in.


Re: Better watch out! D runs on watchOS!

2015-12-30 Thread Joakim via Digitalmars-d-announce

On Wednesday, 30 December 2015 at 21:56:46 UTC, Dan Olson wrote:

Dan Olson  writes:

A little progress report. More to come later when I get 
something pushed to github.


I bought a returned Apple Watch yesterday at discount for 
$223.99 US and tried to see how much of D would work on it 
using my iOS fork of LDC. There were a few bumps, like dealing 
with embedded bitcode (a watchOS requirement). After 4-hours 
of baby steps, little D programs with incremental druntime 
support, I was able to download a huge watch app extension 
with all druntime and phobos unittests and run most of them 
alphabetically. Everything zipped along fine, only a std.math 
error, then mysteriously a exit after running std.parallelism 
test a long time. It was late for me so decided that was 
enough progress.


This means all of druntime worked and probably most of phobos.


Played with this a little more and learned a bit about watchOS 
memory. A little test that allocated memory in 5 MB chucks was 
terminated at 30 MB data RAM.  The combined unittests in phobos 
suck up much more than that, std.uri itself uses over 50 MB.  
By tailoring memory usage and running phobos unittests in 
smaller block, they all work.  The std.math failure was my own 
coding error missing a version block for WatchOS.


In end, good news: druntime and phobos fully work on watchOS 
with LLVM optimizations disabled.  With optimzations on, there 
are alignment problems.  For example, compact unwind data 
generated by LLVM isn't aligned but some of our eh unwinding 
code casts these to uint.  Not so good when the optimizer 
selects instructions requiring special alignment.  I'll track 
these down gradually.


That sounds like this issue I ran into with ARM EH:

https://github.com/ldc-developers/ldc/issues/489#issuecomment-143560075

I was able to work around it by disabling the mentioned llvm 
optimization pass:


https://gist.github.com/joakim-noah/1fb23fba1ba5b7e87e1a#file-android_tls-L42

https://gist.github.com/joakim-noah/63693ead3aa62216e1d9#file-ldc_android_arm-L3133


Re: Better watch out! D runs on watchOS!

2015-12-30 Thread Jacob Carlborg via Digitalmars-d-announce

On 2015-12-30 08:02, Dan Olson wrote:


I know some of it from hacking dyld for iOS, but not all.  How does this
fit in with "Plan B.2"?


If you need to figure out how TLS works, I can give you some help, 
that's all I'm saying :)


--
/Jacob Carlborg


Re: Better watch out! D runs on watchOS!

2015-12-29 Thread Dan Olson via Digitalmars-d-announce
Jacob Carlborg  writes:

> On 2015-12-28 20:02, Dan Olson wrote:
>
>> That is Plan B.2
>
> I'm working on implementing native TLS for OS X in DMD. I think I've
> figured out how everything works. Unless you already know how it
> works, I could tell you what I have figured out.

I know some of it from hacking dyld for iOS, but not all.  How does this
fit in with "Plan B.2"?


Re: Better watch out! D runs on watchOS!

2015-12-29 Thread Jacob Carlborg via Digitalmars-d-announce

On 2015-12-28 20:02, Dan Olson wrote:


That is Plan B.2


I'm working on implementing native TLS for OS X in DMD. I think I've 
figured out how everything works. Unless you already know how it works, 
I could tell you what I have figured out.


--
/Jacob Carlborg


Re: Better watch out! D runs on watchOS!

2015-12-28 Thread Dan Olson via Digitalmars-d-announce
Joakim  writes:
>
> I don't understand how the bitcode requirement works on your own
> device: I thought that was for an Apple-submitted app that they then
> compiled to binary themselves?  Do you have to go through the same
> process even for test apps, ie no sideloading?  Or does the device
> itself take bitcode?

This is all based on my experience and I don't know the full bitcode
story.  I may state erroneous info below.

The device takes normal executables but there is a check to make sure
that each object file has the appropriate bitcode sections. I think the
linker does this, but did not check which tool in build chain spit out
the error.

The bitcode is actually two new sections in every object file:

  .section __LLVM,__bitcode
  .section __LLVM,__cmdline

The __bitcode section seems to just be the LLVM IR for the object file
in the .bc binary format. Some sources say it is a xar archive but in my
investigation I used Apple's clang with -fembed-bitcode and inspected
the IR or ARM assembly to see these two sections. Extracting and using
llvm-dis on the __bitcode section gave back the containing module's
IR in human readable format. It exactly matches the LLVM IR for its
object file sans Apple's clang -fembed-bitcode.  So not sure when xar is
used yet.

The __cmdline section appears to be some of the clang options used to
compile the bitcode.

The compile process becomes something like this:
1. Create IR for module as usual.
2. Generate the IR bitcode representation.
3. Add the two new sections, using bitcode from (2) as contents of
  __bitcode section and __cmdline options to compile it
4. Generate object from IR.

But not wanting to figure all that out now, I tried simpler things and
discovered that at least for testing, these sections only need to be
present and the contents don't seem to matter. So for now I skip 2 and
just put a zero in each.

On implication of Apple requiring bitcode: if Apple is compiling the
bitcode with their clang or llc, then it means using a modifed LLVM like
I do to support thread-locals on watchOS, tvOS, or iOS is only good for
side loading.  Probably going to have to work on plan B for
thread-locals.
-- 
Dan


Re: Better watch out! D runs on watchOS!

2015-12-28 Thread Jacob Carlborg via Digitalmars-d-announce

On 2015-12-28 09:45, Dan Olson wrote:


On implication of Apple requiring bitcode: if Apple is compiling the
bitcode with their clang or llc, then it means using a modifed LLVM like
I do to support thread-locals on watchOS, tvOS, or iOS is only good for
side loading.  Probably going to have to work on plan B for
thread-locals.


Would it be possible to bypass LLVM and do the thread local specific 
parts in LDC?


--
/Jacob Carlborg


Re: Better watch out! D runs on watchOS!

2015-12-28 Thread Dan Olson via Digitalmars-d-announce
Jacob Carlborg  writes:
>
> Would it be possible to bypass LLVM and do the thread local specific
> parts in LDC?

That is Plan B.2


Re: Better watch out! D runs on watchOS!

2015-12-28 Thread Dan Olson via Digitalmars-d-announce
Joakim  writes:
>
> Thanks for the detailed answer; I'm sure this will now become the
> definitive answer online.  I've gone googling for technical info only
> to sometimes be directed back to a post in these D forums. :)

Me too!  Its very funny when that happens.

> Time to get emulated TLS for Mach-O into llvm, as one google engineer
> did with ELF for Android, which will be released in the upcoming llvm
> 3.8:
>
> https://code.google.com/p/android/issues/detail?id=78122

That is Plan B.1


Re: Better watch out! D runs on watchOS!

2015-12-27 Thread Joakim via Digitalmars-d-announce

On Monday, 28 December 2015 at 01:17:15 UTC, Dan Olson wrote:
A little progress report. More to come later when I get 
something pushed to github.


I bought a returned Apple Watch yesterday at discount for 
$223.99 US and tried to see how much of D would work on it 
using my iOS fork of LDC. There were a few bumps, like dealing 
with embedded bitcode (a watchOS requirement). After 4-hours of 
baby steps, little D programs with incremental druntime 
support, I was able to download a huge watch app extension with 
all druntime and phobos unittests and run most of them 
alphabetically. Everything zipped along fine, only a std.math 
error, then mysteriously a exit after running std.parallelism 
test a long time. It was late for me so decided that was enough 
progress.


This means all of druntime worked and probably most of phobos.

The Apple Watch uses a new chip with armv7k, a different ABI, 
and different exception handling than iOS, so kinda surprised 
it worked as well as it did.  Of course much thanks goes to 
LLVM with recently added watchOS, tvOS support, and all the LDC 
contributors that have kept master building with the latest 3.8 
LLVM.


Heh, nice, thanks for keeping us up to date.

I don't understand how the bitcode requirement works on your own 
device: I thought that was for an Apple-submitted app that they 
then compiled to binary themselves?  Do you have to go through 
the same process even for test apps, ie no sideloading?  Or does 
the device itself take bitcode?


btw, Dan's iOS/tvOS/watchOS work will require these dmd/druntime 
github pulls to be reviewed and merged:


https://github.com/D-Programming-Language/dmd/pull/5231
https://github.com/D-Programming-Language/druntime/pull/1448