Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-08-08 Thread Lennart Sorensen
On Sat, Aug 06, 2016 at 04:25:13PM +0100, Luke Kenneth Casson Leighton wrote:
>  did i hear right that there's also a core design difference between
> the A7 and the A53 which results in a performance/watt loss of around
> 15%?  so you're actually *worse off* going to 64-bit at the moment, if
> power (battery life) really matters.  i think it was on anandtech or
> something.

Well the 64 bit equivalant of the A7 would be the A35.  The A53 is higher
level, more like an A9 I would think while the A57 is A15 type of
performance level and A72 probably maps to about A17, or maybe even
better.

Certainly in terms of MIPS/MHz, the A53 is between the A8 and A9, while
the A57 is A15/A17 level, and the A72 is quite a bit faster than the rest.
Apparently there is an A73 coming that adds another 30% performance over
the A72.

The A35 is supposed to be 6 to 40% better performance than the A7 at
the same power.

-- 
Len Sorensen



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-08-07 Thread Jeffrey Walton
On Tue, Jul 26, 2016 at 11:28 PM, Jeffrey Walton  wrote:
> Hi Everyone,
>
> I recently purchased a Raspberry Pi 3. Its got a Broadcom SoC, and its
> ARMv8. Its running a Debian-lite kernel, which I believe is a modified
> 4.4 kernel.
>
> Below is the output from cpuinfo. I see ARMv8's crc32 is available,
> but I don't see pmull, aes or sha. At the moment, I'm not sure if its
> truly missing, or the execution environment is not quite correct.
>
> My question is, what's going on with the device? Is the hardware truly
> lacking the features, or is the image lagging behind capabilities?

In case anyone is interested...

The Raspberry Pi (Broadcom SoC) and the ODROID C2 (Amlogic SoC)
include CRC32, but lack the Crypto extensions.

HiKey and Pine64's have both the CRC32 and the Crypto extensions.

Jeff



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-08-06 Thread Stefan Monnier
>  the reason i ask that is, i'm not seeing any real difference: you
> still have to download the linux kernel source (to submit dtsi
> patches), the linux git repo is still the central location for dtsi
> management... unless you're happy to set up an alternative parallel
> repository (and compile infrastructure) for dtsi management...  thus
> you still have to download the full git repo, you still have to
> compile stuff *from* that same git repo where's the actual benefit
> to having moved to dtsi, in terms of "work needed to maintain it"?

I don't do much kernel hacking, myself.  So I'm talking about use of
apt-get.  The difference is very significant because Debian never
maintained enough different kernels.


Stefan



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-08-06 Thread Luke Kenneth Casson Leighton
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68


On Sat, Aug 6, 2016 at 8:15 PM, Stefan Monnier  wrote:
>>  the only big advantage of dtb files (binary compiled) is *IF* the
>>  decision is made to respect dtb files and treat them as inviolate
>>  and supported forever without needing recompiles, you stand a
>>  chance of being able to upgrade linux kernels *without* replacing
>>  the dtb file.
>
> That might be true when compared to some potential replacement of DTBs,
> but when compared to what we had before DTBs, then the benefit is much
> more clear: a single linux-image-armhf package which works for "all"
> machines.  Personally I don't mind changing the DTB every time I change
> the kernel.  Hell, that could/should be integrated with the process
> which refreshes the initrd file anyway.

 ... are you _sure_ it's clear? :)

 the reason i ask that is, i'm not seeing any real difference: you
still have to download the linux kernel source (to submit dtsi
patches), the linux git repo is still the central location for dtsi
management... unless you're happy to set up an alternative parallel
repository (and compile infrastructure) for dtsi management...  thus
you still have to download the full git repo, you still have to
compile stuff *from* that same git repo where's the actual benefit
to having moved to dtsi, in terms of "work needed to maintain it"?

i appreciate you don't *mind* changing the DTB file each time you
change the kernel, but that defeats one of the very purposes *of* the
DTB file.

 also, i don't know if you've looked in arch/arm/boot/dts but it's
already alarmingly full.   i appreciate that there's some includes
(dtsi) but realistically over time the sharing process is going to
begin to look like the selinux m4 macro includes or the openembedded
infrastructure: an unintelligeable and unmaintainable dog's dinner
that only a handful of people in the world can understand.

 anyway to get back to the original topic, there's very little that
can actually shared - even with devicetree - between different
devices.  it's the "N product design types" times "M processors"
thing.  which is why i'm designing a hardware standard that's similar
to how things are in the x86 world, so that we can get back to "N PLUS
M" at the linux kernel level.

l.



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-08-06 Thread Stefan Monnier
>  the only big advantage of dtb files (binary compiled) is *IF* the
>  decision is made to respect dtb files and treat them as inviolate
>  and supported forever without needing recompiles, you stand a
>  chance of being able to upgrade linux kernels *without* replacing
>  the dtb file.

That might be true when compared to some potential replacement of DTBs,
but when compared to what we had before DTBs, then the benefit is much
more clear: a single linux-image-armhf package which works for "all"
machines.  Personally I don't mind changing the DTB every time I change
the kernel.  Hell, that could/should be integrated with the process
which refreshes the initrd file anyway.


Stefan



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-08-06 Thread Luke Kenneth Casson Leighton
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68


On Sat, Aug 6, 2016 at 2:57 PM, Stefan Monnier  wrote:


> Note also that you will sometimes *lose* performance by going to 64bit
> because the pointers use up twice as much space, so if your program
> needs to store many pointers, it will use up more cache space
> and memory bandwidth, which will tend to slow it down.

 did i hear right that there's also a core design difference between
the A7 and the A53 which results in a performance/watt loss of around
15%?  so you're actually *worse off* going to 64-bit at the moment, if
power (battery life) really matters.  i think it was on anandtech or
something.

l.



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-08-06 Thread Luke Kenneth Casson Leighton
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68


On Thu, Jul 28, 2016 at 5:35 PM, Gunnar Wolf  wrote:

> Keep in mind it's not different Debian images we are talking about —
> "real" Debian cannot be booted on Raspberry hardware. I run a Debian
> userland on top of their provided kernel (with the mystery blobs to
> control its hardware), started by their mystery bootloader. And yes,
> for us people coming from the x86 world, we expect similar devices to
> "just work", but in ARM it *is* really a different way of doing things
> per each kind of board.

 i did find it very funny to learn that Linus did not understand why there
 were so many ARM developers at the Cambridge Linux Conference
 back in... when was it... 2007?  it coincided with UKUUG at the time.
 he's famously on record as saying "why are there so many of you?
 go away, choose one representative and come back with just one
 person!"

 likewise, i _am_ on record as pointing out a long long time ago that
 device-tree will not stop the proliferation or complexity of developing
 device drivers for ARM: it merely *moves* the proliferation and
 complexity... into dtsi files

 the only big advantage of dtb files (binary compiled) is *IF* the
 decision is made to respect dtb files and treat them as inviolate
 and supported forever without needing recompiles, you stand a
 chance of being able to upgrade linux kernels *without* replacing
 the dtb file.  however i seriously doubt that the stringent testing
 needed to make that work will ever be put in place.  oh well.

l.



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-08-06 Thread Stefan Monnier
> physically).  What you often gain going to a 64 bit CPU is the ability
> to do 64 bit arithmetic in one instruction, and store the variables
[...]
> 32 bit calculations, then it doesn't matter, so in many cases it isn't an
> issue, but when it matters it can really make a difference in performance.

AFAIK the difference is only visible for operations on *integer* of size
64bit (and more).  Some programs make significant use of such
operations, but in general they're not that common.  So I'd be surprised
if "you often gain".

Note also that you will sometimes *lose* performance by going to 64bit
because the pointers use up twice as much space, so if your program
needs to store many pointers, it will use up more cache space
and memory bandwidth, which will tend to slow it down.

IOW unless you know your workload very well, the best prediction I could
make is "you won't notice any difference".

In the x86 world, moving from i686 to amd64 has the additional advantage
that the amd64 mode has more registers which is useful in many more
cases than just the manipulation of large integers.  And yet, even there
I find it hard to notice any difference.


Stefan



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-07-28 Thread peter green

On 28/07/16 17:35, Gunnar Wolf wrote:

I'm far from an absolute expert in this area... But I am fairly
certain of what I say — That is, I have a RPi 1 and 2B, and they
cannot boot from the same images.
   

That depends what is in the image.

The current raspberry pi firmware works on all pi models (older firmware 
will only work with older pi models).


The Pi1 needs a specific kernel. The Pi2 and the Pi3 can run the same 
32-bit kernel (at least with foundation kernels, I dunno what the 
situation is with upstream kernels)


The firmware by default selects a suitable kernel (kernel.img or 
kernel7.img) and device tree (each Pi model has a different one though 
IIRC they are pretty similar) based on the detected hardware.


There is some experimental 64-bit kernel/bootloader stuff out there for 
the raspberry pi 3 
https://www.raspberrypi.org/forums/viewtopic.php?f=72=137963=aarch64 
https://www.raspberrypi.org/forums/viewtopic.php?f=72=143765


Userland obviously has to be compatible with the hardware and kernel. 
Raspbian or Debian armel userlands should be usable on any Pi model. 
Debian armhf userland should be usable on a pi2 or pi3 but clearly not a 
pi1. Debian arm64 will obviously require a 64-bit kernel and a pi3.




Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-07-28 Thread Vagrant Cascadian
On 2016-07-28, Gunnar Wolf wrote:
> Alan Corey dijo [Thu, Jul 28, 2016 at 12:22:23PM -0400]:
>> Huh?  I thought they claimed they were interchangeable.  I had an
>> image from my model B days 3 years ago that I booted on my 3B.  And I
>> cloned a working current 3B SD card and booted a Zero from it.  There
>> isn't a different Debian image for every brand of motherboard and CPU,
>> they probe to see what hardware is there.  I wouldn't expect older
>> images to contain drivers for newer hardware maybe.
...
> I'm far from an absolute expert in this area... But I am fairly
> certain of what I say — That is, I have a RPi 1 and 2B, and they
> cannot boot from the same images.

I believe they ship different kernels for different boards all on one
image. Or, at least (used to) ship an rpi1 and rpi2 kernel; not sure if
the rpi3 uses the same kernel as the rpi2 (possibly with a different
device-tree).


> Keep in mind it's not different Debian images we are talking about —
> "real" Debian cannot be booted on Raspberry hardware. I run a Debian
> userland on top of their provided kernel (with the mystery blobs to
> control its hardware), started by their mystery bootloader.

Well, I've got three Raspberry PI 2 boards running kernels shipped by
debian(either jessie-backports or experimental) and u-boot shipped by
debian, but it does require the the GPU firmware to bootstrap the CPU.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-07-28 Thread Lennart Sorensen
On Thu, Jul 28, 2016 at 12:22:23PM -0400, Alan Corey wrote:
> Huh?  I thought they claimed they were interchangeable.  I had an
> image from my model B days 3 years ago that I booted on my 3B.  And I
> cloned a working current 3B SD card and booted a Zero from it.  There
> isn't a different Debian image for every brand of motherboard and CPU,
> they probe to see what hardware is there.  I wouldn't expect older
> images to contain drivers for newer hardware maybe.
> 
> I guess I wouldn't make too much of the jump to 64 bit just yet.  I
> remember when i386 jumped to 32 bit.  16 bit had a messy segmented
> memory addressing scheme I was glad to get away from.  I can't afford
> more than 32 bits worth of RAM anyway, especially since I've usually
> got about 4 machines running.

Well it isn't actually just a question of memory (most 8bit CPUs had 16
bit address space, and many 16 bit CPUs had 24 or 32 bit address space,
and some 32 bit x86 and arm chips can do 36 or 40 bit address space
physically).  What you often gain going to a 64 bit CPU is the ability
to do 64 bit arithmetic in one instruction, and store the variables
in one register rather than two, rather than a bunch of stuff the
compiler generates for you.  After all if you take two 64 bit integrs
and try to multiply them on a 32 bit CPU, most of the time you end up
with numerous multiply, shift, add, mask, instructions to implement
the calculation using 32 bit only instructions, while on a 64 bit CPU
usually it is just one instruction.  So the 64 bit CPU will probably do
the calculation faster than the 32 bit CPU.  Of course if you only need
32 bit calculations, then it doesn't matter, so in many cases it isn't an
issue, but when it matters it can really make a difference in performance.

-- 
Len Sorensen



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-07-28 Thread Gunnar Wolf
Alan Corey dijo [Thu, Jul 28, 2016 at 12:22:23PM -0400]:
> Huh?  I thought they claimed they were interchangeable.  I had an
> image from my model B days 3 years ago that I booted on my 3B.  And I
> cloned a working current 3B SD card and booted a Zero from it.  There
> isn't a different Debian image for every brand of motherboard and CPU,
> they probe to see what hardware is there.  I wouldn't expect older
> images to contain drivers for newer hardware maybe.
> 
> I guess I wouldn't make too much of the jump to 64 bit just yet.  I
> remember when i386 jumped to 32 bit.  16 bit had a messy segmented
> memory addressing scheme I was glad to get away from.  I can't afford
> more than 32 bits worth of RAM anyway, especially since I've usually
> got about 4 machines running.

I'm far from an absolute expert in this area... But I am fairly
certain of what I say — That is, I have a RPi 1 and 2B, and they
cannot boot from the same images.

Keep in mind it's not different Debian images we are talking about —
"real" Debian cannot be booted on Raspberry hardware. I run a Debian
userland on top of their provided kernel (with the mystery blobs to
control its hardware), started by their mystery bootloader. And yes,
for us people coming from the x86 world, we expect similar devices to
"just work", but in ARM it *is* really a different way of doing things
per each kind of board.

I can suggest you to see the talk delivered by Martin Michlmayr some
weeks ago at DebConf on this topic:


http://ftp.acc.umu.se/pub/debian-meetings/2016/debconf16/Debian_on_ARM_devices_2.webm



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-07-28 Thread Alan Corey
Huh?  I thought they claimed they were interchangeable.  I had an
image from my model B days 3 years ago that I booted on my 3B.  And I
cloned a working current 3B SD card and booted a Zero from it.  There
isn't a different Debian image for every brand of motherboard and CPU,
they probe to see what hardware is there.  I wouldn't expect older
images to contain drivers for newer hardware maybe.

I guess I wouldn't make too much of the jump to 64 bit just yet.  I
remember when i386 jumped to 32 bit.  16 bit had a messy segmented
memory addressing scheme I was glad to get away from.  I can't afford
more than 32 bits worth of RAM anyway, especially since I've usually
got about 4 machines running.

On 7/28/16, Gunnar Wolf  wrote:
> Alan Corey dijo [Wed, Jul 27, 2016 at 01:28:31PM -0400]:
>> > 64-bit/ARMv8 on the RPi3 is still in progress.
>>
>> Yes, so they claim and I wonder how they're going to deal with the
>> fact that some Pis are 32 bit and some 64.  I posted this question
>> there but I haven't looked into the links in the response a lot:
>> https://www.raspberrypi.org/forums/viewtopic.php?f=63=154497=1010500#p1010500
>
> It should not be that much of a deal — After all, images for the
> different generations of Raspberries are not interchangable — A RPi1
> won't boot a RPi2 image, nor viceversa. Of course, the earlier
> generations could share all compiled binaries, while now it won't be
> the case (when it actually runs 64, that is).
>


-- 
Credit is the root of all evil.  - AB1JX



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-07-28 Thread Gunnar Wolf
Alan Corey dijo [Wed, Jul 27, 2016 at 01:28:31PM -0400]:
> > 64-bit/ARMv8 on the RPi3 is still in progress.
> 
> Yes, so they claim and I wonder how they're going to deal with the
> fact that some Pis are 32 bit and some 64.  I posted this question
> there but I haven't looked into the links in the response a lot:
> https://www.raspberrypi.org/forums/viewtopic.php?f=63=154497=1010500#p1010500

It should not be that much of a deal — After all, images for the
different generations of Raspberries are not interchangable — A RPi1
won't boot a RPi2 image, nor viceversa. Of course, the earlier
generations could share all compiled binaries, while now it won't be
the case (when it actually runs 64, that is).



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-07-28 Thread Jeffrey Walton
> Using '.byte' below rather than '.inst' or '.inst.w' is another can of 
> worms...
>
> $ gcc -g3 -O0 -march=armv7-a -mfpu=neon test.cc -o test.exe
> $ ./test.exe
> $
>
> $ cat test.cc
> #include 
> int main(int argc, char* argv[])
> {
>   __asm__ __volatile__
>   (
> ".code 32"
>
> // CRC using word
> ".byte 0x1a, 0xc1, 0x58, 0x00;\n"
> // CRC using half word
> ".byte 0x1a, 0xc1, 0x54, 0x00;\n"
> // CRC using byte
> ".byte 0x1a, 0xc1, 0x50, 0x00;\n"
> // PMULL
> ".byte 0x0e, 0xe1, 0xe0, 0x00;\n"
> // PMULL2
> ".byte 0x4e, 0xe1, 0xe0, 0x00;\n"
> // AES (aese)
> ".byte 0x4e, 0x28, 0x48, 0x20;\n"
> // AES (aesd)
> ".byte 0x4e, 0x28, 0x58, 0x20;\n"
> // SHA1 (sha1c)
> ".byte 0x5e, 0x02, 0x00, 0x20;\n"
> // SHA1 (sha1m)
> ".byte 0x5e, 0x02, 0x20, 0x20;\n"
> // SHA1 (sha1p)
> ".byte 0x5e, 0x02, 0x30, 0x20;\n"
>   :
>   :
>   : "cc", "d0", "d1", "d2", "q0", "q1", "q2"
>   );
>
>   return 0;
> }

All that silliness was not needed. All that was needed was (and maybe
a float ABI flag):

   gcc -march=armv8-a+crc -mtune=cortex-a53 -mfpu=crypto-neon-fp-armv8 ...

I can't believe I could not piece that together from the man pages
(Thanks to the GCC and SO folks).

Jeff



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-07-28 Thread Jeffrey Walton
On Thu, Jul 28, 2016 at 3:06 AM, Tixy  wrote:
> On Thu, 2016-07-28 at 02:38 -0400, Jeffrey Walton wrote:
> [...]
>> >> // AES (aese)
>> >> ".byte 0x4e, 0x28, 0x48, 0x20;\n"
>> >
>> > So as instructions are little-endian that's 0x2048284e for a 32-bit
>> > instruction, or 0x284e2048 if it's a Thumb2 instruction (I'm showing
>> > that the same way as the ARM ARM does).
>>
>> I pulled the encodings from a known good machine that used intrinsics.
>> I did not hand encode them (too much work).
>>
>> > According to my copy of the ARM ARM, the AESE instruction has these
>> > encodings:
>> >
>> > For Thumb:
>> >
>> > 1 1 1 1 1 1 1 1 1 D 1 1 size 0 0 Vd 0 0 1 1 0 0 M 0 Vm
>> >
>> > For ARM
>> >
>> > 1 1 1 1 0 0 1 1 1 D 1 1 size 0 0 Vd 0 0 1 1 0 0 M 0 Vm
>> >
>> > For AArch64
>> >
>> > 0 1 0 0 1 1 1 0 size 1 0 1 0 0 0 0 1 0 0 1 0 Rn Rd
>> >
>> > So it looks like you've used the AArch64 encoding (for something
>> > compiled and presumably run as AArch32?!) and gotten the byte order the
>> > wrong way around.
>>
>> I'm not sure if it matters, but this is an ARMv8 device running a 32-bit OS.
>
> So it's running in AArch32 mode, and you want the encodings for that,
> not the AArch64 version. I.e. the second encoding I mentioned, which
> would be
>
> .inst 0xf3b00300

OK, thanks. I *think* what may have happened is the disassembly
occurred on host machine, not the target machine.

> or better, find a compiler version and options that knows about the
> instructions you want to test (which I see you already asked about
> below). Sorry I can't help with that, I know little about toolchains,
> and have also never used the newer ARM instruction features like VFP,
> SIMD, crytpo etc.

Yeah, this is quite painful at the moment. I think there's a
disconnect between what's advertised to work, and what works in
practice.

Let me see if GCC 6.0 is available; or if Clang has better success.
I'm doing my best to avoid building GCC myself. I have very bad
memories from that experience.

Jeff



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-07-28 Thread Tixy
On Thu, 2016-07-28 at 02:38 -0400, Jeffrey Walton wrote:
[...]
> >> // AES (aese)
> >> ".byte 0x4e, 0x28, 0x48, 0x20;\n"
> >
> > So as instructions are little-endian that's 0x2048284e for a 32-bit
> > instruction, or 0x284e2048 if it's a Thumb2 instruction (I'm showing
> > that the same way as the ARM ARM does).
> 
> I pulled the encodings from a known good machine that used intrinsics.
> I did not hand encode them (too much work).
> 
> > According to my copy of the ARM ARM, the AESE instruction has these
> > encodings:
> >
> > For Thumb:
> >
> > 1 1 1 1 1 1 1 1 1 D 1 1 size 0 0 Vd 0 0 1 1 0 0 M 0 Vm
> >
> > For ARM
> >
> > 1 1 1 1 0 0 1 1 1 D 1 1 size 0 0 Vd 0 0 1 1 0 0 M 0 Vm
> >
> > For AArch64
> >
> > 0 1 0 0 1 1 1 0 size 1 0 1 0 0 0 0 1 0 0 1 0 Rn Rd
> >
> > So it looks like you've used the AArch64 encoding (for something
> > compiled and presumably run as AArch32?!) and gotten the byte order the
> > wrong way around.
> 
> I'm not sure if it matters, but this is an ARMv8 device running a 32-bit OS.

So it's running in AArch32 mode, and you want the encodings for that,
not the AArch64 version. I.e. the second encoding I mentioned, which
would be

.inst 0xf3b00300

or better, find a compiler version and options that knows about the
instructions you want to test (which I see you already asked about
below). Sorry I can't help with that, I know little about toolchains,
and have also never used the newer ARM instruction features like VFP,
SIMD, crytpo etc.

> I'm still trying to figure out how to build test cases for an Aarch32
> execution on Aarch64. Eventually it will go into an open source
> library's test script. Also see
> https://gcc.gnu.org/ml/gcc-help/2016-06/msg00097.html.

-- 
Tixy



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-07-28 Thread Jeffrey Walton
>> Using '.byte' below rather than '.inst' or '.inst.w' is another can of 
>> worms...
>
> And if I'm not mistaken, the part of the reason why you got the
> instructions wrong...
>
>> $ gcc -g3 -O0 -march=armv7-a -mfpu=neon test.cc -o test.exe
>> $ ./test.exe
>> $
>
> Does the tool-chain default to ARM or Thumb? I assume ARM code.

I believe its ARM.

>> $ cat test.cc
>> #include 
>> int main(int argc, char* argv[])
>> {
>>   __asm__ __volatile__
>>   (
>> ".code 32"
>
> BTW, above selects ARM code generation, but won't have any affect
> because you don't specify any labels or instruction mnemonics to
> assemble.

That's the only thing that managed to get a good disassembly from objdump -d.

>> // AES (aese)
>> ".byte 0x4e, 0x28, 0x48, 0x20;\n"
>
> So as instructions are little-endian that's 0x2048284e for a 32-bit
> instruction, or 0x284e2048 if it's a Thumb2 instruction (I'm showing
> that the same way as the ARM ARM does).

I pulled the encodings from a known good machine that used intrinsics.
I did not hand encode them (too much work).

> According to my copy of the ARM ARM, the AESE instruction has these
> encodings:
>
> For Thumb:
>
> 1 1 1 1 1 1 1 1 1 D 1 1 size 0 0 Vd 0 0 1 1 0 0 M 0 Vm
>
> For ARM
>
> 1 1 1 1 0 0 1 1 1 D 1 1 size 0 0 Vd 0 0 1 1 0 0 M 0 Vm
>
> For AArch64
>
> 0 1 0 0 1 1 1 0 size 1 0 1 0 0 0 0 1 0 0 1 0 Rn Rd
>
> So it looks like you've used the AArch64 encoding (for something
> compiled and presumably run as AArch32?!) and gotten the byte order the
> wrong way around.

I'm not sure if it matters, but this is an ARMv8 device running a 32-bit OS.

I'm still trying to figure out how to build test cases for an Aarch32
execution on Aarch64. Eventually it will go into an open source
library's test script. Also see
https://gcc.gnu.org/ml/gcc-help/2016-06/msg00097.html.

If you know how to do it, then please email me on a sidebar. I'm happy
to test theories. I think -mcpu=... factors into it somewhere.

Jeff



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-07-28 Thread Tixy
On Thu, 2016-07-28 at 00:48 -0400, Jeffrey Walton wrote:

> Using '.byte' below rather than '.inst' or '.inst.w' is another can of 
> worms...

And if I'm not mistaken, the part of the reason why you got the
instructions wrong...

> $ gcc -g3 -O0 -march=armv7-a -mfpu=neon test.cc -o test.exe
> $ ./test.exe
> $

Does the tool-chain default to ARM or Thumb? I assume ARM code.

> $ cat test.cc
> #include 
> int main(int argc, char* argv[])
> {
>   __asm__ __volatile__
>   (
> ".code 32"

BTW, above selects ARM code generation, but won't have any affect
because you don't specify any labels or instruction mnemonics to
assemble.

> // AES (aese)
> ".byte 0x4e, 0x28, 0x48, 0x20;\n"

So as instructions are little-endian that's 0x2048284e for a 32-bit
instruction, or 0x284e2048 if it's a Thumb2 instruction (I'm showing
that the same way as the ARM ARM does).

According to my copy of the ARM ARM, the AESE instruction has these
encodings:

For Thumb:

1 1 1 1 1 1 1 1 1 D 1 1 size 0 0 Vd 0 0 1 1 0 0 M 0 Vm

For ARM

1 1 1 1 0 0 1 1 1 D 1 1 size 0 0 Vd 0 0 1 1 0 0 M 0 Vm

For AArch64

0 1 0 0 1 1 1 0 size 1 0 1 0 0 0 0 1 0 0 1 0 Rn Rd

So it looks like you've used the AArch64 encoding (for something
compiled and presumably run as AArch32?!) and gotten the byte order the
wrong way around.

Disclaimer, I'm only on my first coffee of the morning, so quite likely
not 100% accurate in my statements above. ;-)

-- 
Tixy



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-07-27 Thread Jeffrey Walton
On Wed, Jul 27, 2016 at 2:18 AM, Paul Wise  wrote:
> On Wed, Jul 27, 2016 at 11:28 AM, Jeffrey Walton wrote:
>
>> I recently purchased a Raspberry Pi 3. Its got a Broadcom SoC, and its
>> ARMv8.
> ...
>> model name: ARMv7 Processor rev 4 (v7l)
>
> Looks like you are running it in ARMv7 32-bit mode, perhaps that
> disables the ARMv8 features.
>
> I recently watched the DebConf16 ARM talk and from memory support for
> 64-bit/ARMv8 on the RPi3 is still in progress.

I've had some time to kick the tires, so to speak.

CPU flags indicate only crc32 from the ARMv8 instruction set. And I
can't get the stock toolchain to consume other intrinsics, like PMULL
and PMULL2.

However, dropping into the GCC extended assembler, the program
executes CRC, PMULL, PMULL2, AES, SHA1 and SHA2 without causing an
illegal instruction.

It would be nice if the Raspberry folks enabled the intrinsics and
instructions in the toolchain for the devs who have the specialized
code to take advantage of it.

***

Using '.byte' below rather than '.inst' or '.inst.w' is another can of worms...

$ gcc -g3 -O0 -march=armv7-a -mfpu=neon test.cc -o test.exe
$ ./test.exe
$

$ cat test.cc
#include 
int main(int argc, char* argv[])
{
  __asm__ __volatile__
  (
".code 32"

// CRC using word
".byte 0x1a, 0xc1, 0x58, 0x00;\n"
// CRC using half word
".byte 0x1a, 0xc1, 0x54, 0x00;\n"
// CRC using byte
".byte 0x1a, 0xc1, 0x50, 0x00;\n"
// PMULL
".byte 0x0e, 0xe1, 0xe0, 0x00;\n"
// PMULL2
".byte 0x4e, 0xe1, 0xe0, 0x00;\n"
// AES (aese)
".byte 0x4e, 0x28, 0x48, 0x20;\n"
// AES (aesd)
".byte 0x4e, 0x28, 0x58, 0x20;\n"
// SHA1 (sha1c)
".byte 0x5e, 0x02, 0x00, 0x20;\n"
// SHA1 (sha1m)
".byte 0x5e, 0x02, 0x20, 0x20;\n"
// SHA1 (sha1p)
".byte 0x5e, 0x02, 0x30, 0x20;\n"
  :
  :
  : "cc", "d0", "d1", "d2", "q0", "q1", "q2"
  );

  return 0;
}



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-07-27 Thread Paul Wise
On Thu, Jul 28, 2016 at 1:28 AM, Alan Corey wrote:

> Yes, so they claim and I wonder how they're going to deal with the
> fact that some Pis are 32 bit and some 64.

ISTR that they plan on keeping 32-bit for the official stuff
recommended by the RPi folks for simplification.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-07-27 Thread Alan Corey
> 64-bit/ARMv8 on the RPi3 is still in progress.

Yes, so they claim and I wonder how they're going to deal with the
fact that some Pis are 32 bit and some 64.  I posted this question
there but I haven't looked into the links in the response a lot:
https://www.raspberrypi.org/forums/viewtopic.php?f=63=154497=1010500#p1010500
-- 
Credit is the root of all evil.  - AB1JX



Re: Broadcom BCM2709, ARMv8, and missing CPU features

2016-07-27 Thread Paul Wise
On Wed, Jul 27, 2016 at 11:28 AM, Jeffrey Walton wrote:

> I recently purchased a Raspberry Pi 3. Its got a Broadcom SoC, and its
> ARMv8.
...
> model name: ARMv7 Processor rev 4 (v7l)

Looks like you are running it in ARMv7 32-bit mode, perhaps that
disables the ARMv8 features.

I recently watched the DebConf16 ARM talk and from memory support for
64-bit/ARMv8 on the RPi3 is still in progress.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise