Re: RE: RE: Java acceleration/Jazelle

2007-12-08 Thread Simon Pickering

Good to see some interest in this still :)

>> Do we need to set some bit to enable Jazelle?
>
> No, the ARM people like to have special instructions to change   
> processor modes, like "ENTERX" or "BXJ" or the like. (I think. :))

No, we do have to explicitly setup Jazelle mode (IMHO).

I stumbled upon these pages on the ARM site:
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344d/Chdiciaj.html

There is a coprocessor which controls the Jazelle hardware and this  
might well imply that it needs to be setup/enabled for Jazelle to work  
(rather than just branching). I believe that the curious mix of faults  
we were seeing when trying to BXJ were simply the fact that BXJ does a  
BX branch (when Jazelle is not enabled), and was branching to our  
bytecodes but interpreting them as an ARM instruction. I tested this  
(bit painful working backwards from the binary to find the  
instruction), but need to dig out the test code if you want to see it.  
Otherwise just craft your bytecodes to be the same as an ARM binary  
instruction and check the fault you generate.

I was pointed towards a later version of the ARM Technical reference  
manual than the one we get in this country  
(https://www.jp.arm.com/document/manual/files/051020DDI0100HJ_v6_1.pdf) which  
contains a fair bit of information about Jazelle and these registers,  
etc.

I spent some time on Google translating it, here's my result (still  
needs some tweaking, any Japanese speakers out there fancy that?):  
http://people.bath.ac.uk/enpsgp/Japanese_jazelle_pdf_translation.rtf

I must admit I've been a bit busy hacking on other things so haven't  
done much else with this (it doesn't tell us how to enable Jazelle,  
nor about the structure of the bytecode lookup table, etc.). I suppose  
the next step is to experiment with the coprocessor and see if we can  
work out what to feed it to enable Jazelle.

Cheers,


Simon


___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: RE: RE: Java acceleration/Jazelle

2007-12-07 Thread Danny Milosavljevic
Hello,

On Wed, 15 Aug 2007 12:43:43 +0100, Simon Pickering wrote:

> Thank you for the links, these are things I've not seen before. 

No problem :-)

> bytecodes are byte-length (or variable length depending on their arguments), 
> not 32bit as we were thinking. 

Yeah, the confusion arises because the CPU does a _prefetch_ - always fetching 
words - in Java mode. That doesn't mean the instruction is as big as a word, 
quite the opposite. Usually it will fetch 4 java instructions that way.

>This does assume that the Address
> column is showing the address in terms of bytes and not some other unit, but I
> think this is a fair assumption.

Yeah.

> Do my original R12 and R14 mappings mean anything I wonder (see last section 
> of
> this email), or were they just random names for the patent?

I think they changed them afterwards... not sure, though.

> these values are not altered when the exception occurs. I've looked at this in
> passing and it doesn't seem to show anything (that I expected - see my 
> previous
> long email to see for yourselves) in the registers after the exception handler
> has been run. 

Yeah, I think we ned to set up some more complicated vector table, see below.

> Obviously we ought to be setting the pointer to the Jazelle exception table 
> (if
> we knew which register to put it in and what form it takes!), 

See below ;-)

> but do we also
> need to setup things like the Java stack pointer, pointer to variables area 
> and
> constant pool pointer? Even if we don't need to actually initialise the data 
> at
> these addresses, do we need to allocate some memory and then provide pointers?

Good question. I guess for testing it wouldn't hurt to use valid addresses for 
these until we get something running and then start ripping stuff out again?

> I created some test code for this:
> http://people.bath.ac.uk/enpsgp/nokia770/jazelle/jalimo6a.c

Cool, I'm pulling my hair out how to get the inline assembler to nicely align 
things (pad handler code to 32 bytes). Help? :)

> I don't know whether the BXJ instruction requires the condition code suffix, 
> but
> it certainly compiles without complaint.

No, that's an ARM special feature where you can always have a condition on each 
instruction. If the condition did not hold, it would skip the instruction. I 
think this is so that branch prediction has an easier time (because for small 
things there will be no branch needed, the total number of branches will go 
down a lot).

> The output is:
> 
> Jalimo6a.bin
> 
> 1: x/i $pc  0x841c :  bxjne   r0
> (gdb) info registers
> r0 0xbef68640   -1091140032
> r1 0x8428   33832
> r2 0x8428   33832
> r3 0x8428   33832
> r4 0x8428   33832
> r5 0x8428   33832
> r6 0x8428   33832
> r7 0x8428   33832
> r8 0x8428   33832
> r9 0x8428   33832
> r100x8428   33832
> r110x8428   33832
> r120x8428   33832
> sp 0x8428   33832
> lr 0x8428   33832
> pc 0x841c   33820
> fps0x100100016781312
> cpsr   0x2010   536870928
> (gdb) si
> 
> Program received signal SIGILL, Illegal instruction.

I think it's disliking that the SW table is not set up.

> Do we need to set some bit to enable Jazelle?

No, the ARM people like to have special instructions to change processor modes, 
like "ENTERX" or "BXJ" or the like. (I think. :))

> All of the unused byte codes are handled as exceptions. 

Well not exactly normal exceptions, but there is a vector table with handler 
code, yes.

> One of the unused byte codes is used as the means to return to the calling 
> program.

> This confirms Scott's idea. From the wording it looks like it's up to the 
> handler software to actually perform the return operation, rather than the 
> Jazelle hardware doing it itself.

Yeah...

> So we know(?) R14 is the bytecode PC.

Some say it's R15. Well, we don't need to touch it manually yet...

Anyway, new info from
 
integrated:

R0..R3   cache java expression stack
R4   local variable 0 ("this" pointer) (montioned in 2 different sources)
R5   pointer to table of SW handlers
R6   Java stack pointer (mentioned in 2 different sources)
R7   Java variables pointer
R8   Java constant pool pointer
R9..R11  reserved for JVM (hardware doesn't use them)
R12,R14  sometimes Java return address for some instructions. Some say Java PC 
is in R14.
R13  Machine stack pointer (not Java)
R14  interrupt handler saves PC here (although doesn't it do that in the 
shadow register?)
R15  Java PC


R5 is maybe a table that can be used by Thumb-EE, which would be (got this from 
some ARM manual on their web site, I think):

>Handlers: HB{L} #handler
>A new 16-bit instruction, which performs a branch, with optional link, to one 
>of 

Re: RE: Java acceleration/Jazelle

2007-08-15 Thread P. Durante
Hi,
sorry for chiming in, incidentally I've just started to study the arm
architecture and the ARM1136J-S manual seems clear about a couple of
things

On 8/15/07, Simon Pickering <[EMAIL PROTECTED]> wrote:
> Thank you for the links, these are things I've not seen before.
>
> > So let me dump the stuff I turned up so far:
> >
> > URL: 
> > Here you can see the size and alignment of the java instructions.
> > (the entire document is
> > )
>
> Looking at the Memory Processor view in Jazelle state (fig 5-39 on page 5-33 
> of
> the pdf), the left-hand column showing the Address of the bytecodes indicates
> that bytecodes are byte-length (or variable length depending on their
> arguments), not 32bit as we were thinking. This does assume that the Address
> column is showing the address in terms of bytes and not some other unit, but I
> think this is a fair assumption.
>
> The same thing is seen in the disassembler shown in Fig 5-52 on page 5-41.
> Section 6.5 on page 6-9 specifically states that Jazelle assembly instructions
> are 8-bit. So we can conclude that they are byte aligned rather than word
> aligned. I wonder why the word aligned code appeared to work?
>
>
[quote]
The ARM1136JF-S processor has three operating states:

ARM state 32-bit, word-aligned ARM instructions are
executed in this state.
Thumb state   16-bit, halfword-aligned Thumb instructions.
Java stateVariable length, byte-aligned Java instructions.

...
In Java state, all instruction fetches are in words.
[/quote]

> "The key to making this approach work lies in a single new ARM instruction, 
> "BXJ
> Rm," for entering Java state. This instruction first performs a test on one of
> the condition codes. If the condition is met, it then stores the current 
> program
> counter (PC), puts the processor into Java state, branches to the specified
> target address and begins executing Java byte codes."
>
> Performs a test on one of the condition codes Which one I wonder? Or is 
> this
> where a Java flag is checked (I'll have to take another look in the chip 
> manual
> pdf). Anyone have any thoughts?
>

Almost all ARM instructions can be optionally executed, that is, the 4
most significant bits of almost every ARM opcode include a condition
code, if the condition is met, the instruction is executed, otherwise
it is equivalent to a NOP,

[quote]
Branch and exchange to Java state: BXJ{cond} 
[/quote]

if you don't specify a condition, the assembler assumes AL and the
branch is unconditionally executed, nothing fancier than that

some other possible values for {cond} are: EQ (Z=1), NE(Z=0), LT(N!=V), VS (V=1)

> My understanding is that condition codes are N(egative), Z(ero), C(arried 
> over)
> and (o)V(erflow) and that the J bit, which is also in CPSR (and isn't a
> condition code afaik), is set by the BXJ instruction, rather than needing to 
> be
> set before the BXJ instruction. In fact setting this bit is explicitly advised
> against wherever it's mentioned. Therefore do we need to do a CMP before the 
> BXJ
> to get it to do something?
>

[quote]
You can switch the operating state of the ARM1136JF-S processor
between ARM state and Java state using the BXJ instruction.
[/quote]

and also

[quote]
MSR cannot be used to change the J bit in the CPSR.
[/quote]

so, you don't set it, the CPU does, it seems no other setup is
required, but I can't be sure about that

> I don't know whether the BXJ instruction requires the condition code suffix, 
> but
> it certainly compiles without complaint.
>

no suffix means BXJAL

regards,
Paolo
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


RE: RE: Java acceleration/Jazelle

2007-08-15 Thread Simon Pickering
Thank you for the links, these are things I've not seen before. 

> So let me dump the stuff I turned up so far:
> 
> URL: 
> Here you can see the size and alignment of the java instructions.
> (the entire document is 
> )

Looking at the Memory Processor view in Jazelle state (fig 5-39 on page 5-33 of
the pdf), the left-hand column showing the Address of the bytecodes indicates
that bytecodes are byte-length (or variable length depending on their
arguments), not 32bit as we were thinking. This does assume that the Address
column is showing the address in terms of bytes and not some other unit, but I
think this is a fair assumption.

The same thing is seen in the disassembler shown in Fig 5-52 on page 5-41.
Section 6.5 on page 6-9 specifically states that Jazelle assembly instructions
are 8-bit. So we can conclude that they are byte aligned rather than word
aligned. I wonder why the word aligned code appeared to work? 




> 
+arm+bxj&hl=en&ct=clnk&cd=4&client=firefox-a>:



> In Java state, the processor assigns several ARM registers to 
> functions specific to the Java machine (for example, R6 = 
> stack pointer, R0-R3 = top elements of stack, R4 = local 
> variable 0). This hardware reuse contributes to the small 
> size of the additional logic (12,000 gates) required to 
> implement the Java machine, and keeps all of the states 
> required by the Jazelle extension in ARM registers, In 
> addition, it ensures compatibility with existing operating 
> systems, interrupt handlers and exception code.
> 
> Keeping the top four elements of the stack in ARM registers [...]. 
> 
> The extension we've added divides Java byte codes into three 
> classes: directly executed, emulated and undefined. The 
> majority of the Java byte codes (138 on the ARM926EJ-S 
> microprocessor core) are executed directly in hardware; the 
> remainder are emulated by short sequences of highly optimized 
> ARM instructions. 
> --

So we now have the following register mappings:

Top elements of stack - probably R0, R1, R2, R3
logical variable 0 - might be R4
Pointer to exception table - ??
Pointer to Java stack - ??
Pointer to Java variables area - ??
Pointer to the constant pool - ??

Do my original R12 and R14 mappings mean anything I wonder (see last section of
this email), or were they just random names for the patent?

I suppose we could try testing some of these other register mappings by pushing
things to the stack and setting the value of local variable 0 and then looking
at the registers once the code returns from the BXJ call. This assumes that
these values are not altered when the exception occurs. I've looked at this in
passing and it doesn't seem to show anything (that I expected - see my previous
long email to see for yourselves) in the registers after the exception handler
has been run. This may be an effect of the ARM exception handler overwriting
though.

Obviously we ought to be setting the pointer to the Jazelle exception table (if
we knew which register to put it in and what form it takes!), but do we also
need to setup things like the Java stack pointer, pointer to variables area and
constant pool pointer? Even if we don't need to actually initialise the data at
these addresses, do we need to allocate some memory and then provide pointers?

There's another interesting bit in this article:

"The key to making this approach work lies in a single new ARM instruction, "BXJ
Rm," for entering Java state. This instruction first performs a test on one of
the condition codes. If the condition is met, it then stores the current program
counter (PC), puts the processor into Java state, branches to the specified
target address and begins executing Java byte codes."

Performs a test on one of the condition codes Which one I wonder? Or is this
where a Java flag is checked (I'll have to take another look in the chip manual
pdf). Anyone have any thoughts? 

My understanding is that condition codes are N(egative), Z(ero), C(arried over)
and (o)V(erflow) and that the J bit, which is also in CPSR (and isn't a
condition code afaik), is set by the BXJ instruction, rather than needing to be
set before the BXJ instruction. In fact setting this bit is explicitly advised
against wherever it's mentioned. Therefore do we need to do a CMP before the BXJ
to get it to do something?

I created some test code for this:
http://people.bath.ac.uk/enpsgp/nokia770/jazelle/jalimo6a.c

I don't know whether the BXJ instruction requires the condition code suffix, but
it certainly compiles without complaint.

The output is:

Jalimo6a.bin

1: x/i $pc  0x841c :  bxjne   r0
(gdb) info registers
r0 0xbef68640   -1091140032
r1 0x8428   33832
r2 0x8428   3

Re: RE: Java acceleration/Jazelle

2007-08-13 Thread Danny Milosavljevic
Hi,

great to see someone tinkering with jazelle.

So let me dump the stuff I turned up so far:

URL: 
Here you can see the size and alignment of the java instructions.
(the entire document is )

:

Once in Java state, the ARM PC is extended to 32 bits to address Java byte 
code. Byte codes are fetched and decoded in two stages (compared with a single 
decode stage when in ARM Thumb instruction-set state). A new Current Processor 
Status Register (CPSR) bit records the processor state. This is an important 
feature, as the CPSR is automatically saved and restored when handling 
interrupts and exceptions, so Jazelle technology is compatible with the 
existing ARM interrupt/exception model used by operating systems.

[Further investigation shows this new register format to be called JPSR 

"A typical display of a Jazelle Program Status Register might show
nZCvqIFtJSVC, giving information about:
• 5 condition code flags (NZCVQ)
• 2 interrupt enable flags (IF)
• 2 state indicators (TJ)
• 1 processor mode name (SVC)."

  (so it seems they cut the mode to 3 bits and added a "J" bit, I would have 
expected it in "x – the extension field PSR[15:8]", but no)

]

In Java state, the processor assigns several ARM registers to functions 
specific to the Java machine (for example, R6 = stack pointer, R0-R3 = top 
elements of stack, R4 = local variable 0). This hardware reuse contributes to 
the small size of the additional logic (12,000 gates) required to implement the 
Java machine, and keeps all of the states required by the Jazelle extension in 
ARM registers, In addition, it ensures compatibility with existing operating 
systems, interrupt handlers and exception code.

Keeping the top four elements of the stack in ARM registers [...]. 

The extension we've added divides Java byte codes into three classes: directly 
executed, emulated and undefined. The majority of the Java byte codes (138 on 
the ARM926EJ-S microprocessor core) are executed directly in hardware; the 
remainder are emulated by short sequences of highly optimized ARM instructions. 
--
:
For efficiency, ARM keeps one of the local variables at zero in one of the ARM 
registers. Java applications frequently use the local variable at zero as a 
pointer to data. By keeping it in a register rather than in memory, the 
processor can perform better.

Additionally, ARM uses other registers for other pointers. A pointer to the 
exception table holds the instruction sequences for the instructions that are 
not executed directly. Also, there is a pointer to the Java stack, a pointer to 
the Java variables area, and a pointer to the constant pool. Java programs 
access these groups of data all the time and keep them in existing ARM 
registers. 
--
+ /* V5J instruction.  */
+ {0x012fff20, 0x0ff0, "bxj%c\t%0-3r"},
--

I hope this wasn't too useless, but I just wanted to post this before I forget 
it again (I can  already feel it all vanish :-))

cheers,
   Danny

___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


RE: Java acceleration/Jazelle

2007-07-30 Thread Simon Pickering
Hello all.

My apologies this is going to be a long one...

All the code mentioned in this email can be found under this directory:
http://people.bath.ac.uk/enpsgp/nokia770/jazelle/

After reading the patent I wrote a piece of code to test whether Jazelle works,
as Scott Bambrough suggested. The patent indicated that R14 should hold the
address of the Java bytecodes while R12 might possibly hold the address of a
handler. The code I wrote performs BXJ R12 (with R12 pointing to the handler and
R14 pointing to the Java code). In the handler I was trying to get the current
bytecode value printed out by calling printf inside assembler.

Here's the code:
(http://people.bath.ac.uk/enpsgp/nokia770/jazelle/test_jazelle9.c), I can't seem
to get printf to work in this code, even though it quite happily works in
another piece of test code
(http://people.bath.ac.uk/enpsgp/nokia770/jazelle/test_printf2.c). I've no idea
what's causing the difference, can anyone see something I've missed? I should
note that there is an error in the test_jazelle9.c code which means it won't
work correctly after the printf anyway. Branching away to this function alters
the value of R14 (which should contain the address of the Java bytecode). This
can be easily fixed by saving R14 to another unused register or memory location
over the call.

So I then removed all of the printf business and started using gdb to step
through the code and look at the registers as the code progresses. The idea is
that within the handler the value in R14 should change and the bytecodes are
handled, and if some known bytecodes are encountered, the value of R14 should
jump by more than sizeof(java bytecode).

This code is here:
(http://people.bath.ac.uk/enpsgp/nokia770/jazelle/test_jazelle10.c).

I've stepped through this code with gdb and the bad news is that the handler is
called all the time (even for the instructions that should be handleable).
Results below:

Java code address=67240

I added a breakpoint just after the start of the handler code. Most of the
registers are uninteresting as they do not change at all except for R1 into
which we store the bytecode value (but the previous value as this instruction
hasn't had a chance to run yet - poor choice on my part) and LR (R14) which
contains the pointer to the current bytecode.

This is the first run through the handler, that's why R1 doesn't contain a
bytecode value

1: x/i $pc  0x8444 :  ldr r1, [lr]
(gdb) info registers
r1 0x40008000   1073774592
lr 0x106a8  67240

1: x/i $pc  0x8444 :  ldr r1, [lr]
(gdb) info registers
r1 0xcc 204
lr 0x106ac  67244

1: x/i $pc  0x8444 :  ldr r1, [lr]
(gdb) info registers
r1 0xcd 205
lr 0x106b0  67248

1: x/i $pc  0x8444 :  ldr r1, [lr]
(gdb) info registers
r1 0xce 206
lr 0x106b4  67252

1: x/i $pc  0x8444 :  ldr r1, [lr]
(gdb) info registers
r1 0xcf 207
lr 0x106b8  67256

1: x/i $pc  0x8444 :  ldr r1, [lr]
(gdb) info registers
r1 0x10 16
lr 0x106bc  67260

1: x/i $pc  0x8444 :  ldr r1, [lr]
(gdb) info registers
r1 0x0  0
lr 0x106c0  67264

1: x/i $pc  0x8444 :  ldr r1, [lr]
(gdb) info registers
r1 0x0  0
lr 0x106c4  67268

1: x/i $pc  0x8444 :  ldr r1, [lr]
(gdb) info registers
r1 0x0  0
lr 0x106c8  67272

1: x/i $pc  0x8444 :  ldr r1, [lr]
(gdb) info registers
r1 0x2a 42
lr 0x106cc  67276

1: x/i $pc  0x8444 :  ldr r1, [lr]
(gdb) info registers
r1 0x3b 59
lr 0x106d0  67280

1: x/i $pc  0x8444 :  ldr r1, [lr]
(gdb) info registers
r1 0x1a 26
lr 0x106d4  67284

1: x/i $pc  0x8444 :  ldr r1, [lr]
(gdb) info registers
r1 0xd0 208
lr 0x106d8  67288

1: x/i $pc  0x8444 :  ldr r1, [lr]
(gdb) info registers
r1 0xd1 209
lr 0x106dc  67292

Program exited normally.

So, the thing to see from these results are that even the bytecodes that we'd
expect to be handled were not, control was always passed straight to the
handler. This raises a couple of questions. Are the register choices correct for
passing the handler and Java bytecode addresses? Does Java need to be enabled
somehow before the Jazelle hardware starts working (i.e. is this BXJ currently
working as a simple B instruction)? Another possibility is whether I need to use
a byte array rather than an int array for the bytecodes?

Staying with R12 and R14 in the hope that the patent almost gave us the correct
information, I wrote and tested some other pieces of code:

(test_jazelle10.c - use int array for bytecodes, R12=handler address,
R14=bytecode address), call BXJ R12

test_jazelle10b.c - use char array for bytecodes, R12=handler address,
R14=bytecode address, call BXJ R12

test_jazelle1

Re: Java acceleration/Jazelle

2007-07-18 Thread Simon Pickering
I've adjusted the code I wrote to test Scott Bambrough's suggestion 
(see earlier in the thread). The name and URL is still the same: 
http://people.bath.ac.uk/enpsgp/nokia770/jazelle/test_jazelle7.c

After the code starts and BXJ R12 is issued, the code does branch to 
the handler whose address is contained in R12 (this can be seen as the 
first entry of the output buffer is equal to the first of the code 
(input) buffer. This may or may not indicate that Jazelle is working - 
on the one hand this is what we'd expect, on the other if Jazelle is 
disabled for some reason this is also what we'd see. We need to get it 
to process more than one instruction to determine which is the case. 
Unfortunately the code segfaults on the second BXJ R12 instruction (the 
one issued at the end of the handler starting at label 2:).

It is definitely this instruction causing the problem, I've spent all 
evening debugging the code to check I'd not made some silly mistake 
(which is not to say I just haven't spotted it) and this instruction is 
the one that produces the segfault.

Therefore, the question is why does this happen? The only apparent 
difference between this BXJ and the first one is that the address is to 
a backwards label rather than a forwards label. I don't think this 
should cause any problems. The other possibility is that the Jazelle 
hardware is now fired up and therefore needs something different to be 
done to it, or I have managed to overwrite one of the registers that 
it's using and it's therefore not happy. This sort of indicates that 
Jazelle is "working", though I'll wait and see if anyone has any other 
ideas

Any suggestions before I try reducing my register usage and trying 
other registers?

Thanks,


Simon
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Java acceleration/Jazelle

2007-07-18 Thread Simon Pickering
Hi Siarhei,

>> Does anyone know whether there are there any good docs/books on ARM asm
>> programming, telling people these sort of things? This is an interesting
>> (and hopefully useful) learning experience, but can be really frustrating
>> when I know what I want to do, and pretty much how to, but not quite! :)
>> E.g. calling functions in linked libraries, how to call .s file functions
>> from C, what is and isn't allowed in in-line asm, etc.
>
> I would recommend checking the following documentation from ARM website:



> http://www.arm.com/pdfs/aapcs.pdf
> "Procedure Call Standard for the ARM Architecture" for the information about
> calling conventions and arguments passing between asm and C and generally
> about ABI.

I'd not spotted this one, thank you for the pointer.

Simon
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Java acceleration/Jazelle

2007-07-18 Thread Siarhei Siamashka
On Wednesday 18 July 2007 13:01, Simon Pickering wrote:

> Does anyone know whether there are there any good docs/books on ARM asm
> programming, telling people these sort of things? This is an interesting
> (and hopefully useful) learning experience, but can be really frustrating
> when I know what I want to do, and pretty much how to, but not quite! :)
> E.g. calling functions in linked libraries, how to call .s file functions
> from C, what is and isn't allowed in in-line asm, etc.

I would recommend checking the following documentation from ARM website:

http://www.arm.com/documentation/Instruction_Set/index.html
"ARM v5TE Architecture Reference Manual" for the detailed information about
the instruction set (up to ARMv5TE). Unfortunately it does not cover new ARMv6
instructions (I used Quick Reference Card to get some information about them).

http://www.arm.com/pdfs/aapcs.pdf
"Procedure Call Standard for the ARM Architecture" for the information about
calling conventions and arguments passing between asm and C and generally
about ABI.

http://www.arm.com/documentation/ARMProcessor_Cores/index.html
"ARM1136JF-S and ARM1136J-S r1p1 Technical Reference Manual" for ARM11
pipeline description and instruction timings (useful when optimizing for 
N800).

"ARM9EJ-S Revision r1p2 Technical Reference Manual" for ARM9E pipeline
description and instruction timings (useful when optimizing for 770).

These four pdf files cover almost everything needed if you are interested in
assembly programming for Nokia 770 and N800. But surely ARM website 
provides many other interesting documents worth reading.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


RE: Java acceleration/Jazelle

2007-07-18 Thread Simon Pickering
> >> The second thought is learn the ABI convention for calling C methods
> >> from assembly and you can pass whatever data you need to a function
> >> that will do the printing for you.  I'd suggest going with this route
> >> since it will be the most straightforward without soldering but also
> >> the least versatile.  --You may end up rebooting many, many times and
> >> the less overhead there is to initialize before playing around the
> >> faster you can try a new idea.  At any rate, I believe the ABI changed
> >> when we went to the armel format and so there's a handy description on
> >> changes to system calls and so forth here:
> >> http://wiki.debian.org/ArmEabiPort  Sorry if I'm pointing out anything
> >> that might be obvious, these are just things I had to work through
> >> when I was working on this type of thing on OS-less boards.
> >> 
> > Thanks for that pointer, I'd forgotten about that page. It does say how
> > registers are passed (but I think it says to system calls rather than some
> > random function). I'll have to look at it again and do some more digging
(and in
> > the meantime save the value of R14 to an array each time the handler code is
> > called, then print it out in C after the asm has completed). 
> >   
> That part is talking about normal function calls. So you have 4 
> registers to pass arguments,
> rest is passed in stack.

The title of that section is "System call interface" and it describes how to
load registers with arguments and how to get return values. That's fair enough,
thinking about it, printing to stdout will involve a system call that I should
be able to generate using this and other information (like a list of system call
numbers, so I can work out what to call), but it's still a system call, rather
than a call to a C function that I've written (my explanation may not have been
clear enough, my apologies). Or have I completely mis-read the page?

What I was really after was a way of calling a C function from within some
inline asm. I was wondering whether the assembler was able to provide a way of
jumping to C functions written in the same source file as the asm() itself? I
suppose there may be a system call that will take a function name as an argument
and call that, this would work for library functions and probably for functions
in the same source file. I need to do some Googling.

This has got me thinking, what happens when the end of a piece of inline asm is
reached? Do I need to always jump to a label I've placed at the end of the
inline asm, or is there a way to return to the C code from in the middle of the
asm. Another one: can I know the address of a piece of asm (within the asm or
surrounding C code)? So that I could jump from one piece to another? This is
probably trying to use inline asm for things it's not designed for, I suppose I
may be better writing a function in asm and putting it in a .S file and calling
that from the C code. Again Google here I come.

> > Does anyone know whether there are there any good docs/books on ARM asm
> > programming, telling people these sort of things? This is an interesting
(and
> > hopefully useful) learning experience, but can be really frustrating when I
know
> > what I want to do, and pretty much how to, but not quite! :) E.g. calling
> > functions in linked libraries, how to call .s file functions from C, what is
and
> > isn't allowed in in-line asm, etc.
> >   
> I don't think there is really such a book, but you can use 
> gcc -S to see what kind of assembler
> gcc generates to call functions. 

Yes, that's a good idea. The only problem I can see is that the function I may
want to call has already been converted into asm, with a label denoting the
start of the function, rather than being in C. It may be that there is actually
no way of branching from inline asm to a C function. This is the kind of thing
I'd like a book to be able to tell me (so I don't waste time trying to find out
how to do something that's not possible, etc.)

On the other hand it will work for calling printf, so I'll take a look at the
asm of a "hello world" program.

Thanks,


Simon

___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Java acceleration/Jazelle

2007-07-18 Thread Riku Voipio
Simon Pickering wrote:
>>   The second thought is learn the ABI convention for calling C methods
>> from assembly and you can pass whatever data you need to a function
>> that will do the printing for you.  I'd suggest going with this route
>> since it will be the most straightforward without soldering but also
>> the least versatile.  --You may end up rebooting many, many times and
>> the less overhead there is to initialize before playing around the
>> faster you can try a new idea.  At any rate, I believe the ABI changed
>> when we went to the armel format and so there's a handy description on
>> changes to system calls and so forth here:
>> http://wiki.debian.org/ArmEabiPort  Sorry if I'm pointing out anything
>> that might be obvious, these are just things I had to work through
>> when I was working on this type of thing on OS-less boards.
>> 
>
> Thanks for that pointer, I'd forgotten about that page. It does say how
> registers are passed (but I think it says to system calls rather than some
> random function). I'll have to look at it again and do some more digging (and 
> in
> the meantime save the value of R14 to an array each time the handler code is
> called, then print it out in C after the asm has completed). 
>   
That part is talking about normal function calls. So you have 4 
registers to pass arguments,
rest is passed in stack.

> Does anyone know whether there are there any good docs/books on ARM asm
> programming, telling people these sort of things? This is an interesting (and
> hopefully useful) learning experience, but can be really frustrating when I 
> know
> what I want to do, and pretty much how to, but not quite! :) E.g. calling
> functions in linked libraries, how to call .s file functions from C, what is 
> and
> isn't allowed in in-line asm, etc.
>   
I don't think there is really such a book, but you can use gcc -S to see 
what kind of assembler
gcc generates to call functions. Otoh it sounds what want to know is 
already in jalimo jamvm
in src/os/linux/arm/callNative.S =)

Since the virtual machine is not only running the bytecode, taking jamvm 
(or something else)
as the base could make sense.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Aw: RE: Java acceleration/Jazelle

2007-07-18 Thread Jason
Hi,
the best book i know about the ARM is "ARM SYSTEM-ON-CHIP ARCHITETURE" from 
Steve Furber, who was one of the developer of the ARM. But i dont know, if 
there is an aktual edition. Mine is from 2002.
Ciao
Jason

- Ursprüngliche Mitteilung -
Von: Simon Pickering <[EMAIL PROTECTED]>
An: 'Larry Battraw' <[EMAIL PROTECTED]>
Cc: maemo-developers@maemo.org
Gesendet: Mi., 18. Jul. 2007 12:01:23 CEST
Betreff: RE: Java acceleration/Jazelle

...

Does anyone know whether there are there any good docs/books on ARM asm
programming, telling people these sort of things? This is an interesting (and
hopefully useful) learning experience, but can be really frustrating when I know
what I want to do, and pretty much how to, but not quite! :) E.g. calling
functions in linked libraries, how to call .s file functions from C, what is and
isn't allowed in in-line asm, etc.

...
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Java acceleration/Jazelle

2007-07-18 Thread Brian Waite
>
> A couple thoughts from a former hardware hacker here: first, serial
> ports are your friend so if you can find a sacrificial device that has
> a cracked screen or some other serious but non- life-threatening
> defect you should probably invest in a level-shifter chip and a DB-9
> connector (and some soldering cleverness) to be able to communicate
> through the serial port.  
Serial port will make life MUCH better!

Serial (and USB ) PDA cables have level shifter built in usually. Just snip 
the PDA connector off and figure out the wires (I am sure google is your 
firend)
> Sending bytes back and forth that way is 
> trivial and will allow you to seriously goof around with an otherwise
> worthless device at the kernel or bootloader level.

___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


RE: Java acceleration/Jazelle

2007-07-18 Thread Igor Stoppa
On Wed, 2007-07-18 at 11:01 +0100, ext Simon Pickering wrote:

> Yes, in an ideal world this would be nice, but I'm doing okay with ssh/sftp 
> over
> wifi for the time being. Out of interest, does the N800 actually have solder
> points for a serial port at some known location?

There are websites, such as mobilchips that distribute schematics and
assembly instructions for phones and other devices. They haven't managed
to get an N800 manual, but they have the 770 one.

I don't know how legal it is for them to distribute such documents and
for people to download them, so this is _not_ a written permission or
anything similar, just a comment.

-- 
Cheers, Igor

Igor Stoppa <[EMAIL PROTECTED]>
(Nokia Multimedia - CP - OSSO / Helsinki, Finland)
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


RE: Java acceleration/Jazelle

2007-07-18 Thread Simon Pickering
Hi Larry, 

> A couple thoughts from a former hardware hacker here: first, serial
> ports are your friend so if you can find a sacrificial device that has
> a cracked screen or some other serious but non- life-threatening
> defect you should probably invest in a level-shifter chip and a DB-9
> connector (and some soldering cleverness) to be able to communicate
> through the serial port.  Sending bytes back and forth that way is
> trivial and will allow you to seriously goof around with an otherwise
> worthless device at the kernel or bootloader level.

Yes, in an ideal world this would be nice, but I'm doing okay with ssh/sftp over
wifi for the time being. Out of interest, does the N800 actually have solder
points for a serial port at some known location?

>   The second thought is learn the ABI convention for calling C methods
> from assembly and you can pass whatever data you need to a function
> that will do the printing for you.  I'd suggest going with this route
> since it will be the most straightforward without soldering but also
> the least versatile.  --You may end up rebooting many, many times and
> the less overhead there is to initialize before playing around the
> faster you can try a new idea.  At any rate, I believe the ABI changed
> when we went to the armel format and so there's a handy description on
> changes to system calls and so forth here:
> http://wiki.debian.org/ArmEabiPort  Sorry if I'm pointing out anything
> that might be obvious, these are just things I had to work through
> when I was working on this type of thing on OS-less boards.

Thanks for that pointer, I'd forgotten about that page. It does say how
registers are passed (but I think it says to system calls rather than some
random function). I'll have to look at it again and do some more digging (and in
the meantime save the value of R14 to an array each time the handler code is
called, then print it out in C after the asm has completed). 

Does anyone know whether there are there any good docs/books on ARM asm
programming, telling people these sort of things? This is an interesting (and
hopefully useful) learning experience, but can be really frustrating when I know
what I want to do, and pretty much how to, but not quite! :) E.g. calling
functions in linked libraries, how to call .s file functions from C, what is and
isn't allowed in in-line asm, etc.

> Good luck!
> Larry

Thanks!


Simon

___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Java acceleration/Jazelle

2007-07-17 Thread Larry Battraw
On 7/17/07, Simon Pickering <[EMAIL PROTECTED]> wrote:
> Hi Scott (& all),
>
> > The following describes an experiment I suggested:
> > Create an array with opcodes 204 to 255 in it.  Create one handler for
> > all opcodes.
> > Set up R14 to point to opcode 204.
(snip)
> I sat down and wrote a bit of code that I hope will do this
> (eventually). The one issue I've not sorted out yet is how to print
> something to the screen from assembly. Is there some secret incantation
> (CALL for example), or do I need to define my own printf function in
> assembly)?

A couple thoughts from a former hardware hacker here: first, serial
ports are your friend so if you can find a sacrificial device that has
a cracked screen or some other serious but non- life-threatening
defect you should probably invest in a level-shifter chip and a DB-9
connector (and some soldering cleverness) to be able to communicate
through the serial port.  Sending bytes back and forth that way is
trivial and will allow you to seriously goof around with an otherwise
worthless device at the kernel or bootloader level.

  The second thought is learn the ABI convention for calling C methods
from assembly and you can pass whatever data you need to a function
that will do the printing for you.  I'd suggest going with this route
since it will be the most straightforward without soldering but also
the least versatile.  --You may end up rebooting many, many times and
the less overhead there is to initialize before playing around the
faster you can try a new idea.  At any rate, I believe the ABI changed
when we went to the armel format and so there's a handy description on
changes to system calls and so forth here:
http://wiki.debian.org/ArmEabiPort  Sorry if I'm pointing out anything
that might be obvious, these are just things I had to work through
when I was working on this type of thing on OS-less boards.


> http://people.bath.ac.uk/enpsgp/nokia770/jazelle/test_jazelle7.c
>
> > Simon, you have done a great deal of work, and gotten quite far.   Good
> > work!
>
> Many thanks and thank you for your detailed summary of the patent, it's
> good to have you on board,
>
>
> Simon

Good luck!
Larry
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Java acceleration/Jazelle

2007-07-17 Thread Simon Pickering
Hi Scott (& all),

> The following describes an experiment I suggested:
> Create an array with opcodes 204 to 255 in it.  Create one handler for
> all opcodes.
> Set up R14 to point to opcode 204.
> Set up R12 to your handler.
> Push the address you want to return to onto the stack.
> Write your handler in C and printf to the console what opcode you are
> handling as long as the opcode is <= 253.  Setup R14 to point to the
> next opcode, and R12 to point to your handler.  For opcodes 254, 255 pop
> the return address off the stack and continue.
>
> I believe this will chew through all the opcodes in the array, dumping
> output to the console until opcode 254 is encountered.  At that point
> execution of Java bytecodes will stop.  This should occur whether
> Jazelle is enable or not.

I sat down and wrote a bit of code that I hope will do this 
(eventually). The one issue I've not sorted out yet is how to print 
something to the screen from assembly. Is there some secret incantation 
(CALL for example), or do I need to define my own printf function in 
assembly)?

http://people.bath.ac.uk/enpsgp/nokia770/jazelle/test_jazelle7.c

> Simon, you have done a great deal of work, and gotten quite far.   Good
> work!

Many thanks and thank you for your detailed summary of the patent, it's 
good to have you on board,


Simon
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Java acceleration/Jazelle

2007-07-17 Thread Scott Bambrough
Folks,

This is a summary of a conversation Simon and I had off line.  We
decided it would be a good idea to post it here to the list so others
could see the discussion and comment.   A couple of caveat's to keep in
mind.  I haven't had a chance to compile and try the code yet, I've been
reading the patent.  I'm also not through the entire patent yet as
well.  This means the following could require revision.

Simon and I also agreed it would be worthwhile (and make Quim happy) if
we started a Wiki page to condense our knowledge.  Following email
threads and pulling out the useful nuggets gets tedious when the thread
gets long.

 From what I've seen in the patent the Jazelle hardware treats Java
opcodes similar to Thumb instructions.   It switches to Jazelle mode and
processes the Java opcodes directly in the CPU pipeline in sequence with
other Thumb and ARM opcodes.

According to the patent, a program could start out executing 32 bit
opcodes, switch to Thumb instructions to load a sequence of Java byte
codes,  switch to Jazelle mode and execute them, return to Thumb mode,
then return to ARM mode and exit.  The program is then really a sequence
of three different types of opcodes (ARM 32, Thumb, Java).

The above is only meant to illustrate that Jazelle is not a coprocessor
implementation like the old FPA11 FPU supported by the NWFPE in the
kernel, and that execution of Java is interleaved with ARM and thumb
code.  The processor is basically executing Java bytecodes once started
until it is told to stop.

Simon pointed out that the actual transition has to be ARM->Java->ARM
according to the processor manual
(http://www.arm.com/pdfs/DDI0211I_arm1136_r1p3_trm.pdf).  This is not
what the patent suggests, but the manual will better describe the actual
implementation.

My take on a sequence for a JVM bytecode processing loop is:

load a stream of bytecodes into a buffer.
load r14 with address of first bytecode to execute in the buffer
load r12 with the address of the code to handle the bytecode
bxj r12


The program then proceeds to run executing the byte codes in the
buffer.  For this to happen, each handler for a particular byte code must:

a) load the address of the next byte code to execute
b) load the address of the software code to handle the next byte code.
c) process the current opcode
d) call bxj bjx r12 to loop and execute the next bytecode.

An opcode handler thus looks like this:

load r14 with address of first byte code to execute
load r12 with the address of the software code to handle the byte code
process the current opcode
bxj r12

This type of architecture makes sense as each opcode knows what data
follows it in the byte code stream and can adjust the byte code pointer
in r14 to point to the next opcode correctly.   Basically as long as r14
and r12 are filled before the bxj opcode is called things should be
fine.  The patent author is a little long winded about interleaving the
fills of these registers with the processing to avoid pipeline stalls.
Fine, but this is an optimization for performance that could be done after.

ARM expects a Jazelle enabled JVM to have a software handler for all
byte codes.  The reason for this is that Jazelle can be enabled/disabled
by software via a bit in CPSR.  You can check whether it is
enabled/disabled by looking at a bit in CP14.  If Jazelle is disabled,
bxj r12 calls the software routine in r12.  As long as Jazelle is
enabled you should be able to execute any of the first 203 opcodes.  One
caveat are the floating point opcodes, they may require special handling
if no VFP is present.

It is implied that register r12 should always point into the JVM, either
to a software handler for an opcode or to an unhandled byte code
handler.  A simple implementation is to always load the address of the
same routine in r12, and use it for a jump table to execute any byte
code that hits it.  This however incurs the overhead of a comparison,
and  a couple of indirect jumps to process every opcode not handled by
hardware.

To alleviate this overhead, the patent also talks about a program
translation table, and the JVM's ability to program the table.  It is
implied the Jazelle hardware is able to look up the address of the
handler for a byte code in the Jazelle translation table more efficiently.

The patent isn't clear about the form this table takes, how to program
it, or if one is actually provided with the CPU core.   From the way the
patent is written it is possible to program the translation table with a
mapping between a byte code and the address of its handler for the
opcodes (in the range 203-253) supported by the JVM and load r12 with
the address of an unhandled opcode exception handler always.

The question is how does the Jazelle hardware know where to find the
translation table?  One thought is that the translation table base
address is provided in a register (RExec in the patent), then the
Jazelle hardware simply adds the bytecode value to this address and
j

Re: Java acceleration/Jazelle

2007-07-14 Thread Simon Pickering

>>> Did you make a typo in your declaration (on p34) of int code[]? Should
>>> this not be unsigned char code[] as bytecodes are 1x byte not n x byte
>>> long (I'm assuming you're running on a machine with sizeof(int)>1)?
>> This is not a typo. I don't understand the reason, but it had only
>> worked with the integer array, not with bytes - don't know why.
>
> reasoning a minute, it sounds logical to me to have an int array,
> since normal arm instructions are 32bit long.
> So stretching the byte code to 4byte, the instructions and pc can
> internaly be handled as in normal arm operation mode.

Yes it does sound logical, but surely this would require more fiddling 
about when loading the bytecode into memory, or at least an 
intermediate step to split up the bytecodes. I don't know.

Relating to this (all page/section numbers refer to the ARM1136JF-S™ 
and ARM1136J-S Technical Reference Manual available from 
http://www.arm.com/pdfs/DDI0211I_arm1136_r1p3_trm.pdf):

p95.
"2.2 Processor operating states
The ARM1136JF-S processor has three operating states:
ARM state 32-bit, word-aligned ARM instructions are executed in this state.
Thumb state 16-bit, halfword-aligned Thumb instructions.
Java state Variable length, byte-aligned Java instructions.
In Thumb state, the Program Counter (PC) uses bit 1 to select between 
alternate
halfwords. In Java state, all instruction fetches are in words."

So we can see from this that the instructions are byte aligned and that 
fetches are performed in words (i.e. 32bits). But does this indicate 
that instructions should be in 8bit bytes? I have the feeling it's 
saying that 32bit's worth of data are fetched at a time (because this 
is the fastest way to do it), but that the instructions are actually 
8bit.

Quoted from your other email:
> This is not a typo. I don't understand the reason, but it had only
> worked with the integer array, not with bytes - don't know why.

The code you use:
int code[] = {0x10, //bipush
0x00, 0x00, 0x00, 0x2a, // dummy int: 42
0x3B, // istore_0
0x1A, // iload_0
0xB1, // return
0x00, 0x00, 0x00, 0x00, 0x00 // any rubbish

I've only thought about this quickly (and my apologoes if my 
littleendian conversion is wrong). If interpreted as bytes, I think 
this would actually still be valid code:

The first int would turn into {0x10, 0x00, 0x00, 0x00}, and adding on 
an extra byte from the second int would produce the same effect as your 
first instruction, but pushing 0x00 rather than 0x2a. A bytecode of 
0x00 is NOP iirc, 0x2a pushes a register onto the stack, etc. So it 
might just work when interpreted as bytes and manage to run all the way 
though. Just a guess though.

There are some other interesting points in the pdf:

On p128 (section 2.11.3 Exception entry and exit summary) there's a 
table telling us what happens to the registers and what return 
instruction to use for various exceptions when in ARM, Thumb or Java 
state. The interesting point is that the table says that SWI and UNDEF 
exceptions are not used in Java state. So this confirms that the 
unknown bytecode handler is not related to the usual undefined 
instruction handler as used in ARM mode.

The document also tells us about the bits in the CPSR and how to read 
them, and the bits in the CP14 register and how to read/write them. So 
I'll test this and see whether a bit needs to be set before invoking 
bxj.

Cheers,


Simon



___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Java acceleration/Jazelle

2007-07-13 Thread Sebastian Mancke
Sebastian Mancke schrieb:
> Hi Simon,
> 
> sincerely I don't found the time to test your code yet.
> 
>> Did you make a typo in your declaration (on p34) of int code[]? Should
>> this not be unsigned char code[] as bytecodes are 1x byte not n x byte
>> long (I'm assuming you're running on a machine with sizeof(int)>1)?
> This is not a typo. I don't understand the reason, but it had only
> worked with the integer array, not with bytes - don't know why.

reasoning a minute, it sounds logical to me to have an int array,
since normal arm instructions are 32bit long.
So stretching the byte code to 4byte, the instructions and pc can
internaly be handled as in normal arm operation mode.


> 
> --Sebastian
> 
> Simon Pickering schrieb:
>> Hi all,
>>
>> Sebastian, in this link:
>>
>>> You can find a small example in my jalimo slides from linuxtag2007
>>> (slide 33ff).
>>>
>>> http://www.jalimo.org/wiki/doku.php?id=news:linuxtag2007 
>> (direct link:
>> http://www.jalimo.org/documents/jalimo-slides_english_linuxtag2007.pdf)
>>
>> Did you make a typo in your declaration (on p34) of int code[]? Should this 
>> not
>> be unsigned char code[] as bytecodes are 1x byte not n x byte long (I'm 
>> assuming
>> you're running on a machine with sizeof(int)>1)? Just wondering if this is a
>> typo when you were writing the presentation or whether it might have affected
>> your results. On the other hand it may be my misunderstanding the 
>> declaration.
>>
>>> It might be possible to test at least some aspects of my 'new improved' 
>>> theory. The one that comes to mind is to try a BXJ to an unhandled Java 
>>> instruction immediately. This should then branch back to whatever ARM 
>>> code is at R12 straight away (not needing to know the pointer table 
>>> base address). This could prove a number of things, including 
>>> the stack pointer, R12 & R14 contents, etc.
>> I've written a piece of test code, but not tested it yet (no compiler here at
>> work). I must admit that I've only learned extended inline asm this week, so 
>> it
>> may not work correctly ;). If anyone spots any mistakes, please let me know.
>>
>> http://people.bath.ac.uk/enpsgp/nokia770/jazelle/test_jazelle.c
>> http://people.bath.ac.uk/enpsgp/nokia770/jazelle/to_compile_do_this.txt
>>
>> I'll post some results tomorrow/later on, one way or the other.
>>
>> Cheers,
>>
>>
>> Si
>>
> 
> 


-- 
tarent Gesellschaft für Softwareentwicklung und IT-Beratung mbH

Heilsbachstr. 24, 53123 Bonn| Poststr. 4-5, 10178 Berlin
fon: +49(228) / 52675-0 | fon: +49(30) / 27594853
fax: +49(228) / 52675-25| fax: +49(30) / 78709617
durchwahl: +49(228) / 52675-17  | mobil: +49(171) / 7673249

Geschäftsführer:
Boris Esser, Elmar Geese, Thomas Müller-Ackermann
HRB AG Bonn 5168
Ust-ID: DE122264941
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Java acceleration/Jazelle

2007-07-13 Thread Sebastian Mancke
Hi Simon,

sincerely I don't found the time to test your code yet.

> Did you make a typo in your declaration (on p34) of int code[]? Should
> this not be unsigned char code[] as bytecodes are 1x byte not n x byte
> long (I'm assuming you're running on a machine with sizeof(int)>1)?
This is not a typo. I don't understand the reason, but it had only
worked with the integer array, not with bytes - don't know why.

--Sebastian

Simon Pickering schrieb:
> Hi all,
> 
> Sebastian, in this link:
> 
>> You can find a small example in my jalimo slides from linuxtag2007
>> (slide 33ff).
>>
>> http://www.jalimo.org/wiki/doku.php?id=news:linuxtag2007 
> 
> (direct link:
> http://www.jalimo.org/documents/jalimo-slides_english_linuxtag2007.pdf)
> 
> Did you make a typo in your declaration (on p34) of int code[]? Should this 
> not
> be unsigned char code[] as bytecodes are 1x byte not n x byte long (I'm 
> assuming
> you're running on a machine with sizeof(int)>1)? Just wondering if this is a
> typo when you were writing the presentation or whether it might have affected
> your results. On the other hand it may be my misunderstanding the declaration.
> 
>> It might be possible to test at least some aspects of my 'new improved' 
>> theory. The one that comes to mind is to try a BXJ to an unhandled Java 
>> instruction immediately. This should then branch back to whatever ARM 
>> code is at R12 straight away (not needing to know the pointer table 
>> base address). This could prove a number of things, including 
>> the stack pointer, R12 & R14 contents, etc.
> 
> I've written a piece of test code, but not tested it yet (no compiler here at
> work). I must admit that I've only learned extended inline asm this week, so 
> it
> may not work correctly ;). If anyone spots any mistakes, please let me know.
> 
> http://people.bath.ac.uk/enpsgp/nokia770/jazelle/test_jazelle.c
> http://people.bath.ac.uk/enpsgp/nokia770/jazelle/to_compile_do_this.txt
> 
> I'll post some results tomorrow/later on, one way or the other.
> 
> Cheers,
> 
> 
> Si
> 


-- 
tarent Gesellschaft für Softwareentwicklung und IT-Beratung mbH

Heilsbachstr. 24, 53123 Bonn| Poststr. 4-5, 10178 Berlin
fon: +49(228) / 52675-0 | fon: +49(30) / 27594853
fax: +49(228) / 52675-25| fax: +49(30) / 78709617
durchwahl: +49(228) / 52675-17  | mobil: +49(171) / 7673249

Geschäftsführer:
Boris Esser, Elmar Geese, Thomas Müller-Ackermann
HRB AG Bonn 5168
Ust-ID: DE122264941
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Java acceleration/Jazelle

2007-07-13 Thread Simon Pickering

> New improved code here:
> http://people.bath.ac.uk/enpsgp/nokia770/jazelle/jazelle5.c

Sorry, wrong file name, it should be 
http://people.bath.ac.uk/enpsgp/nokia770/jazelle/test_jazelle5.c


Simon
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Java acceleration/Jazelle

2007-07-13 Thread Simon Pickering
Ah, looks like I spoke too soon.

My original branching code seems to have been misguided, hence the segfaults.

New improved code here:
http://people.bath.ac.uk/enpsgp/nokia770/jazelle/jazelle5.c

Now, when I call bxj r12, with r12 pointing to the handler code and r14 
pointing to the Java code (or anything for that matter), I get a jump 
to the handler code at r12. Not sure whether this is actually a good 
thing, as I get a jump to the code in r12 no matter whether I have an 
unhandleable Java bytecode or what I understand are valid bytecodes 
(from Sebastian Mancke's presentation) at the address pointed to by 
r14. I'm not sure this should happen. It almost appears that the bxj 
instruction is acting as a simple branch instruction.

Example output:

Nokia-N800-26:/home/user# ./test_jazelle5.bin
Start
R14 is 67192
R12 is 33752
R6 is -1090742540
R4 is 0
End

If I try running bxj r14 instead, I get a segfault.

Not quite giving up the thread of hope, it may be that the Java 
hardware needs to be enabled by one of the flags in CPSR (though this 
is requires privileges, so hopefully not) or what's called CP14 in fig1 
of the patent. Now I'm not sure if this agrees with what Sebastian 
found, as my impression was that no flag tweaking was required to make 
something happen for his test case?

It's too late in the evening to go looking for info about CP14 now, 
perhaps over the weekend.

Cheers,


Simon
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Java acceleration/Jazelle

2007-07-13 Thread Simon Pickering
Not such good news.

I have learned alot about extended inline asm though.

Anyway, my final code is here: 
http://people.bath.ac.uk/enpsgp/nokia770/jazelle/jazelle1.c

It's changed a fair bit from my first untested code and should now work 
as planned. Unfortunately what happens is a segfault. So this 
immediately rules out the premise that the code pointed to by R12 is 
jumped to (I called bxj with both r12 and r14 as its argument, same 
results both times). If the bxj Rn is removed then the segfault goes 
away, so this appears to be causing it.

I suppose the next step is to check and see which register is required 
to pass the Java bytecode address, and then try all other registers 
pointing to the bytecode 'handler' arm code to see if one doesn't 
produce a segfault. I'm assuming the segfault is caused because the 
process is trying to jump to an illegal address (because the relevant 
register is not set). Any other suggestions?

Thanks,


Simon

___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


RE: Java acceleration/Jazelle

2007-07-13 Thread Simon Pickering
Hi all,

Sebastian, in this link:

> You can find a small example in my jalimo slides from linuxtag2007
> (slide 33ff).
> 
> http://www.jalimo.org/wiki/doku.php?id=news:linuxtag2007 

(direct link:
http://www.jalimo.org/documents/jalimo-slides_english_linuxtag2007.pdf)

Did you make a typo in your declaration (on p34) of int code[]? Should this not
be unsigned char code[] as bytecodes are 1x byte not n x byte long (I'm assuming
you're running on a machine with sizeof(int)>1)? Just wondering if this is a
typo when you were writing the presentation or whether it might have affected
your results. On the other hand it may be my misunderstanding the declaration.

> It might be possible to test at least some aspects of my 'new improved' 
> theory. The one that comes to mind is to try a BXJ to an unhandled Java 
> instruction immediately. This should then branch back to whatever ARM 
> code is at R12 straight away (not needing to know the pointer table 
> base address). This could prove a number of things, including 
> the stack pointer, R12 & R14 contents, etc.

I've written a piece of test code, but not tested it yet (no compiler here at
work). I must admit that I've only learned extended inline asm this week, so it
may not work correctly ;). If anyone spots any mistakes, please let me know.

http://people.bath.ac.uk/enpsgp/nokia770/jazelle/test_jazelle.c
http://people.bath.ac.uk/enpsgp/nokia770/jazelle/to_compile_do_this.txt

I'll post some results tomorrow/later on, one way or the other.

Cheers,


Si

___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Java acceleration/Jazelle

2007-07-12 Thread Simon Pickering
Hi Sebastian,

> nice research !!! This seams easier (and more os independent) than my
> thoughts with an interrupt in kernel mode. Although we can not say for
> sure, that Jazelle is implemented the way, the patent describes, this is
> a very good point to get a forthcome.

Thanks. I had also been thinking about exceptions and needing a kernel 
driver to hook into the Linux exception handler or to simply rewrite 
the process' interrupt vector, and I wasn't filled with hope because it 
seemed that it might cause some security issues (and therefore is that 
really the way it'd be done?).

It might be possible to test at least some aspects of my 'new improved' 
theory. The one that comes to mind is to try a BXJ to an unhandled Java 
instruction immediately. This should then branch back to whatever ARM 
code is at R12 straight away (not needing to know the pointer table 
base address). This could prove a number of things, including the stack 
pointer, R12 & R14 contents, etc.

I made a mistake in assuming that Rexc is only needed by the user:

>> The other register that needs to be determined is Rexc, which is the
>> one that points to the base of the pointer table which contains the
>> pointers to the ARM code snippets. But it looks like accesses to this
>> table are only handled in the ARM snippets (which 'we' would provide),
>> so it then becomes a question of which spare register can be used to
>> store this address and won't be overwritten.

Although it is used in the code snippets, unless the first bytecode is 
unhandled (and all the ones after it) at some point the hardware will 
be incrementing the Java program counter, R14, itself. Therefore, when 
an unhandled bytecode occurs, it will have to know the base address of 
the pointer table so that it is able to jump to the code to handle that 
instruction (where we set up R12 with the address of the code snippet 
to handle the *next* instruction, should it be needed). So we need to 
work out which register Rexc is. Trial and error?

>> I've changed my mind about exception handlers (see my previous posts
>> with the same subject line). I believe that the processor internally
>> handles any exception caused by an unrecognised Java bytecode, and (and
>> this is where a bit of interpretation/reading between the lines comes
>> in) automatically switches to ARM mode and jumps to an address
>> specified in a pointer table provided by the application running the
>> JVM (i.e. us).

Just to clarify why I've reached this conclusion. The patent also talks 
a fair bit about handling floating point Java operations. It talks 
about causing VFP exceptions and jumping to the ARM vfp code (assuming 
you either have no VFP hardware, or for those instructions that are 
still emulated). The difference between the Java unhandled bytecode and 
ARM unhandled floating point instruction is made reasonably clear and 
specifically that the VFP code can be jumped to (and back from) once 
the processor is running in ARM mode and running the code snippet to 
emulate the Java floating point operation. I hope that makes some 
sense, to me it said that the same mechanism is not used for unhandled 
Java bytecodes as is for unhandled ARM instructions.

There are other aspects covered in the patent, for example what happens 
when a page fault is caused by an instruction & operands extending over 
two pages. I'm not sure if this is something we'd need to handle, or 
describing the hardware. Hopefully the latter!

Anyway, there should be enough hints and ideas in what we have to try 
some hacking at least :)

Cheers,


Simon
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Java acceleration/Jazelle

2007-07-11 Thread Sebastian Mancke
Hi Simon,

nice research !!! This seams easier (and more os independent) than my
thoughts with an interrupt in kernel mode. Although we can not say for
sure, that Jazelle is implemented the way, the patent describes, this is
a very good point to get a forthcome.

Regards,
  Sebastian


Simon Pickering schrieb:
> Hi all,
> 
> If this is all already known please let me know and I'll stop waffling, 
> but some Googling couldn't find anything, so I'll present what I've 
> found thus far.
> 
>> I had a look around for some more information and found the patent 
>> for Jazelle: US patent number 7089539.
> 
>> Google link here (with figures, which makes understanding it far 
>> easier): http://www.google.com/patents?id=iMt6EBAJ&dq=7089539
> 
> I've finally got round to reading all of the patent in question.
> 
> I've changed my mind about exception handlers (see my previous posts 
> with the same subject line). I believe that the processor internally 
> handles any exception caused by an unrecognised Java bytecode, and (and 
> this is where a bit of interpretation/reading between the lines comes 
> in) automatically switches to ARM mode and jumps to an address 
> specified in a pointer table provided by the application running the 
> JVM (i.e. us).
> 
> If you look at Fig.4 in the patent, for example, you can see a snippet 
> of code. This is one of the chunks of code that would be jumped to 
> (pointed to by the pointer table, which will be 256 pointers long, one 
> for each bytecode) and implements the iadd bytecode.
> 
> I've repeated the code fragment below with comments above each line:
> 
> /* increment bytecode pointer (R14) and load value pointed to by R14 into R4.
> This is performed so that we have the bytecode value of the *next* 
> bytecode
> (i.e. not the one that couldn't be handled) in R4. We move the Java
> 'program counter' register, R14, along by one before we do this (in fact 
> it
> does R4=*(R14+1) then R14=R14+1, but same in the end). */
> LDRB R4, [R14, #1]!
> /* decrement Rstack by 4, then pop first operand from stack into R1
> This is part of handling the actual instruction, in this case an 
> iadd, so we
> need to pop the values from the stack */
> LDR R1,[Rstack, #-4]!
> /* decrement Rstack by 4, then pop first operand from stack into R0
> As above */
> LDR R0,[Rstack, #-4]!
> /* Get address of next code fragment for next bytecode
> load into R12 the value from Rexc + (R4 x 2^2).
> In this step we look into the pointer table (which contains pointers to
> ARM code fragments that handle each Java bytecode) and we load the address
> of the code fragment for the next Java instruction (not the one we're
> currently handling) which we loaded into R4 on the first line */
> LDR R12,[Rexc, R4, LSL #2]
> /* R0 = R0 + R1
> Here we simply perform the add operation that this code fragment is
> handling*/
> ADD R0,R0,R1
> /* Store R0 in Rstack and increment Rstack by 4 (pre or post? post probably)
> Here we save the results of the add operation that we're handling in this
> code snippet*/
> STR R0,[Rstack],#4
> /* Branch to Java
> This command takes as its operand the address of the ARM software snippet
> used to handled the bytecode. Not (it would appear) the address of the 
> Java
> bytecode to execute. The address of the bytecode should be in R14. The
> Jazelle hardware decides whether Jazelle is present and enabled and 
> chooses
> whether to jump to the bytecode and enter Java mode, or to stay in ARM 
> mode
> and 'emulate' the instruction */
> BXJ R12
> 
> Note that the actual implementation of the iadd instruction (popping 
> twice, adding then pushing) is mixed in with the preparations to handle 
> the next bytecode/re-enter Java mode. Afaiu, this is done to make the 
> code more efficient and avoid stalls.
> 
> So you can see that the code has already worked out where it needs to 
> jump to in  either Java or ARM 'emulation' mode (the latter is done to 
> speed up processing should Jazelle be disabled or the bytecode be 
> another one that's not handled by the hardware).
> 
> One curious point about this code is that I was under the impression 
> that the stack was held in registers [1], rather than at some address 
> as is indicated by the pop instructions. It may be that the Jazelle 
> hardware does in fact hold the top stack elements in registers and 
> flushes them to memory when the unrecognised Java bytecode exception is 
> caused. The question then is which register actually holds this memory 
> address (ie. Rstack in the code above)? The same article says that R6 
> holds the stack pointer, so perhaps this is used...?
> 
> The other register that needs to be determined is Rexc, which is the 
> one that points to the base of the pointer table which contains the 
> pointers to the ARM code snippets. But it looks like accesses to this 
> table are only handled in the ARM snippets (which 'we' woul

RE: Java acceleration/Jazelle

2007-07-11 Thread Simon Pickering
Hi all,

If this is all already known please let me know and I'll stop waffling, 
but some Googling couldn't find anything, so I'll present what I've 
found thus far.

> I had a look around for some more information and found the patent 
> for Jazelle: US patent number 7089539.

> Google link here (with figures, which makes understanding it far 
> easier): http://www.google.com/patents?id=iMt6EBAJ&dq=7089539

I've finally got round to reading all of the patent in question.

I've changed my mind about exception handlers (see my previous posts 
with the same subject line). I believe that the processor internally 
handles any exception caused by an unrecognised Java bytecode, and (and 
this is where a bit of interpretation/reading between the lines comes 
in) automatically switches to ARM mode and jumps to an address 
specified in a pointer table provided by the application running the 
JVM (i.e. us).

If you look at Fig.4 in the patent, for example, you can see a snippet 
of code. This is one of the chunks of code that would be jumped to 
(pointed to by the pointer table, which will be 256 pointers long, one 
for each bytecode) and implements the iadd bytecode.

I've repeated the code fragment below with comments above each line:

/* increment bytecode pointer (R14) and load value pointed to by R14 into R4.
This is performed so that we have the bytecode value of the *next* bytecode
(i.e. not the one that couldn't be handled) in R4. We move the Java
'program counter' register, R14, along by one before we do this (in fact it
does R4=*(R14+1) then R14=R14+1, but same in the end). */
LDRB R4, [R14, #1]!
/* decrement Rstack by 4, then pop first operand from stack into R1
This is part of handling the actual instruction, in this case an 
iadd, so we
need to pop the values from the stack */
LDR R1,[Rstack, #-4]!
/* decrement Rstack by 4, then pop first operand from stack into R0
As above */
LDR R0,[Rstack, #-4]!
/* Get address of next code fragment for next bytecode
load into R12 the value from Rexc + (R4 x 2^2).
In this step we look into the pointer table (which contains pointers to
ARM code fragments that handle each Java bytecode) and we load the address
of the code fragment for the next Java instruction (not the one we're
currently handling) which we loaded into R4 on the first line */
LDR R12,[Rexc, R4, LSL #2]
/* R0 = R0 + R1
Here we simply perform the add operation that this code fragment is
handling*/
ADD R0,R0,R1
/* Store R0 in Rstack and increment Rstack by 4 (pre or post? post probably)
Here we save the results of the add operation that we're handling in this
code snippet*/
STR R0,[Rstack],#4
/* Branch to Java
This command takes as its operand the address of the ARM software snippet
used to handled the bytecode. Not (it would appear) the address of the Java
bytecode to execute. The address of the bytecode should be in R14. The
Jazelle hardware decides whether Jazelle is present and enabled and chooses
whether to jump to the bytecode and enter Java mode, or to stay in ARM mode
and 'emulate' the instruction */
BXJ R12

Note that the actual implementation of the iadd instruction (popping 
twice, adding then pushing) is mixed in with the preparations to handle 
the next bytecode/re-enter Java mode. Afaiu, this is done to make the 
code more efficient and avoid stalls.

So you can see that the code has already worked out where it needs to 
jump to in  either Java or ARM 'emulation' mode (the latter is done to 
speed up processing should Jazelle be disabled or the bytecode be 
another one that's not handled by the hardware).

One curious point about this code is that I was under the impression 
that the stack was held in registers [1], rather than at some address 
as is indicated by the pop instructions. It may be that the Jazelle 
hardware does in fact hold the top stack elements in registers and 
flushes them to memory when the unrecognised Java bytecode exception is 
caused. The question then is which register actually holds this memory 
address (ie. Rstack in the code above)? The same article says that R6 
holds the stack pointer, so perhaps this is used...?

The other register that needs to be determined is Rexc, which is the 
one that points to the base of the pointer table which contains the 
pointers to the ARM code snippets. But it looks like accesses to this 
table are only handled in the ARM snippets (which 'we' would provide), 
so it then becomes a question of which spare register can be used to 
store this address and won't be overwritten.

There are of course a number of other patents that were from around the 
same time. Not sure whether any of the others will have any useful info 
in them. For that matter I wonder why ARM provided some of the register 
names/numbers, but not others?

Time to brush up on my inline asm, and to dig out my Java 2 Virtual 
Machine book and make up some test (byte)codes...


Si

Re: Java acceleration/Jazelle

2007-07-01 Thread Simon Pickering

>> The next question is how to implement the undefined instruction
>> exception handler. Is 0x0004 (or optionally 0x0004) writable (I
>> need to write some test code really) from a user program? Assuming it
>> is, then it should be reasonably straightforward to write an exception
>> handler and to use this to branch to some code to handle the
>> un-implemented Java instructions. If it's not possible to write to this
>> memory, how do programs like gdb hook exceptions? Have I missed some
>> unseen stumbling-block here?
>
> GDB doesn't handles exceptions because they are mapped as signal at kernel
> level. This is a posix abstraction. Unexpected ones are handled as
> 'Segmentation Fault'.
>
> Exceptions must be handled at ring0. You should compile a new kernel.
> No idea if is easy to do the same into a kernel module, but you can
> directly write assembly in /dev/mem into the interrupt vector and
> launch these new syscalls to trigger your snippets.
>
> I have never hooked an exception on ARM or Linux, so I should probably
> need to read more kernel :)

Oh, knew it couldn't be all that easy. Well the old style floating 
point emulation code used to trap the floating point instructions, so 
there's code out there. On the other hand, is enough information passed 
with the signal produced by the kernel to be able to write a handler 
and successfully return without killing the process?

>> I seem to remember seeing a list of those instructions that are handled
>> by the hardware, does anyone have a link? (though obviously writing a
>> piece of code to interate through and see which cause exceptions is
>> quite possible).
>
> Take a look here:
>
>   http://www.gelato.unsw.edu.au/lxr/source/arch/arm/mm/fault.c
>

Ah, thanks for that, but I was meaning the Java instructions that are 
not implemented in hardware (my fault for the poor explanation). The 
Java hardware stores the top 4 stack entries in registers R0 to R3, and 
the hardware does the manipulation of these entries (pushing/popping 
and pulling/pushing those that don't fit to somewhere else). So I was 
wondering what would happen if any of the non-hardware  instructions 
needed to pop/push to the stack. It may be (hopefully) that none of the 
non-hardware instructions actually need to access the stack.

Oh, I managed to reply to myself last night rather than the list 
regarding this:

> The text says it should run "BXJ R14" which effectively returns control
> to the next instruction of the Java program, which is what one would
> assume. Fig.3 also shows a check on whether Java hw interpretation is
> available, and it is this check that determines whether a normal branch
> is made to the address in R12 (to emulate the next instruction) or a
> branch to Java to R14 (to continue hw interpretation). This makes
> sense, so I think that box 34 is simply wrong.

I take it back, the diagram was correct.

BXJ should be called with the address of the emulation code (which may 
or may not need to also be present in R12) and R14 should contain the 
address of the next Java bytecode. The hardware makes the decision as 
to whether to do a normal branch to R12 or to enter Java mode and 
execute the Java instruction at R14.

Serves me right for not reading the whole document!

Cheers,


Simon

___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Java acceleration/Jazelle

2007-06-30 Thread pancake
On Sat, 30 Jun 2007 23:36:23 +0100
Simon Pickering <[EMAIL PROTECTED]> wrote:

> The next question is how to implement the undefined instruction 
> exception handler. Is 0x0004 (or optionally 0x0004) writable (I 
> need to write some test code really) from a user program? Assuming it 
> is, then it should be reasonably straightforward to write an exception 
> handler and to use this to branch to some code to handle the 
> un-implemented Java instructions. If it's not possible to write to this 
> memory, how do programs like gdb hook exceptions? Have I missed some 
> unseen stumbling-block here?

GDB doesn't handles exceptions because they are mapped as signal at kernel
level. This is a posix abstraction. Unexpected ones are handled as
'Segmentation Fault'.

Exceptions must be handled at ring0. You should compile a new kernel.
No idea if is easy to do the same into a kernel module, but you can
directly write assembly in /dev/mem into the interrupt vector and
launch these new syscalls to trigger your snippets.

I have never hooked an exception on ARM or Linux, so I should probably
need to read more kernel :)

> I seem to remember seeing a list of those instructions that are handled 
> by the hardware, does anyone have a link? (though obviously writing a 
> piece of code to interate through and see which cause exceptions is 
> quite possible).

Take a look here:

  http://www.gelato.unsw.edu.au/lxr/source/arch/arm/mm/fault.c

  --pancake
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers