Re: RFR: 8275063: Implementation of Memory Access API (Second incubator)

Maurizio Cimadamore Tue, 12 Oct 2021 09:39:07 -0700

On Tue, 12 Oct 2021 11:16:51 GMT, Maurizio Cimadamore <mcimadam...@openjdk.org> 
wrote:


> This PR contains the API and implementation changes for JEP-419 [1]. A more 
> detailed description of such changes, to avoid repetitions during the review 
> process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

This PR contains mainly API changes. We have tried to simplify the API, by 
removing redundant classes and moving functionalities where they belong. Below 
we list the main changes introduced in this PR. A big thanks to all who helped 
along the way: @briangoetz, @ChrisHegarty, @FrauBoes, @iwanowww, @JornVernee, 
@PaulSandoz, @sundararajana and @rose00.

### Value layouts and carriers

This is perhaps the biggest change in the API, which has a knock-on effect in 
other areas as well. In the past, a value layout used to be a fairly *neutral* 
description of a piece of memory containing a scalar value. A value layout has 
a size, alignment and endianness. Since a value layout contains no information 
on *how* the value is to be dereferenced by Java clients, said clients have to 
specify a *carrier* when obtaining a var handle from a value layout.

In this iteration, we have decided to attach carrier types to value layouts - 
so, in addition to size, alignment and endianness, all value layouts now have a 
`carrier` accessor, which returns a `j.l.Class`. You will find several 
hand-specialized version of `ValueLayout`, one for each main carrier type (e.g. 
`ValueLayout.OfInt` for the `int` carrier).

Attaching carrier information to layouts simplifies the API in many ways:

* When obtaining a var handle from a layout, it is no longer necessary to 
provide a carrier; the layout in fact contains all the necessary information 
for the dereference operation to occur.
* Similarly, when linking downcall method handles, the `MethodType` parameter 
is no longer necessary, as now carrier information can be inferred from the 
provided `FunctionDescriptor`.
* We can express dereference operations in a more general fashion - e.g. 
`get(ValueLayout.OfInt)` instead of `getInt(ByteOrder)`. Note how the new form 
is more *complete*.

This iteration also adds support for `boolean` and `MemoryAddress` carriers in 
memory access var handles.

### Layout attributes

To help the `CLinker` distinguish between a 32-bit layout modelling to a C 
`int` and a similar layout modelling a C `float` we have in the past resorted 
to *layout attributes* - that is, we injected custom classification information 
in layouts, and then required clients working with the `CLinker` API to only 
work with such augmented layouts.

Because of the changes described above, this is no longer necessary: a layout 
is always associated with a Java carrier, so the `CLinker` can always 
disambiguate between `ValueLayout.OfInt` and `ValueLayout.OfFloat` even though 
they have the same size. For this reason, API support for custom layout 
attributes has been dropped in this iteration.

Similarly, platform-dependent layout constants in `CLinker` have been removed; 
clients interacting with the foreign linker can simply use the basic layout 
constants defined in `ValueLayout` (e.g. `JAVA_INT`, `JAVA_FLOAT`, ...) - which 
is not too different from using the JNI types `jint` and `jfloat`. Of course 
tools (such as `jextract`) are free to define *custom* layouts which model C 
type for a specific platform.

### Memory dereference

Previous iterations of the API provided ready-made dereference operations as 
static methods in the `MemoryAccess` class. This class is now removed, and all 
the dereference operations have been moved to `MemorySegment` and 
`MemoryAddress`. As mentioned before, the new dereference operations have a new 
form. Instead of:


MemorySegment segment = ...
MemoryAccess.getIntAtOffset(segment, /*offset */ 100, /* endianness */ 
ByteOrder.nativeOrder());


The new API works as follows:


MemorySegment segment = ...
int val  = segment.get(JAVA_INT, /*offset */ 100);

Note that the new dereference method is not static, and that parameters such as 
endianness are now omitted, since clients can just specify the value layout 
they want to work with. Also, since the new dereference methods are not static, 
we no longer need the workaround to enable VM argument type profiling (this was 
necessary to make static methods in `MemoryAccess` class perform reasonably 
well in the face of profile pollution). 

The same dereference operations are also available in `MemoryAddress`; when 
working with native code it might be necessary to dereference a raw pointer. In 
Java 17, to write a basic comparator function for qsort, the following code is 
needed:


static int qsortCompare(MemoryAddress addr1, MemoryAddress addr2) {
             return 
MemoryAccess.getIntAtOffset(MemorySegment.globalNativeSegment(), 
addr1.toRawLongValue()) -
                       
MemoryAccess.getIntAtOffset(MemorySegment.globalNativeSegment(), 
addr2.toRawLongValue());

With the proposed changes, the above code simplifies to:

static int qsortCompare(MemoryAddress addr1, MemoryAddress addr2) {
    return addr1.get(C_INT, 0) - addr2.get(C_INT, 0);
}

Which is far more readable. Note that dereferencing a memory address is still a 
potentially unsafe operation, as an address has no spatia/temporall bounds. For 
this reason, dereference operation on `MemoryAddress` are marked as 
*restricted*.

### Memory copy

This iteration adds more support for copying Java arrays to and from memory 
segments. In Java 17, it is possible to copy a Java array into a memory 
segment, as follows:


int[] array = ....
MemorySegment segment = ...
MemorySegment heapView = MemorySegment.ofArray(array);
segment.asSlice(startSegmentOffset, nelems * 4)
              .copyFrom(heapView.asSlice(startArrayOffset * 4, nelems * 4))


This code snippet is suboptimal for several reasons:

* three temporary segments have to be created: the heap view, plus the two 
slices
* note how the code has to carefully slice the source/target segment to make 
sure that only the desired elements are copied, and at the desired target 
offset in the segment.
* offset in arrays is expressed in elements, whereas offset in segments is 
expressed in bytes - which calls for potential mistakes.
* it is not possible to specify custom endianness/alignment for the copy 
operation

With the changes in this PR, the above code becomes:


int[] array = ....
MemorySegment segment = ...
MemorySegment.copy(array, startArrayOffset, segment, JAVA_INT, 
startSegmentOffset, nelems);


The above code is much simpler, with less potential for mistakes. Also, the 
extra value layout allows client to inject additional alignment constraints and 
endianness (if required).

### Role of `MemoryAddress`

In Java 17, `MemoryAddress` has a scope accessor. This is useful when reasoning 
about an address obtained from a memory segment, but is far less useful when 
thinking about an address received from a native call. While it is possible to 
model the latter as an address associated with the *global scope*, 
`MemoryAddress` supports several operations which allow clients to attach 
spatial and temporal bounds to an address, turning it into a `MemorySegment`. 
If the address already has a scope, the semantics of some of these operation 
becomes confusing.

For this reason, in this iteration `MemoryAddress` is only used to model raw 
native addresses. There is no scope associated with a memory address - 
dereferencing a raw address is always unsafe. This change brings more clarity 
to API, as `MemoryAddress` is nothing but a simple wrapper around a `long` 
value. This also means that obtaining a `MemoryAddress` from a heap segment is 
no longer a possibility; in other words, clients that don't care about native 
interop should probably just use `MemorySegment` and forget about 
`MemoryAddress`.

### Downcall method handle safety

In Java 17, by-reference parameters to downcall method handles are passed as 
`MemoryAddress` arguments. This means that e.g. passing a segment by-reference 
requires a conversion (from segment to memory address, using 
`MemorySegment::address`). This conversion is lossy, as we lose information 
about the original memory segment (spatial and temporal bounds). As a result, 
passing parameter by-reference to downcall method handle is less safe.

The changes described in this PR introduce stronger safety guarantees for 
by-reference parameters. The `CLinker` will now map any such parameter to the 
`Addressable` type - a common super interface of all things that can be passed 
by reference. This means clients can pass a memory segment *directly* to a 
downcall method handle, no conversion required. Because of that, the `CLinker` 
runtime can make sure that e.g. arguments passed by reference are kept alive 
for the entire duration of the native call.

### Native symbols

In Java 17, looking up a symbol on a native library is done using the 
`SymbolLookup` interface. This interface used to return a plain `MemoryAddress` 
(the address of the native function). Given the changes described above, 
`MemoryAddress` is no longer a great choice:

* a `MemoryAddress` now models a raw native memory address, and has no scope
* a `MemoryAddress` can be easily dereferenced* 

For this reason, a new abstraction is added, namely `NativeSymbol`, which is 
used to model the entry point of a symbol (a function or a global variable) in 
a native library. A native symbol is `Addressable`, has a name and a resource 
scope. Since native symbols have a scope, the `CLinker` runtime can make sure 
that the scope of the native symbol corresponding to the native function being 
executed cannot be closed prematurely. This effectively allows clients to 
support safe library loading abstractions which support [deterministic library 
unloading](https://github.com/sundararajana/panama-jextract-samples/blob/master/dlopen/Dlopen.java).

Additionally, we have tweaked the `CLinker::upcallStub` method to also return 
an *anonymous* `NativeSymbol`, rather than a raw `MemoryAddress`.

### Resource scope tweaks

This PR removes the distinction between *implicit* and *explicit* scopes. Now 
all scopes (except for the *global scope*) are closeable, and can be associated 
with a `Cleaner`, if required.

Another change in the resource scope API is in how temporal scope dependencies 
are handled. In Java 17 scopes could be acquired and released:


MemorySegment segment = ...
ResourceScope.Handle segmentHandle = segment.scope().acquire();
try {
   <critical operation on segment>
} finally {
   segment.scope().release(segmentHandle);
}

This PR removes the `ResourceScope::acquire` and `ResourceScope::release` 
methods, and allows instead to capture dependencies between scopes in a more 
explicit fashion:


MemorySegment segment = ...
try (ResourceScope criticalScope = ResourceScope.newConfinedScope()) {
    criticalScope.keepAlive(segment.scope());
    <critical operation on segment>
}

API javadoc:
http://cr.openjdk.java.net/~mcimadamore/8275064/javadoc/jdk/incubator/foreign/package-summary.html

Specdiff:
http://cr.openjdk.java.net/~mcimadamore/8275064/specdiff_out/overview-summary.html

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

Re: RFR: 8275063: Implementation of Memory Access API (Second incubator)

Reply via email to