Re: [External] : Re: New candidate JEP: 471: Deprecate the Memory-Access Methods in sun.misc.Unsafe for Removal

2024-06-11 Thread David Lloyd
On Tue, Jun 11, 2024 at 12:57 PM Alan Bateman 
wrote:

>
>
> On 11/06/2024 18:19, David Lloyd wrote:
>
> :
>
> I thought that might be where Alan was headed with this. I would support
> this solution; it would solve the problem for conformant serialization
> libraries. If a class has a `readObject`/etc. then we use it - we wouldn't
> care if it was "natural" or generated. This also gives us the option to
> allow the user to use `opens` selectively to opt-in to special
> optimizations, without a major penalty if they do not.
>
> Is there already someone assigned for this task
>
>
> Not to my knowledge so you have cycles to prototype and have these methods
> return a MH that work like a "default" readObject/writeObject then it would
> help the discussion.
>

I think I can come up with *at least* a prototype in the next week or two.
I have one concern though. The `readObject` method, when defined by users,
must by specification do one of two things before it may read values from
the stream:

* Call `OIS.defaultReadObject()`
* Call `OIS.readFields()`

In the latter case, we don't have a problem, but if the user class calls
`defaultReadObject` then we're back in the boat of having to reflectively
set the fields of the class. We could possibly solve this by creating a
*new* accessor which creates a parallel version of the method which
*always* sets fields reflectively, even if a `readObject` already exists on
the class. I'm not sure if there is a better solution. There is a similar
problem on the write side, however this involves reading fields rather than
writing them (obviously).

-- 
- DML • he/him


Re: [External] : Re: New candidate JEP: 471: Deprecate the Memory-Access Methods in sun.misc.Unsafe for Removal

2024-06-11 Thread David Lloyd
On Tue, Jun 11, 2024 at 11:58 AM Ron Pressler 
wrote:

>
>
> > On 11 Jun 2024, at 17:27, David Lloyd  wrote:
> >
> >
> >
> > On Tue, Jun 11, 2024 at 10:17 AM Alan Bateman 
> wrote:
> > On 06/06/2024 18:37, David Lloyd wrote:
> >> Just bumping this one more time. I intend to start by opening a JIRA to
> add the two proposed methods to `ReflectionFactory`, and go from there. I
> guess that we might need a JEP for the proposed serialization restrictions,
> which is going to be considerably more involved, so I'm putting that off as
> a second step for now, pending further discussion.
> >>
> >
> > I don't think the JDK should be adding another backdoor for
> serialization libs to do deep reflection.
> >
> > I'm curious, does your serialization library uses the ReflectionFactory
> to get method handles to the readObject/writeObject methods (if they are
> defined)?
> >
> > Yes, all of the method-access methods on ReflectionFactory are used, not
> just for readObject/writeObject but also readObjectNoData, readResolve, and
> writeReplace, the constructor accessors, and the factory methods for
> OptionalDataException. We don't use the static initializer one though
> (maybe the ORB does, I'm not sure).
> >
> > --
> > - DML • he/him
>
>
> Ok, good.
>
> So the way the JDK’s built-in serialization mechanism allows custom
> implementation is by extending ObjectInputStream/ObjectOutputStream (and to
> that purpose it offers protected constructors of those two classes).
> ReflectionFactory offers access to non-public readObject/writeObject
> methods, but it does not currently offer a way to exploit the default
> field-based reflection when such methods are not explicitly declared.
>
> Given that, I think that an appropriate mechanism to consider is having
> ReflectionFactory offer access to MethodHandles that can perform
> de/serialization using the default JDK field-based serialization, but still
> requiring the serialization library to extend
> ObjectInputStream/ObjectOutputStream — i.e. those MethodHandles will invoke
> the appropriate OIS/OOS methods for writing fields — but not offering any
> direct access to non-public fields. This would still be an interim
> mechanism, and it may still require significant work by serialization
> libraries that don’t wish to offer custom serializers for JDK classes and
> don’t wish to use --add-opens.


I thought that might be where Alan was headed with this. I would support
this solution; it would solve the problem for conformant serialization
libraries. If a class has a `readObject`/etc. then we use it - we wouldn't
care if it was "natural" or generated. This also gives us the option to
allow the user to use `opens` selectively to opt-in to special
optimizations, without a major penalty if they do not.

Is there already someone assigned for this task, or is it a
hot-off-the-presses new idea?

Looking further ahead, a future JDK mechanism to support custom
> serialization would still be based on invoking public constructors. I.e.,
> the same effect could be achieved today — albeit with some more work — by
> the serialization library offering custom serializers for JDK classes that
> invoke their public constructors (the difference being that such a future
> mechanism would make it easier to automatically identify the public
> constructor to be invoked, whereas today it would need to be individually
> determined on a per-class basis). Serialization libraries that want to be
> better prepared for that future and not do further significant work when it
> arrives, should use custom serializers for JDK classes that invoke public
> constructors rather than rely on the mechanism I proposed above or on
> --add-opens.
>

It would be great to see this. I've prototyped a few ideas for
constructor-based deserialization in the past (essentially, using
caller-sensitivity to control stream field access), but the issue always
comes down to needing the entire class hierarchy to participate and play
nicely, hence my suspicion that some amount of language/JDK changes would
be needed. I would support any new work in this direction either way
though; it gets rid of a lot of hackiness on the deserialization end.

-- 
- DML • he/him


Re: [External] : Re: New candidate JEP: 471: Deprecate the Memory-Access Methods in sun.misc.Unsafe for Removal

2024-06-11 Thread David Lloyd
On Tue, Jun 11, 2024 at 11:38 AM Alan Bateman 
wrote:

>
>
> On 11/06/2024 17:27, David Lloyd wrote:
>
> :
>
> Yes, all of the method-access methods on ReflectionFactory are used, not
> just for readObject/writeObject but also readObjectNoData, readResolve, and
> writeReplace, the constructor accessors, and the factory methods for
> OptionalDataException. We don't use the static initializer one though
> (maybe the ORB does, I'm not sure).
>
> Okay, next question is whether you are also using Unsafe to access the
> fields of classes where you've got a MH to invoke the readXXX/writeXXX
> methods? Ignore SVUID for the moment.
>

For "normal" objects, no; we invoke readXXX/writeXXX if they're available
with our OIS/OOS proxy object. We are using Unsafe to initialize the
handler of reflection proxy instances though, on the deserialization side,
and similarly for certain fields of some "known" classes for optimization
purposes, though if these classes had readXXX/writeXXX we could probably do
that another way (albeit at a performance penalty, in all likelihood).

-- 
- DML • he/him


Re: [External] : Re: New candidate JEP: 471: Deprecate the Memory-Access Methods in sun.misc.Unsafe for Removal

2024-06-11 Thread David Lloyd
On Tue, Jun 11, 2024 at 10:17 AM Alan Bateman 
wrote:

> On 06/06/2024 18:37, David Lloyd wrote:
>
> Just bumping this one more time. I intend to start by opening a JIRA to
> add the two proposed methods to `ReflectionFactory`, and go from there. I
> guess that we might need a JEP for the proposed serialization restrictions,
> which is going to be considerably more involved, so I'm putting that off as
> a second step for now, pending further discussion.
>
>
> I don't think the JDK should be adding another backdoor for serialization
> libs to do deep reflection.
>
> I'm curious, does your serialization library uses the ReflectionFactory to
> get method handles to the readObject/writeObject methods (if they are
> defined)?
>

Yes, all of the method-access methods on ReflectionFactory are used, not
just for readObject/writeObject but also readObjectNoData, readResolve, and
writeReplace, the constructor accessors, and the factory methods for
OptionalDataException. We don't use the static initializer one though
(maybe the ORB does, I'm not sure).

-- 
- DML • he/him


Re: [External] : Re: New candidate JEP: 471: Deprecate the Memory-Access Methods in sun.misc.Unsafe for Removal

2024-06-10 Thread David Lloyd
On Fri, Jun 7, 2024 at 1:19 PM Ron Pressler  wrote:

>
> > On 6 Jun 2024, at 18:37, David Lloyd  wrote:
> >
> > Just bumping this one more time. I intend to start by opening a JIRA to
> add the two proposed methods to `ReflectionFactory`, and go from there. I
> guess that we might need a JEP for the proposed serialization restrictions,
> which is going to be considerably more involved, so I'm putting that off as
> a second step for now, pending further discussion.
>
>
> Hi.
>
> Seven years before the upcoming delivery of JEP 471 we [announced that the
> world of accessing JDK internals would be coming to an end](
> https://openjdk.org/jeps/260). Four years later, the JDK [started
> enforcing that](https://openjdk.org/jeps/403), but to give extra time to
> laggards who have not yet adapted to either refraining from accessing
> internals or instructing their users on the proper configuration of the JDK
> to allow doing so, we left some unsupported mechanisms in place to
> temporarily allow hacking around the restrictions until they can properly
> adapt. Now, three years later, we're starting the process of removing the
> temporary hacks, although we don't yet know how long the process should
> last.
>
> My first question is, how many more years do you think we should wait for
> libraries to finish the process by which they either refrain from accessing
> internals or instruct their users on the proper configuration that allows
> them to do so? Knowing how long we should wait for the libraries that have
> not yet finished their adaptation in the past seven years, and how far
> along they are in the process, could help inform how long we should wait
> until the actual removal of Unsafe.
>
> My second question has to do with the consideration that any serialization
> procedure should no longer bypass constructors (at least not without
> `--add-opens`, that is). I'd be interested to know about the difficulties
> serialization libraries have encountered in their process of migrating away
> from accessing internals and toward custom serializers for JDK classes,
> such as what public constructors are missing. This would help us identify
> what constructors we should add to support serialization that doesn't
> violate integrity.
>

The first thing to understand is that, unlike APIs in the JDK which may
only be implemented *in* the JDK, serialization is governed by a
specification [1]. This specification is still implemented by multiple
external libraries (including the OpenJDK ORB) as well as the JDK itself.
We aren't talking about libraries which invent some arbitrary kind of
serialization that requires unsafe object creation. We're talking about
libraries which conform to some or all of the serialization specification.

If you look back at the history of the discussion of serialization support
with regards to the JDK (which is referenced in the history of this email
thread), it was determined at the time of Java 9 that serialization would
be considered an exception to the rule of reflection restrictions, for
purely pragmatic reasons. Brian Goetz has talked frequently about the
limitations of existing serialization and how this impacts the platform in
various ways. He's also hinted at a future where some alternative
serialization mechanism exists which would presumably avoid the need for
the things we find in `ReflectionFactory`.

However we don't live in that world just yet. So, to answer your first
question ("How many more years...") I would start by modifying the
question. It's not about waiting for libraries; I think this greatly
mischaracterizes the problem. It's about waiting for OpenJDK to have a new
serialization mechanism. Until then, there is no replacement, so we're in a
holding pattern, like it or not. At best we're talking about small,
monotonic improvements which bring the platform closer to the goal of
integrity.

And so the answer to the first question is, "however many years it takes
for there to be an alternative, blessed OpenJDK serialization mechanism",
which presumably will involve both language and core library modifications.

For your second question, serialization *must*, by specification, bypass
constructors in the bottom part of the serializable class's type hierarchy
which correspond to serializable classes (most frequently this means
everything other than `Object` itself, but occasionally there are more
levels involved). If the platform were to drop that support, then all
serialization libraries would break, including the OpenJDK ORB, and the
specification would be unimplementable.

The serialization specification is not deprecated. Nor are the classes
which support serialization in the JDK. So in my opinion it would not yet
be appropriate to break these things. In order to tackle this particular
corner of platform integrity, those t

Re: [External] : Re: New candidate JEP: 471: Deprecate the Memory-Access Methods in sun.misc.Unsafe for Removal

2024-06-06 Thread David Lloyd
Just bumping this one more time. I intend to start by opening a JIRA to add
the two proposed methods to `ReflectionFactory`, and go from there. I guess
that we might need a JEP for the proposed serialization restrictions, which
is going to be considerably more involved, so I'm putting that off as a
second step for now, pending further discussion.

On Wed, May 15, 2024 at 9:00 PM -  wrote:

> Indeed, ReflectionFactory. newConstructorForSerialization can be used to
> access otherwise-private constructors. Before JDK-8315810, it could even
> allocate one class and call the constructor of another class.
>
> I strongly support your proposal to restrict ReflectionFactory.
>
> Regards, Chen
>
> On Wed, May 15, 2024 at 6:23 PM David Lloyd 
> wrote:
>
>>
>>
>> On Tue, May 14, 2024 at 10:01 AM David Lloyd 
>> wrote:
>>
>>> (Moving to core-libs-dev)
>>>
>>> On Tue, May 14, 2024 at 9:40 AM Alan Bateman 
>>> wrote:
>>>
>>>> On 14/05/2024 14:42, David Lloyd wrote:
>>>>
>>>> ReflectionFactory allows access to serialization facilities without any
>>>> access checking (other than the defunct SecurityManager checks). Perhaps
>>>> this class could gain some more methods, like this:
>>>>
>>>> * `newGetterForSerialization(Field field)` - includes ability to access
>>>> `objectStreamFields` and `serialVersionUID`, or these could be separate
>>>> methods
>>>> * `newSetterForSerialziation(Field field)`
>>>>
>>>> Does this seem workable?
>>>>
>>>> The intention with ReflectionFactory is that serialization libraries go
>>>> through the readObject/writeObject and other magic methods, to avoid field
>>>> access.
>>>>
>>>> Probably best to being this to core-libs-dev for further discussion.
>>>>
>>>
>>> This doesn't match my recollection. The `readObject` and `writeObject`
>>> methods are optional private methods which serializable classes *may*
>>> provide when they want a bespoke serialization strategy (and the other
>>> methods that are accessed via this special class are similar in this
>>> regard). They are not in any way magical in that they do not provide the
>>> means to perform this part of the serialization process, and thus they are
>>> not a substitute for field access in serialization. According to
>>> JDK-8164908, these methods were added because at the time, Unsafe was still
>>> state of the art for custom serialization and IIOP to access fields (with
>>> libraries using Field actively moving to Unsafe instead). However Unsafe
>>> did not and does not provide access to methods, so in order to avoid the
>>> aforementioned `add-opens` explosion, these methods were added as a
>>> compromise. Now that Unsafe is being deprecated, it is my opinion that this
>>> does need to be revisited because at the time, Unsafe was the recommended
>>> approach.
>>>
>>> [1] https://bugs.openjdk.org/browse/JDK-8164908
>>>
>> It seems to me that it might be a good idea (as part of 8305968
>> (Integrity by default)) to put the `ReflectionFactory` API behind a
>> restricted method call somehow, even considered separately from the changes
>> suggested above. Maybe in conjunction with a non-standard switch that works
>> similarly to `--enable-native-access`, e.g.
>> `--enable-serialization=org.serial.framework`? That would also somewhat
>> mitigate the security risk of adding the aforementioned methods or
>> something like them to this class.
>>
>> Should I open a bug for either (or both) of these suggestions? Or perhaps
>> there is a better way to proceed?
>>
>> --
>> - DML • he/him
>>
>

-- 
- DML • he/him


Re: [External] : Re: New candidate JEP: 471: Deprecate the Memory-Access Methods in sun.misc.Unsafe for Removal

2024-05-15 Thread David Lloyd
On Tue, May 14, 2024 at 10:01 AM David Lloyd  wrote:

> (Moving to core-libs-dev)
>
> On Tue, May 14, 2024 at 9:40 AM Alan Bateman 
> wrote:
>
>> On 14/05/2024 14:42, David Lloyd wrote:
>>
>> ReflectionFactory allows access to serialization facilities without any
>> access checking (other than the defunct SecurityManager checks). Perhaps
>> this class could gain some more methods, like this:
>>
>> * `newGetterForSerialization(Field field)` - includes ability to access
>> `objectStreamFields` and `serialVersionUID`, or these could be separate
>> methods
>> * `newSetterForSerialziation(Field field)`
>>
>> Does this seem workable?
>>
>> The intention with ReflectionFactory is that serialization libraries go
>> through the readObject/writeObject and other magic methods, to avoid field
>> access.
>>
>> Probably best to being this to core-libs-dev for further discussion.
>>
>
> This doesn't match my recollection. The `readObject` and `writeObject`
> methods are optional private methods which serializable classes *may*
> provide when they want a bespoke serialization strategy (and the other
> methods that are accessed via this special class are similar in this
> regard). They are not in any way magical in that they do not provide the
> means to perform this part of the serialization process, and thus they are
> not a substitute for field access in serialization. According to
> JDK-8164908, these methods were added because at the time, Unsafe was still
> state of the art for custom serialization and IIOP to access fields (with
> libraries using Field actively moving to Unsafe instead). However Unsafe
> did not and does not provide access to methods, so in order to avoid the
> aforementioned `add-opens` explosion, these methods were added as a
> compromise. Now that Unsafe is being deprecated, it is my opinion that this
> does need to be revisited because at the time, Unsafe was the recommended
> approach.
>
> [1] https://bugs.openjdk.org/browse/JDK-8164908
>
It seems to me that it might be a good idea (as part of 8305968 (Integrity
by default)) to put the `ReflectionFactory` API behind a restricted method
call somehow, even considered separately from the changes suggested above.
Maybe in conjunction with a non-standard switch that works similarly to
`--enable-native-access`, e.g.
`--enable-serialization=org.serial.framework`? That would also somewhat
mitigate the security risk of adding the aforementioned methods or
something like them to this class.

Should I open a bug for either (or both) of these suggestions? Or perhaps
there is a better way to proceed?

-- 
- DML • he/him


Re: [External] : Re: New candidate JEP: 471: Deprecate the Memory-Access Methods in sun.misc.Unsafe for Removal

2024-05-14 Thread David Lloyd
(Moving to core-libs-dev)

On Tue, May 14, 2024 at 9:40 AM Alan Bateman 
wrote:

> On 14/05/2024 14:42, David Lloyd wrote:
>
> ReflectionFactory allows access to serialization facilities without any
> access checking (other than the defunct SecurityManager checks). Perhaps
> this class could gain some more methods, like this:
>
> * `newGetterForSerialization(Field field)` - includes ability to access
> `objectStreamFields` and `serialVersionUID`, or these could be separate
> methods
> * `newSetterForSerialziation(Field field)`
>
> Does this seem workable?
>
> The intention with ReflectionFactory is that serialization libraries go
> through the readObject/writeObject and other magic methods, to avoid field
> access.
>
> Probably best to being this to core-libs-dev for further discussion.
>

This doesn't match my recollection. The `readObject` and `writeObject`
methods are optional private methods which serializable classes *may*
provide when they want a bespoke serialization strategy (and the other
methods that are accessed via this special class are similar in this
regard). They are not in any way magical in that they do not provide the
means to perform this part of the serialization process, and thus they are
not a substitute for field access in serialization. According to
JDK-8164908, these methods were added because at the time, Unsafe was still
state of the art for custom serialization and IIOP to access fields (with
libraries using Field actively moving to Unsafe instead). However Unsafe
did not and does not provide access to methods, so in order to avoid the
aforementioned `add-opens` explosion, these methods were added as a
compromise. Now that Unsafe is being deprecated, it is my opinion that this
does need to be revisited because at the time, Unsafe was the recommended
approach.

[1] https://bugs.openjdk.org/browse/JDK-8164908

-- 
- DML • he/him


Re: [External] : Re: New candidate JEP: 471: Deprecate the Memory-Access Methods in sun.misc.Unsafe for Removal

2024-05-04 Thread David Lloyd
On Fri, May 3, 2024 at 1:20 PM Ron Pressler  wrote:

>
>
> > On 3 May 2024, at 18:33, David Lloyd  wrote:
> > It seems to me that the JDK could fill this gap by introducing some API
> which can construct and provide access to an array or something like it,
> with striding and/or alignment guarantees that each element will reside on
> a separate data cache line (or barring that, perhaps using a minimum
> per-element size and/or alignment that is given as an argument to the
> factory), and with the gamut of atomic accessors via a `VarHandle` or
> similar. This could be especially valuable if/when objects start coming in
> a variety of shapes and sizes in memory, once value types hit.
> >
> > Could such a thing be added into the plan?
>
> [redirecting to core-libs]
>

Good call, I should have thought of that.

Adding some VarHandle operation that takes into account the cache lines
> size is interesting — although preserving cache-line *alignment* could be
> tricky as the GC relocates arrays, so an array element that’s at the start
> of a cache line at time t0 might not be at the start of a cache line at
> time t1 — but that’s unrelated to this JEP.
>
> What is related to this JEP is that you’re using Unsafe to determine the
> size of an oop (in particular, to tell if oops are compressed or no)t. Is
> that what you’re asking for?
>

What I need - my actual use case - is some way to guarantee that some piece
of data is stored alone on its own cache line. This may be a subset of a
broader use case where some set of values should be stored alone on its own
cache line.

To achieve this use case I am presently making assumptions about objects in
an array - specifically that they are stored as oops, and that an oop is
smaller than a cache line (okay that's a fairly reasonable assumption I
guess), and that I can find out the size of an oop so that I can scale
things appropriately. But recognizing that this is not a good kind of
assumption to make, I would rather have an API that (for example) lets me
create and access an array (probably via VarHandle) such that I can
guarantee that each item is a of minimum size (and that size, for my use
case, is going to be what I can determine as the minimum data cache line
size).

A general API which does this would be cool, but I'd be hard-pressed to
imagine a use case for target size other than the data cache line size.

On preserving larger alignment of array contents - that would be generally
pretty interesting. It is not necessary for my use case though, and
probably would be very difficult to implement anyway.

As John suggests, I can probably work around the problem by being very
conservative in my estimates. I can assume that an oop is 4 bytes for
example, and this would work (possibly at the cost of a bit of extra empty
space, and in the hope that nobody will invent some even smaller oop
encoding). I can also assume that a `long` is 8 bytes (this is a safer
assumption) without too much risk.  But it feels like there could be a more
optimal solution in the JDK for this.

-- 
- DML • he/him


Re: Generalizing binary search

2024-04-25 Thread David Lloyd
On Thu, Apr 25, 2024 at 2:09 PM Pavel Rappo  wrote:

>
>
> > On 25 Apr 2024, at 19:41, David Lloyd  wrote:
> >
> > The JDK contains a few collection- and array-oriented implementations of
> binary search. For the most part, they all work the same way: search the
> target "thing" by index using the obvious bisection strategy, returning
> either /index/ or /-index-1/ depending on whether it was found at the end
> of the search.
> >
> > However, if you're doing a binary search on a thing that is not a list
> or an array, you have two choices: try to make the thing you're searching
> on implement List (often infeasible) or write your own binary search.
> >
> > I'm really tired of writing my own binary search. I've probably done it
> 50 times by now, each one using a slightly different access and/or compare
> strategy, and every time is a new chance to typo something or get something
> backwards, leading to irritating debugging sessions and higher dentist
> bills.
>
> Can we safely say that it sets your teeth on edge?
>

You could say that. :)

Have a look at this recently filed issue:
> https://bugs.openjdk.org/browse/JDK-8326330


Oh, nice, I didn't see that come across. It's even more general than my
version: I like that. My only complaint is (once again) that the lambda
must be capturing in order to work with a plain
`IntPredicate`/`LongPredicate`.

-- 
- DML • he/him


Generalizing binary search

2024-04-25 Thread David Lloyd
The JDK contains a few collection- and array-oriented implementations of
binary search. For the most part, they all work the same way: search the
target "thing" by index using the obvious bisection strategy, returning
either /index/ or /-index-1/ depending on whether it was found at the end
of the search.

However, if you're doing a binary search on a thing that is not a list or
an array, you have two choices: try to make the thing you're searching on
implement List (often infeasible) or write your own binary search.

I'm really tired of writing my own binary search. I've probably done it 50
times by now, each one using a slightly different access and/or compare
strategy, and every time is a new chance to typo something or get something
backwards, leading to irritating debugging sessions and higher
dentist bills.

It got me to thinking that it wouldn't be too hard to make a "general"
binary search which can search on anything, so that's what I did. I was
thinking that it might be interesting to try adding this into the JDK
somehow.

This implementation is more or less what I now copy & paste to different
projects at the moment:

public static  int binarySearch(C collection, int start, int end,
T key, Comparator cmp, IntGetter getter) {
int low = start;
int high = end - 1;
while (low <= high) {
int mid = low + high >>> 1;
int res = cmp.compare(getter.get(collection, mid), key);
if (res < 0) {
low = mid + 1;
} else if (res > 0) {
high = mid - 1;
} else {
return mid;
}
}
return -low - 1;
}
// (Plus variations for `Comparable` keys and long indices)

A key problem with this approach is that in the JDK, there is no
`ObjIntFunction` or `ObjLongFunction` which would be necessary
to implement the "getter" portion of the algorithm (despite the existence
of the analogous `ObjIntConsumer` and `ObjLongConsumer`). So, I also
have to copy/paste a `IntGetter`/`LongGetter` as well, which is
annoying.

A slightly less-good approach is for the general `binarySearch` method to
accept a `IntFunction`/`LongFunction`, and drop the `C collection`
parameter, like this:

public static  int binarySearch(int start, int end, T key,
Comparator cmp, IntFunction getter) { ... }

In this case, we can use the function types that the JDK already provides,
but we would very likely have to also create a capturing lambda (e.g.
`myList::get` instead of `List::get`). Maybe this isn't that bad of a
compromise.

It would be possible to replace the existing `binarySearch` implementations
with delegating calls to a generalized implementation. For `Collections`,
the indexed version uses `List::get` and the iterator version uses a helper
lambda to move the iterator and get the result. For arrays, a lambda would
be provided which gets the corresponding array element. If the
non-capturing variant is used, then (on paper at least) this version should
perform similarly to the existing implementations, with less code needed
overall. However, if a capturing lambda is required (due to the
aforementioned lack of `ObjXFunction`), then this could be slightly
worse-performing than the existing implementation due to the construction
(and maybe dispatch) overhead of the lambda. Some real-world benchmarks
would have to be written with various-sized data sets.

It would also be possible to produce primitive variations which operate on
int, float, long, and double values, using existing functions if capturing
is deemed "OK". It is also possible to produce a variation which uses a
`long` for the index, for huge data sets (for example, very large mapped
files using `MemorySegment`).

Also unclear is: where would it live? `Collections`? Somewhere else?

Any thoughts/opinions would be appreciated (even if they are along the
lines of "it's not worth the effort"). Particularly, any insight would be
appreciated as to whether or not this kind of hypothetical enhancement
would warrant a JEP (I wouldn't think so, but I'm no expert at such
assessments).

--
- DML • he/him


Re: Vector (and integer) API: unsigned min/max

2024-04-18 Thread David Lloyd
Presently, I'm implementing operations for a WASM emulator/runtime, so it's
more a case of transitively acquiring use cases from WASM, though I'm also
doing this specifically to practice accomplishing useful things with the
vector API.

Regarding saturating arithmetic: WASM has operations for that as well (both
signed and unsigned), so on that basis alone it would be a welcome addition
as far as I'm concerned. FWIW, another simple use case for saturating
arithmetic is audio processing/mixing (when processing many channels at
once) because saturation closely matches analog clipping, and also perhaps
image compositing for similar reasons.

On Thu, Apr 18, 2024 at 1:40 PM Paul Sandoz  wrote:

> Hi David,
>
> It’s not at all outlandish, but would caution it's more work than one
> might initially think.
>
> Could you describe a little more about your use case? that can be helpful
> to understand the larger picture and demand. Using unsigned comparison
> would be my recommended approach with the current API. CC'ing Sandhya for
> her opinion.
>
> Generally when we add new Vector operations we also consider the impact on
> the scalar types and try to fill any gaps there, so the vector operation
> behavior is composed from the scalar operation behavior (so like you
> indicated regarding symmetry).
>
> We are seeing demand for saturated arithmetic primarily for vector (not
> “vector” as in hardware vector) search, so we may do something there for
> specific integral types.
>
> Paul.
>
> > On Apr 17, 2024, at 7:13 AM, David Lloyd  wrote:
> >
> > I've been trying the the incubating Vector API and one thing I've missed
> on integer-typed vectors is having unsigned min/max operations. There is a
> signed min/max operation, and unsigned comparison, but no unsigned min/max.
> >
> > I guess this is for symmetry with `Math`, which only has signed
> `min`/`max`. However, I would point out that while it's not very hard to
> implement one's own unsigned min/max for integer types using
> `compareUnsigned`, it is a bit harder to do so with vectors. The best I've
> come up with is to take one of two approaches:
> >
> > 1. Zero-extend the vector to the next-larger size, perform the min/max,
> and reduce it back down again, or
> > 2. Add .MIN_VALUE, min/max with a value or vector also offset
> by .MIN_VALUE, and then subtract the offset again
> >
> > I guess a third approach could be to do a comparison using unsigned
> compares, and then use the resultant vector mask to select items from the
> original two vectors, but I didn't bother to work out this solution given I
> already had the other two options.
> >
> > Would it be feasible to add unsigned min/max operations for vectors? It
> looks like at least AArch64 has support for this as a single instruction,
> so it doesn't seem too outlandish.
> >
> > And as a separate (but related) question, what about
> `Math.minUnsigned`/`Math.maxUnsigned` of `int` and `long` for symmetry?
> >
> > --
> > - DML • he/him
>
>

-- 
- DML • he/him


Vector (and integer) API: unsigned min/max

2024-04-17 Thread David Lloyd
I've been trying the the incubating Vector API and one thing I've missed on
integer-typed vectors is having unsigned min/max operations. There is a
signed min/max operation, and unsigned comparison, but no unsigned min/max.

I guess this is for symmetry with `Math`, which only has signed
`min`/`max`. However, I would point out that while it's not very hard to
implement one's own unsigned min/max for integer types using
`compareUnsigned`, it is a bit harder to do so with vectors. The best I've
come up with is to take one of two approaches:

1. Zero-extend the vector to the next-larger size, perform the min/max, and
reduce it back down again, or
2. Add .MIN_VALUE, min/max with a value or vector also offset by
.MIN_VALUE, and then subtract the offset again

I guess a third approach could be to do a comparison using unsigned
compares, and then use the resultant vector mask to select items from the
original two vectors, but I didn't bother to work out this solution given I
already had the other two options.

Would it be feasible to add unsigned min/max operations for vectors? It
looks like at least AArch64 has support for this as a single instruction,
so it doesn't seem too outlandish.

And as a separate (but related) question, what about
`Math.minUnsigned`/`Math.maxUnsigned` of `int` and `long` for symmetry?

-- 
- DML • he/him


Re: Revisiting JDK-4512189: ZipConstants leaking into public APIs

2023-11-28 Thread David Lloyd
On Tue, Nov 28, 2023 at 10:38 AM Eirik Bjørsnøs  wrote:

>
>
> On Tue, Nov 28, 2023 at 5:28 PM David Lloyd 
> wrote:
>
>> I agree, I always thought that this was a bizarre thing to expose in a
>> public API so it being accidental does explain things. But maybe the
>> appropriate fix is even simpler now in these modern times: one could remove
>> the `implements` and add a `static import` of `ZipConstants.*`, which at
>> approximately one or two lines is a much more minimal (and
>> conflict-resistant, for those who care about backporting subsequent fixes
>> of this class) change IMO.
>>
>
> Thanks David!
>
> Not entirely sure I understand what you suggest this as an alternative to.
> Removing "implements" and replacing that with static imports is surely
> where I would eventually see this going.
>

I was "responding" to the (22-year-old?) suggested fix on the JIRA
bug, perhaps unnecessarily; my apologies.


> But before we get there, I think we need to discuss the compatibility
> impact of such a change, and how to go about it with regards to the
> deprecation process etc.
>

Makes sense to me. FWIW (maybe not much) I think your analysis of the
source vs binary compatibility impact is correct. Peeking at
`java.util.zip.ZipConstants` with `javap` reveals that the fields do indeed
have the expected `ConstantValue` attribute which seems to support your
conclusion per JLS § 13.1 [1].

[1] https://docs.oracle.com/javase/specs/jls/se21/html/jls-13.html#jls-13.1

-- 
- DML • he/him


Re: Revisiting JDK-4512189: ZipConstants leaking into public APIs

2023-11-28 Thread David Lloyd
On Tue, Nov 28, 2023 at 10:01 AM Eirik Bjørsnøs  wrote:

> Hi,
>
> JDK-4512189 [1] raised the issue that ZipFile, ZipEntry, ZipInputStream
> and ZipOutputStream all implement the non-public ZipConstants interface.
> While the interface itself is non-public, the constants defined
> unfortunately leak into the public API as constants of the mentioned
> classes. This means they show up in Javadoc, and maybe worse in code
> completions.
>
> JDK-4512189 was closed in 2001 because of concerns of binary compatibility:
>
> Unfortunately, whether or not a class implements an interface is part of
>> the classes public API and having those classes *not* implement
>> ZipConstants will break binary compatibility. Therefore, unfortunately,
>> this bug is being closed as will not fix.
>
>
> However, will this really break binary compatibility? Any reference to
> ZipFile.CENSIG or any other constant (which are all of type long and int)
> will be compiled into byte code in the referencing class, with or without a
> copy of the constant in the constant pool of the referencing class. Thus,
> any compiled reference should continue to work.
>
> There is of course a source compatibility issue, but these constants
> should generally not be of interest to anyone outside the implementation,
> and if anyone is accidentally using them, it's most probably a bug. In any
> case, they would get warned when trying to recompile their sources. The
> easy solution then will be to define their own constants or fix the bug.
>
> In light of this, I would like to revisit this issue, 22 years later:
>
> - Is my assessment that this change is actually not binary incompatible
> sound, or did I miss something?
> - Would it in any case make sense to mark ZipConstants as @Deprecated,
> maybe for removal to alert people we want to remove the constants?
> - Could we aim to make the mentioned classes *not* implement ZipConstants,
> following the regular deprecation process, CSR and release note etc?
>

I agree, I always thought that this was a bizarre thing to expose in a
public API so it being accidental does explain things. But maybe the
appropriate fix is even simpler now in these modern times: one could remove
the `implements` and add a `static import` of `ZipConstants.*`, which at
approximately one or two lines is a much more minimal (and
conflict-resistant, for those who care about backporting subsequent fixes
of this class) change IMO.

-- 
- DML • he/him


Re: Unsigned long to double and back

2022-11-11 Thread David Lloyd
I'd be hesitant to do that in decimal; it's probably better to add
`0x1.0p63` instead. But I guess the approaches are similar: shave off one
bit, and then test that bit and adjust the result accordingly.

On Fri, Nov 11, 2022 at 9:59 AM Remi Forax  wrote:

> Hi all,
> I don't know if it's the correct way to do it or if there is a better way,
> but i'm using
>
>   var result = (double) (value & 0x7fffL);
>   if (value < 0) {
> result = result + 9.223372036854776E18;
>   }
>
>
> The idea is to mask the sign bit, do the conversion to double and if the
> sign bit was present (if value < 0) add Long.MAX_VALUE + 1
> (9223372036854775808) as a double.
>
> I've got 9.223372036854776E18 as the result of
>   BigInteger.valueOf(Long.MAX_VALUE).add(BigInteger.ONE).doubleValue()
>
> Rémi
>
> --
>
> *From: *"David Lloyd" 
> *To: *"Johannes Kuhn" 
> *Cc: *"core-libs-dev" 
> *Sent: *Friday, November 11, 2022 3:38:31 PM
> *Subject: *Re: Unsigned long to double and back
>
> Well, I typed this out from memory so there's an error, of course. `(tmp &
> 1)` should be `(input & 1)`.
>
> On Fri, Nov 11, 2022 at 8:31 AM David Lloyd 
> wrote:
>
>> I encountered this issue as well; for now I'm using the following
>> transformation:
>>
>> long tmp = input >>> 1;
>> double output = ((double) tmp) * 2.0 + (tmp & 1);
>>
>> I... *think* it's correct but I'm not 100% sure and have a long-standing
>> TODO to try and figure it out...
>>
>> On Sat, Nov 5, 2022 at 7:17 PM Johannes Kuhn  wrote:
>>
>>> When I tried to implement an WASM transpiler, I noticed some missing
>>> conversion methods from unsigned types to floating point, for example
>>> from unsigned long to a double.
>>>
>>> For the meaning of unsigned long, see Long.toUnsignedString(long i).
>>>
>>> Converting between unsigned long and floating point is not a trivial
>>> task, as it probably requires some bit manipulation.
>>>
>>> In particular, I would love to see the following methods added*:
>>>
>>> - double Double.fromUnsignedLong(long i)
>>> - long Double.toUnsignedLong(double d)
>>> - float Float.fromUnsignedLong(long i)
>>> - long Float.toUnsignedLong(float f)
>>>
>>> * Subject to bikeshedding - I don't care about the name, or if it is
>>> added to the Long class.
>>>
>>> Currently, I don't think that additional methods for unsigned int are
>>> necessary - as it is possible to cast between long and int, but feel
>>> free to correct me.
>>>
>>> In WASM, the specification for those methods can be found here:
>>>
>>> https://www.w3.org/TR/wasm-core-1/#op-trunc-u
>>> https://www.w3.org/TR/wasm-core-1/#op-convert-u
>>>
>>> Note that the WASM specification is undefined for some values, notably
>>> NaN, infinities, and values that fall out of the range.
>>>
>>> As *I* want to use it to implement WASM instructions, I do not have any
>>> strong opinion on the undefined cases - for example, returning the
>>> nearest unsigned long value or throwing an exception is fine for me.
>>>
>>> What do you think?
>>>
>>> - Johannes
>>>
>>>
>>
>> --
>> - DML • he/him
>>
>
>
> --
> - DML • he/him
>
>

-- 
- DML • he/him


Re: Unsigned long to double and back

2022-11-11 Thread David Lloyd
Well, I typed this out from memory so there's an error, of course. `(tmp &
1)` should be `(input & 1)`.

On Fri, Nov 11, 2022 at 8:31 AM David Lloyd  wrote:

> I encountered this issue as well; for now I'm using the following
> transformation:
>
> long tmp = input >>> 1;
> double output = ((double) tmp) * 2.0 + (tmp & 1);
>
> I... *think* it's correct but I'm not 100% sure and have a long-standing
> TODO to try and figure it out...
>
> On Sat, Nov 5, 2022 at 7:17 PM Johannes Kuhn  wrote:
>
>> When I tried to implement an WASM transpiler, I noticed some missing
>> conversion methods from unsigned types to floating point, for example
>> from unsigned long to a double.
>>
>> For the meaning of unsigned long, see Long.toUnsignedString(long i).
>>
>> Converting between unsigned long and floating point is not a trivial
>> task, as it probably requires some bit manipulation.
>>
>> In particular, I would love to see the following methods added*:
>>
>> - double Double.fromUnsignedLong(long i)
>> - long Double.toUnsignedLong(double d)
>> - float Float.fromUnsignedLong(long i)
>> - long Float.toUnsignedLong(float f)
>>
>> * Subject to bikeshedding - I don't care about the name, or if it is
>> added to the Long class.
>>
>> Currently, I don't think that additional methods for unsigned int are
>> necessary - as it is possible to cast between long and int, but feel
>> free to correct me.
>>
>> In WASM, the specification for those methods can be found here:
>>
>> https://www.w3.org/TR/wasm-core-1/#op-trunc-u
>> https://www.w3.org/TR/wasm-core-1/#op-convert-u
>>
>> Note that the WASM specification is undefined for some values, notably
>> NaN, infinities, and values that fall out of the range.
>>
>> As *I* want to use it to implement WASM instructions, I do not have any
>> strong opinion on the undefined cases - for example, returning the
>> nearest unsigned long value or throwing an exception is fine for me.
>>
>> What do you think?
>>
>> - Johannes
>>
>>
>
> --
> - DML • he/him
>


-- 
- DML • he/him


Re: Unsigned long to double and back

2022-11-11 Thread David Lloyd
I encountered this issue as well; for now I'm using the following
transformation:

long tmp = input >>> 1;
double output = ((double) tmp) * 2.0 + (tmp & 1);

I... *think* it's correct but I'm not 100% sure and have a long-standing
TODO to try and figure it out...

On Sat, Nov 5, 2022 at 7:17 PM Johannes Kuhn  wrote:

> When I tried to implement an WASM transpiler, I noticed some missing
> conversion methods from unsigned types to floating point, for example
> from unsigned long to a double.
>
> For the meaning of unsigned long, see Long.toUnsignedString(long i).
>
> Converting between unsigned long and floating point is not a trivial
> task, as it probably requires some bit manipulation.
>
> In particular, I would love to see the following methods added*:
>
> - double Double.fromUnsignedLong(long i)
> - long Double.toUnsignedLong(double d)
> - float Float.fromUnsignedLong(long i)
> - long Float.toUnsignedLong(float f)
>
> * Subject to bikeshedding - I don't care about the name, or if it is
> added to the Long class.
>
> Currently, I don't think that additional methods for unsigned int are
> necessary - as it is possible to cast between long and int, but feel
> free to correct me.
>
> In WASM, the specification for those methods can be found here:
>
> https://www.w3.org/TR/wasm-core-1/#op-trunc-u
> https://www.w3.org/TR/wasm-core-1/#op-convert-u
>
> Note that the WASM specification is undefined for some values, notably
> NaN, infinities, and values that fall out of the range.
>
> As *I* want to use it to implement WASM instructions, I do not have any
> strong opinion on the undefined cases - for example, returning the
> nearest unsigned long value or throwing an exception is fine for me.
>
> What do you think?
>
> - Johannes
>
>

-- 
- DML • he/him


Re: Handling of USR2 signal in JVM on Linux

2022-06-21 Thread David Lloyd
On Fri, Jun 17, 2022 at 1:05 AM Thomas Stüfe  wrote:
>>
>>
>> > >
>> > > I see your point. I dislike crashes, especially ones they can be
>> > > reliably triggered from outside. You never can be sure about the
>> > > security implications either.
>> > >
>> > > If you dislike ignoring the signal, and since the default action for
>> > > SIGUSR1+2 is process termination, why not exit the VM with an error
>> > > message instead? At least we don’t have what looks like a random
>> > > segfault, and we could print out the sender pid  too.
>> >
>> > We could ... but then do we do that for every other signal that can be
>> > thrown at the JVM from externally and which will leads to crashes? Where
>> > do you stop?
>>
>
> It's a very limited set. I'd say we limit this to
>
> 1) signal for which we registered handlers: the handlers should at least not 
> crash or vanish the VM without a trace
> 2) signals which may conceivably be sent to the VM in the normal course of 
> events: if the default action is to terminate the VM, we should handle them
>
> Note that for (2), we already do this for some signals. We explicitly handle 
> SIGPIPE (https://bugs.openjdk.org/browse/JDK-4229104) and SIGXFSZ 
> (https://bugs.openjdk.org/browse/JDK-6499219) because they kill the VM.
>
>>
>> Well, playing devil's advocate, why not? Would it be more complex than 
>> adding a
>>
>> if (si.si_pid != my_cached_pid) {
>> // it came from Outside, let's exit with 128+signum
>> // ...
>> }
>>
>> to the top of the signal handler(s)?
>>
>> --
>> - DML • he/him
>>
>
> I filed a JBS issue and a PR: 
> https://github.com/openjdk/jdk/pull/9181#issuecomment-1157353776.
>
> I went with a different approach, just using Thread::current=NULL as an 
> indicator. I was worried that Unix implementations may leave the si_pid field 
> empty, or that it gets filled with the sender's kernel thread id, not the 
> process pid. Just verified that on Linux, its really actually the process id 
> (so the main threads id).

`siginfo_t` is POSIX so I wouldn't worry much about it being anything
other than the sender's process ID in the year 2022. Testing that
`Thread::current=NULL` tells you that the thread that the signal was
delivered to doesn't have a Java thread, I guess, but it doesn't tell
you if the signal was invalid in the case where the target thread does
have a corresponding Java thread; after all, a process signal sent
from outside could be sent to any thread as you say. Checking the
sending PID tells you with certainty whether or not the signal did
come from outside, which is really the salient question AFAICT.

-- 
- DML • he/him



Re: Handling of USR2 signal in JVM on Linux

2022-06-16 Thread David Lloyd
On Thu, Jun 16, 2022 at 8:17 AM David Holmes  wrote:
> On 16/06/2022 12:09 am, Thomas Stüfe wrote:
> > On Wed 15. Jun 2022 at 15:06, David Holmes  > > wrote:
> > Nor do I. SIGUSR2 is reserved for use by the VM and we document
> > that. If
> > you start throwing around random signals at processes then they can
> > crash - so don't do that. I'm not sure ignoring signals that shouldn't
> > be raised actually aids the end-user who obviously has an issue in
> > their
> > environment that needs to be addressed.
> >
> >
> > I see your point. I dislike crashes, especially ones they can be
> > reliably triggered from outside. You never can be sure about the
> > security implications either.
> >
> > If you dislike ignoring the signal, and since the default action for
> > SIGUSR1+2 is process termination, why not exit the VM with an error
> > message instead? At least we don’t have what looks like a random
> > segfault, and we could print out the sender pid  too.
>
> We could ... but then do we do that for every other signal that can be
> thrown at the JVM from externally and which will leads to crashes? Where
> do you stop?

Well, playing devil's advocate, why not? Would it be more complex than adding a

if (si.si_pid != my_cached_pid) {
// it came from Outside, let's exit with 128+signum
// ...
}

to the top of the signal handler(s)?

-- 
- DML • he/him