Re: RFR: 8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2024-04-20 Thread Srinivas Vamsi Parasa
On Tue, 9 Apr 2024 21:36:46 GMT, Vladimir Yaroslavskiy  wrote:

>>> Hi Vamsi (@vamsi-parasa), few questions on your test environment:
>>> 
>>> * what are the hardware specs of your server ?
>>> * bare-metal or virtual ?
>>> * are other services or big processes running ?
>>> * os tuning ? CPU HT: off? Fixed CPU governor or frequency ?
>>> * isolation using taskset ?
>>> 
>>> Maybe C2 JIT (+ CDS archive) are given more performance on stock jdk sort 
>>> than same code running outside jdk...
>>> 
>>> Thanks, Laurent
>> 
>> Hi Laurent,
>> 
>> The benchmarks are run on Intel TigerLake Core i7 machine. It's bare-metal 
>> without any virtualization. HT is ON and there is no other specific OS 
>> tuning or isolation using taskset.
>> 
>> Thanks,
>> Vamsi
>
> Hello Vamsi (@vamsi-parasa),
> 
> Could you please run the new benchmarking?
> To save time and don't patch JDK several times, I've created 
> JavaBenchmarkHarness
> class which is under package java.util and compares several versions of DPQS.
> Also I prepared several versions of current sorting source from JDK to detect 
> what is going wrong.
> 
> What you need is to compile and run JavaBenchmarkHarness once:
> 
> javac --patch-module java.base=. -d classes *.java
> java -XX:CompileThreshold=1 -XX:-TieredCompilation --patch-module 
> java.base=classes -cp classes java.util.JavaBenchmarkHarness
> 
> Find the sources there:
> 
> https://github.com/iaroslavski/sorting/blob/master/radixsort/JavaBenchmarkHarness.java
>   
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01_ins.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01_mrg.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01_piv.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01_prt.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r29p.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r29p5.java
> 
> Thank you,
> Vladimir

Hi Vladimir (@iaroslavski),

Please see the data below:

Thanks,
Vamsi



name | builder | size | mode | count | score
-- | -- | -- | -- | -- | --
b01 | RANDOM | 600 | avg | 325677 | 6.862
b01 | RANDOM | 3000 | avg | 52041 | 82.233
b01 | RANDOM | 9 | avg | 1217 | 4456.51
b01 | RANDOM | 40 | avg | 242 | 22923.28
b01 | RANDOM | 100 | avg | 90 | 60598.84
b01 | REPEATED | 600 | avg | 651354 | 1.933
b01 | REPEATED | 3000 | avg | 104083 | 13.753
b01 | REPEATED | 9 | avg | 2435 | 723.039
b01 | REPEATED | 40 | avg | 484 | 3084.416
b01 | REPEATED | 100 | avg | 180 | 8234.428
b01 | STAGGER | 600 | avg | 1954062 | 1.005
b01 | STAGGER | 3000 | avg | 312251 | 4.945
b01 | STAGGER | 9 | avg | 7305 | 133.126
b01 | STAGGER | 40 | avg | 1453 | 592.144
b01 | STAGGER | 100 | avg | 542 | 1493.876
b01 | SHUFFLE | 600 | avg | 325677 | 5.12
b01 | SHUFFLE | 3000 | avg | 52041 | 29.252
b01 | SHUFFLE | 9 | avg | 1217 | 1396.664
b01 | SHUFFLE | 40 | avg | 242 | 5743.489
b01 | SHUFFLE | 100 | avg | 90 | 15490.81
b01_ins | RANDOM | 600 | avg | 325677 | 7.594
b01_ins | RANDOM | 3000 | avg | 52041 | 78.631
b01_ins | RANDOM | 9 | avg | 1217 | 4312.511
b01_ins | RANDOM | 40 | avg | 242 | 22108.18
b01_ins | RANDOM | 100 | avg | 90 | 58467.16
b01_ins | REPEATED | 600 | avg | 651354 | 1.569
b01_ins | REPEATED | 3000 | avg | 104083 | 11.313
b01_ins | REPEATED | 9 | avg | 2435 | 720.838
b01_ins | REPEATED | 40 | avg | 484 | 3003.673
b01_ins | REPEATED | 100 | avg | 180 | 8144.944
b01_ins | STAGGER | 600 | avg | 1954062 | 0.98
b01_ins | STAGGER | 3000 | avg | 312251 | 4.948
b01_ins | STAGGER | 9 | avg | 7305 | 132.909
b01_ins | STAGGER | 40 | avg | 1453 | 592.572
b01_ins | STAGGER | 100 | avg | 542 | 1492.627
b01_ins | SHUFFLE | 600 | avg | 325677 | 4.092
b01_ins | SHUFFLE | 3000 | avg | 52041 | 27.138
b01_ins | SHUFFLE | 9 | avg | 1217 | 1304.326
b01_ins | SHUFFLE | 40 | avg | 242 | 5465.745
b01_ins | SHUFFLE | 100 | avg | 90 | 14585.08
b01_mrg | RANDOM | 600 | avg | 325677 | 7.139
b01_mrg | RANDOM | 3000 | avg | 52041 | 81.01
b01_mrg | RANDOM | 9 | avg | 1217 | 4266.084
b01_mrg | RANDOM | 40 | avg | 242 | 21937.77
b01_mrg | RANDOM | 100 | avg | 90 | 58177.72
b01_mrg | REPEATED | 600 | avg | 651354 | 1.36
b01_mrg | REPEATED | 3000 | avg | 104083 | 9.013
b01_mrg | REPEATED | 9 | avg | 2435 | 737.684
b01_mrg | REPEATED | 40 | avg | 484 | 3152.447
b01_mrg | REPEATED | 100 | avg | 180 | 8366.866
b01_mrg | STAGGER | 600 | avg | 1954062 | 0.73
b01_mrg | STAGGER | 3000 | avg | 312251 | 3.733
b01_mrg | STAGGER | 9 | avg | 7305 | 114.35
b01_mrg | STAGGER | 40 | avg | 1453 | 524.821
b01_mrg | STAGGER | 100 | avg | 542 | 1351.504
b01_mrg | SHUFFLE | 600 | avg | 325677 | 4.986
b01_mrg | SHUFF

Re: EnumeratedStream

2024-04-20 Thread -
On Sat, Apr 20, 2024 at 8:29 PM ІП-24 Олександр Ротань <
rotan.olexa...@gmail.com> wrote:

> Gatherers could be effectively terminal, but I don't think Gatherer API
> designers intended it to be. In related JEP, gatherers are described as a
> way to declare custom intermediate operations, and introducing "terminal"
> gatherers would be misleading.
>
I will show you an API that expands on collectors too.

>
> Talking about performance, not even considering gather() method itself,
> creating an instance of Indexed object for each value in stream is costy
> and might turn into a nightmare for infinite streams.
>
We will have value objects soon (Project Valhalla), by then the object
creation will be much more lightweight as it's effectively a tuple. This is
not really a thing of concern.

>
> As for internals of Stream API right now, I am not aware about its current
> state, so not much to say here except that adding a new type of streams
> that just slightly extends existing functionality might not be that
> harmful, I guess.
>
> Regarding API exposure, I don't think that moving it from stream directly
> to Gatherers factory would be much of a deal since to see index-aware
> methods user must explicitly convert stream to enumerated.
>
Having another set of EnumeratedStream APIs and implementations is going to
be a nightmare to maintain. Also it has bad interpolatability to convert
itself back into regular streams for other usages, while Stream>
are much better in that aspect.

>
> I also think the I didn't express my point about index consistency clear
> enough. Consider following:
>
Notice that Stream.filter retains instead of drops the matching elements!
Since otherwise your results don't make sense. I will assume you mean
"filter" like "dropIf"

>
> List.of(1,2,3).stream().enumerated().filter((idx, i) -> i*idx <
> 2).map((idx, val) -> idx * val)
>
> Result : (4, 9)
>
Does idx start from 0 or 1? All Java indices start from 0, having it start
from 1 in enumerated is really weird. I will assume you mean [2, 6]=[1*2,
2*3]
Your use case is even simpler, this case can be covered in a simple
gatherer like this:
Gatherer.ofSequential(() -> new int[] {0},
Gatherer.Integrator.ofGreedy((counter, v, sink) ->
sink.push(new Indexed<>(counter[0]++, v
Better than writing thousands lines of new code. The functionalities are
already implemented in my sample project in
https://github.com/liachmodded/IndexedStream/tree/main/src/main/java/com/github/liachmodded/indexedstream

>
> With your approach
>
> List.of(1,2,3).stream().gather(Gatherers.filterIndexed((idx, val) ->
> idx*val < 2)).gather(Gatherers.mapIndexed((idx, val) -> idx * val))
>
> Result : (2, 6)
>
I will assume you mean [0, 3]=[0*2, 1*3]

>
> Not only second option is much more verbose, but also indexes of stream
> are inconsistent between operations.
>
Then what does EnumeratedStream serve? We would assume the enumeration
means the enumeration at the current stage so we want the number
immediately. If you want the enumeration at a particular stage, it's better
to zip your stream items with an index like my gatherer above at an
explicit step.

Also I have already made a showcase git repository with your two test cases
at
https://github.com/liachmodded/IndexedStream/blob/main/src/test/java/com/github/liachmodded/indexedstream/IndexedGatherersTest.java
Feel free to rant and tell what you really want that aren't served by these
few simple classes.

>
> PS: regarding findIndex, I did some benchmarking today, you might like to
> take a look. On average list-based version outperform collector and
> gatherer-based in more then 10 times. And also lists in stdlib doesn't have
> any hashcode-based implementations. I wrote 19 implementations in list
> subclasses and none of them had anything but simple traversing logic inside
> indexOf. But that's topic of another thread
>
Thanks for the prompt, I will look at it and reply in the other thread.


Re: EnumeratedStream

2024-04-20 Thread ІП-24 Олександр Ротань
Gatherers could be effectively terminal, but I don't think Gatherer API
designers intended it to be. In related JEP, gatherers are described as a
way to declare custom intermediate operations, and introducing "terminal"
gatherers would be misleading.

Talking about performance, not even considering gather() method itself,
creating an instance of Indexed object for each value in stream is costy
and might turn into a nightmare for infinite streams.

As for internals of Stream API right now, I am not aware about its current
state, so not much to say here except that adding a new type of streams
that just slightly extends existing functionality might not be that
harmful, I guess.

Regarding API exposure, I don't think that moving it from stream directly
to Gatherers factory would be much of a deal since to see index-aware
methods user must explicitly convert stream to enumerated.

I also think the I didn't express my point about index consistency clear
enough. Consider following:

List.of(1,2,3).stream().enumerated().filter((idx, i) -> i*idx <
2).map((idx, val) -> idx * val)

Result : (4, 9)

With your approach

List.of(1,2,3).stream().gather(Gatherers.filterIndexed((idx, val) ->
idx*val < 2)).gather(Gatherers.mapIndexed((idx, val) -> idx * val))

Result : (2, 6)

Not only second option is much more verbose, but also indexes of stream are
inconsistent between operations.

PS: regarding findIndex, I did some benchmarking today, you might like to
take a look. On average list-based version outperform collector and
gatherer-based in more then 10 times. And also lists in stdlib doesn't have
any hashcode-based implementations. I wrote 19 implementations in list
subclasses and none of them had anything but simple traversing logic inside
indexOf. But that's topic of another thread

On Sun, Apr 21, 2024, 04:03 -  wrote:

> On Sat, Apr 20, 2024 at 7:44 PM ІП-24 Олександр Ротань <
> rotan.olexa...@gmail.com> wrote:
>
>> Also enumerated stream should also support index-aware terminal
>> operations, which getherers are incapable of, so it will also require to
>> create index-aware collectors. I am not aware if this is even possible, but
>> this looks like another separate functionality in different place, and some
>> developers might just don't be aware of its existence. I think that we as
>> language devs should think not only about what is possible in language, but
>> also about is it comfortable and is it obvious for user
>>
> Gatherers can become many-to-one; notice it has a state A, it can totally
> choose to only emit a single element R in its finisher (i.e. its integrator
> only touches state and element and ignores the downstream), then you can
> use findAny().orElseThrow() to access that single collector result. That
> said, the factory I proposed can try to wrap Collectors the same way it
> wraps Gatherers. See conclusion below.
>
>> On Sun, Apr 21, 2024, 03:36 ІП-24 Олександр Ротань <
>> rotan.olexa...@gmail.com> wrote:
>>
>>> Yes, I think every possible intermediate operation could be made index
>>> aware using gatherers. The point is: should it be turned?
>>>
>>> As a developers of jdk itself, we are not limited in a ways we could
>>> provide tools for Java users, especially when it comes to adding completely
>>> new features and not modifying existing apis.
>>>
>> Creating a new type of pipeline is only going to blow up the complexity
> of Streams; having IntStream, LongStream, DoubleStream and Stream, together
> with the 4-way spliterators (Spliterator.OfInt etc.) and iterators
> (PrimitiveIterator.OfInt etc.), is already an API nightmare. And the
> indexing will multiply all the intermediate and terminal operations by 2,
> further blowing up the API, which I don't believe is the right direction to
> go.
>
>>
>>>
>> Gatherer-based approach looks like we are developers of third party
>>> library that has to look for workarounds instead of directly adding
>>> features we need. It's syntax is more wordy without any payoff in
>>> flexibility, and obviously would be slower and memory-costy.
>>>
>> Gatherer is not a "third party hook", but an essential API that
> represents all possible stream operations, including Collector. Gatherer
> would not be slow; it already supports short-circuiting and should not add
> extra overheads, as Gatherer is like an API for all possible stream
> operations.
>
>>
>>> For me seems that implementing this using gatherer would only introduce
>>> unnecessary intermediate steps in operation internal pipeline without any
>>> visible payoff.
>>>
>> Implementing with Gatherer would reduce useless API exposure, as indexed
> operations are't that frequently used and are useless in parallel
> scenarios. Especially that these indexed operations aren't optimizable by
> stream subclasses, much like how findIndex is not helpful in Lists as
> Predicates can't be easily decoded like Object equivalence/hashCode, which
> some Lists can use to speed up indexOf.
>
>>
>>> Also, indexes created this way

Re: EnumeratedStream

2024-04-20 Thread -
On Sat, Apr 20, 2024 at 7:44 PM ІП-24 Олександр Ротань <
rotan.olexa...@gmail.com> wrote:

> Also enumerated stream should also support index-aware terminal
> operations, which getherers are incapable of, so it will also require to
> create index-aware collectors. I am not aware if this is even possible, but
> this looks like another separate functionality in different place, and some
> developers might just don't be aware of its existence. I think that we as
> language devs should think not only about what is possible in language, but
> also about is it comfortable and is it obvious for user
>
Gatherers can become many-to-one; notice it has a state A, it can totally
choose to only emit a single element R in its finisher (i.e. its integrator
only touches state and element and ignores the downstream), then you can
use findAny().orElseThrow() to access that single collector result. That
said, the factory I proposed can try to wrap Collectors the same way it
wraps Gatherers. See conclusion below.

> On Sun, Apr 21, 2024, 03:36 ІП-24 Олександр Ротань <
> rotan.olexa...@gmail.com> wrote:
>
>> Yes, I think every possible intermediate operation could be made index
>> aware using gatherers. The point is: should it be turned?
>>
>> As a developers of jdk itself, we are not limited in a ways we could
>> provide tools for Java users, especially when it comes to adding completely
>> new features and not modifying existing apis.
>>
> Creating a new type of pipeline is only going to blow up the complexity of
Streams; having IntStream, LongStream, DoubleStream and Stream, together
with the 4-way spliterators (Spliterator.OfInt etc.) and iterators
(PrimitiveIterator.OfInt etc.), is already an API nightmare. And the
indexing will multiply all the intermediate and terminal operations by 2,
further blowing up the API, which I don't believe is the right direction to
go.

>
>>
> Gatherer-based approach looks like we are developers of third party
>> library that has to look for workarounds instead of directly adding
>> features we need. It's syntax is more wordy without any payoff in
>> flexibility, and obviously would be slower and memory-costy.
>>
> Gatherer is not a "third party hook", but an essential API that represents
all possible stream operations, including Collector. Gatherer would not be
slow; it already supports short-circuiting and should not add extra
overheads, as Gatherer is like an API for all possible stream operations.

>
>> For me seems that implementing this using gatherer would only introduce
>> unnecessary intermediate steps in operation internal pipeline without any
>> visible payoff.
>>
> Implementing with Gatherer would reduce useless API exposure, as indexed
operations are't that frequently used and are useless in parallel
scenarios. Especially that these indexed operations aren't optimizable by
stream subclasses, much like how findIndex is not helpful in Lists as
Predicates can't be easily decoded like Object equivalence/hashCode, which
some Lists can use to speed up indexOf.

>
>> Also, indexes created this way will be inconsistent between operations,
>> and I am not sure if that is what we are looking for.
>>
> We declare an index-aware gatherer and my said factory converts it to an
index-unaware, sequential gatherer; the factory gatherer prepares indices
before calling our index-aware gatherer.

For my factory, if you think my 4-line syntax above is too verbose, we can
encapsulate those to become
public static  Gatherer filter(Predicate> predicate)
etc.

And the primary methods will be:
public static  Gatherer indexed(Gatherer, A,
R> gatherer)
public static  Collector indexed(Collector, A,
R> collector)


Re: EnumeratedStream

2024-04-20 Thread ІП-24 Олександр Ротань
Also enumerated stream should also support index-aware terminal operations,
which getherers are incapable of, so it will also require to create
index-aware collectors. I am not aware if this is even possible, but this
looks like another separate functionality in different place, and some
developers might just don't be aware of its existence. I think that we as
language devs should think not only about what is possible in language, but
also about is it comfortable and is it obvious for user

On Sun, Apr 21, 2024, 03:36 ІП-24 Олександр Ротань 
wrote:

> Yes, I think every possible intermediate operation could be made index
> aware using gatherers. The point is: should it be turned?
>
> As a developers of jdk itself, we are not limited in a ways we could
> provide tools for Java users, especially when it comes to adding completely
> new features and not modifying existing apis.
>
> Gatherer-based approach looks like we are developers of third party
> library that has to look for workarounds instead of directly adding
> features we need. It's syntax is more wordy without any payoff in
> flexibility, and obviously would be slower and memory-costy.
>
> For me seems that implementing this using gatherer would only introduce
> unnecessary intermediate steps in operation internal pipeline without any
> visible payoff.
>
> Also, indexes created this way will be inconsistent between operations,
> and I am not sure if that is what we are looking for.
>
> On Sun, Apr 21, 2024, 03:25 -  wrote:
>
>> My point is that we can create Gatherers that gather indexed elements
>> (indexed gatherer), and another factory wraps the Gatherer so the index
>> boxing and automatically performed before sending to the indexed gatherer.
>> All other stream operations, like indexed filter, indexed collect, etc. can
>> all be done through the indexed gatherer.
>>
>> On Sat, Apr 20, 2024 at 7:13 PM ІП-24 Олександр Ротань <
>> rotan.olexa...@gmail.com> wrote:
>>
>>> I am sorry, but I feel like I am missing the point of your response and
>>> how is it related to what I have said.
>>>
>>> Regarding wrapping and unwrapping indexed pairs, one of advantages of
>>> the approach I have suggested is that EnumeratedStream is still a stream
>>> and all index-unaware ops could still be applied to it
>>>
>>> On Sun, Apr 21, 2024, 03:08 -  wrote:
>>>
 We must convert index-processing operations to a
 `gather(Gatherers.scan(/* index gathering */))` immediate before the
 operation that uses the index, and immediately unwrap the indices
 afterwards.

 Syntactically writing such 3 lines for every index-aware operation
 would be weird; I actually wonder if we can find a way to convert a
 Gatherer, ?, R> into a Gatherer; the nested gatherer
 itself will receive the enumerated values. I don't know if such a way to
 compose Gatherer has been studied, but this would allow us to define all
 indexed operations as gatherers instead of having to make new
 BiConsumers for indices. This would be interesting to look into. So the
 ideal form would be:

 BiPredicate filter = ...
 Gatherer, ?, R> filterGatherer = Gatherer.of((_, indexed,
 sink) -> return (!filter.test(indexed.index(), indexed.value())) ||
 sink.push(indexed.value()));
 Gatherer wrapped = IndexedGatherer.wrap(filterGatherer); // a
 factory method performing wrapping
 return stream.gather(wrapped) // and more

 On Sat, Apr 20, 2024 at 6:47 PM ІП-24 Олександр Ротань <
 rotan.olexa...@gmail.com> wrote:

> I would like to correct myself a bit: indexes should be assigned at
> the moment an element is received from upstream, not when it is passed to
> processing. Not sure if it will effectively change order of execution in
> actual implementation, but to put it this way would be more precise
>
> вс, 21 апр. 2024 г. в 02:33, ІП-24 Олександр Ротань <
> rotan.olexa...@gmail.com>:
>
>> Hello again.
>>
>> I have imagined implementation of enumerated streams in a slightly
>> different way. I think enumerated streams should be a separate kind of
>> stream like parallel, sequential or some primitive stream. This way
>> enumeration could be deferred to the time that index is being consumed,
>> This would eliminate issue with stream merging and also kind of smoothen
>> the issue with parallel streams, as the indexed will follow the order
>> elements have been passed to processing.
>>
>> This could potentially lead to another issue with consistency of
>> indexes through stream pipeline, but there could be a workaround like
>> IdentityHashMaps that is received from previous enumerated streams.
>>
>> This way we could make streams more efficient and also introduce a
>> more user-friendly API, which is essentially a win-win situation.
>>
>> Regarding concatenation specifically, we could either introduce
>> separate methods f

Re: Bag (Multiset) Collection

2024-04-20 Thread -
>
> 1. Batch tasks processing, especially in concurrent environments, when
> tasks could have different priorities. Sometimes tasks that are stored in a
> queue are processed in batches. Using today's stdlib of Java,
> acquiring such a batch of tasks from collection would be a O(n * log n)
> complexity, because each heapification is log n and called after each task
> is popped. Using TreeMultiset, this could be achieved in log (n). This also
> applies to messaging.
>
This is a good point. There's no way for users to attach tree node metadata
that are maintained when a BST is rebalanced, so having a rank operation
for a Tree Multiset is quite helpful. Unfortunately, not even google guava
supports this operation, but this is technically easy to implement.

I think this is the main advantage of a Multiset over regular Map> implementations. However, given Java's TreeMap doesn't even
support getting element by rank (its RBTree nodes don't record size), we
might have to start with adding size to RBTree nodes first.

>


Re: EnumeratedStream

2024-04-20 Thread ІП-24 Олександр Ротань
Yes, I think every possible intermediate operation could be made index
aware using gatherers. The point is: should it be turned?

As a developers of jdk itself, we are not limited in a ways we could
provide tools for Java users, especially when it comes to adding completely
new features and not modifying existing apis.

Gatherer-based approach looks like we are developers of third party library
that has to look for workarounds instead of directly adding features we
need. It's syntax is more wordy without any payoff in flexibility, and
obviously would be slower and memory-costy.

For me seems that implementing this using gatherer would only introduce
unnecessary intermediate steps in operation internal pipeline without any
visible payoff.

Also, indexes created this way will be inconsistent between operations, and
I am not sure if that is what we are looking for.

On Sun, Apr 21, 2024, 03:25 -  wrote:

> My point is that we can create Gatherers that gather indexed elements
> (indexed gatherer), and another factory wraps the Gatherer so the index
> boxing and automatically performed before sending to the indexed gatherer.
> All other stream operations, like indexed filter, indexed collect, etc. can
> all be done through the indexed gatherer.
>
> On Sat, Apr 20, 2024 at 7:13 PM ІП-24 Олександр Ротань <
> rotan.olexa...@gmail.com> wrote:
>
>> I am sorry, but I feel like I am missing the point of your response and
>> how is it related to what I have said.
>>
>> Regarding wrapping and unwrapping indexed pairs, one of advantages of the
>> approach I have suggested is that EnumeratedStream is still a stream and
>> all index-unaware ops could still be applied to it
>>
>> On Sun, Apr 21, 2024, 03:08 -  wrote:
>>
>>> We must convert index-processing operations to a
>>> `gather(Gatherers.scan(/* index gathering */))` immediate before the
>>> operation that uses the index, and immediately unwrap the indices
>>> afterwards.
>>>
>>> Syntactically writing such 3 lines for every index-aware operation would
>>> be weird; I actually wonder if we can find a way to convert a
>>> Gatherer, ?, R> into a Gatherer; the nested gatherer
>>> itself will receive the enumerated values. I don't know if such a way to
>>> compose Gatherer has been studied, but this would allow us to define all
>>> indexed operations as gatherers instead of having to make new
>>> BiConsumers for indices. This would be interesting to look into. So the
>>> ideal form would be:
>>>
>>> BiPredicate filter = ...
>>> Gatherer, ?, R> filterGatherer = Gatherer.of((_, indexed,
>>> sink) -> return (!filter.test(indexed.index(), indexed.value())) ||
>>> sink.push(indexed.value()));
>>> Gatherer wrapped = IndexedGatherer.wrap(filterGatherer); // a
>>> factory method performing wrapping
>>> return stream.gather(wrapped) // and more
>>>
>>> On Sat, Apr 20, 2024 at 6:47 PM ІП-24 Олександр Ротань <
>>> rotan.olexa...@gmail.com> wrote:
>>>
 I would like to correct myself a bit: indexes should be assigned at the
 moment an element is received from upstream, not when it is passed to
 processing. Not sure if it will effectively change order of execution in
 actual implementation, but to put it this way would be more precise

 вс, 21 апр. 2024 г. в 02:33, ІП-24 Олександр Ротань <
 rotan.olexa...@gmail.com>:

> Hello again.
>
> I have imagined implementation of enumerated streams in a slightly
> different way. I think enumerated streams should be a separate kind of
> stream like parallel, sequential or some primitive stream. This way
> enumeration could be deferred to the time that index is being consumed,
> This would eliminate issue with stream merging and also kind of smoothen
> the issue with parallel streams, as the indexed will follow the order
> elements have been passed to processing.
>
> This could potentially lead to another issue with consistency of
> indexes through stream pipeline, but there could be a workaround like
> IdentityHashMaps that is received from previous enumerated streams.
>
> This way we could make streams more efficient and also introduce a
> more user-friendly API, which is essentially a win-win situation.
>
> Regarding concatenation specifically, we could either introduce
> separate methods for merging enumerated streams, where we define how
> conflicts are resolved, or just treat them as usual streams, so the result
> of concatenation of enumerated streams will result in a normal stream that
> should be enumerated once again. Also i don't really see how gathering
> elements into index-value pairs solves issue with conflicting indexes
>
> вс, 21 апр. 2024 г. в 02:06, - :
>
>> Hi Oleksandr,
>> I fear that enumeration might not be applicable to all streams,
>> especially parallel ones. If we have a parallel stream, we might not 
>> always
>> process all elements in order, and even index generatio

Re: EnumeratedStream

2024-04-20 Thread -
My point is that we can create Gatherers that gather indexed elements
(indexed gatherer), and another factory wraps the Gatherer so the index
boxing and automatically performed before sending to the indexed gatherer.
All other stream operations, like indexed filter, indexed collect, etc. can
all be done through the indexed gatherer.

On Sat, Apr 20, 2024 at 7:13 PM ІП-24 Олександр Ротань <
rotan.olexa...@gmail.com> wrote:

> I am sorry, but I feel like I am missing the point of your response and
> how is it related to what I have said.
>
> Regarding wrapping and unwrapping indexed pairs, one of advantages of the
> approach I have suggested is that EnumeratedStream is still a stream and
> all index-unaware ops could still be applied to it
>
> On Sun, Apr 21, 2024, 03:08 -  wrote:
>
>> We must convert index-processing operations to a
>> `gather(Gatherers.scan(/* index gathering */))` immediate before the
>> operation that uses the index, and immediately unwrap the indices
>> afterwards.
>>
>> Syntactically writing such 3 lines for every index-aware operation would
>> be weird; I actually wonder if we can find a way to convert a
>> Gatherer, ?, R> into a Gatherer; the nested gatherer
>> itself will receive the enumerated values. I don't know if such a way to
>> compose Gatherer has been studied, but this would allow us to define all
>> indexed operations as gatherers instead of having to make new
>> BiConsumers for indices. This would be interesting to look into. So the
>> ideal form would be:
>>
>> BiPredicate filter = ...
>> Gatherer, ?, R> filterGatherer = Gatherer.of((_, indexed,
>> sink) -> return (!filter.test(indexed.index(), indexed.value())) ||
>> sink.push(indexed.value()));
>> Gatherer wrapped = IndexedGatherer.wrap(filterGatherer); // a
>> factory method performing wrapping
>> return stream.gather(wrapped) // and more
>>
>> On Sat, Apr 20, 2024 at 6:47 PM ІП-24 Олександр Ротань <
>> rotan.olexa...@gmail.com> wrote:
>>
>>> I would like to correct myself a bit: indexes should be assigned at the
>>> moment an element is received from upstream, not when it is passed to
>>> processing. Not sure if it will effectively change order of execution in
>>> actual implementation, but to put it this way would be more precise
>>>
>>> вс, 21 апр. 2024 г. в 02:33, ІП-24 Олександр Ротань <
>>> rotan.olexa...@gmail.com>:
>>>
 Hello again.

 I have imagined implementation of enumerated streams in a slightly
 different way. I think enumerated streams should be a separate kind of
 stream like parallel, sequential or some primitive stream. This way
 enumeration could be deferred to the time that index is being consumed,
 This would eliminate issue with stream merging and also kind of smoothen
 the issue with parallel streams, as the indexed will follow the order
 elements have been passed to processing.

 This could potentially lead to another issue with consistency of
 indexes through stream pipeline, but there could be a workaround like
 IdentityHashMaps that is received from previous enumerated streams.

 This way we could make streams more efficient and also introduce a more
 user-friendly API, which is essentially a win-win situation.

 Regarding concatenation specifically, we could either introduce
 separate methods for merging enumerated streams, where we define how
 conflicts are resolved, or just treat them as usual streams, so the result
 of concatenation of enumerated streams will result in a normal stream that
 should be enumerated once again. Also i don't really see how gathering
 elements into index-value pairs solves issue with conflicting indexes

 вс, 21 апр. 2024 г. в 02:06, - :

> Hi Oleksandr,
> I fear that enumeration might not be applicable to all streams,
> especially parallel ones. If we have a parallel stream, we might not 
> always
> process all elements in order, and even index generation can be 
> unreliable.
> In addition, stream merging will become a headache. I think
> Gatherers.scan(() -> new Indexed<>(0, dummy), (indexed, value) -> new
> Indexed<>(indexed.index() + 1, value)) can create a Stream>
> which should serve your purpose.
>
> And for expanded operations for enumerated streams, there are similar
> features for primitive streams already, where they have extra methods like
> summaryStatistics() compared to object streams. We most likely will have
> Gatherers that explicitly work on Stream> to support
> type-specific operations, partly due to Java's type limits and that Stream
> hierarchy is effectively closed.
>
> Best,
> Chen Liang
>
> On Sat, Apr 20, 2024 at 4:07 PM ІП-24 Олександр Ротань <
> rotan.olexa...@gmail.com> wrote:
>
>> My proposal regarding findIndex brought up a topic that, as I see,
>> has been brought up here multiple times.
>>
>> My idea is to extend th

Re: EnumeratedStream

2024-04-20 Thread ІП-24 Олександр Ротань
I am sorry, but I feel like I am missing the point of your response and how
is it related to what I have said.

Regarding wrapping and unwrapping indexed pairs, one of advantages of the
approach I have suggested is that EnumeratedStream is still a stream and
all index-unaware ops could still be applied to it

On Sun, Apr 21, 2024, 03:08 -  wrote:

> We must convert index-processing operations to a `gather(Gatherers.scan(/*
> index gathering */))` immediate before the operation that uses the index,
> and immediately unwrap the indices afterwards.
>
> Syntactically writing such 3 lines for every index-aware operation would
> be weird; I actually wonder if we can find a way to convert a
> Gatherer, ?, R> into a Gatherer; the nested gatherer
> itself will receive the enumerated values. I don't know if such a way to
> compose Gatherer has been studied, but this would allow us to define all
> indexed operations as gatherers instead of having to make new
> BiConsumers for indices. This would be interesting to look into. So the
> ideal form would be:
>
> BiPredicate filter = ...
> Gatherer, ?, R> filterGatherer = Gatherer.of((_, indexed, sink)
> -> return (!filter.test(indexed.index(), indexed.value())) ||
> sink.push(indexed.value()));
> Gatherer wrapped = IndexedGatherer.wrap(filterGatherer); // a
> factory method performing wrapping
> return stream.gather(wrapped) // and more
>
> On Sat, Apr 20, 2024 at 6:47 PM ІП-24 Олександр Ротань <
> rotan.olexa...@gmail.com> wrote:
>
>> I would like to correct myself a bit: indexes should be assigned at the
>> moment an element is received from upstream, not when it is passed to
>> processing. Not sure if it will effectively change order of execution in
>> actual implementation, but to put it this way would be more precise
>>
>> вс, 21 апр. 2024 г. в 02:33, ІП-24 Олександр Ротань <
>> rotan.olexa...@gmail.com>:
>>
>>> Hello again.
>>>
>>> I have imagined implementation of enumerated streams in a slightly
>>> different way. I think enumerated streams should be a separate kind of
>>> stream like parallel, sequential or some primitive stream. This way
>>> enumeration could be deferred to the time that index is being consumed,
>>> This would eliminate issue with stream merging and also kind of smoothen
>>> the issue with parallel streams, as the indexed will follow the order
>>> elements have been passed to processing.
>>>
>>> This could potentially lead to another issue with consistency of indexes
>>> through stream pipeline, but there could be a workaround like
>>> IdentityHashMaps that is received from previous enumerated streams.
>>>
>>> This way we could make streams more efficient and also introduce a more
>>> user-friendly API, which is essentially a win-win situation.
>>>
>>> Regarding concatenation specifically, we could either introduce separate
>>> methods for merging enumerated streams, where we define how conflicts are
>>> resolved, or just treat them as usual streams, so the result of
>>> concatenation of enumerated streams will result in a normal stream that
>>> should be enumerated once again. Also i don't really see how gathering
>>> elements into index-value pairs solves issue with conflicting indexes
>>>
>>> вс, 21 апр. 2024 г. в 02:06, - :
>>>
 Hi Oleksandr,
 I fear that enumeration might not be applicable to all streams,
 especially parallel ones. If we have a parallel stream, we might not always
 process all elements in order, and even index generation can be unreliable.
 In addition, stream merging will become a headache. I think
 Gatherers.scan(() -> new Indexed<>(0, dummy), (indexed, value) -> new
 Indexed<>(indexed.index() + 1, value)) can create a Stream>
 which should serve your purpose.

 And for expanded operations for enumerated streams, there are similar
 features for primitive streams already, where they have extra methods like
 summaryStatistics() compared to object streams. We most likely will have
 Gatherers that explicitly work on Stream> to support
 type-specific operations, partly due to Java's type limits and that Stream
 hierarchy is effectively closed.

 Best,
 Chen Liang

 On Sat, Apr 20, 2024 at 4:07 PM ІП-24 Олександр Ротань <
 rotan.olexa...@gmail.com> wrote:

> My proposal regarding findIndex brought up a topic that, as I see, has
> been brought up here multiple times.
>
> My idea is to extend the existing stream interface in the new
> EnumeratedStream interface. Such an approach has a number of advantages.
>
> 1. Some methods will be able to explicitly specify that they require
> an enumerated stream.
>
> 2. This would allow us to use an enumerated stream just as the default
> value stream, if the index is redundant at a certain processing step.
>
> 3. This could help introduce a more user-friendly interface: instead
> of passing pairs of index and value, they could be passed as separate

Re: EnumeratedStream

2024-04-20 Thread -
We must convert index-processing operations to a `gather(Gatherers.scan(/*
index gathering */))` immediate before the operation that uses the index,
and immediately unwrap the indices afterwards.

Syntactically writing such 3 lines for every index-aware operation would be
weird; I actually wonder if we can find a way to convert a
Gatherer, ?, R> into a Gatherer; the nested gatherer
itself will receive the enumerated values. I don't know if such a way to
compose Gatherer has been studied, but this would allow us to define all
indexed operations as gatherers instead of having to make new
BiConsumers for indices. This would be interesting to look into. So the
ideal form would be:

BiPredicate filter = ...
Gatherer, ?, R> filterGatherer = Gatherer.of((_, indexed, sink)
-> return (!filter.test(indexed.index(), indexed.value())) ||
sink.push(indexed.value()));
Gatherer wrapped = IndexedGatherer.wrap(filterGatherer); // a
factory method performing wrapping
return stream.gather(wrapped) // and more

On Sat, Apr 20, 2024 at 6:47 PM ІП-24 Олександр Ротань <
rotan.olexa...@gmail.com> wrote:

> I would like to correct myself a bit: indexes should be assigned at the
> moment an element is received from upstream, not when it is passed to
> processing. Not sure if it will effectively change order of execution in
> actual implementation, but to put it this way would be more precise
>
> вс, 21 апр. 2024 г. в 02:33, ІП-24 Олександр Ротань <
> rotan.olexa...@gmail.com>:
>
>> Hello again.
>>
>> I have imagined implementation of enumerated streams in a slightly
>> different way. I think enumerated streams should be a separate kind of
>> stream like parallel, sequential or some primitive stream. This way
>> enumeration could be deferred to the time that index is being consumed,
>> This would eliminate issue with stream merging and also kind of smoothen
>> the issue with parallel streams, as the indexed will follow the order
>> elements have been passed to processing.
>>
>> This could potentially lead to another issue with consistency of indexes
>> through stream pipeline, but there could be a workaround like
>> IdentityHashMaps that is received from previous enumerated streams.
>>
>> This way we could make streams more efficient and also introduce a more
>> user-friendly API, which is essentially a win-win situation.
>>
>> Regarding concatenation specifically, we could either introduce separate
>> methods for merging enumerated streams, where we define how conflicts are
>> resolved, or just treat them as usual streams, so the result of
>> concatenation of enumerated streams will result in a normal stream that
>> should be enumerated once again. Also i don't really see how gathering
>> elements into index-value pairs solves issue with conflicting indexes
>>
>> вс, 21 апр. 2024 г. в 02:06, - :
>>
>>> Hi Oleksandr,
>>> I fear that enumeration might not be applicable to all streams,
>>> especially parallel ones. If we have a parallel stream, we might not always
>>> process all elements in order, and even index generation can be unreliable.
>>> In addition, stream merging will become a headache. I think
>>> Gatherers.scan(() -> new Indexed<>(0, dummy), (indexed, value) -> new
>>> Indexed<>(indexed.index() + 1, value)) can create a Stream>
>>> which should serve your purpose.
>>>
>>> And for expanded operations for enumerated streams, there are similar
>>> features for primitive streams already, where they have extra methods like
>>> summaryStatistics() compared to object streams. We most likely will have
>>> Gatherers that explicitly work on Stream> to support
>>> type-specific operations, partly due to Java's type limits and that Stream
>>> hierarchy is effectively closed.
>>>
>>> Best,
>>> Chen Liang
>>>
>>> On Sat, Apr 20, 2024 at 4:07 PM ІП-24 Олександр Ротань <
>>> rotan.olexa...@gmail.com> wrote:
>>>
 My proposal regarding findIndex brought up a topic that, as I see, has
 been brought up here multiple times.

 My idea is to extend the existing stream interface in the new
 EnumeratedStream interface. Such an approach has a number of advantages.

 1. Some methods will be able to explicitly specify that they require an
 enumerated stream.

 2. This would allow us to use an enumerated stream just as the default
 value stream, if the index is redundant at a certain processing step.

 3. This could help introduce a more user-friendly interface: instead of
 passing pairs of index and value, they could be passed as separate
 variables, which will make code more concise.

 Consider following example:
 List.of(1, 2, 3).stream()
 .enumerated()
 .map(idx, value -> idx % 2 == 0 ? value : -value);

 looks much better than
 List.of(1, 2, 3).stream()
 .enumerated()
 .map(pair -> pair.idx % 2 == 0 ? pair.value :
 -pair.value);

 However, I see some caveats with this appr

Re: EnumeratedStream

2024-04-20 Thread ІП-24 Олександр Ротань
I would like to correct myself a bit: indexes should be assigned at the
moment an element is received from upstream, not when it is passed to
processing. Not sure if it will effectively change order of execution in
actual implementation, but to put it this way would be more precise

вс, 21 апр. 2024 г. в 02:33, ІП-24 Олександр Ротань <
rotan.olexa...@gmail.com>:

> Hello again.
>
> I have imagined implementation of enumerated streams in a slightly
> different way. I think enumerated streams should be a separate kind of
> stream like parallel, sequential or some primitive stream. This way
> enumeration could be deferred to the time that index is being consumed,
> This would eliminate issue with stream merging and also kind of smoothen
> the issue with parallel streams, as the indexed will follow the order
> elements have been passed to processing.
>
> This could potentially lead to another issue with consistency of indexes
> through stream pipeline, but there could be a workaround like
> IdentityHashMaps that is received from previous enumerated streams.
>
> This way we could make streams more efficient and also introduce a more
> user-friendly API, which is essentially a win-win situation.
>
> Regarding concatenation specifically, we could either introduce separate
> methods for merging enumerated streams, where we define how conflicts are
> resolved, or just treat them as usual streams, so the result of
> concatenation of enumerated streams will result in a normal stream that
> should be enumerated once again. Also i don't really see how gathering
> elements into index-value pairs solves issue with conflicting indexes
>
> вс, 21 апр. 2024 г. в 02:06, - :
>
>> Hi Oleksandr,
>> I fear that enumeration might not be applicable to all streams,
>> especially parallel ones. If we have a parallel stream, we might not always
>> process all elements in order, and even index generation can be unreliable.
>> In addition, stream merging will become a headache. I think
>> Gatherers.scan(() -> new Indexed<>(0, dummy), (indexed, value) -> new
>> Indexed<>(indexed.index() + 1, value)) can create a Stream>
>> which should serve your purpose.
>>
>> And for expanded operations for enumerated streams, there are similar
>> features for primitive streams already, where they have extra methods like
>> summaryStatistics() compared to object streams. We most likely will have
>> Gatherers that explicitly work on Stream> to support
>> type-specific operations, partly due to Java's type limits and that Stream
>> hierarchy is effectively closed.
>>
>> Best,
>> Chen Liang
>>
>> On Sat, Apr 20, 2024 at 4:07 PM ІП-24 Олександр Ротань <
>> rotan.olexa...@gmail.com> wrote:
>>
>>> My proposal regarding findIndex brought up a topic that, as I see, has
>>> been brought up here multiple times.
>>>
>>> My idea is to extend the existing stream interface in the new
>>> EnumeratedStream interface. Such an approach has a number of advantages.
>>>
>>> 1. Some methods will be able to explicitly specify that they require an
>>> enumerated stream.
>>>
>>> 2. This would allow us to use an enumerated stream just as the default
>>> value stream, if the index is redundant at a certain processing step.
>>>
>>> 3. This could help introduce a more user-friendly interface: instead of
>>> passing pairs of index and value, they could be passed as separate
>>> variables, which will make code more concise.
>>>
>>> Consider following example:
>>> List.of(1, 2, 3).stream()
>>> .enumerated()
>>> .map(idx, value -> idx % 2 == 0 ? value : -value);
>>>
>>> looks much better than
>>> List.of(1, 2, 3).stream()
>>> .enumerated()
>>> .map(pair -> pair.idx % 2 == 0 ? pair.value :
>>> -pair.value);
>>>
>>> However, I see some caveats with this approach, which relate to parallel
>>> streams:
>>> when a stream is backed by a collection, you might expect assigned
>>> indexes to represent order of iteration through collection, especially when
>>> talking about sequential collections. Assigning indexes in such streams
>>> could be a difficult task in a parallel environment. It should either
>>> assign index to a stream elements at the moment when stream becomes
>>> parallelized, which is also impossible if already parallelized stream is
>>> being enumerated, and also wont work for infinite streams, or make
>>> enumerated stream at least partially sequential by passing elements to
>>> processing in order they were passed to stream at the first place, which is
>>> also hard or maybe even impossible to achieve.
>>>
>>> Also, side note: methods that already accept 2 parameters might become
>>> kinda verbose, like reduce that will have to accept for params (idx1, val1,
>>> idx2, val2), but i think that is not a big deal
>>>
>>>
>>>


Re: EnumeratedStream

2024-04-20 Thread ІП-24 Олександр Ротань
Hello again.

I have imagined implementation of enumerated streams in a slightly
different way. I think enumerated streams should be a separate kind of
stream like parallel, sequential or some primitive stream. This way
enumeration could be deferred to the time that index is being consumed,
This would eliminate issue with stream merging and also kind of smoothen
the issue with parallel streams, as the indexed will follow the order
elements have been passed to processing.

This could potentially lead to another issue with consistency of indexes
through stream pipeline, but there could be a workaround like
IdentityHashMaps that is received from previous enumerated streams.

This way we could make streams more efficient and also introduce a more
user-friendly API, which is essentially a win-win situation.

Regarding concatenation specifically, we could either introduce separate
methods for merging enumerated streams, where we define how conflicts are
resolved, or just treat them as usual streams, so the result of
concatenation of enumerated streams will result in a normal stream that
should be enumerated once again. Also i don't really see how gathering
elements into index-value pairs solves issue with conflicting indexes

вс, 21 апр. 2024 г. в 02:06, - :

> Hi Oleksandr,
> I fear that enumeration might not be applicable to all streams, especially
> parallel ones. If we have a parallel stream, we might not always process
> all elements in order, and even index generation can be unreliable. In
> addition, stream merging will become a headache. I think Gatherers.scan(()
> -> new Indexed<>(0, dummy), (indexed, value) -> new
> Indexed<>(indexed.index() + 1, value)) can create a Stream>
> which should serve your purpose.
>
> And for expanded operations for enumerated streams, there are similar
> features for primitive streams already, where they have extra methods like
> summaryStatistics() compared to object streams. We most likely will have
> Gatherers that explicitly work on Stream> to support
> type-specific operations, partly due to Java's type limits and that Stream
> hierarchy is effectively closed.
>
> Best,
> Chen Liang
>
> On Sat, Apr 20, 2024 at 4:07 PM ІП-24 Олександр Ротань <
> rotan.olexa...@gmail.com> wrote:
>
>> My proposal regarding findIndex brought up a topic that, as I see, has
>> been brought up here multiple times.
>>
>> My idea is to extend the existing stream interface in the new
>> EnumeratedStream interface. Such an approach has a number of advantages.
>>
>> 1. Some methods will be able to explicitly specify that they require an
>> enumerated stream.
>>
>> 2. This would allow us to use an enumerated stream just as the default
>> value stream, if the index is redundant at a certain processing step.
>>
>> 3. This could help introduce a more user-friendly interface: instead of
>> passing pairs of index and value, they could be passed as separate
>> variables, which will make code more concise.
>>
>> Consider following example:
>> List.of(1, 2, 3).stream()
>> .enumerated()
>> .map(idx, value -> idx % 2 == 0 ? value : -value);
>>
>> looks much better than
>> List.of(1, 2, 3).stream()
>> .enumerated()
>> .map(pair -> pair.idx % 2 == 0 ? pair.value :
>> -pair.value);
>>
>> However, I see some caveats with this approach, which relate to parallel
>> streams:
>> when a stream is backed by a collection, you might expect assigned
>> indexes to represent order of iteration through collection, especially when
>> talking about sequential collections. Assigning indexes in such streams
>> could be a difficult task in a parallel environment. It should either
>> assign index to a stream elements at the moment when stream becomes
>> parallelized, which is also impossible if already parallelized stream is
>> being enumerated, and also wont work for infinite streams, or make
>> enumerated stream at least partially sequential by passing elements to
>> processing in order they were passed to stream at the first place, which is
>> also hard or maybe even impossible to achieve.
>>
>> Also, side note: methods that already accept 2 parameters might become
>> kinda verbose, like reduce that will have to accept for params (idx1, val1,
>> idx2, val2), but i think that is not a big deal
>>
>>
>>


Re: Bag (Multiset) Collection

2024-04-20 Thread ІП-24 Олександр Ротань
Hello!

The example you have provided is indeed a workaround. Such a setup indeed
covers most of the problems, nothing I could get from the top of my head
that can't be accomplished using this.

However, there are a few downsides:

1. Obvious performance and memory overhead. I think no elaboration here is
needed, maybe jsut point out that, from my experience, we are talking about
multiples of 10 when explaining gap in performance.

2. Extremely uncomfortable API. Most of the operations with multisets, at
the end of the day, converge to iteration over elements. Every data
processing pipeline requires additional flattening before processing in
most of the cases, or using nested loops. If, for example, you have a web
API that works with data processing and visualization, every IO operation
would require mapping from list or array to map back and forth. This could
(and will) introduce some errors and make code less maintainable.

Also, which is important for me, this approach is very frustrating. My goal
will always be developer experience improvement in the first place. I know
that is not a top-level priority for Java maintainers, but I will always
try my best to make the life of Java users more comfortable.

вс, 21 апр. 2024 г. в 01:57, - :

> Hi IP-24 Oleksandr Rotan',
> You can use a Map>. This map stores T as the object
> equivalence key, and each Collection in this map stores the values known
> to be equal to the key. Since map compares keys by equals, any key in the
> collection can be used to find the collection in the Map>.
>
> If you want a Tree/Navigable multiset, you can just make it a
> NavigableMap>. If you use this concurrently, you can
> replace the value Collection with something like CopyOnWriteArrayList.
>
> Is there any Multiset functionality not covered by such a Map Collection> setup?
>
> P.S. For the other versions of tree collections, I think JDK uses Red
> Black tree due to its superior performance when there's frequent removal.
> AVL is known for faster additions but involves a lot more balance
> operations during removal. If we are looking at immutable navigable/sorted
> collections, we can use an array and perform binary search, which works
> just like a tree.
>
> Regards,
> Chen Liang
>
>
> On Sat, Apr 20, 2024 at 3:25 PM ІП-24 Олександр Ротань <
> rotan.olexa...@gmail.com> wrote:
>
>> In this letter I would like to express some of my thoughts regarding the
>> potential Multiset interface.
>>
>> I, personally, have encountered a few situations where such an interface
>> could come in handy, mostly when I needed an ordered collection that
>> permits duplicates. That time I used guava`s TreeMultiset, but I think Java
>> itself should have such a collection in its std library. While it is not a
>> very common problem, there are still a bunch of use cases where such things
>> could come in handy.
>>
>> I am willing to take on development of such thing, but there are a few
>> concerns about Multiset:
>>
>> 1. Is there any other use for this besides ordered collection that
>> permits duplicates? I can't remember anything else from the top of my head.
>>
>> 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet
>> class hierarchy, while not being directly connected. I think introducing
>> any other ordered collection will require some refactoring of existing
>> collection interfaces (for example extract SortedCollection from SortedSet,
>> Navigable Collection from NavigableSet etc.). From the perspective of clean
>> code, this would be the right decision, but I feel like this will be a very
>> complex task to accomplish.
>>
>> 3. Maybe there should be few versions of Tree collection (for example,
>> for regular tasks red-black tree and B-Tree for large amounts of data). I
>> have some expirience implementing both, but is it really needed in standard
>> library?
>>
>


Re: EnumeratedStream

2024-04-20 Thread -
Hi Oleksandr,
I fear that enumeration might not be applicable to all streams, especially
parallel ones. If we have a parallel stream, we might not always process
all elements in order, and even index generation can be unreliable. In
addition, stream merging will become a headache. I think Gatherers.scan(()
-> new Indexed<>(0, dummy), (indexed, value) -> new
Indexed<>(indexed.index() + 1, value)) can create a Stream>
which should serve your purpose.

And for expanded operations for enumerated streams, there are similar
features for primitive streams already, where they have extra methods like
summaryStatistics() compared to object streams. We most likely will have
Gatherers that explicitly work on Stream> to support
type-specific operations, partly due to Java's type limits and that Stream
hierarchy is effectively closed.

Best,
Chen Liang

On Sat, Apr 20, 2024 at 4:07 PM ІП-24 Олександр Ротань <
rotan.olexa...@gmail.com> wrote:

> My proposal regarding findIndex brought up a topic that, as I see, has
> been brought up here multiple times.
>
> My idea is to extend the existing stream interface in the new
> EnumeratedStream interface. Such an approach has a number of advantages.
>
> 1. Some methods will be able to explicitly specify that they require an
> enumerated stream.
>
> 2. This would allow us to use an enumerated stream just as the default
> value stream, if the index is redundant at a certain processing step.
>
> 3. This could help introduce a more user-friendly interface: instead of
> passing pairs of index and value, they could be passed as separate
> variables, which will make code more concise.
>
> Consider following example:
> List.of(1, 2, 3).stream()
> .enumerated()
> .map(idx, value -> idx % 2 == 0 ? value : -value);
>
> looks much better than
> List.of(1, 2, 3).stream()
> .enumerated()
> .map(pair -> pair.idx % 2 == 0 ? pair.value : -pair.value);
>
> However, I see some caveats with this approach, which relate to parallel
> streams:
> when a stream is backed by a collection, you might expect assigned indexes
> to represent order of iteration through collection, especially when talking
> about sequential collections. Assigning indexes in such streams could be a
> difficult task in a parallel environment. It should either assign index to
> a stream elements at the moment when stream becomes parallelized, which is
> also impossible if already parallelized stream is being enumerated, and
> also wont work for infinite streams, or make enumerated stream at least
> partially sequential by passing elements to processing in order they were
> passed to stream at the first place, which is also hard or maybe even
> impossible to achieve.
>
> Also, side note: methods that already accept 2 parameters might become
> kinda verbose, like reduce that will have to accept for params (idx1, val1,
> idx2, val2), but i think that is not a big deal
>
>
>


Re: Bag (Multiset) Collection

2024-04-20 Thread -
Hi IP-24 Oleksandr Rotan',
You can use a Map>. This map stores T as the object
equivalence key, and each Collection in this map stores the values known
to be equal to the key. Since map compares keys by equals, any key in the
collection can be used to find the collection in the Map>.

If you want a Tree/Navigable multiset, you can just make it a
NavigableMap>. If you use this concurrently, you can
replace the value Collection with something like CopyOnWriteArrayList.

Is there any Multiset functionality not covered by such a Map> setup?

P.S. For the other versions of tree collections, I think JDK uses Red Black
tree due to its superior performance when there's frequent removal. AVL is
known for faster additions but involves a lot more balance operations
during removal. If we are looking at immutable navigable/sorted
collections, we can use an array and perform binary search, which works
just like a tree.

Regards,
Chen Liang


On Sat, Apr 20, 2024 at 3:25 PM ІП-24 Олександр Ротань <
rotan.olexa...@gmail.com> wrote:

> In this letter I would like to express some of my thoughts regarding the
> potential Multiset interface.
>
> I, personally, have encountered a few situations where such an interface
> could come in handy, mostly when I needed an ordered collection that
> permits duplicates. That time I used guava`s TreeMultiset, but I think Java
> itself should have such a collection in its std library. While it is not a
> very common problem, there are still a bunch of use cases where such things
> could come in handy.
>
> I am willing to take on development of such thing, but there are a few
> concerns about Multiset:
>
> 1. Is there any other use for this besides ordered collection that permits
> duplicates? I can't remember anything else from the top of my head.
>
> 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet class
> hierarchy, while not being directly connected. I think introducing any
> other ordered collection will require some refactoring of existing
> collection interfaces (for example extract SortedCollection from SortedSet,
> Navigable Collection from NavigableSet etc.). From the perspective of clean
> code, this would be the right decision, but I feel like this will be a very
> complex task to accomplish.
>
> 3. Maybe there should be few versions of Tree collection (for example, for
> regular tasks red-black tree and B-Tree for large amounts of data). I have
> some expirience implementing both, but is it really needed in standard
> library?
>


Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v25]

2024-04-20 Thread Scott Gibbons
On Sat, 20 Apr 2024 19:09:43 GMT, Scott Gibbons  wrote:

>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64.  See 
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around 
>> this change.
>> 
>> Overall, making this an intrinsic improves overall performance of 
>> `Unsafe::setMemory` by up to 4x for all buffer sizes.
>> 
>> Tested with tier-1 (and full CI).  I've added a table of the before and 
>> after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`).
>> 
>> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt)
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Fix UnsafeCopyMemoryMark scope issue

Merge done.

-

PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2067803696


Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v26]

2024-04-20 Thread Scott Gibbons
> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64.  See 
> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around 
> this change.
> 
> Overall, making this an intrinsic improves overall performance of 
> `Unsafe::setMemory` by up to 4x for all buffer sizes.
> 
> Tested with tier-1 (and full CI).  I've added a table of the before and after 
> numbers for the JMH I ran (`MemorySegmentZeroUnsafe`).
> 
> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt)

Scott Gibbons has updated the pull request with a new target base due to a 
merge or a rebase. The pull request now contains 37 commits:

 - Merge branch 'openjdk:master' into setMemory
 - Fix UnsafeCopyMemoryMark scope issue
 - Long to short jmp; other cleanup
 - Review comments
 - Address review comments; update copyright years
 - Add enter() and leave(); remove Windows-specific register stuff
 - Fix memory mark after sync to upstream
 - Merge branch 'openjdk:master' into setMemory
 - Set memory test (#23)
   
   * Even more review comments
   
   * Re-write of atomic copy loops
   
   * Change name of UnsafeCopyMemory{,Mark} to UnsafeMemory{Access,Mark}
   
   * Only add a memory mark for byte unaligned fill
   
   * Remove MUSL_LIBC ifdef
   
   * Remove MUSL_LIBC ifdef
 - Set memory test (#22)
   
   * Even more review comments
   
   * Re-write of atomic copy loops
   
   * Change name of UnsafeCopyMemory{,Mark} to UnsafeMemory{Access,Mark}
   
   * Only add a memory mark for byte unaligned fill
 - ... and 27 more: https://git.openjdk.org/jdk/compare/6d569961...1122b500

-

Changes: https://git.openjdk.org/jdk/pull/18555/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18555&range=25
  Stats: 507 lines in 36 files changed: 420 ins; 5 del; 82 mod
  Patch: https://git.openjdk.org/jdk/pull/18555.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18555/head:pull/18555

PR: https://git.openjdk.org/jdk/pull/18555


Re: Bag (Multiset) Collection

2024-04-20 Thread ІП-24 Олександр Ротань
Also, after some research, I found that HashMultisets also have some area
of application in image detection and event simulations

вс, 21 апр. 2024 г. в 01:19, ІП-24 Олександр Ротань <
rotan.olexa...@gmail.com>:

> Sorry for duplicate, I accidentally sent last message only to you
>
> вс, 21 апр. 2024 г. в 01:19, ІП-24 Олександр Ротань <
> rotan.olexa...@gmail.com>:
>
>> Firstly, about queues. As far as I know, priority queue is implemented as
>> heap, not tree, so things like subQueue doesn't make much sense in this
>> context.
>>
>> About popularity: this is indeed not very popular, might even be
>> negligible outside of sorted multiset (lets adopt java naming conventions
>> for those from now on). Although, I could provide at least few examples:
>>
>> 1. Batch tasks processing, especially in concurrent environments, when
>> tasks could have different priorities. Sometimes tasks that are stored in a
>> queue are processed in batches. Using today's stdlib of Java,
>> acquiring such a batch of tasks from collection would be a O(n * log n)
>> complexity, because each heapification is log n and called after each task
>> is popped. Using TreeMultiset, this could be achieved in log (n). This also
>> applies to messaging.
>>
>> 2. Implementing custom data structures. Not a task most of commercial
>> developers encounter, although for specific types of tasks, like big data
>> processing, custom datastructs could be a thing. Some collections,
>> expeciall ordered, like B-tree, could require sorted multiset as part of
>> internal implementation.
>>
>> 3. Again, for sorted multisets, counting frequency is a common task for
>> commercial developers. Counting element frequencies using multisets is the
>> thing that simply couldn't be achieved by set and is inefficient in list.
>> This could have various use cases, from data displaying in histograms in
>> generated reports to data validation is stateful systems (checking if user
>> has not exceeded permitted amount of attempts, for example).
>>
>> 4. Storage for the result of db queries. Another common task is to fetch
>> some ordered data from the database, and then modify it to preserve order.
>> This category of tasks could be optimized using ordered multisets.
>>
>> 5. Concurent bag, as you just mentioned. This could be used in concurrent
>> environment as I mentioned above, or just as less restricting
>> concurrent collection as preserving sequence of elements could be a complex
>> and resource consuming task in mutable concurrent collections
>>
>> As you might see, almost all of those indeed converge to a sorted
>> collection of duplicating elements. I am not sure If there is much more to
>> it, but what i would like to say is that feature like this would definitely
>> not be redundant for "enterprise" language like Java, where optimizing some
>> operations from O(n) or even O(n * log n) to O (log n) could introduce
>> significant performance improvement, as well as reducing amount of possible
>> bugs by taking on some of responsibilities related to preserving order of
>> elements sorted.
>>
>> вс, 21 апр. 2024 г. в 00:33, Holo The Sage Wolf :
>>
>> вс, 21 апр. 2024 г. в 01:06, ІП-24 Олександр Ротань <
>> rotan.olexa...@gmail.com>:
>>
>>> Firstly, about queues. As far as I know, priority queue is implemented
>>> as heap, not tree, so things like subQueue doesn't make much sense in this
>>> context.
>>>
>>> About popularity: this is indeed not very popular, might even be
>>> negligible outside of sorted multiset (lets adopt java naming conventions
>>> for those from now on). Although, I could provide at least few examples:
>>>
>>> 1. Batch tasks processing, especially in concurrent environments, when
>>> tasks could have different priorities. Sometimes tasks that are stored in a
>>> queue are processed in batches. Using today's stdlib of Java,
>>> acquiring such a batch of tasks from collection would be a O(n * log n)
>>> complexity, because each heapification is log n and called after each task
>>> is popped. Using TreeMultiset, this could be achieved in log (n). This also
>>> applies to messaging.
>>>
>>> 2. Implementing custom data structures. Not a task most of commercial
>>> developers encounter, although for specific types of tasks, like big data
>>> processing, custom datastructs could be a thing. Some collections,
>>> expeciall ordered, like B-tree, could require sorted multiset as part of
>>> internal implementation.
>>>
>>> 3. Again, for sorted multisets, counting frequency is a common task for
>>> commercial developers. Counting element frequencies using multisets is the
>>> thing that simply couldn't be achieved by set and is inefficient in list.
>>> This could have various use cases, from data displaying in histograms in
>>> generated reports to data validation is stateful systems (checking if user
>>> has not exceeded permitted amount of attempts, for example).
>>>
>>> 4. Storage for the result of db queries. Another common task is to fetch
>>> some 

Re: Bag (Multiset) Collection

2024-04-20 Thread ІП-24 Олександр Ротань
Firstly, about queues. As far as I know, priority queue is implemented as
heap, not tree, so things like subQueue doesn't make much sense in this
context.

About popularity: this is indeed not very popular, might even be negligible
outside of sorted multiset (lets adopt java naming conventions for those
from now on). Although, I could provide at least few examples:

1. Batch tasks processing, especially in concurrent environments, when
tasks could have different priorities. Sometimes tasks that are stored in a
queue are processed in batches. Using today's stdlib of Java,
acquiring such a batch of tasks from collection would be a O(n * log n)
complexity, because each heapification is log n and called after each task
is popped. Using TreeMultiset, this could be achieved in log (n). This also
applies to messaging.

2. Implementing custom data structures. Not a task most of commercial
developers encounter, although for specific types of tasks, like big data
processing, custom datastructs could be a thing. Some collections,
expeciall ordered, like B-tree, could require sorted multiset as part of
internal implementation.

3. Again, for sorted multisets, counting frequency is a common task for
commercial developers. Counting element frequencies using multisets is the
thing that simply couldn't be achieved by set and is inefficient in list.
This could have various use cases, from data displaying in histograms in
generated reports to data validation is stateful systems (checking if user
has not exceeded permitted amount of attempts, for example).

4. Storage for the result of db queries. Another common task is to fetch
some ordered data from the database, and then modify it to preserve order.
This category of tasks could be optimized using ordered multisets.

5. Concurent bag, as you just mentioned. This could be used in concurrent
environment as I mentioned above, or just as less restricting
concurrent collection as preserving sequence of elements could be a complex
and resource consuming task in mutable concurrent collections

As you might see, almost all of those indeed converge to a sorted
collection of duplicating elements. I am not sure If there is much more to
it, but what i would like to say is that feature like this would definitely
not be redundant for "enterprise" language like Java, where optimizing some
operations from O(n) or even O(n * log n) to O (log n) could introduce
significant performance improvement, as well as reducing amount of possible
bugs by taking on some of responsibilities related to preserving order of
elements sorted.

вс, 21 апр. 2024 г. в 00:33, Holo The Sage Wolf :

вс, 21 апр. 2024 г. в 01:06, ІП-24 Олександр Ротань <
rotan.olexa...@gmail.com>:

> Firstly, about queues. As far as I know, priority queue is implemented as
> heap, not tree, so things like subQueue doesn't make much sense in this
> context.
>
> About popularity: this is indeed not very popular, might even be
> negligible outside of sorted multiset (lets adopt java naming conventions
> for those from now on). Although, I could provide at least few examples:
>
> 1. Batch tasks processing, especially in concurrent environments, when
> tasks could have different priorities. Sometimes tasks that are stored in a
> queue are processed in batches. Using today's stdlib of Java,
> acquiring such a batch of tasks from collection would be a O(n * log n)
> complexity, because each heapification is log n and called after each task
> is popped. Using TreeMultiset, this could be achieved in log (n). This also
> applies to messaging.
>
> 2. Implementing custom data structures. Not a task most of commercial
> developers encounter, although for specific types of tasks, like big data
> processing, custom datastructs could be a thing. Some collections,
> expeciall ordered, like B-tree, could require sorted multiset as part of
> internal implementation.
>
> 3. Again, for sorted multisets, counting frequency is a common task for
> commercial developers. Counting element frequencies using multisets is the
> thing that simply couldn't be achieved by set and is inefficient in list.
> This could have various use cases, from data displaying in histograms in
> generated reports to data validation is stateful systems (checking if user
> has not exceeded permitted amount of attempts, for example).
>
> 4. Storage for the result of db queries. Another common task is to fetch
> some ordered data from the database, and then modify it to preserve order.
> This category of tasks could be optimized using ordered multisets.
>
> 5. Concurent bag, as you just mentioned. This could be used in concurrent
> environment as I mentioned above, or just as less restricting
> concurrent collection as preserving sequence of elements could be a complex
> and resource consuming task in mutable concurrent collections
>
> As you might see, almost all of those indeed converge to a sorted
> collection of duplicating elements. I am not sure If there is much more to
> it, bu

Re: EnumeratedStream

2024-04-20 Thread ІП-24 Олександр Ротань
Yes, your point about enumeration is the best way to do it, i guess, I am
also voting for this. This makes the most sense considering the way that
method invocation chains should be handled

вс, 21 апр. 2024 г. в 00:55, David Alayachew :

> I am in full support of this idea. I do also appreciate the functionality
> of using a BiFunction on the map method instead of a normal
> Function, R>.
>
> As for the actual enumeration logic, my vote is that it should simply
> enumerate as it arrives, with no context or care given to what came before
> it. Consider the following examples.
>
> final List list = List.of(0, 1, 2, 3, 4, 5, 6, 7, 8, 9);
>
> System.out.println(list.stream().enumerated().filter((idx, val) -> val % 2
> == 0).toList());
> //0:0, 2:2, 4:4, 6:6, 8:8
>
> System.out.println(list.stream().filter(val -> val % 2 ==
> 0).enumerated().toList());
> //0:0, 1:2, 2:4, 3:6, 4:8
>
> Enumeration is applied as it arrives, and no sooner. That, I feel, will
> keep the implementation simplest, and allow the streaming logic to make the
> most sense. And I added helpful reinterpretations of filter() method (like
> you did with map()), but I don't think either is necessary.
>
> Of course, the big question looming over all of this is -- does this
> feature carry its weight? You and I vote yes, but let's see what the
> community says.
>
> What do you all think? Is this feature worth it -- with or without some
> changes?
>
> On Sat, Apr 20, 2024 at 5:07 PM ІП-24 Олександр Ротань <
> rotan.olexa...@gmail.com> wrote:
>
>> My proposal regarding findIndex brought up a topic that, as I see, has
>> been brought up here multiple times.
>>
>> My idea is to extend the existing stream interface in the new
>> EnumeratedStream interface. Such an approach has a number of advantages.
>>
>> 1. Some methods will be able to explicitly specify that they require an
>> enumerated stream.
>>
>> 2. This would allow us to use an enumerated stream just as the default
>> value stream, if the index is redundant at a certain processing step.
>>
>> 3. This could help introduce a more user-friendly interface: instead of
>> passing pairs of index and value, they could be passed as separate
>> variables, which will make code more concise.
>>
>> Consider following example:
>> List.of(1, 2, 3).stream()
>> .enumerated()
>> .map(idx, value -> idx % 2 == 0 ? value : -value);
>>
>> looks much better than
>> List.of(1, 2, 3).stream()
>> .enumerated()
>> .map(pair -> pair.idx % 2 == 0 ? pair.value :
>> -pair.value);
>>
>> However, I see some caveats with this approach, which relate to parallel
>> streams:
>> when a stream is backed by a collection, you might expect assigned
>> indexes to represent order of iteration through collection, especially when
>> talking about sequential collections. Assigning indexes in such streams
>> could be a difficult task in a parallel environment. It should either
>> assign index to a stream elements at the moment when stream becomes
>> parallelized, which is also impossible if already parallelized stream is
>> being enumerated, and also wont work for infinite streams, or make
>> enumerated stream at least partially sequential by passing elements to
>> processing in order they were passed to stream at the first place, which is
>> also hard or maybe even impossible to achieve.
>>
>> Also, side note: methods that already accept 2 parameters might become
>> kinda verbose, like reduce that will have to accept for params (idx1, val1,
>> idx2, val2), but i think that is not a big deal
>>
>>
>>


Re: Bag (Multiset) Collection

2024-04-20 Thread David Alayachew
Biggest use case is definitely when creating different histograms from the
same dataset. Our friends in sci-py land spend A LOT of time doing this.
Our R friends also use this frequently.

I can imagine this bag implementation would not just be good for
Collections, but would play well with the Collectors framework, as well as
Gatherers.

On Sat, Apr 20, 2024 at 5:57 PM Holo The Sage Wolf 
wrote:

> By "ordered collection" you mean "unordered collection"?
>
> How common is it to actually use a multiset in general? Of course there
> are use cases, and there are areas in which it is used a lot, but I don't
> believe it is common enough to be part of the std with the one exception of
> concurrent bag.
>
> On Sat, 20 Apr 2024, 23:25 ІП-24 Олександр Ротань, <
> rotan.olexa...@gmail.com> wrote:
>
>> In this letter I would like to express some of my thoughts regarding the
>> potential Multiset interface.
>>
>> I, personally, have encountered a few situations where such an interface
>> could come in handy, mostly when I needed an ordered collection that
>> permits duplicates. That time I used guava`s TreeMultiset, but I think Java
>> itself should have such a collection in its std library. While it is not a
>> very common problem, there are still a bunch of use cases where such things
>> could come in handy.
>>
>> I am willing to take on development of such thing, but there are a few
>> concerns about Multiset:
>>
>> 1. Is there any other use for this besides ordered collection that
>> permits duplicates? I can't remember anything else from the top of my head.
>>
>> 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet
>> class hierarchy, while not being directly connected. I think introducing
>> any other ordered collection will require some refactoring of existing
>> collection interfaces (for example extract SortedCollection from SortedSet,
>> Navigable Collection from NavigableSet etc.). From the perspective of clean
>> code, this would be the right decision, but I feel like this will be a very
>> complex task to accomplish.
>>
>> 3. Maybe there should be few versions of Tree collection (for example,
>> for regular tasks red-black tree and B-Tree for large amounts of data). I
>> have some expirience implementing both, but is it really needed in standard
>> library?
>>
>


Re: EnumeratedStream

2024-04-20 Thread David Alayachew
I am in full support of this idea. I do also appreciate the functionality
of using a BiFunction on the map method instead of a normal
Function, R>.

As for the actual enumeration logic, my vote is that it should simply
enumerate as it arrives, with no context or care given to what came before
it. Consider the following examples.

final List list = List.of(0, 1, 2, 3, 4, 5, 6, 7, 8, 9);

System.out.println(list.stream().enumerated().filter((idx, val) -> val % 2
== 0).toList());
//0:0, 2:2, 4:4, 6:6, 8:8

System.out.println(list.stream().filter(val -> val % 2 ==
0).enumerated().toList());
//0:0, 1:2, 2:4, 3:6, 4:8

Enumeration is applied as it arrives, and no sooner. That, I feel, will
keep the implementation simplest, and allow the streaming logic to make the
most sense. And I added helpful reinterpretations of filter() method (like
you did with map()), but I don't think either is necessary.

Of course, the big question looming over all of this is -- does this
feature carry its weight? You and I vote yes, but let's see what the
community says.

What do you all think? Is this feature worth it -- with or without some
changes?

On Sat, Apr 20, 2024 at 5:07 PM ІП-24 Олександр Ротань <
rotan.olexa...@gmail.com> wrote:

> My proposal regarding findIndex brought up a topic that, as I see, has
> been brought up here multiple times.
>
> My idea is to extend the existing stream interface in the new
> EnumeratedStream interface. Such an approach has a number of advantages.
>
> 1. Some methods will be able to explicitly specify that they require an
> enumerated stream.
>
> 2. This would allow us to use an enumerated stream just as the default
> value stream, if the index is redundant at a certain processing step.
>
> 3. This could help introduce a more user-friendly interface: instead of
> passing pairs of index and value, they could be passed as separate
> variables, which will make code more concise.
>
> Consider following example:
> List.of(1, 2, 3).stream()
> .enumerated()
> .map(idx, value -> idx % 2 == 0 ? value : -value);
>
> looks much better than
> List.of(1, 2, 3).stream()
> .enumerated()
> .map(pair -> pair.idx % 2 == 0 ? pair.value : -pair.value);
>
> However, I see some caveats with this approach, which relate to parallel
> streams:
> when a stream is backed by a collection, you might expect assigned indexes
> to represent order of iteration through collection, especially when talking
> about sequential collections. Assigning indexes in such streams could be a
> difficult task in a parallel environment. It should either assign index to
> a stream elements at the moment when stream becomes parallelized, which is
> also impossible if already parallelized stream is being enumerated, and
> also wont work for infinite streams, or make enumerated stream at least
> partially sequential by passing elements to processing in order they were
> passed to stream at the first place, which is also hard or maybe even
> impossible to achieve.
>
> Also, side note: methods that already accept 2 parameters might become
> kinda verbose, like reduce that will have to accept for params (idx1, val1,
> idx2, val2), but i think that is not a big deal
>
>
>


Re: Bag (Multiset) Collection

2024-04-20 Thread Holo The Sage Wolf
Let me prefix this by saying that I'm a mathematician, so maybe the lingo I
use is a bit different. A multi set has no structure apart from "counting
duplication", just like a set has no structure at all, ordered set **is not
a set** and ordered multi set **is not a multi set**.

With the clarification you gave I understand what you mean and retract my
first question (although I believe that "sortedX" or "priorityX" are the
naming Java has for what you called "orderedX").

> THe whole point of the bag is to store duplicate elements without
preserving order, which allows things like ordered collections.

No, the whole point of the bag is to store elements. The difference between
a bag and a list is that a bag has less, anything you can do with a bag you
can do with a list.

The benefit of having a bag is to have a more restrictive interface for the
times you don't need the extra structure, as well as the possible
performant implementation that doesn't need to respect a stricted contract.

CopyOnWriteArratList is more than just a bag.


> then that just adds one more use case to potential Bag interface

Indeed, I gave concurrent bag as an example of a possible implementation of
a bag interface, but you didn't answer my main question, just how common is
a use if a bag to change the std for it?

To me it sounds like you are interested only on the sorted/navigable
variations, so an alternative solution will be to improve PriorityQueue to
have a "subQueue" methods (and friends) as well as extending it to a
Navigable variation.




On Sun, 21 Apr 2024, 00:01 ІП-24 Олександр Ротань, 
wrote:

> Concurrent Bag is something like CopyOnWriteArrayList or Vector, don't we
> have it already? (I know vectors aren't optimized). THe whole point of the
> bag is to store duplicate elements without preserving order, which allows
> things like ordered collections. There could be mutable collections for
> concurrency optimized with read-write locks though, if that is what you are
> referring to as "concurrent bag", and I am missing the point why this
> can't be a part of list interface, then that just adds one more use case to
> potential Bag interface.
>
> PS: Ordered means it preserves order based on comparator, what you are
> referring to as ordered essentially means sequential, or am I missing a
> point?
>
> сб, 20 апр. 2024 г. в 23:52, Holo The Sage Wolf :
>
>> By "ordered collection" you mean "unordered collection"?
>>
>> How common is it to actually use a multiset in general? Of course there
>> are use cases, and there are areas in which it is used a lot, but I don't
>> believe it is common enough to be part of the std with the one exception of
>> concurrent bag.
>>
>> On Sat, 20 Apr 2024, 23:25 ІП-24 Олександр Ротань, <
>> rotan.olexa...@gmail.com> wrote:
>>
>>> In this letter I would like to express some of my thoughts regarding the
>>> potential Multiset interface.
>>>
>>> I, personally, have encountered a few situations where such an interface
>>> could come in handy, mostly when I needed an ordered collection that
>>> permits duplicates. That time I used guava`s TreeMultiset, but I think Java
>>> itself should have such a collection in its std library. While it is not a
>>> very common problem, there are still a bunch of use cases where such things
>>> could come in handy.
>>>
>>> I am willing to take on development of such thing, but there are a few
>>> concerns about Multiset:
>>>
>>> 1. Is there any other use for this besides ordered collection that
>>> permits duplicates? I can't remember anything else from the top of my head.
>>>
>>> 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet
>>> class hierarchy, while not being directly connected. I think introducing
>>> any other ordered collection will require some refactoring of existing
>>> collection interfaces (for example extract SortedCollection from SortedSet,
>>> Navigable Collection from NavigableSet etc.). From the perspective of clean
>>> code, this would be the right decision, but I feel like this will be a very
>>> complex task to accomplish.
>>>
>>> 3. Maybe there should be few versions of Tree collection (for example,
>>> for regular tasks red-black tree and B-Tree for large amounts of data). I
>>> have some expirience implementing both, but is it really needed in standard
>>> library?
>>>
>>


EnumeratedStream

2024-04-20 Thread ІП-24 Олександр Ротань
My proposal regarding findIndex brought up a topic that, as I see, has been
brought up here multiple times.

My idea is to extend the existing stream interface in the new
EnumeratedStream interface. Such an approach has a number of advantages.

1. Some methods will be able to explicitly specify that they require an
enumerated stream.

2. This would allow us to use an enumerated stream just as the default
value stream, if the index is redundant at a certain processing step.

3. This could help introduce a more user-friendly interface: instead of
passing pairs of index and value, they could be passed as separate
variables, which will make code more concise.

Consider following example:
List.of(1, 2, 3).stream()
.enumerated()
.map(idx, value -> idx % 2 == 0 ? value : -value);

looks much better than
List.of(1, 2, 3).stream()
.enumerated()
.map(pair -> pair.idx % 2 == 0 ? pair.value : -pair.value);

However, I see some caveats with this approach, which relate to parallel
streams:
when a stream is backed by a collection, you might expect assigned indexes
to represent order of iteration through collection, especially when talking
about sequential collections. Assigning indexes in such streams could be a
difficult task in a parallel environment. It should either assign index to
a stream elements at the moment when stream becomes parallelized, which is
also impossible if already parallelized stream is being enumerated, and
also wont work for infinite streams, or make enumerated stream at least
partially sequential by passing elements to processing in order they were
passed to stream at the first place, which is also hard or maybe even
impossible to achieve.

Also, side note: methods that already accept 2 parameters might become
kinda verbose, like reduce that will have to accept for params (idx1, val1,
idx2, val2), but i think that is not a big deal


Re: Bag (Multiset) Collection

2024-04-20 Thread David Alayachew
I won't stop you, but remember, you are not the one maintaining the large
number of collection implementations in the JDK. The ones who are may feel
differently. This FAQ gives a good hint of their opinion, even to your
retorts.

On Sat, Apr 20, 2024, 4:44 PM ІП-24 Олександр Ротань <
rotan.olexa...@gmail.com> wrote:

> I agree with the point made in the FAQ about the popularity of such
> problems. That said, I don't think that it is that unpopular to be ignored.
>
> Regarding Map.values(), this is the case, but, In my experience, one of
> the main advantages of using TreeMultiset was O(long n) modification
> complexity. Collection views returned by map, for obvious reasons, forbid
> modification. Also, correct me if I`m wrong, but while TreeMultiset is an
> unpopular requirement, it is at least as popular as TreeSet.
>
> Also, I may not have a full view of how such Bag collections can be
> implemented and used, however, I feel like there could be more to it.
>
> I am not insisting on anything, I just feel that if there is someone (like
> me lol) who is willing to take on full development and integration cycle,
> there aren't much reason to reject such enhancements.
>
> сб, 20 апр. 2024 г. в 23:43, ІП-24 Олександр Ротань <
> rotan.olexa...@gmail.com>:
>
>> I agree with the point made in the FAQ about the popularity of such
>> problems. That said, I don't think that it is that unpopular to be ignored.
>>
>> Regarding Map.values(), this is the case, but, In my experience, one of
>> the main advantages of using TreeMultiset was O(long n) modification
>> complexity. Collection views returned by map, for obvious reasons, forbid
>> modification. Also, correct me if I`m wrong, but while TreeMultiset is an
>> unpopular requirement, it is at least as popular as TreeSet.
>>
>> Also, I may not have a full view of how such Bag collections can be
>> implemented and used, however, I feel like there could be more to it.
>>
>> I am not insisting on anything, I just feel that if there is someone
>> (like me lol) who is willing to take on full development and integration
>> cycle, there aren't much reason to reject such enhancements.
>>
>> сб, 20 апр. 2024 г. в 23:31, David Alayachew :
>>
>>> Your Bag suggestion has been asked so frequently that there is an FAQ
>>> entry in the official Java Docs.
>>>
>>>
>>> https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/doc-files/coll-designfaq.html#a3
>>>
>>> On Sat, Apr 20, 2024 at 4:25 PM ІП-24 Олександр Ротань <
>>> rotan.olexa...@gmail.com> wrote:
>>>
 In this letter I would like to express some of my thoughts regarding
 the potential Multiset interface.

 I, personally, have encountered a few situations where such an
 interface could come in handy, mostly when I needed an ordered collection
 that permits duplicates. That time I used guava`s TreeMultiset, but I think
 Java itself should have such a collection in its std library. While it is
 not a very common problem, there are still a bunch of use cases where such
 things could come in handy.

 I am willing to take on development of such thing, but there are a few
 concerns about Multiset:

 1. Is there any other use for this besides ordered collection that
 permits duplicates? I can't remember anything else from the top of my head.

 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet
 class hierarchy, while not being directly connected. I think introducing
 any other ordered collection will require some refactoring of existing
 collection interfaces (for example extract SortedCollection from SortedSet,
 Navigable Collection from NavigableSet etc.). From the perspective of clean
 code, this would be the right decision, but I feel like this will be a very
 complex task to accomplish.

 3. Maybe there should be few versions of Tree collection (for example,
 for regular tasks red-black tree and B-Tree for large amounts of data). I
 have some expirience implementing both, but is it really needed in standard
 library?

>>>


Re: Bag (Multiset) Collection

2024-04-20 Thread ІП-24 Олександр Ротань
Concurrent Bag is something like CopyOnWriteArrayList or Vector, don't we
have it already? (I know vectors aren't optimized). THe whole point of the
bag is to store duplicate elements without preserving order, which allows
things like ordered collections. There could be mutable collections for
concurrency optimized with read-write locks though, if that is what you are
referring to as "concurrent bag", and I am missing the point why this
can't be a part of list interface, then that just adds one more use case to
potential Bag interface.

PS: Ordered means it preserves order based on comparator, what you are
referring to as ordered essentially means sequential, or am I missing a
point?

сб, 20 апр. 2024 г. в 23:52, Holo The Sage Wolf :

> By "ordered collection" you mean "unordered collection"?
>
> How common is it to actually use a multiset in general? Of course there
> are use cases, and there are areas in which it is used a lot, but I don't
> believe it is common enough to be part of the std with the one exception of
> concurrent bag.
>
> On Sat, 20 Apr 2024, 23:25 ІП-24 Олександр Ротань, <
> rotan.olexa...@gmail.com> wrote:
>
>> In this letter I would like to express some of my thoughts regarding the
>> potential Multiset interface.
>>
>> I, personally, have encountered a few situations where such an interface
>> could come in handy, mostly when I needed an ordered collection that
>> permits duplicates. That time I used guava`s TreeMultiset, but I think Java
>> itself should have such a collection in its std library. While it is not a
>> very common problem, there are still a bunch of use cases where such things
>> could come in handy.
>>
>> I am willing to take on development of such thing, but there are a few
>> concerns about Multiset:
>>
>> 1. Is there any other use for this besides ordered collection that
>> permits duplicates? I can't remember anything else from the top of my head.
>>
>> 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet
>> class hierarchy, while not being directly connected. I think introducing
>> any other ordered collection will require some refactoring of existing
>> collection interfaces (for example extract SortedCollection from SortedSet,
>> Navigable Collection from NavigableSet etc.). From the perspective of clean
>> code, this would be the right decision, but I feel like this will be a very
>> complex task to accomplish.
>>
>> 3. Maybe there should be few versions of Tree collection (for example,
>> for regular tasks red-black tree and B-Tree for large amounts of data). I
>> have some expirience implementing both, but is it really needed in standard
>> library?
>>
>


Re: Bag (Multiset) Collection

2024-04-20 Thread Holo The Sage Wolf
By "ordered collection" you mean "unordered collection"?

How common is it to actually use a multiset in general? Of course there are
use cases, and there are areas in which it is used a lot, but I don't
believe it is common enough to be part of the std with the one exception of
concurrent bag.

On Sat, 20 Apr 2024, 23:25 ІП-24 Олександр Ротань, 
wrote:

> In this letter I would like to express some of my thoughts regarding the
> potential Multiset interface.
>
> I, personally, have encountered a few situations where such an interface
> could come in handy, mostly when I needed an ordered collection that
> permits duplicates. That time I used guava`s TreeMultiset, but I think Java
> itself should have such a collection in its std library. While it is not a
> very common problem, there are still a bunch of use cases where such things
> could come in handy.
>
> I am willing to take on development of such thing, but there are a few
> concerns about Multiset:
>
> 1. Is there any other use for this besides ordered collection that permits
> duplicates? I can't remember anything else from the top of my head.
>
> 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet class
> hierarchy, while not being directly connected. I think introducing any
> other ordered collection will require some refactoring of existing
> collection interfaces (for example extract SortedCollection from SortedSet,
> Navigable Collection from NavigableSet etc.). From the perspective of clean
> code, this would be the right decision, but I feel like this will be a very
> complex task to accomplish.
>
> 3. Maybe there should be few versions of Tree collection (for example, for
> regular tasks red-black tree and B-Tree for large amounts of data). I have
> some expirience implementing both, but is it really needed in standard
> library?
>


Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v25]

2024-04-20 Thread Vladimir Kozlov
On Sat, 20 Apr 2024 19:09:43 GMT, Scott Gibbons  wrote:

>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64.  See 
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around 
>> this change.
>> 
>> Overall, making this an intrinsic improves overall performance of 
>> `Unsafe::setMemory` by up to 4x for all buffer sizes.
>> 
>> Tested with tier-1 (and full CI).  I've added a table of the before and 
>> after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`).
>> 
>> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt)
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Fix UnsafeCopyMemoryMark scope issue

Before I do testing, please sync with mainline.

-

PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-206569


Re: Bag (Multiset) Collection

2024-04-20 Thread ІП-24 Олександр Ротань
I agree with the point made in the FAQ about the popularity of such
problems. That said, I don't think that it is that unpopular to be ignored.

Regarding Map.values(), this is the case, but, In my experience, one of the
main advantages of using TreeMultiset was O(long n) modification
complexity. Collection views returned by map, for obvious reasons, forbid
modification. Also, correct me if I`m wrong, but while TreeMultiset is an
unpopular requirement, it is at least as popular as TreeSet.

Also, I may not have a full view of how such Bag collections can be
implemented and used, however, I feel like there could be more to it.

I am not insisting on anything, I just feel that if there is someone (like
me lol) who is willing to take on full development and integration cycle,
there aren't much reason to reject such enhancements.

сб, 20 апр. 2024 г. в 23:43, ІП-24 Олександр Ротань <
rotan.olexa...@gmail.com>:

> I agree with the point made in the FAQ about the popularity of such
> problems. That said, I don't think that it is that unpopular to be ignored.
>
> Regarding Map.values(), this is the case, but, In my experience, one of
> the main advantages of using TreeMultiset was O(long n) modification
> complexity. Collection views returned by map, for obvious reasons, forbid
> modification. Also, correct me if I`m wrong, but while TreeMultiset is an
> unpopular requirement, it is at least as popular as TreeSet.
>
> Also, I may not have a full view of how such Bag collections can be
> implemented and used, however, I feel like there could be more to it.
>
> I am not insisting on anything, I just feel that if there is someone (like
> me lol) who is willing to take on full development and integration cycle,
> there aren't much reason to reject such enhancements.
>
> сб, 20 апр. 2024 г. в 23:31, David Alayachew :
>
>> Your Bag suggestion has been asked so frequently that there is an FAQ
>> entry in the official Java Docs.
>>
>>
>> https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/doc-files/coll-designfaq.html#a3
>>
>> On Sat, Apr 20, 2024 at 4:25 PM ІП-24 Олександр Ротань <
>> rotan.olexa...@gmail.com> wrote:
>>
>>> In this letter I would like to express some of my thoughts regarding the
>>> potential Multiset interface.
>>>
>>> I, personally, have encountered a few situations where such an interface
>>> could come in handy, mostly when I needed an ordered collection that
>>> permits duplicates. That time I used guava`s TreeMultiset, but I think Java
>>> itself should have such a collection in its std library. While it is not a
>>> very common problem, there are still a bunch of use cases where such things
>>> could come in handy.
>>>
>>> I am willing to take on development of such thing, but there are a few
>>> concerns about Multiset:
>>>
>>> 1. Is there any other use for this besides ordered collection that
>>> permits duplicates? I can't remember anything else from the top of my head.
>>>
>>> 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet
>>> class hierarchy, while not being directly connected. I think introducing
>>> any other ordered collection will require some refactoring of existing
>>> collection interfaces (for example extract SortedCollection from SortedSet,
>>> Navigable Collection from NavigableSet etc.). From the perspective of clean
>>> code, this would be the right decision, but I feel like this will be a very
>>> complex task to accomplish.
>>>
>>> 3. Maybe there should be few versions of Tree collection (for example,
>>> for regular tasks red-black tree and B-Tree for large amounts of data). I
>>> have some expirience implementing both, but is it really needed in standard
>>> library?
>>>
>>


Re: Bag (Multiset) Collection

2024-04-20 Thread David Alayachew
Your Bag suggestion has been asked so frequently that there is an FAQ entry
in the official Java Docs.

https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/doc-files/coll-designfaq.html#a3

On Sat, Apr 20, 2024 at 4:25 PM ІП-24 Олександр Ротань <
rotan.olexa...@gmail.com> wrote:

> In this letter I would like to express some of my thoughts regarding the
> potential Multiset interface.
>
> I, personally, have encountered a few situations where such an interface
> could come in handy, mostly when I needed an ordered collection that
> permits duplicates. That time I used guava`s TreeMultiset, but I think Java
> itself should have such a collection in its std library. While it is not a
> very common problem, there are still a bunch of use cases where such things
> could come in handy.
>
> I am willing to take on development of such thing, but there are a few
> concerns about Multiset:
>
> 1. Is there any other use for this besides ordered collection that permits
> duplicates? I can't remember anything else from the top of my head.
>
> 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet class
> hierarchy, while not being directly connected. I think introducing any
> other ordered collection will require some refactoring of existing
> collection interfaces (for example extract SortedCollection from SortedSet,
> Navigable Collection from NavigableSet etc.). From the perspective of clean
> code, this would be the right decision, but I feel like this will be a very
> complex task to accomplish.
>
> 3. Maybe there should be few versions of Tree collection (for example, for
> regular tasks red-black tree and B-Tree for large amounts of data). I have
> some expirience implementing both, but is it really needed in standard
> library?
>


Bag (Multiset) Collection

2024-04-20 Thread ІП-24 Олександр Ротань
In this letter I would like to express some of my thoughts regarding the
potential Multiset interface.

I, personally, have encountered a few situations where such an interface
could come in handy, mostly when I needed an ordered collection that
permits duplicates. That time I used guava`s TreeMultiset, but I think Java
itself should have such a collection in its std library. While it is not a
very common problem, there are still a bunch of use cases where such things
could come in handy.

I am willing to take on development of such thing, but there are a few
concerns about Multiset:

1. Is there any other use for this besides ordered collection that permits
duplicates? I can't remember anything else from the top of my head.

2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet class
hierarchy, while not being directly connected. I think introducing any
other ordered collection will require some refactoring of existing
collection interfaces (for example extract SortedCollection from SortedSet,
Navigable Collection from NavigableSet etc.). From the perspective of clean
code, this would be the right decision, but I feel like this will be a very
complex task to accomplish.

3. Maybe there should be few versions of Tree collection (for example, for
regular tasks red-black tree and B-Tree for large amounts of data). I have
some expirience implementing both, but is it really needed in standard
library?


Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v24]

2024-04-20 Thread Scott Gibbons
On Sat, 20 Apr 2024 04:28:43 GMT, Vladimir Kozlov  wrote:

>> Scott Gibbons has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Long to short jmp; other cleanup
>
> `runtime/Unsafe/InternalErrorTest.java` test SIGBUS when run with `-Xcomp` 
> (and other flags in test's @run command):
> 
> #  SIGBUS (0xa) at pc=0x000119514760, pid=63021, tid=28163
> #
> # JRE version: Java(TM) SE Runtime Environment (23.0) (fastdebug build 
> 23-internal-2024-04-19-2326152.vladimir.kozlov.jdkgit2)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 
> 23-internal-2024-04-19-2326152.vladimir.kozlov.jdkgit2, compiled mode, 
> sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-amd64)
> # Problematic frame:
> # v  ~StubRoutines::jbyte_fill 0x000119514760

@vnkozlov Thanks for the feedback.  Can you please start the testing again?  
I'd appreciate it.

-

PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2067759300


Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v24]

2024-04-20 Thread Scott Gibbons
On Fri, 19 Apr 2024 22:08:52 GMT, Scott Gibbons  wrote:

>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64.  See 
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around 
>> this change.
>> 
>> Overall, making this an intrinsic improves overall performance of 
>> `Unsafe::setMemory` by up to 4x for all buffer sizes.
>> 
>> Tested with tier-1 (and full CI).  I've added a table of the before and 
>> after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`).
>> 
>> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt)
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Long to short jmp; other cleanup

The SIGBUS was due to improper scoping of the UnsafeCopyMemoryMark.  The change 
is:

`   {`
` // Add set memory mark to protect against unsafe accesses faulting`
`-UnsafeCopyMemoryMark(this, ((t == T_BYTE) && !aligned), true);`
`+UnsafeCopyMemoryMark usmm(this, ((t == T_BYTE) && !aligned), true);`
` __ generate_fill(t, aligned, to, value, r11, rax, xmm0);`
`   }`

-

PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2067758164


Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v24]

2024-04-20 Thread Scott Gibbons
On Sat, 20 Apr 2024 14:14:59 GMT, Jatin Bhateja  wrote:

>> Scott Gibbons has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Long to short jmp; other cleanup
>
> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2530:
> 
>> 2528: switch (type) {
>> 2529:   case USM_SHORT:
>> 2530: __ movw(Address(dest, (2 * i)), wide_value);
> 
> MOVW emits an extra Operand Size Override prefix byte compared to 32 and 64 
> bit stores, any specific reason for keeping same unroll factor for all the 
> stores.

My understanding is the spec requires the appropriate-sized write based on 
alignment and size.  This is why there's no 128-bit or 256-bit store loops.

> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2539:
> 
>> 2537: break;
>> 2538: }
>> 2539:   }
> 
> I understand we want to be as accurate as possible in filling the tail in an 
> event of SIGBUS, but we are anyways creating a wide value for 8 packed bytes 
> if destination segment was quadword aligned, aligned quadword stores are 
> implicitly atomic on x86 targets, what's your thoughts on using a vector 
> instruction based loop.

I believe the spec is specific on the size of the store required given 
alignment and size.  I want to honor that spec even though wider stores could 
be done in many cases.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573373720
PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573374108


Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v25]

2024-04-20 Thread Scott Gibbons
> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64.  See 
> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around 
> this change.
> 
> Overall, making this an intrinsic improves overall performance of 
> `Unsafe::setMemory` by up to 4x for all buffer sizes.
> 
> Tested with tier-1 (and full CI).  I've added a table of the before and after 
> numbers for the JMH I ran (`MemorySegmentZeroUnsafe`).
> 
> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt)

Scott Gibbons has updated the pull request incrementally with one additional 
commit since the last revision:

  Fix UnsafeCopyMemoryMark scope issue

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/18555/files
  - new: https://git.openjdk.org/jdk/pull/18555/files/19616244..c1290169

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18555&range=24
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18555&range=23-24

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/18555.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18555/head:pull/18555

PR: https://git.openjdk.org/jdk/pull/18555


Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v24]

2024-04-20 Thread Jatin Bhateja
On Fri, 19 Apr 2024 22:08:52 GMT, Scott Gibbons  wrote:

>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64.  See 
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around 
>> this change.
>> 
>> Overall, making this an intrinsic improves overall performance of 
>> `Unsafe::setMemory` by up to 4x for all buffer sizes.
>> 
>> Tested with tier-1 (and full CI).  I've added a table of the before and 
>> after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`).
>> 
>> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt)
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Long to short jmp; other cleanup

src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2530:

> 2528: switch (type) {
> 2529:   case USM_SHORT:
> 2530: __ movw(Address(dest, (2 * i)), wide_value);

MOVW emits an extra Operand Size Override prefix byte compared to 32 and 64 bit 
stores, any specific reason for keeping same unroll factor for all the stores.

src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2539:

> 2537: break;
> 2538: }
> 2539:   }

I understand we want to be as accurate as possible in filling the tail in an 
event of SIGBUS, but we are anyways creating a wide value for 8 packed bytes if 
destination segment was quadword aligned, aligned quadword stores are 
implicitly atomic on x86 targets, what's your thoughts on using a vector 
instruction based loop.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573297441
PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573299069


Re: Addition of Predicate-based findIndex and findLastIndex methods to java.util.List

2024-04-20 Thread ІП-24 Олександр Ротань
Regarding what Holo said, I think enumerated() would be a more suitable
name.

I have few ideas about implementation of such things, like separate
EnumeratedStream interface, which extends stream and provides methods like
forEach(BiConsumer), etc. This would eliminate unnecessary
object instantiation and also make stream interface more comfortable as
enumeration of steam won't require to consume index in every operation.\

Although it is an interesting and important topic, I have to ask you guys
to move it to another thread if possible. This thread already shifted
significantly from the original topic, which was potential compatibility
and API issues that I need to fix before applying for CSR.

сб, 20 апр. 2024 г. в 14:10, Holo The Sage Wolf :

> Hello,
>
> I want to challenge the idea that every stream may be indexed for 3
> reasons:
>
> Any sensible type to use for an index (int/long) will be bounded, but
> streams need not, e.g.
> new Random().ints().withIndex().forEach(v -> ...);
> May result (after a very long time) non sensical values. Using bigInteger
> will solve this problem but would be silly as in practice 99.9% of the
> times long is more than enough (and 90% int is probably enough)
>
> The two other problems have to do with the semantics of "index":
> Index gives the impression that the "index" is a real position, but the
> stream may be truly non deterministic (using e.g. Hardware Random Stream)
> Index gives the impression that the "index" respects the canonical order
> of the underlying collection (if exists), that is:
> var x = myList.stream().parallel()
> .withIndex().findAny().get();
> assert x.value() == myList.get(x.index()); // implementation dependent
>
> Which I don't think is possible without enormous performance problems.
>
> ---
>
> To be clear about my position: I like and want the idea. The first
> problem, in my opinion, shouldn't block it, but we should document it. That
> being said, I don't think we should be calling this "index".
>
> On Sat, 20 Apr 2024, 11:54 -,  wrote:
>
>> Hi Rotan',
>> Interface methods are both exposed to callers and to implementations. Is
>> there any scenario where the implementation can be more efficient than the
>> stream/gatherer scanning approach?
>>
>> indexOf or lastIndexOf may be faster if a list has some tricks like a
>> hashmap to quickly look up the index of an item. In contrast, the predicate
>> looks too generic and barely optimizable by list implementations; even if
>> we can stream in parallel, this is already covered by the stream API, so we
>> can simply declare a Gatherer to perform this findIndex operation on a
>> stream of elements (also note that a list can be reversed with
>> List::reversed() since JDK 21, so the same Gatherer can be reused for
>> findLastIndex)
>>
>> I think if there is no scenario where implementation can do better with
>> findIndex, the findIndex should become a Gatherer, which has better
>> compatibility.
>>
>> Best,
>> Chen Liang
>>
>> On Fri, Apr 19, 2024 at 8:51 PM ІП-24 Олександр Ротань <
>> rotan.olexa...@gmail.com> wrote:
>>
>>> Turned out this whole time I was only replying to David. I have to thank
>>> him for his patience explaining me how this whole mailing thing work.
>>>
>>> To summarize what has been missed out, besides what have you seen in
>>> quotes. I have pointed out that, while flexibility and simplicity is
>>> important, the main reason why I have decided to propose this feature is to
>>> enhance developer experience.
>>>
>>> I know that conciseness is not a main goal of Java, and therefore my
>>> proposal is kind of contradicts the general direction of development of
>>> Java as language, but my desire to make Java more user friendly is based on
>>> personal experience of introducing fresh devs or switchers to java and
>>> their feedback. Therefore, while it is not a priority for community, I
>>> really hope you guys don't mind me working at a backrank of Java making it
>>> a bit prettier.
>>>
>>> On Sat, Apr 20, 2024, 02:56 David Alayachew 
>>> wrote:
>>>
 > That was what i referred to as "Map-solvable" problems,
 > but if this way of doing it is a thing, then I agree with
 > the point.

 Doing it the Map way would not only be slower, but it would force you
 to include more concepts than you need. Your identifier is your index.
 Lists/arrays have index. Bringing a map in only complicates things.

 As for the rest of your response, I think I understand the trouble
 here. You put it best when you said that you are introducing this feature
 with the intent of improving the developer experience, specifically through
 making code more concise.

 I can understand that goal, and it is a good goal to strive for, but
 conciseness is one of the last things Java aims to provide as a language or
 through its standard library. I won't tell you not to do it, but I will
 tell you that there will be significant 

RFR: 8330624: Inconsistent use of semicolon in the code snippet of java.io.Serializable javadoc

2024-04-20 Thread Korov
Fix the description of java.io.Serializable.

-

Commit messages:
 - Inconsistent use of semicolon in the code snippet of java.io.Serializable 
javadoc

Changes: https://git.openjdk.org/jdk/pull/18874/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18874&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8330624
  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/18874.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18874/head:pull/18874

PR: https://git.openjdk.org/jdk/pull/18874


Re: Addition of Predicate-based findIndex and findLastIndex methods to java.util.List

2024-04-20 Thread Holo The Sage Wolf
Hello,

I want to challenge the idea that every stream may be indexed for 3 reasons:

Any sensible type to use for an index (int/long) will be bounded, but
streams need not, e.g.
new Random().ints().withIndex().forEach(v -> ...);
May result (after a very long time) non sensical values. Using bigInteger
will solve this problem but would be silly as in practice 99.9% of the
times long is more than enough (and 90% int is probably enough)

The two other problems have to do with the semantics of "index":
Index gives the impression that the "index" is a real position, but the
stream may be truly non deterministic (using e.g. Hardware Random Stream)
Index gives the impression that the "index" respects the canonical order of
the underlying collection (if exists), that is:
var x = myList.stream().parallel()
.withIndex().findAny().get();
assert x.value() == myList.get(x.index()); // implementation dependent

Which I don't think is possible without enormous performance problems.

---

To be clear about my position: I like and want the idea. The first problem,
in my opinion, shouldn't block it, but we should document it. That being
said, I don't think we should be calling this "index".

On Sat, 20 Apr 2024, 11:54 -,  wrote:

> Hi Rotan',
> Interface methods are both exposed to callers and to implementations. Is
> there any scenario where the implementation can be more efficient than the
> stream/gatherer scanning approach?
>
> indexOf or lastIndexOf may be faster if a list has some tricks like a
> hashmap to quickly look up the index of an item. In contrast, the predicate
> looks too generic and barely optimizable by list implementations; even if
> we can stream in parallel, this is already covered by the stream API, so we
> can simply declare a Gatherer to perform this findIndex operation on a
> stream of elements (also note that a list can be reversed with
> List::reversed() since JDK 21, so the same Gatherer can be reused for
> findLastIndex)
>
> I think if there is no scenario where implementation can do better with
> findIndex, the findIndex should become a Gatherer, which has better
> compatibility.
>
> Best,
> Chen Liang
>
> On Fri, Apr 19, 2024 at 8:51 PM ІП-24 Олександр Ротань <
> rotan.olexa...@gmail.com> wrote:
>
>> Turned out this whole time I was only replying to David. I have to thank
>> him for his patience explaining me how this whole mailing thing work.
>>
>> To summarize what has been missed out, besides what have you seen in
>> quotes. I have pointed out that, while flexibility and simplicity is
>> important, the main reason why I have decided to propose this feature is to
>> enhance developer experience.
>>
>> I know that conciseness is not a main goal of Java, and therefore my
>> proposal is kind of contradicts the general direction of development of
>> Java as language, but my desire to make Java more user friendly is based on
>> personal experience of introducing fresh devs or switchers to java and
>> their feedback. Therefore, while it is not a priority for community, I
>> really hope you guys don't mind me working at a backrank of Java making it
>> a bit prettier.
>>
>> On Sat, Apr 20, 2024, 02:56 David Alayachew 
>> wrote:
>>
>>> > That was what i referred to as "Map-solvable" problems,
>>> > but if this way of doing it is a thing, then I agree with
>>> > the point.
>>>
>>> Doing it the Map way would not only be slower, but it would force you to
>>> include more concepts than you need. Your identifier is your index.
>>> Lists/arrays have index. Bringing a map in only complicates things.
>>>
>>> As for the rest of your response, I think I understand the trouble here.
>>> You put it best when you said that you are introducing this feature with
>>> the intent of improving the developer experience, specifically through
>>> making code more concise.
>>>
>>> I can understand that goal, and it is a good goal to strive for, but
>>> conciseness is one of the last things Java aims to provide as a language or
>>> through its standard library. I won't tell you not to do it, but I will
>>> tell you that there will be significant push back from folks here, and for
>>> good reason -- conciseness is not the priority here at Java. At Java, the
>>> lack of conciseness is not a weakness. Sure, unneeded verbosity can be a
>>> weakness, but the solution I suggested above is basically the tempo that
>>> Java strives for -- compose simple concepts so that you can create what you
>>> need.
>>>
>>> All of that is to say, you have made your point clear. You goal is
>>> conciseness and ease of access. Understanding that, it's clear that my
>>> solution is not concise enough to meet that goal, so I won't push it
>>> further.
>>>
>>> On Fri, Apr 19, 2024 at 7:26 PM ІП-24 Олександр Ротань <
>>> rotan.olexa...@gmail.com> wrote:
>>>
 " 1. Fetching data from same-sized lists -- This is a trick from
 Data-Oriented Design (not to be confused with Data-Oriented Programming)
 where you filter 

RFR: 8330686: Fix typos in the DatabaseMetaData javadoc

2024-04-20 Thread Jin Kwon
Fix typos within the `DatabaseMetaData.java`.

-

Commit messages:
 - Merge branch 'master' of github.com:onacit/jdk
 - Merge branch 'master' of github.com:onacit/jdk
 - Fix more typos
 - Fix typo
 - Fix wrongly typed constant name
 - Fix more typos
 - Fix typo
 - Fix wrongly typed constant name
 - Fix more typos
 - Fix typo

Changes: https://git.openjdk.org/jdk/pull/18860/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18860&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8330686
  Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/18860.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18860/head:pull/18860

PR: https://git.openjdk.org/jdk/pull/18860


Re: RFR: 8330686: Fix typos in the DatabaseMetaData javadoc

2024-04-20 Thread Jaikiran Pai
On Fri, 19 Apr 2024 11:36:48 GMT, Jin Kwon  wrote:

>> Fix typos within the `DatabaseMetaData.java`.
>
> @openjdk with issue from where?

Hello @onacit, it appears a JBS issue has been filed for this 
https://bugs.openjdk.org/browse/JDK-8330686. Please change the title of this PR 
to `8330686: Fix typos in the DatabaseMetaData javadoc` so that it triggers the 
official review process.

-

PR Comment: https://git.openjdk.org/jdk/pull/18860#issuecomment-2067548609


Re: RFR: 8330686: Fix typos in the DatabaseMetaData javadoc

2024-04-20 Thread Jin Kwon
On Fri, 19 Apr 2024 11:33:48 GMT, Jin Kwon  wrote:

> Fix typos within the `DatabaseMetaData.java`.

@openjdk with issue from where?

-

PR Comment: https://git.openjdk.org/jdk/pull/18860#issuecomment-2066389888


Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v22]

2024-04-20 Thread Viktor Klang
On Fri, 5 Apr 2024 23:13:39 GMT, Brent Christian  wrote:

>> Classes in the `java.lang.ref` package would benefit from an update to bring 
>> the spec in line with how the VM already behaves. The changes would focus on 
>> _happens-before_ edges at some key points during reference processing.
>> 
>> A couple key things we want to be able to say are:
>> - `Reference.reachabilityFence(x)` _happens-before_ reference processing 
>> occurs for 'x'.
>> - `Cleaner.register()` _happens-before_ the Cleaner thread runs the 
>> registered cleaning action.
>> 
>> This will bring Cleaner in line (or close) with the memory visibility 
>> guarantees made for finalizers in [JLS 
>> 17.4.5](https://docs.oracle.com/javase/specs/jls/se18/html/jls-17.html#jls-17.4.5):
>> _"There is a happens-before edge from the end of a constructor of an object 
>> to the start of a finalizer (§12.6) for that object."_
>
> Brent Christian has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Another update to reachabilityFence() @apiNote

Thanks Brent, I have no objections. Good job!

-

PR Comment: https://git.openjdk.org/jdk/pull/16644#issuecomment-2067611311


Re: RFR: 8330681: Explicit hashCode and equals for java.lang.runtime.SwitchBootstraps$TypePairs

2024-04-20 Thread Chen Liang
On Sat, 20 Apr 2024 00:00:54 GMT, ExE Boss  wrote:

>> We can reduce overhead of first use of a switch bootstrap by moving 
>> `typePairToName` into `TypePairs` and by explicitly overriding `hashCode` 
>> and `equals`. The first change avoids loading and initializing the 
>> `TypePairs` class in actual cases, the second remove some excess code 
>> generation from happening on first use.
>
> src/java.base/share/classes/java/lang/runtime/SwitchBootstraps.java line 685:
> 
>> 683: record TypePairs(Class from, Class to) {
>> 684: 
>> 685: private static final Map typePairToName = 
>> initialize();
> 
> If `TypePairs.typePairToName` is never modified after initialisation, then it 
> should probably be made immutable:
> Suggestion:
> 
> private static final Map typePairToName = 
> Map.copyOf(initialize());

If you really think about it, the `initialize` method itself is somewhat 
problematic, as it's initializing with byte/short/char on the left, all of 
which are already converted to int in the of() factory. This should be done in 
a separate issue.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18865#discussion_r1573195851


Re: Addition of Predicate-based findIndex and findLastIndex methods to java.util.List

2024-04-20 Thread -
Hi Rotan',
Interface methods are both exposed to callers and to implementations. Is
there any scenario where the implementation can be more efficient than the
stream/gatherer scanning approach?

indexOf or lastIndexOf may be faster if a list has some tricks like a
hashmap to quickly look up the index of an item. In contrast, the predicate
looks too generic and barely optimizable by list implementations; even if
we can stream in parallel, this is already covered by the stream API, so we
can simply declare a Gatherer to perform this findIndex operation on a
stream of elements (also note that a list can be reversed with
List::reversed() since JDK 21, so the same Gatherer can be reused for
findLastIndex)

I think if there is no scenario where implementation can do better with
findIndex, the findIndex should become a Gatherer, which has better
compatibility.

Best,
Chen Liang

On Fri, Apr 19, 2024 at 8:51 PM ІП-24 Олександр Ротань <
rotan.olexa...@gmail.com> wrote:

> Turned out this whole time I was only replying to David. I have to thank
> him for his patience explaining me how this whole mailing thing work.
>
> To summarize what has been missed out, besides what have you seen in
> quotes. I have pointed out that, while flexibility and simplicity is
> important, the main reason why I have decided to propose this feature is to
> enhance developer experience.
>
> I know that conciseness is not a main goal of Java, and therefore my
> proposal is kind of contradicts the general direction of development of
> Java as language, but my desire to make Java more user friendly is based on
> personal experience of introducing fresh devs or switchers to java and
> their feedback. Therefore, while it is not a priority for community, I
> really hope you guys don't mind me working at a backrank of Java making it
> a bit prettier.
>
> On Sat, Apr 20, 2024, 02:56 David Alayachew 
> wrote:
>
>> > That was what i referred to as "Map-solvable" problems,
>> > but if this way of doing it is a thing, then I agree with
>> > the point.
>>
>> Doing it the Map way would not only be slower, but it would force you to
>> include more concepts than you need. Your identifier is your index.
>> Lists/arrays have index. Bringing a map in only complicates things.
>>
>> As for the rest of your response, I think I understand the trouble here.
>> You put it best when you said that you are introducing this feature with
>> the intent of improving the developer experience, specifically through
>> making code more concise.
>>
>> I can understand that goal, and it is a good goal to strive for, but
>> conciseness is one of the last things Java aims to provide as a language or
>> through its standard library. I won't tell you not to do it, but I will
>> tell you that there will be significant push back from folks here, and for
>> good reason -- conciseness is not the priority here at Java. At Java, the
>> lack of conciseness is not a weakness. Sure, unneeded verbosity can be a
>> weakness, but the solution I suggested above is basically the tempo that
>> Java strives for -- compose simple concepts so that you can create what you
>> need.
>>
>> All of that is to say, you have made your point clear. You goal is
>> conciseness and ease of access. Understanding that, it's clear that my
>> solution is not concise enough to meet that goal, so I won't push it
>> further.
>>
>> On Fri, Apr 19, 2024 at 7:26 PM ІП-24 Олександр Ротань <
>> rotan.olexa...@gmail.com> wrote:
>>
>>> " 1. Fetching data from same-sized lists -- This is a trick from
>>> Data-Oriented Design (not to be confused with Data-Oriented Programming)
>>> where you filter down to the indexes that you want, and use the indexes to
>>> fetch related values from the respective lists. So, I would filter down to
>>> Player1, Player4, and Player6, then I would use their indexes to do lookups
>>> in PlayerItemList, PlayerSkinList, etc. This tactic is especially useful
>>> when working with structs (or value classes when Java gets them)."
>>>
>>> That was what i referred to as "Map-solvable" problems, but if this way
>>> of doing it is a thing, then I agree with the point.
>>>
>>> "By this logic, many of the methods in the Stream library should
>>> instead be in the Collection library. There are several emails on the
>>> lambda mailing list that address why this wasn't done. I imagine that Rémi
>>> himself could point them out to you."
>>>
>>> It may be just my opinion, but I don't see anything harmful in some
>>> shortcuts that could be optimized, especially if they still converge to a
>>> single point under the hood. I would love to read mails about maintainers'
>>> opinions on why some features of Streams couldnt be also a part of
>>> Collection library if it would enhance developer experience, so, if it
>>> wouldn't be hard for you, could you attach some links? Also I guess a good
>>> compromise would be extension methods, I am currently exploring
>>> possibilities of their implementation.
>>>
>>> "Well