Re: RFR: 8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Tue, 9 Apr 2024 21:36:46 GMT, Vladimir Yaroslavskiy wrote: >>> Hi Vamsi (@vamsi-parasa), few questions on your test environment: >>> >>> * what are the hardware specs of your server ? >>> * bare-metal or virtual ? >>> * are other services or big processes running ? >>> * os tuning ? CPU HT: off? Fixed CPU governor or frequency ? >>> * isolation using taskset ? >>> >>> Maybe C2 JIT (+ CDS archive) are given more performance on stock jdk sort >>> than same code running outside jdk... >>> >>> Thanks, Laurent >> >> Hi Laurent, >> >> The benchmarks are run on Intel TigerLake Core i7 machine. It's bare-metal >> without any virtualization. HT is ON and there is no other specific OS >> tuning or isolation using taskset. >> >> Thanks, >> Vamsi > > Hello Vamsi (@vamsi-parasa), > > Could you please run the new benchmarking? > To save time and don't patch JDK several times, I've created > JavaBenchmarkHarness > class which is under package java.util and compares several versions of DPQS. > Also I prepared several versions of current sorting source from JDK to detect > what is going wrong. > > What you need is to compile and run JavaBenchmarkHarness once: > > javac --patch-module java.base=. -d classes *.java > java -XX:CompileThreshold=1 -XX:-TieredCompilation --patch-module > java.base=classes -cp classes java.util.JavaBenchmarkHarness > > Find the sources there: > > https://github.com/iaroslavski/sorting/blob/master/radixsort/JavaBenchmarkHarness.java > > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01_ins.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01_mrg.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01_piv.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01_prt.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r29p.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r29p5.java > > Thank you, > Vladimir Hi Vladimir (@iaroslavski), Please see the data below: Thanks, Vamsi name | builder | size | mode | count | score -- | -- | -- | -- | -- | -- b01 | RANDOM | 600 | avg | 325677 | 6.862 b01 | RANDOM | 3000 | avg | 52041 | 82.233 b01 | RANDOM | 9 | avg | 1217 | 4456.51 b01 | RANDOM | 40 | avg | 242 | 22923.28 b01 | RANDOM | 100 | avg | 90 | 60598.84 b01 | REPEATED | 600 | avg | 651354 | 1.933 b01 | REPEATED | 3000 | avg | 104083 | 13.753 b01 | REPEATED | 9 | avg | 2435 | 723.039 b01 | REPEATED | 40 | avg | 484 | 3084.416 b01 | REPEATED | 100 | avg | 180 | 8234.428 b01 | STAGGER | 600 | avg | 1954062 | 1.005 b01 | STAGGER | 3000 | avg | 312251 | 4.945 b01 | STAGGER | 9 | avg | 7305 | 133.126 b01 | STAGGER | 40 | avg | 1453 | 592.144 b01 | STAGGER | 100 | avg | 542 | 1493.876 b01 | SHUFFLE | 600 | avg | 325677 | 5.12 b01 | SHUFFLE | 3000 | avg | 52041 | 29.252 b01 | SHUFFLE | 9 | avg | 1217 | 1396.664 b01 | SHUFFLE | 40 | avg | 242 | 5743.489 b01 | SHUFFLE | 100 | avg | 90 | 15490.81 b01_ins | RANDOM | 600 | avg | 325677 | 7.594 b01_ins | RANDOM | 3000 | avg | 52041 | 78.631 b01_ins | RANDOM | 9 | avg | 1217 | 4312.511 b01_ins | RANDOM | 40 | avg | 242 | 22108.18 b01_ins | RANDOM | 100 | avg | 90 | 58467.16 b01_ins | REPEATED | 600 | avg | 651354 | 1.569 b01_ins | REPEATED | 3000 | avg | 104083 | 11.313 b01_ins | REPEATED | 9 | avg | 2435 | 720.838 b01_ins | REPEATED | 40 | avg | 484 | 3003.673 b01_ins | REPEATED | 100 | avg | 180 | 8144.944 b01_ins | STAGGER | 600 | avg | 1954062 | 0.98 b01_ins | STAGGER | 3000 | avg | 312251 | 4.948 b01_ins | STAGGER | 9 | avg | 7305 | 132.909 b01_ins | STAGGER | 40 | avg | 1453 | 592.572 b01_ins | STAGGER | 100 | avg | 542 | 1492.627 b01_ins | SHUFFLE | 600 | avg | 325677 | 4.092 b01_ins | SHUFFLE | 3000 | avg | 52041 | 27.138 b01_ins | SHUFFLE | 9 | avg | 1217 | 1304.326 b01_ins | SHUFFLE | 40 | avg | 242 | 5465.745 b01_ins | SHUFFLE | 100 | avg | 90 | 14585.08 b01_mrg | RANDOM | 600 | avg | 325677 | 7.139 b01_mrg | RANDOM | 3000 | avg | 52041 | 81.01 b01_mrg | RANDOM | 9 | avg | 1217 | 4266.084 b01_mrg | RANDOM | 40 | avg | 242 | 21937.77 b01_mrg | RANDOM | 100 | avg | 90 | 58177.72 b01_mrg | REPEATED | 600 | avg | 651354 | 1.36 b01_mrg | REPEATED | 3000 | avg | 104083 | 9.013 b01_mrg | REPEATED | 9 | avg | 2435 | 737.684 b01_mrg | REPEATED | 40 | avg | 484 | 3152.447 b01_mrg | REPEATED | 100 | avg | 180 | 8366.866 b01_mrg | STAGGER | 600 | avg | 1954062 | 0.73 b01_mrg | STAGGER | 3000 | avg | 312251 | 3.733 b01_mrg | STAGGER | 9 | avg | 7305 | 114.35 b01_mrg | STAGGER | 40 | avg | 1453 | 524.821 b01_mrg | STAGGER | 100 | avg | 542 | 1351.504 b01_mrg | SHUFFLE | 600 | avg | 325677 | 4.986 b01_mrg | SHUFF
Re: EnumeratedStream
On Sat, Apr 20, 2024 at 8:29 PM ІП-24 Олександр Ротань < rotan.olexa...@gmail.com> wrote: > Gatherers could be effectively terminal, but I don't think Gatherer API > designers intended it to be. In related JEP, gatherers are described as a > way to declare custom intermediate operations, and introducing "terminal" > gatherers would be misleading. > I will show you an API that expands on collectors too. > > Talking about performance, not even considering gather() method itself, > creating an instance of Indexed object for each value in stream is costy > and might turn into a nightmare for infinite streams. > We will have value objects soon (Project Valhalla), by then the object creation will be much more lightweight as it's effectively a tuple. This is not really a thing of concern. > > As for internals of Stream API right now, I am not aware about its current > state, so not much to say here except that adding a new type of streams > that just slightly extends existing functionality might not be that > harmful, I guess. > > Regarding API exposure, I don't think that moving it from stream directly > to Gatherers factory would be much of a deal since to see index-aware > methods user must explicitly convert stream to enumerated. > Having another set of EnumeratedStream APIs and implementations is going to be a nightmare to maintain. Also it has bad interpolatability to convert itself back into regular streams for other usages, while Stream> are much better in that aspect. > > I also think the I didn't express my point about index consistency clear > enough. Consider following: > Notice that Stream.filter retains instead of drops the matching elements! Since otherwise your results don't make sense. I will assume you mean "filter" like "dropIf" > > List.of(1,2,3).stream().enumerated().filter((idx, i) -> i*idx < > 2).map((idx, val) -> idx * val) > > Result : (4, 9) > Does idx start from 0 or 1? All Java indices start from 0, having it start from 1 in enumerated is really weird. I will assume you mean [2, 6]=[1*2, 2*3] Your use case is even simpler, this case can be covered in a simple gatherer like this: Gatherer.ofSequential(() -> new int[] {0}, Gatherer.Integrator.ofGreedy((counter, v, sink) -> sink.push(new Indexed<>(counter[0]++, v Better than writing thousands lines of new code. The functionalities are already implemented in my sample project in https://github.com/liachmodded/IndexedStream/tree/main/src/main/java/com/github/liachmodded/indexedstream > > With your approach > > List.of(1,2,3).stream().gather(Gatherers.filterIndexed((idx, val) -> > idx*val < 2)).gather(Gatherers.mapIndexed((idx, val) -> idx * val)) > > Result : (2, 6) > I will assume you mean [0, 3]=[0*2, 1*3] > > Not only second option is much more verbose, but also indexes of stream > are inconsistent between operations. > Then what does EnumeratedStream serve? We would assume the enumeration means the enumeration at the current stage so we want the number immediately. If you want the enumeration at a particular stage, it's better to zip your stream items with an index like my gatherer above at an explicit step. Also I have already made a showcase git repository with your two test cases at https://github.com/liachmodded/IndexedStream/blob/main/src/test/java/com/github/liachmodded/indexedstream/IndexedGatherersTest.java Feel free to rant and tell what you really want that aren't served by these few simple classes. > > PS: regarding findIndex, I did some benchmarking today, you might like to > take a look. On average list-based version outperform collector and > gatherer-based in more then 10 times. And also lists in stdlib doesn't have > any hashcode-based implementations. I wrote 19 implementations in list > subclasses and none of them had anything but simple traversing logic inside > indexOf. But that's topic of another thread > Thanks for the prompt, I will look at it and reply in the other thread.
Re: EnumeratedStream
Gatherers could be effectively terminal, but I don't think Gatherer API designers intended it to be. In related JEP, gatherers are described as a way to declare custom intermediate operations, and introducing "terminal" gatherers would be misleading. Talking about performance, not even considering gather() method itself, creating an instance of Indexed object for each value in stream is costy and might turn into a nightmare for infinite streams. As for internals of Stream API right now, I am not aware about its current state, so not much to say here except that adding a new type of streams that just slightly extends existing functionality might not be that harmful, I guess. Regarding API exposure, I don't think that moving it from stream directly to Gatherers factory would be much of a deal since to see index-aware methods user must explicitly convert stream to enumerated. I also think the I didn't express my point about index consistency clear enough. Consider following: List.of(1,2,3).stream().enumerated().filter((idx, i) -> i*idx < 2).map((idx, val) -> idx * val) Result : (4, 9) With your approach List.of(1,2,3).stream().gather(Gatherers.filterIndexed((idx, val) -> idx*val < 2)).gather(Gatherers.mapIndexed((idx, val) -> idx * val)) Result : (2, 6) Not only second option is much more verbose, but also indexes of stream are inconsistent between operations. PS: regarding findIndex, I did some benchmarking today, you might like to take a look. On average list-based version outperform collector and gatherer-based in more then 10 times. And also lists in stdlib doesn't have any hashcode-based implementations. I wrote 19 implementations in list subclasses and none of them had anything but simple traversing logic inside indexOf. But that's topic of another thread On Sun, Apr 21, 2024, 04:03 - wrote: > On Sat, Apr 20, 2024 at 7:44 PM ІП-24 Олександр Ротань < > rotan.olexa...@gmail.com> wrote: > >> Also enumerated stream should also support index-aware terminal >> operations, which getherers are incapable of, so it will also require to >> create index-aware collectors. I am not aware if this is even possible, but >> this looks like another separate functionality in different place, and some >> developers might just don't be aware of its existence. I think that we as >> language devs should think not only about what is possible in language, but >> also about is it comfortable and is it obvious for user >> > Gatherers can become many-to-one; notice it has a state A, it can totally > choose to only emit a single element R in its finisher (i.e. its integrator > only touches state and element and ignores the downstream), then you can > use findAny().orElseThrow() to access that single collector result. That > said, the factory I proposed can try to wrap Collectors the same way it > wraps Gatherers. See conclusion below. > >> On Sun, Apr 21, 2024, 03:36 ІП-24 Олександр Ротань < >> rotan.olexa...@gmail.com> wrote: >> >>> Yes, I think every possible intermediate operation could be made index >>> aware using gatherers. The point is: should it be turned? >>> >>> As a developers of jdk itself, we are not limited in a ways we could >>> provide tools for Java users, especially when it comes to adding completely >>> new features and not modifying existing apis. >>> >> Creating a new type of pipeline is only going to blow up the complexity > of Streams; having IntStream, LongStream, DoubleStream and Stream, together > with the 4-way spliterators (Spliterator.OfInt etc.) and iterators > (PrimitiveIterator.OfInt etc.), is already an API nightmare. And the > indexing will multiply all the intermediate and terminal operations by 2, > further blowing up the API, which I don't believe is the right direction to > go. > >> >>> >> Gatherer-based approach looks like we are developers of third party >>> library that has to look for workarounds instead of directly adding >>> features we need. It's syntax is more wordy without any payoff in >>> flexibility, and obviously would be slower and memory-costy. >>> >> Gatherer is not a "third party hook", but an essential API that > represents all possible stream operations, including Collector. Gatherer > would not be slow; it already supports short-circuiting and should not add > extra overheads, as Gatherer is like an API for all possible stream > operations. > >> >>> For me seems that implementing this using gatherer would only introduce >>> unnecessary intermediate steps in operation internal pipeline without any >>> visible payoff. >>> >> Implementing with Gatherer would reduce useless API exposure, as indexed > operations are't that frequently used and are useless in parallel > scenarios. Especially that these indexed operations aren't optimizable by > stream subclasses, much like how findIndex is not helpful in Lists as > Predicates can't be easily decoded like Object equivalence/hashCode, which > some Lists can use to speed up indexOf. > >> >>> Also, indexes created this way
Re: EnumeratedStream
On Sat, Apr 20, 2024 at 7:44 PM ІП-24 Олександр Ротань < rotan.olexa...@gmail.com> wrote: > Also enumerated stream should also support index-aware terminal > operations, which getherers are incapable of, so it will also require to > create index-aware collectors. I am not aware if this is even possible, but > this looks like another separate functionality in different place, and some > developers might just don't be aware of its existence. I think that we as > language devs should think not only about what is possible in language, but > also about is it comfortable and is it obvious for user > Gatherers can become many-to-one; notice it has a state A, it can totally choose to only emit a single element R in its finisher (i.e. its integrator only touches state and element and ignores the downstream), then you can use findAny().orElseThrow() to access that single collector result. That said, the factory I proposed can try to wrap Collectors the same way it wraps Gatherers. See conclusion below. > On Sun, Apr 21, 2024, 03:36 ІП-24 Олександр Ротань < > rotan.olexa...@gmail.com> wrote: > >> Yes, I think every possible intermediate operation could be made index >> aware using gatherers. The point is: should it be turned? >> >> As a developers of jdk itself, we are not limited in a ways we could >> provide tools for Java users, especially when it comes to adding completely >> new features and not modifying existing apis. >> > Creating a new type of pipeline is only going to blow up the complexity of Streams; having IntStream, LongStream, DoubleStream and Stream, together with the 4-way spliterators (Spliterator.OfInt etc.) and iterators (PrimitiveIterator.OfInt etc.), is already an API nightmare. And the indexing will multiply all the intermediate and terminal operations by 2, further blowing up the API, which I don't believe is the right direction to go. > >> > Gatherer-based approach looks like we are developers of third party >> library that has to look for workarounds instead of directly adding >> features we need. It's syntax is more wordy without any payoff in >> flexibility, and obviously would be slower and memory-costy. >> > Gatherer is not a "third party hook", but an essential API that represents all possible stream operations, including Collector. Gatherer would not be slow; it already supports short-circuiting and should not add extra overheads, as Gatherer is like an API for all possible stream operations. > >> For me seems that implementing this using gatherer would only introduce >> unnecessary intermediate steps in operation internal pipeline without any >> visible payoff. >> > Implementing with Gatherer would reduce useless API exposure, as indexed operations are't that frequently used and are useless in parallel scenarios. Especially that these indexed operations aren't optimizable by stream subclasses, much like how findIndex is not helpful in Lists as Predicates can't be easily decoded like Object equivalence/hashCode, which some Lists can use to speed up indexOf. > >> Also, indexes created this way will be inconsistent between operations, >> and I am not sure if that is what we are looking for. >> > We declare an index-aware gatherer and my said factory converts it to an index-unaware, sequential gatherer; the factory gatherer prepares indices before calling our index-aware gatherer. For my factory, if you think my 4-line syntax above is too verbose, we can encapsulate those to become public static Gatherer filter(Predicate> predicate) etc. And the primary methods will be: public static Gatherer indexed(Gatherer, A, R> gatherer) public static Collector indexed(Collector, A, R> collector)
Re: EnumeratedStream
Also enumerated stream should also support index-aware terminal operations, which getherers are incapable of, so it will also require to create index-aware collectors. I am not aware if this is even possible, but this looks like another separate functionality in different place, and some developers might just don't be aware of its existence. I think that we as language devs should think not only about what is possible in language, but also about is it comfortable and is it obvious for user On Sun, Apr 21, 2024, 03:36 ІП-24 Олександр Ротань wrote: > Yes, I think every possible intermediate operation could be made index > aware using gatherers. The point is: should it be turned? > > As a developers of jdk itself, we are not limited in a ways we could > provide tools for Java users, especially when it comes to adding completely > new features and not modifying existing apis. > > Gatherer-based approach looks like we are developers of third party > library that has to look for workarounds instead of directly adding > features we need. It's syntax is more wordy without any payoff in > flexibility, and obviously would be slower and memory-costy. > > For me seems that implementing this using gatherer would only introduce > unnecessary intermediate steps in operation internal pipeline without any > visible payoff. > > Also, indexes created this way will be inconsistent between operations, > and I am not sure if that is what we are looking for. > > On Sun, Apr 21, 2024, 03:25 - wrote: > >> My point is that we can create Gatherers that gather indexed elements >> (indexed gatherer), and another factory wraps the Gatherer so the index >> boxing and automatically performed before sending to the indexed gatherer. >> All other stream operations, like indexed filter, indexed collect, etc. can >> all be done through the indexed gatherer. >> >> On Sat, Apr 20, 2024 at 7:13 PM ІП-24 Олександр Ротань < >> rotan.olexa...@gmail.com> wrote: >> >>> I am sorry, but I feel like I am missing the point of your response and >>> how is it related to what I have said. >>> >>> Regarding wrapping and unwrapping indexed pairs, one of advantages of >>> the approach I have suggested is that EnumeratedStream is still a stream >>> and all index-unaware ops could still be applied to it >>> >>> On Sun, Apr 21, 2024, 03:08 - wrote: >>> We must convert index-processing operations to a `gather(Gatherers.scan(/* index gathering */))` immediate before the operation that uses the index, and immediately unwrap the indices afterwards. Syntactically writing such 3 lines for every index-aware operation would be weird; I actually wonder if we can find a way to convert a Gatherer, ?, R> into a Gatherer; the nested gatherer itself will receive the enumerated values. I don't know if such a way to compose Gatherer has been studied, but this would allow us to define all indexed operations as gatherers instead of having to make new BiConsumers for indices. This would be interesting to look into. So the ideal form would be: BiPredicate filter = ... Gatherer, ?, R> filterGatherer = Gatherer.of((_, indexed, sink) -> return (!filter.test(indexed.index(), indexed.value())) || sink.push(indexed.value())); Gatherer wrapped = IndexedGatherer.wrap(filterGatherer); // a factory method performing wrapping return stream.gather(wrapped) // and more On Sat, Apr 20, 2024 at 6:47 PM ІП-24 Олександр Ротань < rotan.olexa...@gmail.com> wrote: > I would like to correct myself a bit: indexes should be assigned at > the moment an element is received from upstream, not when it is passed to > processing. Not sure if it will effectively change order of execution in > actual implementation, but to put it this way would be more precise > > вс, 21 апр. 2024 г. в 02:33, ІП-24 Олександр Ротань < > rotan.olexa...@gmail.com>: > >> Hello again. >> >> I have imagined implementation of enumerated streams in a slightly >> different way. I think enumerated streams should be a separate kind of >> stream like parallel, sequential or some primitive stream. This way >> enumeration could be deferred to the time that index is being consumed, >> This would eliminate issue with stream merging and also kind of smoothen >> the issue with parallel streams, as the indexed will follow the order >> elements have been passed to processing. >> >> This could potentially lead to another issue with consistency of >> indexes through stream pipeline, but there could be a workaround like >> IdentityHashMaps that is received from previous enumerated streams. >> >> This way we could make streams more efficient and also introduce a >> more user-friendly API, which is essentially a win-win situation. >> >> Regarding concatenation specifically, we could either introduce >> separate methods f
Re: Bag (Multiset) Collection
> > 1. Batch tasks processing, especially in concurrent environments, when > tasks could have different priorities. Sometimes tasks that are stored in a > queue are processed in batches. Using today's stdlib of Java, > acquiring such a batch of tasks from collection would be a O(n * log n) > complexity, because each heapification is log n and called after each task > is popped. Using TreeMultiset, this could be achieved in log (n). This also > applies to messaging. > This is a good point. There's no way for users to attach tree node metadata that are maintained when a BST is rebalanced, so having a rank operation for a Tree Multiset is quite helpful. Unfortunately, not even google guava supports this operation, but this is technically easy to implement. I think this is the main advantage of a Multiset over regular Map> implementations. However, given Java's TreeMap doesn't even support getting element by rank (its RBTree nodes don't record size), we might have to start with adding size to RBTree nodes first. >
Re: EnumeratedStream
Yes, I think every possible intermediate operation could be made index aware using gatherers. The point is: should it be turned? As a developers of jdk itself, we are not limited in a ways we could provide tools for Java users, especially when it comes to adding completely new features and not modifying existing apis. Gatherer-based approach looks like we are developers of third party library that has to look for workarounds instead of directly adding features we need. It's syntax is more wordy without any payoff in flexibility, and obviously would be slower and memory-costy. For me seems that implementing this using gatherer would only introduce unnecessary intermediate steps in operation internal pipeline without any visible payoff. Also, indexes created this way will be inconsistent between operations, and I am not sure if that is what we are looking for. On Sun, Apr 21, 2024, 03:25 - wrote: > My point is that we can create Gatherers that gather indexed elements > (indexed gatherer), and another factory wraps the Gatherer so the index > boxing and automatically performed before sending to the indexed gatherer. > All other stream operations, like indexed filter, indexed collect, etc. can > all be done through the indexed gatherer. > > On Sat, Apr 20, 2024 at 7:13 PM ІП-24 Олександр Ротань < > rotan.olexa...@gmail.com> wrote: > >> I am sorry, but I feel like I am missing the point of your response and >> how is it related to what I have said. >> >> Regarding wrapping and unwrapping indexed pairs, one of advantages of the >> approach I have suggested is that EnumeratedStream is still a stream and >> all index-unaware ops could still be applied to it >> >> On Sun, Apr 21, 2024, 03:08 - wrote: >> >>> We must convert index-processing operations to a >>> `gather(Gatherers.scan(/* index gathering */))` immediate before the >>> operation that uses the index, and immediately unwrap the indices >>> afterwards. >>> >>> Syntactically writing such 3 lines for every index-aware operation would >>> be weird; I actually wonder if we can find a way to convert a >>> Gatherer, ?, R> into a Gatherer; the nested gatherer >>> itself will receive the enumerated values. I don't know if such a way to >>> compose Gatherer has been studied, but this would allow us to define all >>> indexed operations as gatherers instead of having to make new >>> BiConsumers for indices. This would be interesting to look into. So the >>> ideal form would be: >>> >>> BiPredicate filter = ... >>> Gatherer, ?, R> filterGatherer = Gatherer.of((_, indexed, >>> sink) -> return (!filter.test(indexed.index(), indexed.value())) || >>> sink.push(indexed.value())); >>> Gatherer wrapped = IndexedGatherer.wrap(filterGatherer); // a >>> factory method performing wrapping >>> return stream.gather(wrapped) // and more >>> >>> On Sat, Apr 20, 2024 at 6:47 PM ІП-24 Олександр Ротань < >>> rotan.olexa...@gmail.com> wrote: >>> I would like to correct myself a bit: indexes should be assigned at the moment an element is received from upstream, not when it is passed to processing. Not sure if it will effectively change order of execution in actual implementation, but to put it this way would be more precise вс, 21 апр. 2024 г. в 02:33, ІП-24 Олександр Ротань < rotan.olexa...@gmail.com>: > Hello again. > > I have imagined implementation of enumerated streams in a slightly > different way. I think enumerated streams should be a separate kind of > stream like parallel, sequential or some primitive stream. This way > enumeration could be deferred to the time that index is being consumed, > This would eliminate issue with stream merging and also kind of smoothen > the issue with parallel streams, as the indexed will follow the order > elements have been passed to processing. > > This could potentially lead to another issue with consistency of > indexes through stream pipeline, but there could be a workaround like > IdentityHashMaps that is received from previous enumerated streams. > > This way we could make streams more efficient and also introduce a > more user-friendly API, which is essentially a win-win situation. > > Regarding concatenation specifically, we could either introduce > separate methods for merging enumerated streams, where we define how > conflicts are resolved, or just treat them as usual streams, so the result > of concatenation of enumerated streams will result in a normal stream that > should be enumerated once again. Also i don't really see how gathering > elements into index-value pairs solves issue with conflicting indexes > > вс, 21 апр. 2024 г. в 02:06, - : > >> Hi Oleksandr, >> I fear that enumeration might not be applicable to all streams, >> especially parallel ones. If we have a parallel stream, we might not >> always >> process all elements in order, and even index generatio
Re: EnumeratedStream
My point is that we can create Gatherers that gather indexed elements (indexed gatherer), and another factory wraps the Gatherer so the index boxing and automatically performed before sending to the indexed gatherer. All other stream operations, like indexed filter, indexed collect, etc. can all be done through the indexed gatherer. On Sat, Apr 20, 2024 at 7:13 PM ІП-24 Олександр Ротань < rotan.olexa...@gmail.com> wrote: > I am sorry, but I feel like I am missing the point of your response and > how is it related to what I have said. > > Regarding wrapping and unwrapping indexed pairs, one of advantages of the > approach I have suggested is that EnumeratedStream is still a stream and > all index-unaware ops could still be applied to it > > On Sun, Apr 21, 2024, 03:08 - wrote: > >> We must convert index-processing operations to a >> `gather(Gatherers.scan(/* index gathering */))` immediate before the >> operation that uses the index, and immediately unwrap the indices >> afterwards. >> >> Syntactically writing such 3 lines for every index-aware operation would >> be weird; I actually wonder if we can find a way to convert a >> Gatherer, ?, R> into a Gatherer; the nested gatherer >> itself will receive the enumerated values. I don't know if such a way to >> compose Gatherer has been studied, but this would allow us to define all >> indexed operations as gatherers instead of having to make new >> BiConsumers for indices. This would be interesting to look into. So the >> ideal form would be: >> >> BiPredicate filter = ... >> Gatherer, ?, R> filterGatherer = Gatherer.of((_, indexed, >> sink) -> return (!filter.test(indexed.index(), indexed.value())) || >> sink.push(indexed.value())); >> Gatherer wrapped = IndexedGatherer.wrap(filterGatherer); // a >> factory method performing wrapping >> return stream.gather(wrapped) // and more >> >> On Sat, Apr 20, 2024 at 6:47 PM ІП-24 Олександр Ротань < >> rotan.olexa...@gmail.com> wrote: >> >>> I would like to correct myself a bit: indexes should be assigned at the >>> moment an element is received from upstream, not when it is passed to >>> processing. Not sure if it will effectively change order of execution in >>> actual implementation, but to put it this way would be more precise >>> >>> вс, 21 апр. 2024 г. в 02:33, ІП-24 Олександр Ротань < >>> rotan.olexa...@gmail.com>: >>> Hello again. I have imagined implementation of enumerated streams in a slightly different way. I think enumerated streams should be a separate kind of stream like parallel, sequential or some primitive stream. This way enumeration could be deferred to the time that index is being consumed, This would eliminate issue with stream merging and also kind of smoothen the issue with parallel streams, as the indexed will follow the order elements have been passed to processing. This could potentially lead to another issue with consistency of indexes through stream pipeline, but there could be a workaround like IdentityHashMaps that is received from previous enumerated streams. This way we could make streams more efficient and also introduce a more user-friendly API, which is essentially a win-win situation. Regarding concatenation specifically, we could either introduce separate methods for merging enumerated streams, where we define how conflicts are resolved, or just treat them as usual streams, so the result of concatenation of enumerated streams will result in a normal stream that should be enumerated once again. Also i don't really see how gathering elements into index-value pairs solves issue with conflicting indexes вс, 21 апр. 2024 г. в 02:06, - : > Hi Oleksandr, > I fear that enumeration might not be applicable to all streams, > especially parallel ones. If we have a parallel stream, we might not > always > process all elements in order, and even index generation can be > unreliable. > In addition, stream merging will become a headache. I think > Gatherers.scan(() -> new Indexed<>(0, dummy), (indexed, value) -> new > Indexed<>(indexed.index() + 1, value)) can create a Stream> > which should serve your purpose. > > And for expanded operations for enumerated streams, there are similar > features for primitive streams already, where they have extra methods like > summaryStatistics() compared to object streams. We most likely will have > Gatherers that explicitly work on Stream> to support > type-specific operations, partly due to Java's type limits and that Stream > hierarchy is effectively closed. > > Best, > Chen Liang > > On Sat, Apr 20, 2024 at 4:07 PM ІП-24 Олександр Ротань < > rotan.olexa...@gmail.com> wrote: > >> My proposal regarding findIndex brought up a topic that, as I see, >> has been brought up here multiple times. >> >> My idea is to extend th
Re: EnumeratedStream
I am sorry, but I feel like I am missing the point of your response and how is it related to what I have said. Regarding wrapping and unwrapping indexed pairs, one of advantages of the approach I have suggested is that EnumeratedStream is still a stream and all index-unaware ops could still be applied to it On Sun, Apr 21, 2024, 03:08 - wrote: > We must convert index-processing operations to a `gather(Gatherers.scan(/* > index gathering */))` immediate before the operation that uses the index, > and immediately unwrap the indices afterwards. > > Syntactically writing such 3 lines for every index-aware operation would > be weird; I actually wonder if we can find a way to convert a > Gatherer, ?, R> into a Gatherer; the nested gatherer > itself will receive the enumerated values. I don't know if such a way to > compose Gatherer has been studied, but this would allow us to define all > indexed operations as gatherers instead of having to make new > BiConsumers for indices. This would be interesting to look into. So the > ideal form would be: > > BiPredicate filter = ... > Gatherer, ?, R> filterGatherer = Gatherer.of((_, indexed, sink) > -> return (!filter.test(indexed.index(), indexed.value())) || > sink.push(indexed.value())); > Gatherer wrapped = IndexedGatherer.wrap(filterGatherer); // a > factory method performing wrapping > return stream.gather(wrapped) // and more > > On Sat, Apr 20, 2024 at 6:47 PM ІП-24 Олександр Ротань < > rotan.olexa...@gmail.com> wrote: > >> I would like to correct myself a bit: indexes should be assigned at the >> moment an element is received from upstream, not when it is passed to >> processing. Not sure if it will effectively change order of execution in >> actual implementation, but to put it this way would be more precise >> >> вс, 21 апр. 2024 г. в 02:33, ІП-24 Олександр Ротань < >> rotan.olexa...@gmail.com>: >> >>> Hello again. >>> >>> I have imagined implementation of enumerated streams in a slightly >>> different way. I think enumerated streams should be a separate kind of >>> stream like parallel, sequential or some primitive stream. This way >>> enumeration could be deferred to the time that index is being consumed, >>> This would eliminate issue with stream merging and also kind of smoothen >>> the issue with parallel streams, as the indexed will follow the order >>> elements have been passed to processing. >>> >>> This could potentially lead to another issue with consistency of indexes >>> through stream pipeline, but there could be a workaround like >>> IdentityHashMaps that is received from previous enumerated streams. >>> >>> This way we could make streams more efficient and also introduce a more >>> user-friendly API, which is essentially a win-win situation. >>> >>> Regarding concatenation specifically, we could either introduce separate >>> methods for merging enumerated streams, where we define how conflicts are >>> resolved, or just treat them as usual streams, so the result of >>> concatenation of enumerated streams will result in a normal stream that >>> should be enumerated once again. Also i don't really see how gathering >>> elements into index-value pairs solves issue with conflicting indexes >>> >>> вс, 21 апр. 2024 г. в 02:06, - : >>> Hi Oleksandr, I fear that enumeration might not be applicable to all streams, especially parallel ones. If we have a parallel stream, we might not always process all elements in order, and even index generation can be unreliable. In addition, stream merging will become a headache. I think Gatherers.scan(() -> new Indexed<>(0, dummy), (indexed, value) -> new Indexed<>(indexed.index() + 1, value)) can create a Stream> which should serve your purpose. And for expanded operations for enumerated streams, there are similar features for primitive streams already, where they have extra methods like summaryStatistics() compared to object streams. We most likely will have Gatherers that explicitly work on Stream> to support type-specific operations, partly due to Java's type limits and that Stream hierarchy is effectively closed. Best, Chen Liang On Sat, Apr 20, 2024 at 4:07 PM ІП-24 Олександр Ротань < rotan.olexa...@gmail.com> wrote: > My proposal regarding findIndex brought up a topic that, as I see, has > been brought up here multiple times. > > My idea is to extend the existing stream interface in the new > EnumeratedStream interface. Such an approach has a number of advantages. > > 1. Some methods will be able to explicitly specify that they require > an enumerated stream. > > 2. This would allow us to use an enumerated stream just as the default > value stream, if the index is redundant at a certain processing step. > > 3. This could help introduce a more user-friendly interface: instead > of passing pairs of index and value, they could be passed as separate
Re: EnumeratedStream
We must convert index-processing operations to a `gather(Gatherers.scan(/* index gathering */))` immediate before the operation that uses the index, and immediately unwrap the indices afterwards. Syntactically writing such 3 lines for every index-aware operation would be weird; I actually wonder if we can find a way to convert a Gatherer, ?, R> into a Gatherer; the nested gatherer itself will receive the enumerated values. I don't know if such a way to compose Gatherer has been studied, but this would allow us to define all indexed operations as gatherers instead of having to make new BiConsumers for indices. This would be interesting to look into. So the ideal form would be: BiPredicate filter = ... Gatherer, ?, R> filterGatherer = Gatherer.of((_, indexed, sink) -> return (!filter.test(indexed.index(), indexed.value())) || sink.push(indexed.value())); Gatherer wrapped = IndexedGatherer.wrap(filterGatherer); // a factory method performing wrapping return stream.gather(wrapped) // and more On Sat, Apr 20, 2024 at 6:47 PM ІП-24 Олександр Ротань < rotan.olexa...@gmail.com> wrote: > I would like to correct myself a bit: indexes should be assigned at the > moment an element is received from upstream, not when it is passed to > processing. Not sure if it will effectively change order of execution in > actual implementation, but to put it this way would be more precise > > вс, 21 апр. 2024 г. в 02:33, ІП-24 Олександр Ротань < > rotan.olexa...@gmail.com>: > >> Hello again. >> >> I have imagined implementation of enumerated streams in a slightly >> different way. I think enumerated streams should be a separate kind of >> stream like parallel, sequential or some primitive stream. This way >> enumeration could be deferred to the time that index is being consumed, >> This would eliminate issue with stream merging and also kind of smoothen >> the issue with parallel streams, as the indexed will follow the order >> elements have been passed to processing. >> >> This could potentially lead to another issue with consistency of indexes >> through stream pipeline, but there could be a workaround like >> IdentityHashMaps that is received from previous enumerated streams. >> >> This way we could make streams more efficient and also introduce a more >> user-friendly API, which is essentially a win-win situation. >> >> Regarding concatenation specifically, we could either introduce separate >> methods for merging enumerated streams, where we define how conflicts are >> resolved, or just treat them as usual streams, so the result of >> concatenation of enumerated streams will result in a normal stream that >> should be enumerated once again. Also i don't really see how gathering >> elements into index-value pairs solves issue with conflicting indexes >> >> вс, 21 апр. 2024 г. в 02:06, - : >> >>> Hi Oleksandr, >>> I fear that enumeration might not be applicable to all streams, >>> especially parallel ones. If we have a parallel stream, we might not always >>> process all elements in order, and even index generation can be unreliable. >>> In addition, stream merging will become a headache. I think >>> Gatherers.scan(() -> new Indexed<>(0, dummy), (indexed, value) -> new >>> Indexed<>(indexed.index() + 1, value)) can create a Stream> >>> which should serve your purpose. >>> >>> And for expanded operations for enumerated streams, there are similar >>> features for primitive streams already, where they have extra methods like >>> summaryStatistics() compared to object streams. We most likely will have >>> Gatherers that explicitly work on Stream> to support >>> type-specific operations, partly due to Java's type limits and that Stream >>> hierarchy is effectively closed. >>> >>> Best, >>> Chen Liang >>> >>> On Sat, Apr 20, 2024 at 4:07 PM ІП-24 Олександр Ротань < >>> rotan.olexa...@gmail.com> wrote: >>> My proposal regarding findIndex brought up a topic that, as I see, has been brought up here multiple times. My idea is to extend the existing stream interface in the new EnumeratedStream interface. Such an approach has a number of advantages. 1. Some methods will be able to explicitly specify that they require an enumerated stream. 2. This would allow us to use an enumerated stream just as the default value stream, if the index is redundant at a certain processing step. 3. This could help introduce a more user-friendly interface: instead of passing pairs of index and value, they could be passed as separate variables, which will make code more concise. Consider following example: List.of(1, 2, 3).stream() .enumerated() .map(idx, value -> idx % 2 == 0 ? value : -value); looks much better than List.of(1, 2, 3).stream() .enumerated() .map(pair -> pair.idx % 2 == 0 ? pair.value : -pair.value); However, I see some caveats with this appr
Re: EnumeratedStream
I would like to correct myself a bit: indexes should be assigned at the moment an element is received from upstream, not when it is passed to processing. Not sure if it will effectively change order of execution in actual implementation, but to put it this way would be more precise вс, 21 апр. 2024 г. в 02:33, ІП-24 Олександр Ротань < rotan.olexa...@gmail.com>: > Hello again. > > I have imagined implementation of enumerated streams in a slightly > different way. I think enumerated streams should be a separate kind of > stream like parallel, sequential or some primitive stream. This way > enumeration could be deferred to the time that index is being consumed, > This would eliminate issue with stream merging and also kind of smoothen > the issue with parallel streams, as the indexed will follow the order > elements have been passed to processing. > > This could potentially lead to another issue with consistency of indexes > through stream pipeline, but there could be a workaround like > IdentityHashMaps that is received from previous enumerated streams. > > This way we could make streams more efficient and also introduce a more > user-friendly API, which is essentially a win-win situation. > > Regarding concatenation specifically, we could either introduce separate > methods for merging enumerated streams, where we define how conflicts are > resolved, or just treat them as usual streams, so the result of > concatenation of enumerated streams will result in a normal stream that > should be enumerated once again. Also i don't really see how gathering > elements into index-value pairs solves issue with conflicting indexes > > вс, 21 апр. 2024 г. в 02:06, - : > >> Hi Oleksandr, >> I fear that enumeration might not be applicable to all streams, >> especially parallel ones. If we have a parallel stream, we might not always >> process all elements in order, and even index generation can be unreliable. >> In addition, stream merging will become a headache. I think >> Gatherers.scan(() -> new Indexed<>(0, dummy), (indexed, value) -> new >> Indexed<>(indexed.index() + 1, value)) can create a Stream> >> which should serve your purpose. >> >> And for expanded operations for enumerated streams, there are similar >> features for primitive streams already, where they have extra methods like >> summaryStatistics() compared to object streams. We most likely will have >> Gatherers that explicitly work on Stream> to support >> type-specific operations, partly due to Java's type limits and that Stream >> hierarchy is effectively closed. >> >> Best, >> Chen Liang >> >> On Sat, Apr 20, 2024 at 4:07 PM ІП-24 Олександр Ротань < >> rotan.olexa...@gmail.com> wrote: >> >>> My proposal regarding findIndex brought up a topic that, as I see, has >>> been brought up here multiple times. >>> >>> My idea is to extend the existing stream interface in the new >>> EnumeratedStream interface. Such an approach has a number of advantages. >>> >>> 1. Some methods will be able to explicitly specify that they require an >>> enumerated stream. >>> >>> 2. This would allow us to use an enumerated stream just as the default >>> value stream, if the index is redundant at a certain processing step. >>> >>> 3. This could help introduce a more user-friendly interface: instead of >>> passing pairs of index and value, they could be passed as separate >>> variables, which will make code more concise. >>> >>> Consider following example: >>> List.of(1, 2, 3).stream() >>> .enumerated() >>> .map(idx, value -> idx % 2 == 0 ? value : -value); >>> >>> looks much better than >>> List.of(1, 2, 3).stream() >>> .enumerated() >>> .map(pair -> pair.idx % 2 == 0 ? pair.value : >>> -pair.value); >>> >>> However, I see some caveats with this approach, which relate to parallel >>> streams: >>> when a stream is backed by a collection, you might expect assigned >>> indexes to represent order of iteration through collection, especially when >>> talking about sequential collections. Assigning indexes in such streams >>> could be a difficult task in a parallel environment. It should either >>> assign index to a stream elements at the moment when stream becomes >>> parallelized, which is also impossible if already parallelized stream is >>> being enumerated, and also wont work for infinite streams, or make >>> enumerated stream at least partially sequential by passing elements to >>> processing in order they were passed to stream at the first place, which is >>> also hard or maybe even impossible to achieve. >>> >>> Also, side note: methods that already accept 2 parameters might become >>> kinda verbose, like reduce that will have to accept for params (idx1, val1, >>> idx2, val2), but i think that is not a big deal >>> >>> >>>
Re: EnumeratedStream
Hello again. I have imagined implementation of enumerated streams in a slightly different way. I think enumerated streams should be a separate kind of stream like parallel, sequential or some primitive stream. This way enumeration could be deferred to the time that index is being consumed, This would eliminate issue with stream merging and also kind of smoothen the issue with parallel streams, as the indexed will follow the order elements have been passed to processing. This could potentially lead to another issue with consistency of indexes through stream pipeline, but there could be a workaround like IdentityHashMaps that is received from previous enumerated streams. This way we could make streams more efficient and also introduce a more user-friendly API, which is essentially a win-win situation. Regarding concatenation specifically, we could either introduce separate methods for merging enumerated streams, where we define how conflicts are resolved, or just treat them as usual streams, so the result of concatenation of enumerated streams will result in a normal stream that should be enumerated once again. Also i don't really see how gathering elements into index-value pairs solves issue with conflicting indexes вс, 21 апр. 2024 г. в 02:06, - : > Hi Oleksandr, > I fear that enumeration might not be applicable to all streams, especially > parallel ones. If we have a parallel stream, we might not always process > all elements in order, and even index generation can be unreliable. In > addition, stream merging will become a headache. I think Gatherers.scan(() > -> new Indexed<>(0, dummy), (indexed, value) -> new > Indexed<>(indexed.index() + 1, value)) can create a Stream> > which should serve your purpose. > > And for expanded operations for enumerated streams, there are similar > features for primitive streams already, where they have extra methods like > summaryStatistics() compared to object streams. We most likely will have > Gatherers that explicitly work on Stream> to support > type-specific operations, partly due to Java's type limits and that Stream > hierarchy is effectively closed. > > Best, > Chen Liang > > On Sat, Apr 20, 2024 at 4:07 PM ІП-24 Олександр Ротань < > rotan.olexa...@gmail.com> wrote: > >> My proposal regarding findIndex brought up a topic that, as I see, has >> been brought up here multiple times. >> >> My idea is to extend the existing stream interface in the new >> EnumeratedStream interface. Such an approach has a number of advantages. >> >> 1. Some methods will be able to explicitly specify that they require an >> enumerated stream. >> >> 2. This would allow us to use an enumerated stream just as the default >> value stream, if the index is redundant at a certain processing step. >> >> 3. This could help introduce a more user-friendly interface: instead of >> passing pairs of index and value, they could be passed as separate >> variables, which will make code more concise. >> >> Consider following example: >> List.of(1, 2, 3).stream() >> .enumerated() >> .map(idx, value -> idx % 2 == 0 ? value : -value); >> >> looks much better than >> List.of(1, 2, 3).stream() >> .enumerated() >> .map(pair -> pair.idx % 2 == 0 ? pair.value : >> -pair.value); >> >> However, I see some caveats with this approach, which relate to parallel >> streams: >> when a stream is backed by a collection, you might expect assigned >> indexes to represent order of iteration through collection, especially when >> talking about sequential collections. Assigning indexes in such streams >> could be a difficult task in a parallel environment. It should either >> assign index to a stream elements at the moment when stream becomes >> parallelized, which is also impossible if already parallelized stream is >> being enumerated, and also wont work for infinite streams, or make >> enumerated stream at least partially sequential by passing elements to >> processing in order they were passed to stream at the first place, which is >> also hard or maybe even impossible to achieve. >> >> Also, side note: methods that already accept 2 parameters might become >> kinda verbose, like reduce that will have to accept for params (idx1, val1, >> idx2, val2), but i think that is not a big deal >> >> >>
Re: Bag (Multiset) Collection
Hello! The example you have provided is indeed a workaround. Such a setup indeed covers most of the problems, nothing I could get from the top of my head that can't be accomplished using this. However, there are a few downsides: 1. Obvious performance and memory overhead. I think no elaboration here is needed, maybe jsut point out that, from my experience, we are talking about multiples of 10 when explaining gap in performance. 2. Extremely uncomfortable API. Most of the operations with multisets, at the end of the day, converge to iteration over elements. Every data processing pipeline requires additional flattening before processing in most of the cases, or using nested loops. If, for example, you have a web API that works with data processing and visualization, every IO operation would require mapping from list or array to map back and forth. This could (and will) introduce some errors and make code less maintainable. Also, which is important for me, this approach is very frustrating. My goal will always be developer experience improvement in the first place. I know that is not a top-level priority for Java maintainers, but I will always try my best to make the life of Java users more comfortable. вс, 21 апр. 2024 г. в 01:57, - : > Hi IP-24 Oleksandr Rotan', > You can use a Map>. This map stores T as the object > equivalence key, and each Collection in this map stores the values known > to be equal to the key. Since map compares keys by equals, any key in the > collection can be used to find the collection in the Map>. > > If you want a Tree/Navigable multiset, you can just make it a > NavigableMap>. If you use this concurrently, you can > replace the value Collection with something like CopyOnWriteArrayList. > > Is there any Multiset functionality not covered by such a Map Collection> setup? > > P.S. For the other versions of tree collections, I think JDK uses Red > Black tree due to its superior performance when there's frequent removal. > AVL is known for faster additions but involves a lot more balance > operations during removal. If we are looking at immutable navigable/sorted > collections, we can use an array and perform binary search, which works > just like a tree. > > Regards, > Chen Liang > > > On Sat, Apr 20, 2024 at 3:25 PM ІП-24 Олександр Ротань < > rotan.olexa...@gmail.com> wrote: > >> In this letter I would like to express some of my thoughts regarding the >> potential Multiset interface. >> >> I, personally, have encountered a few situations where such an interface >> could come in handy, mostly when I needed an ordered collection that >> permits duplicates. That time I used guava`s TreeMultiset, but I think Java >> itself should have such a collection in its std library. While it is not a >> very common problem, there are still a bunch of use cases where such things >> could come in handy. >> >> I am willing to take on development of such thing, but there are a few >> concerns about Multiset: >> >> 1. Is there any other use for this besides ordered collection that >> permits duplicates? I can't remember anything else from the top of my head. >> >> 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet >> class hierarchy, while not being directly connected. I think introducing >> any other ordered collection will require some refactoring of existing >> collection interfaces (for example extract SortedCollection from SortedSet, >> Navigable Collection from NavigableSet etc.). From the perspective of clean >> code, this would be the right decision, but I feel like this will be a very >> complex task to accomplish. >> >> 3. Maybe there should be few versions of Tree collection (for example, >> for regular tasks red-black tree and B-Tree for large amounts of data). I >> have some expirience implementing both, but is it really needed in standard >> library? >> >
Re: EnumeratedStream
Hi Oleksandr, I fear that enumeration might not be applicable to all streams, especially parallel ones. If we have a parallel stream, we might not always process all elements in order, and even index generation can be unreliable. In addition, stream merging will become a headache. I think Gatherers.scan(() -> new Indexed<>(0, dummy), (indexed, value) -> new Indexed<>(indexed.index() + 1, value)) can create a Stream> which should serve your purpose. And for expanded operations for enumerated streams, there are similar features for primitive streams already, where they have extra methods like summaryStatistics() compared to object streams. We most likely will have Gatherers that explicitly work on Stream> to support type-specific operations, partly due to Java's type limits and that Stream hierarchy is effectively closed. Best, Chen Liang On Sat, Apr 20, 2024 at 4:07 PM ІП-24 Олександр Ротань < rotan.olexa...@gmail.com> wrote: > My proposal regarding findIndex brought up a topic that, as I see, has > been brought up here multiple times. > > My idea is to extend the existing stream interface in the new > EnumeratedStream interface. Such an approach has a number of advantages. > > 1. Some methods will be able to explicitly specify that they require an > enumerated stream. > > 2. This would allow us to use an enumerated stream just as the default > value stream, if the index is redundant at a certain processing step. > > 3. This could help introduce a more user-friendly interface: instead of > passing pairs of index and value, they could be passed as separate > variables, which will make code more concise. > > Consider following example: > List.of(1, 2, 3).stream() > .enumerated() > .map(idx, value -> idx % 2 == 0 ? value : -value); > > looks much better than > List.of(1, 2, 3).stream() > .enumerated() > .map(pair -> pair.idx % 2 == 0 ? pair.value : -pair.value); > > However, I see some caveats with this approach, which relate to parallel > streams: > when a stream is backed by a collection, you might expect assigned indexes > to represent order of iteration through collection, especially when talking > about sequential collections. Assigning indexes in such streams could be a > difficult task in a parallel environment. It should either assign index to > a stream elements at the moment when stream becomes parallelized, which is > also impossible if already parallelized stream is being enumerated, and > also wont work for infinite streams, or make enumerated stream at least > partially sequential by passing elements to processing in order they were > passed to stream at the first place, which is also hard or maybe even > impossible to achieve. > > Also, side note: methods that already accept 2 parameters might become > kinda verbose, like reduce that will have to accept for params (idx1, val1, > idx2, val2), but i think that is not a big deal > > >
Re: Bag (Multiset) Collection
Hi IP-24 Oleksandr Rotan', You can use a Map>. This map stores T as the object equivalence key, and each Collection in this map stores the values known to be equal to the key. Since map compares keys by equals, any key in the collection can be used to find the collection in the Map>. If you want a Tree/Navigable multiset, you can just make it a NavigableMap>. If you use this concurrently, you can replace the value Collection with something like CopyOnWriteArrayList. Is there any Multiset functionality not covered by such a Map> setup? P.S. For the other versions of tree collections, I think JDK uses Red Black tree due to its superior performance when there's frequent removal. AVL is known for faster additions but involves a lot more balance operations during removal. If we are looking at immutable navigable/sorted collections, we can use an array and perform binary search, which works just like a tree. Regards, Chen Liang On Sat, Apr 20, 2024 at 3:25 PM ІП-24 Олександр Ротань < rotan.olexa...@gmail.com> wrote: > In this letter I would like to express some of my thoughts regarding the > potential Multiset interface. > > I, personally, have encountered a few situations where such an interface > could come in handy, mostly when I needed an ordered collection that > permits duplicates. That time I used guava`s TreeMultiset, but I think Java > itself should have such a collection in its std library. While it is not a > very common problem, there are still a bunch of use cases where such things > could come in handy. > > I am willing to take on development of such thing, but there are a few > concerns about Multiset: > > 1. Is there any other use for this besides ordered collection that permits > duplicates? I can't remember anything else from the top of my head. > > 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet class > hierarchy, while not being directly connected. I think introducing any > other ordered collection will require some refactoring of existing > collection interfaces (for example extract SortedCollection from SortedSet, > Navigable Collection from NavigableSet etc.). From the perspective of clean > code, this would be the right decision, but I feel like this will be a very > complex task to accomplish. > > 3. Maybe there should be few versions of Tree collection (for example, for > regular tasks red-black tree and B-Tree for large amounts of data). I have > some expirience implementing both, but is it really needed in standard > library? >
Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v25]
On Sat, 20 Apr 2024 19:09:43 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See >> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around >> this change. >> >> Overall, making this an intrinsic improves overall performance of >> `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and >> after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Fix UnsafeCopyMemoryMark scope issue Merge done. - PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2067803696
Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v26]
> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See > [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around > this change. > > Overall, making this an intrinsic improves overall performance of > `Unsafe::setMemory` by up to 4x for all buffer sizes. > > Tested with tier-1 (and full CI). I've added a table of the before and after > numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). > > [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: - Merge branch 'openjdk:master' into setMemory - Fix UnsafeCopyMemoryMark scope issue - Long to short jmp; other cleanup - Review comments - Address review comments; update copyright years - Add enter() and leave(); remove Windows-specific register stuff - Fix memory mark after sync to upstream - Merge branch 'openjdk:master' into setMemory - Set memory test (#23) * Even more review comments * Re-write of atomic copy loops * Change name of UnsafeCopyMemory{,Mark} to UnsafeMemory{Access,Mark} * Only add a memory mark for byte unaligned fill * Remove MUSL_LIBC ifdef * Remove MUSL_LIBC ifdef - Set memory test (#22) * Even more review comments * Re-write of atomic copy loops * Change name of UnsafeCopyMemory{,Mark} to UnsafeMemory{Access,Mark} * Only add a memory mark for byte unaligned fill - ... and 27 more: https://git.openjdk.org/jdk/compare/6d569961...1122b500 - Changes: https://git.openjdk.org/jdk/pull/18555/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18555&range=25 Stats: 507 lines in 36 files changed: 420 ins; 5 del; 82 mod Patch: https://git.openjdk.org/jdk/pull/18555.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18555/head:pull/18555 PR: https://git.openjdk.org/jdk/pull/18555
Re: Bag (Multiset) Collection
Also, after some research, I found that HashMultisets also have some area of application in image detection and event simulations вс, 21 апр. 2024 г. в 01:19, ІП-24 Олександр Ротань < rotan.olexa...@gmail.com>: > Sorry for duplicate, I accidentally sent last message only to you > > вс, 21 апр. 2024 г. в 01:19, ІП-24 Олександр Ротань < > rotan.olexa...@gmail.com>: > >> Firstly, about queues. As far as I know, priority queue is implemented as >> heap, not tree, so things like subQueue doesn't make much sense in this >> context. >> >> About popularity: this is indeed not very popular, might even be >> negligible outside of sorted multiset (lets adopt java naming conventions >> for those from now on). Although, I could provide at least few examples: >> >> 1. Batch tasks processing, especially in concurrent environments, when >> tasks could have different priorities. Sometimes tasks that are stored in a >> queue are processed in batches. Using today's stdlib of Java, >> acquiring such a batch of tasks from collection would be a O(n * log n) >> complexity, because each heapification is log n and called after each task >> is popped. Using TreeMultiset, this could be achieved in log (n). This also >> applies to messaging. >> >> 2. Implementing custom data structures. Not a task most of commercial >> developers encounter, although for specific types of tasks, like big data >> processing, custom datastructs could be a thing. Some collections, >> expeciall ordered, like B-tree, could require sorted multiset as part of >> internal implementation. >> >> 3. Again, for sorted multisets, counting frequency is a common task for >> commercial developers. Counting element frequencies using multisets is the >> thing that simply couldn't be achieved by set and is inefficient in list. >> This could have various use cases, from data displaying in histograms in >> generated reports to data validation is stateful systems (checking if user >> has not exceeded permitted amount of attempts, for example). >> >> 4. Storage for the result of db queries. Another common task is to fetch >> some ordered data from the database, and then modify it to preserve order. >> This category of tasks could be optimized using ordered multisets. >> >> 5. Concurent bag, as you just mentioned. This could be used in concurrent >> environment as I mentioned above, or just as less restricting >> concurrent collection as preserving sequence of elements could be a complex >> and resource consuming task in mutable concurrent collections >> >> As you might see, almost all of those indeed converge to a sorted >> collection of duplicating elements. I am not sure If there is much more to >> it, but what i would like to say is that feature like this would definitely >> not be redundant for "enterprise" language like Java, where optimizing some >> operations from O(n) or even O(n * log n) to O (log n) could introduce >> significant performance improvement, as well as reducing amount of possible >> bugs by taking on some of responsibilities related to preserving order of >> elements sorted. >> >> вс, 21 апр. 2024 г. в 00:33, Holo The Sage Wolf : >> >> вс, 21 апр. 2024 г. в 01:06, ІП-24 Олександр Ротань < >> rotan.olexa...@gmail.com>: >> >>> Firstly, about queues. As far as I know, priority queue is implemented >>> as heap, not tree, so things like subQueue doesn't make much sense in this >>> context. >>> >>> About popularity: this is indeed not very popular, might even be >>> negligible outside of sorted multiset (lets adopt java naming conventions >>> for those from now on). Although, I could provide at least few examples: >>> >>> 1. Batch tasks processing, especially in concurrent environments, when >>> tasks could have different priorities. Sometimes tasks that are stored in a >>> queue are processed in batches. Using today's stdlib of Java, >>> acquiring such a batch of tasks from collection would be a O(n * log n) >>> complexity, because each heapification is log n and called after each task >>> is popped. Using TreeMultiset, this could be achieved in log (n). This also >>> applies to messaging. >>> >>> 2. Implementing custom data structures. Not a task most of commercial >>> developers encounter, although for specific types of tasks, like big data >>> processing, custom datastructs could be a thing. Some collections, >>> expeciall ordered, like B-tree, could require sorted multiset as part of >>> internal implementation. >>> >>> 3. Again, for sorted multisets, counting frequency is a common task for >>> commercial developers. Counting element frequencies using multisets is the >>> thing that simply couldn't be achieved by set and is inefficient in list. >>> This could have various use cases, from data displaying in histograms in >>> generated reports to data validation is stateful systems (checking if user >>> has not exceeded permitted amount of attempts, for example). >>> >>> 4. Storage for the result of db queries. Another common task is to fetch >>> some
Re: Bag (Multiset) Collection
Firstly, about queues. As far as I know, priority queue is implemented as heap, not tree, so things like subQueue doesn't make much sense in this context. About popularity: this is indeed not very popular, might even be negligible outside of sorted multiset (lets adopt java naming conventions for those from now on). Although, I could provide at least few examples: 1. Batch tasks processing, especially in concurrent environments, when tasks could have different priorities. Sometimes tasks that are stored in a queue are processed in batches. Using today's stdlib of Java, acquiring such a batch of tasks from collection would be a O(n * log n) complexity, because each heapification is log n and called after each task is popped. Using TreeMultiset, this could be achieved in log (n). This also applies to messaging. 2. Implementing custom data structures. Not a task most of commercial developers encounter, although for specific types of tasks, like big data processing, custom datastructs could be a thing. Some collections, expeciall ordered, like B-tree, could require sorted multiset as part of internal implementation. 3. Again, for sorted multisets, counting frequency is a common task for commercial developers. Counting element frequencies using multisets is the thing that simply couldn't be achieved by set and is inefficient in list. This could have various use cases, from data displaying in histograms in generated reports to data validation is stateful systems (checking if user has not exceeded permitted amount of attempts, for example). 4. Storage for the result of db queries. Another common task is to fetch some ordered data from the database, and then modify it to preserve order. This category of tasks could be optimized using ordered multisets. 5. Concurent bag, as you just mentioned. This could be used in concurrent environment as I mentioned above, or just as less restricting concurrent collection as preserving sequence of elements could be a complex and resource consuming task in mutable concurrent collections As you might see, almost all of those indeed converge to a sorted collection of duplicating elements. I am not sure If there is much more to it, but what i would like to say is that feature like this would definitely not be redundant for "enterprise" language like Java, where optimizing some operations from O(n) or even O(n * log n) to O (log n) could introduce significant performance improvement, as well as reducing amount of possible bugs by taking on some of responsibilities related to preserving order of elements sorted. вс, 21 апр. 2024 г. в 00:33, Holo The Sage Wolf : вс, 21 апр. 2024 г. в 01:06, ІП-24 Олександр Ротань < rotan.olexa...@gmail.com>: > Firstly, about queues. As far as I know, priority queue is implemented as > heap, not tree, so things like subQueue doesn't make much sense in this > context. > > About popularity: this is indeed not very popular, might even be > negligible outside of sorted multiset (lets adopt java naming conventions > for those from now on). Although, I could provide at least few examples: > > 1. Batch tasks processing, especially in concurrent environments, when > tasks could have different priorities. Sometimes tasks that are stored in a > queue are processed in batches. Using today's stdlib of Java, > acquiring such a batch of tasks from collection would be a O(n * log n) > complexity, because each heapification is log n and called after each task > is popped. Using TreeMultiset, this could be achieved in log (n). This also > applies to messaging. > > 2. Implementing custom data structures. Not a task most of commercial > developers encounter, although for specific types of tasks, like big data > processing, custom datastructs could be a thing. Some collections, > expeciall ordered, like B-tree, could require sorted multiset as part of > internal implementation. > > 3. Again, for sorted multisets, counting frequency is a common task for > commercial developers. Counting element frequencies using multisets is the > thing that simply couldn't be achieved by set and is inefficient in list. > This could have various use cases, from data displaying in histograms in > generated reports to data validation is stateful systems (checking if user > has not exceeded permitted amount of attempts, for example). > > 4. Storage for the result of db queries. Another common task is to fetch > some ordered data from the database, and then modify it to preserve order. > This category of tasks could be optimized using ordered multisets. > > 5. Concurent bag, as you just mentioned. This could be used in concurrent > environment as I mentioned above, or just as less restricting > concurrent collection as preserving sequence of elements could be a complex > and resource consuming task in mutable concurrent collections > > As you might see, almost all of those indeed converge to a sorted > collection of duplicating elements. I am not sure If there is much more to > it, bu
Re: EnumeratedStream
Yes, your point about enumeration is the best way to do it, i guess, I am also voting for this. This makes the most sense considering the way that method invocation chains should be handled вс, 21 апр. 2024 г. в 00:55, David Alayachew : > I am in full support of this idea. I do also appreciate the functionality > of using a BiFunction on the map method instead of a normal > Function, R>. > > As for the actual enumeration logic, my vote is that it should simply > enumerate as it arrives, with no context or care given to what came before > it. Consider the following examples. > > final List list = List.of(0, 1, 2, 3, 4, 5, 6, 7, 8, 9); > > System.out.println(list.stream().enumerated().filter((idx, val) -> val % 2 > == 0).toList()); > //0:0, 2:2, 4:4, 6:6, 8:8 > > System.out.println(list.stream().filter(val -> val % 2 == > 0).enumerated().toList()); > //0:0, 1:2, 2:4, 3:6, 4:8 > > Enumeration is applied as it arrives, and no sooner. That, I feel, will > keep the implementation simplest, and allow the streaming logic to make the > most sense. And I added helpful reinterpretations of filter() method (like > you did with map()), but I don't think either is necessary. > > Of course, the big question looming over all of this is -- does this > feature carry its weight? You and I vote yes, but let's see what the > community says. > > What do you all think? Is this feature worth it -- with or without some > changes? > > On Sat, Apr 20, 2024 at 5:07 PM ІП-24 Олександр Ротань < > rotan.olexa...@gmail.com> wrote: > >> My proposal regarding findIndex brought up a topic that, as I see, has >> been brought up here multiple times. >> >> My idea is to extend the existing stream interface in the new >> EnumeratedStream interface. Such an approach has a number of advantages. >> >> 1. Some methods will be able to explicitly specify that they require an >> enumerated stream. >> >> 2. This would allow us to use an enumerated stream just as the default >> value stream, if the index is redundant at a certain processing step. >> >> 3. This could help introduce a more user-friendly interface: instead of >> passing pairs of index and value, they could be passed as separate >> variables, which will make code more concise. >> >> Consider following example: >> List.of(1, 2, 3).stream() >> .enumerated() >> .map(idx, value -> idx % 2 == 0 ? value : -value); >> >> looks much better than >> List.of(1, 2, 3).stream() >> .enumerated() >> .map(pair -> pair.idx % 2 == 0 ? pair.value : >> -pair.value); >> >> However, I see some caveats with this approach, which relate to parallel >> streams: >> when a stream is backed by a collection, you might expect assigned >> indexes to represent order of iteration through collection, especially when >> talking about sequential collections. Assigning indexes in such streams >> could be a difficult task in a parallel environment. It should either >> assign index to a stream elements at the moment when stream becomes >> parallelized, which is also impossible if already parallelized stream is >> being enumerated, and also wont work for infinite streams, or make >> enumerated stream at least partially sequential by passing elements to >> processing in order they were passed to stream at the first place, which is >> also hard or maybe even impossible to achieve. >> >> Also, side note: methods that already accept 2 parameters might become >> kinda verbose, like reduce that will have to accept for params (idx1, val1, >> idx2, val2), but i think that is not a big deal >> >> >>
Re: Bag (Multiset) Collection
Biggest use case is definitely when creating different histograms from the same dataset. Our friends in sci-py land spend A LOT of time doing this. Our R friends also use this frequently. I can imagine this bag implementation would not just be good for Collections, but would play well with the Collectors framework, as well as Gatherers. On Sat, Apr 20, 2024 at 5:57 PM Holo The Sage Wolf wrote: > By "ordered collection" you mean "unordered collection"? > > How common is it to actually use a multiset in general? Of course there > are use cases, and there are areas in which it is used a lot, but I don't > believe it is common enough to be part of the std with the one exception of > concurrent bag. > > On Sat, 20 Apr 2024, 23:25 ІП-24 Олександр Ротань, < > rotan.olexa...@gmail.com> wrote: > >> In this letter I would like to express some of my thoughts regarding the >> potential Multiset interface. >> >> I, personally, have encountered a few situations where such an interface >> could come in handy, mostly when I needed an ordered collection that >> permits duplicates. That time I used guava`s TreeMultiset, but I think Java >> itself should have such a collection in its std library. While it is not a >> very common problem, there are still a bunch of use cases where such things >> could come in handy. >> >> I am willing to take on development of such thing, but there are a few >> concerns about Multiset: >> >> 1. Is there any other use for this besides ordered collection that >> permits duplicates? I can't remember anything else from the top of my head. >> >> 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet >> class hierarchy, while not being directly connected. I think introducing >> any other ordered collection will require some refactoring of existing >> collection interfaces (for example extract SortedCollection from SortedSet, >> Navigable Collection from NavigableSet etc.). From the perspective of clean >> code, this would be the right decision, but I feel like this will be a very >> complex task to accomplish. >> >> 3. Maybe there should be few versions of Tree collection (for example, >> for regular tasks red-black tree and B-Tree for large amounts of data). I >> have some expirience implementing both, but is it really needed in standard >> library? >> >
Re: EnumeratedStream
I am in full support of this idea. I do also appreciate the functionality of using a BiFunction on the map method instead of a normal Function, R>. As for the actual enumeration logic, my vote is that it should simply enumerate as it arrives, with no context or care given to what came before it. Consider the following examples. final List list = List.of(0, 1, 2, 3, 4, 5, 6, 7, 8, 9); System.out.println(list.stream().enumerated().filter((idx, val) -> val % 2 == 0).toList()); //0:0, 2:2, 4:4, 6:6, 8:8 System.out.println(list.stream().filter(val -> val % 2 == 0).enumerated().toList()); //0:0, 1:2, 2:4, 3:6, 4:8 Enumeration is applied as it arrives, and no sooner. That, I feel, will keep the implementation simplest, and allow the streaming logic to make the most sense. And I added helpful reinterpretations of filter() method (like you did with map()), but I don't think either is necessary. Of course, the big question looming over all of this is -- does this feature carry its weight? You and I vote yes, but let's see what the community says. What do you all think? Is this feature worth it -- with or without some changes? On Sat, Apr 20, 2024 at 5:07 PM ІП-24 Олександр Ротань < rotan.olexa...@gmail.com> wrote: > My proposal regarding findIndex brought up a topic that, as I see, has > been brought up here multiple times. > > My idea is to extend the existing stream interface in the new > EnumeratedStream interface. Such an approach has a number of advantages. > > 1. Some methods will be able to explicitly specify that they require an > enumerated stream. > > 2. This would allow us to use an enumerated stream just as the default > value stream, if the index is redundant at a certain processing step. > > 3. This could help introduce a more user-friendly interface: instead of > passing pairs of index and value, they could be passed as separate > variables, which will make code more concise. > > Consider following example: > List.of(1, 2, 3).stream() > .enumerated() > .map(idx, value -> idx % 2 == 0 ? value : -value); > > looks much better than > List.of(1, 2, 3).stream() > .enumerated() > .map(pair -> pair.idx % 2 == 0 ? pair.value : -pair.value); > > However, I see some caveats with this approach, which relate to parallel > streams: > when a stream is backed by a collection, you might expect assigned indexes > to represent order of iteration through collection, especially when talking > about sequential collections. Assigning indexes in such streams could be a > difficult task in a parallel environment. It should either assign index to > a stream elements at the moment when stream becomes parallelized, which is > also impossible if already parallelized stream is being enumerated, and > also wont work for infinite streams, or make enumerated stream at least > partially sequential by passing elements to processing in order they were > passed to stream at the first place, which is also hard or maybe even > impossible to achieve. > > Also, side note: methods that already accept 2 parameters might become > kinda verbose, like reduce that will have to accept for params (idx1, val1, > idx2, val2), but i think that is not a big deal > > >
Re: Bag (Multiset) Collection
Let me prefix this by saying that I'm a mathematician, so maybe the lingo I use is a bit different. A multi set has no structure apart from "counting duplication", just like a set has no structure at all, ordered set **is not a set** and ordered multi set **is not a multi set**. With the clarification you gave I understand what you mean and retract my first question (although I believe that "sortedX" or "priorityX" are the naming Java has for what you called "orderedX"). > THe whole point of the bag is to store duplicate elements without preserving order, which allows things like ordered collections. No, the whole point of the bag is to store elements. The difference between a bag and a list is that a bag has less, anything you can do with a bag you can do with a list. The benefit of having a bag is to have a more restrictive interface for the times you don't need the extra structure, as well as the possible performant implementation that doesn't need to respect a stricted contract. CopyOnWriteArratList is more than just a bag. > then that just adds one more use case to potential Bag interface Indeed, I gave concurrent bag as an example of a possible implementation of a bag interface, but you didn't answer my main question, just how common is a use if a bag to change the std for it? To me it sounds like you are interested only on the sorted/navigable variations, so an alternative solution will be to improve PriorityQueue to have a "subQueue" methods (and friends) as well as extending it to a Navigable variation. On Sun, 21 Apr 2024, 00:01 ІП-24 Олександр Ротань, wrote: > Concurrent Bag is something like CopyOnWriteArrayList or Vector, don't we > have it already? (I know vectors aren't optimized). THe whole point of the > bag is to store duplicate elements without preserving order, which allows > things like ordered collections. There could be mutable collections for > concurrency optimized with read-write locks though, if that is what you are > referring to as "concurrent bag", and I am missing the point why this > can't be a part of list interface, then that just adds one more use case to > potential Bag interface. > > PS: Ordered means it preserves order based on comparator, what you are > referring to as ordered essentially means sequential, or am I missing a > point? > > сб, 20 апр. 2024 г. в 23:52, Holo The Sage Wolf : > >> By "ordered collection" you mean "unordered collection"? >> >> How common is it to actually use a multiset in general? Of course there >> are use cases, and there are areas in which it is used a lot, but I don't >> believe it is common enough to be part of the std with the one exception of >> concurrent bag. >> >> On Sat, 20 Apr 2024, 23:25 ІП-24 Олександр Ротань, < >> rotan.olexa...@gmail.com> wrote: >> >>> In this letter I would like to express some of my thoughts regarding the >>> potential Multiset interface. >>> >>> I, personally, have encountered a few situations where such an interface >>> could come in handy, mostly when I needed an ordered collection that >>> permits duplicates. That time I used guava`s TreeMultiset, but I think Java >>> itself should have such a collection in its std library. While it is not a >>> very common problem, there are still a bunch of use cases where such things >>> could come in handy. >>> >>> I am willing to take on development of such thing, but there are a few >>> concerns about Multiset: >>> >>> 1. Is there any other use for this besides ordered collection that >>> permits duplicates? I can't remember anything else from the top of my head. >>> >>> 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet >>> class hierarchy, while not being directly connected. I think introducing >>> any other ordered collection will require some refactoring of existing >>> collection interfaces (for example extract SortedCollection from SortedSet, >>> Navigable Collection from NavigableSet etc.). From the perspective of clean >>> code, this would be the right decision, but I feel like this will be a very >>> complex task to accomplish. >>> >>> 3. Maybe there should be few versions of Tree collection (for example, >>> for regular tasks red-black tree and B-Tree for large amounts of data). I >>> have some expirience implementing both, but is it really needed in standard >>> library? >>> >>
EnumeratedStream
My proposal regarding findIndex brought up a topic that, as I see, has been brought up here multiple times. My idea is to extend the existing stream interface in the new EnumeratedStream interface. Such an approach has a number of advantages. 1. Some methods will be able to explicitly specify that they require an enumerated stream. 2. This would allow us to use an enumerated stream just as the default value stream, if the index is redundant at a certain processing step. 3. This could help introduce a more user-friendly interface: instead of passing pairs of index and value, they could be passed as separate variables, which will make code more concise. Consider following example: List.of(1, 2, 3).stream() .enumerated() .map(idx, value -> idx % 2 == 0 ? value : -value); looks much better than List.of(1, 2, 3).stream() .enumerated() .map(pair -> pair.idx % 2 == 0 ? pair.value : -pair.value); However, I see some caveats with this approach, which relate to parallel streams: when a stream is backed by a collection, you might expect assigned indexes to represent order of iteration through collection, especially when talking about sequential collections. Assigning indexes in such streams could be a difficult task in a parallel environment. It should either assign index to a stream elements at the moment when stream becomes parallelized, which is also impossible if already parallelized stream is being enumerated, and also wont work for infinite streams, or make enumerated stream at least partially sequential by passing elements to processing in order they were passed to stream at the first place, which is also hard or maybe even impossible to achieve. Also, side note: methods that already accept 2 parameters might become kinda verbose, like reduce that will have to accept for params (idx1, val1, idx2, val2), but i think that is not a big deal
Re: Bag (Multiset) Collection
I won't stop you, but remember, you are not the one maintaining the large number of collection implementations in the JDK. The ones who are may feel differently. This FAQ gives a good hint of their opinion, even to your retorts. On Sat, Apr 20, 2024, 4:44 PM ІП-24 Олександр Ротань < rotan.olexa...@gmail.com> wrote: > I agree with the point made in the FAQ about the popularity of such > problems. That said, I don't think that it is that unpopular to be ignored. > > Regarding Map.values(), this is the case, but, In my experience, one of > the main advantages of using TreeMultiset was O(long n) modification > complexity. Collection views returned by map, for obvious reasons, forbid > modification. Also, correct me if I`m wrong, but while TreeMultiset is an > unpopular requirement, it is at least as popular as TreeSet. > > Also, I may not have a full view of how such Bag collections can be > implemented and used, however, I feel like there could be more to it. > > I am not insisting on anything, I just feel that if there is someone (like > me lol) who is willing to take on full development and integration cycle, > there aren't much reason to reject such enhancements. > > сб, 20 апр. 2024 г. в 23:43, ІП-24 Олександр Ротань < > rotan.olexa...@gmail.com>: > >> I agree with the point made in the FAQ about the popularity of such >> problems. That said, I don't think that it is that unpopular to be ignored. >> >> Regarding Map.values(), this is the case, but, In my experience, one of >> the main advantages of using TreeMultiset was O(long n) modification >> complexity. Collection views returned by map, for obvious reasons, forbid >> modification. Also, correct me if I`m wrong, but while TreeMultiset is an >> unpopular requirement, it is at least as popular as TreeSet. >> >> Also, I may not have a full view of how such Bag collections can be >> implemented and used, however, I feel like there could be more to it. >> >> I am not insisting on anything, I just feel that if there is someone >> (like me lol) who is willing to take on full development and integration >> cycle, there aren't much reason to reject such enhancements. >> >> сб, 20 апр. 2024 г. в 23:31, David Alayachew : >> >>> Your Bag suggestion has been asked so frequently that there is an FAQ >>> entry in the official Java Docs. >>> >>> >>> https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/doc-files/coll-designfaq.html#a3 >>> >>> On Sat, Apr 20, 2024 at 4:25 PM ІП-24 Олександр Ротань < >>> rotan.olexa...@gmail.com> wrote: >>> In this letter I would like to express some of my thoughts regarding the potential Multiset interface. I, personally, have encountered a few situations where such an interface could come in handy, mostly when I needed an ordered collection that permits duplicates. That time I used guava`s TreeMultiset, but I think Java itself should have such a collection in its std library. While it is not a very common problem, there are still a bunch of use cases where such things could come in handy. I am willing to take on development of such thing, but there are a few concerns about Multiset: 1. Is there any other use for this besides ordered collection that permits duplicates? I can't remember anything else from the top of my head. 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet class hierarchy, while not being directly connected. I think introducing any other ordered collection will require some refactoring of existing collection interfaces (for example extract SortedCollection from SortedSet, Navigable Collection from NavigableSet etc.). From the perspective of clean code, this would be the right decision, but I feel like this will be a very complex task to accomplish. 3. Maybe there should be few versions of Tree collection (for example, for regular tasks red-black tree and B-Tree for large amounts of data). I have some expirience implementing both, but is it really needed in standard library? >>>
Re: Bag (Multiset) Collection
Concurrent Bag is something like CopyOnWriteArrayList or Vector, don't we have it already? (I know vectors aren't optimized). THe whole point of the bag is to store duplicate elements without preserving order, which allows things like ordered collections. There could be mutable collections for concurrency optimized with read-write locks though, if that is what you are referring to as "concurrent bag", and I am missing the point why this can't be a part of list interface, then that just adds one more use case to potential Bag interface. PS: Ordered means it preserves order based on comparator, what you are referring to as ordered essentially means sequential, or am I missing a point? сб, 20 апр. 2024 г. в 23:52, Holo The Sage Wolf : > By "ordered collection" you mean "unordered collection"? > > How common is it to actually use a multiset in general? Of course there > are use cases, and there are areas in which it is used a lot, but I don't > believe it is common enough to be part of the std with the one exception of > concurrent bag. > > On Sat, 20 Apr 2024, 23:25 ІП-24 Олександр Ротань, < > rotan.olexa...@gmail.com> wrote: > >> In this letter I would like to express some of my thoughts regarding the >> potential Multiset interface. >> >> I, personally, have encountered a few situations where such an interface >> could come in handy, mostly when I needed an ordered collection that >> permits duplicates. That time I used guava`s TreeMultiset, but I think Java >> itself should have such a collection in its std library. While it is not a >> very common problem, there are still a bunch of use cases where such things >> could come in handy. >> >> I am willing to take on development of such thing, but there are a few >> concerns about Multiset: >> >> 1. Is there any other use for this besides ordered collection that >> permits duplicates? I can't remember anything else from the top of my head. >> >> 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet >> class hierarchy, while not being directly connected. I think introducing >> any other ordered collection will require some refactoring of existing >> collection interfaces (for example extract SortedCollection from SortedSet, >> Navigable Collection from NavigableSet etc.). From the perspective of clean >> code, this would be the right decision, but I feel like this will be a very >> complex task to accomplish. >> >> 3. Maybe there should be few versions of Tree collection (for example, >> for regular tasks red-black tree and B-Tree for large amounts of data). I >> have some expirience implementing both, but is it really needed in standard >> library? >> >
Re: Bag (Multiset) Collection
By "ordered collection" you mean "unordered collection"? How common is it to actually use a multiset in general? Of course there are use cases, and there are areas in which it is used a lot, but I don't believe it is common enough to be part of the std with the one exception of concurrent bag. On Sat, 20 Apr 2024, 23:25 ІП-24 Олександр Ротань, wrote: > In this letter I would like to express some of my thoughts regarding the > potential Multiset interface. > > I, personally, have encountered a few situations where such an interface > could come in handy, mostly when I needed an ordered collection that > permits duplicates. That time I used guava`s TreeMultiset, but I think Java > itself should have such a collection in its std library. While it is not a > very common problem, there are still a bunch of use cases where such things > could come in handy. > > I am willing to take on development of such thing, but there are a few > concerns about Multiset: > > 1. Is there any other use for this besides ordered collection that permits > duplicates? I can't remember anything else from the top of my head. > > 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet class > hierarchy, while not being directly connected. I think introducing any > other ordered collection will require some refactoring of existing > collection interfaces (for example extract SortedCollection from SortedSet, > Navigable Collection from NavigableSet etc.). From the perspective of clean > code, this would be the right decision, but I feel like this will be a very > complex task to accomplish. > > 3. Maybe there should be few versions of Tree collection (for example, for > regular tasks red-black tree and B-Tree for large amounts of data). I have > some expirience implementing both, but is it really needed in standard > library? >
Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v25]
On Sat, 20 Apr 2024 19:09:43 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See >> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around >> this change. >> >> Overall, making this an intrinsic improves overall performance of >> `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and >> after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Fix UnsafeCopyMemoryMark scope issue Before I do testing, please sync with mainline. - PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-206569
Re: Bag (Multiset) Collection
I agree with the point made in the FAQ about the popularity of such problems. That said, I don't think that it is that unpopular to be ignored. Regarding Map.values(), this is the case, but, In my experience, one of the main advantages of using TreeMultiset was O(long n) modification complexity. Collection views returned by map, for obvious reasons, forbid modification. Also, correct me if I`m wrong, but while TreeMultiset is an unpopular requirement, it is at least as popular as TreeSet. Also, I may not have a full view of how such Bag collections can be implemented and used, however, I feel like there could be more to it. I am not insisting on anything, I just feel that if there is someone (like me lol) who is willing to take on full development and integration cycle, there aren't much reason to reject such enhancements. сб, 20 апр. 2024 г. в 23:43, ІП-24 Олександр Ротань < rotan.olexa...@gmail.com>: > I agree with the point made in the FAQ about the popularity of such > problems. That said, I don't think that it is that unpopular to be ignored. > > Regarding Map.values(), this is the case, but, In my experience, one of > the main advantages of using TreeMultiset was O(long n) modification > complexity. Collection views returned by map, for obvious reasons, forbid > modification. Also, correct me if I`m wrong, but while TreeMultiset is an > unpopular requirement, it is at least as popular as TreeSet. > > Also, I may not have a full view of how such Bag collections can be > implemented and used, however, I feel like there could be more to it. > > I am not insisting on anything, I just feel that if there is someone (like > me lol) who is willing to take on full development and integration cycle, > there aren't much reason to reject such enhancements. > > сб, 20 апр. 2024 г. в 23:31, David Alayachew : > >> Your Bag suggestion has been asked so frequently that there is an FAQ >> entry in the official Java Docs. >> >> >> https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/doc-files/coll-designfaq.html#a3 >> >> On Sat, Apr 20, 2024 at 4:25 PM ІП-24 Олександр Ротань < >> rotan.olexa...@gmail.com> wrote: >> >>> In this letter I would like to express some of my thoughts regarding the >>> potential Multiset interface. >>> >>> I, personally, have encountered a few situations where such an interface >>> could come in handy, mostly when I needed an ordered collection that >>> permits duplicates. That time I used guava`s TreeMultiset, but I think Java >>> itself should have such a collection in its std library. While it is not a >>> very common problem, there are still a bunch of use cases where such things >>> could come in handy. >>> >>> I am willing to take on development of such thing, but there are a few >>> concerns about Multiset: >>> >>> 1. Is there any other use for this besides ordered collection that >>> permits duplicates? I can't remember anything else from the top of my head. >>> >>> 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet >>> class hierarchy, while not being directly connected. I think introducing >>> any other ordered collection will require some refactoring of existing >>> collection interfaces (for example extract SortedCollection from SortedSet, >>> Navigable Collection from NavigableSet etc.). From the perspective of clean >>> code, this would be the right decision, but I feel like this will be a very >>> complex task to accomplish. >>> >>> 3. Maybe there should be few versions of Tree collection (for example, >>> for regular tasks red-black tree and B-Tree for large amounts of data). I >>> have some expirience implementing both, but is it really needed in standard >>> library? >>> >>
Re: Bag (Multiset) Collection
Your Bag suggestion has been asked so frequently that there is an FAQ entry in the official Java Docs. https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/doc-files/coll-designfaq.html#a3 On Sat, Apr 20, 2024 at 4:25 PM ІП-24 Олександр Ротань < rotan.olexa...@gmail.com> wrote: > In this letter I would like to express some of my thoughts regarding the > potential Multiset interface. > > I, personally, have encountered a few situations where such an interface > could come in handy, mostly when I needed an ordered collection that > permits duplicates. That time I used guava`s TreeMultiset, but I think Java > itself should have such a collection in its std library. While it is not a > very common problem, there are still a bunch of use cases where such things > could come in handy. > > I am willing to take on development of such thing, but there are a few > concerns about Multiset: > > 1. Is there any other use for this besides ordered collection that permits > duplicates? I can't remember anything else from the top of my head. > > 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet class > hierarchy, while not being directly connected. I think introducing any > other ordered collection will require some refactoring of existing > collection interfaces (for example extract SortedCollection from SortedSet, > Navigable Collection from NavigableSet etc.). From the perspective of clean > code, this would be the right decision, but I feel like this will be a very > complex task to accomplish. > > 3. Maybe there should be few versions of Tree collection (for example, for > regular tasks red-black tree and B-Tree for large amounts of data). I have > some expirience implementing both, but is it really needed in standard > library? >
Bag (Multiset) Collection
In this letter I would like to express some of my thoughts regarding the potential Multiset interface. I, personally, have encountered a few situations where such an interface could come in handy, mostly when I needed an ordered collection that permits duplicates. That time I used guava`s TreeMultiset, but I think Java itself should have such a collection in its std library. While it is not a very common problem, there are still a bunch of use cases where such things could come in handy. I am willing to take on development of such thing, but there are a few concerns about Multiset: 1. Is there any other use for this besides ordered collection that permits duplicates? I can't remember anything else from the top of my head. 2. Guava's TreeMultiset class hierarchy pretty much imitates TreeSet class hierarchy, while not being directly connected. I think introducing any other ordered collection will require some refactoring of existing collection interfaces (for example extract SortedCollection from SortedSet, Navigable Collection from NavigableSet etc.). From the perspective of clean code, this would be the right decision, but I feel like this will be a very complex task to accomplish. 3. Maybe there should be few versions of Tree collection (for example, for regular tasks red-black tree and B-Tree for large amounts of data). I have some expirience implementing both, but is it really needed in standard library?
Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v24]
On Sat, 20 Apr 2024 04:28:43 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Long to short jmp; other cleanup > > `runtime/Unsafe/InternalErrorTest.java` test SIGBUS when run with `-Xcomp` > (and other flags in test's @run command): > > # SIGBUS (0xa) at pc=0x000119514760, pid=63021, tid=28163 > # > # JRE version: Java(TM) SE Runtime Environment (23.0) (fastdebug build > 23-internal-2024-04-19-2326152.vladimir.kozlov.jdkgit2) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug > 23-internal-2024-04-19-2326152.vladimir.kozlov.jdkgit2, compiled mode, > sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-amd64) > # Problematic frame: > # v ~StubRoutines::jbyte_fill 0x000119514760 @vnkozlov Thanks for the feedback. Can you please start the testing again? I'd appreciate it. - PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2067759300
Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v24]
On Fri, 19 Apr 2024 22:08:52 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See >> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around >> this change. >> >> Overall, making this an intrinsic improves overall performance of >> `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and >> after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Long to short jmp; other cleanup The SIGBUS was due to improper scoping of the UnsafeCopyMemoryMark. The change is: ` {` ` // Add set memory mark to protect against unsafe accesses faulting` `-UnsafeCopyMemoryMark(this, ((t == T_BYTE) && !aligned), true);` `+UnsafeCopyMemoryMark usmm(this, ((t == T_BYTE) && !aligned), true);` ` __ generate_fill(t, aligned, to, value, r11, rax, xmm0);` ` }` - PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2067758164
Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v24]
On Sat, 20 Apr 2024 14:14:59 GMT, Jatin Bhateja wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Long to short jmp; other cleanup > > src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2530: > >> 2528: switch (type) { >> 2529: case USM_SHORT: >> 2530: __ movw(Address(dest, (2 * i)), wide_value); > > MOVW emits an extra Operand Size Override prefix byte compared to 32 and 64 > bit stores, any specific reason for keeping same unroll factor for all the > stores. My understanding is the spec requires the appropriate-sized write based on alignment and size. This is why there's no 128-bit or 256-bit store loops. > src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2539: > >> 2537: break; >> 2538: } >> 2539: } > > I understand we want to be as accurate as possible in filling the tail in an > event of SIGBUS, but we are anyways creating a wide value for 8 packed bytes > if destination segment was quadword aligned, aligned quadword stores are > implicitly atomic on x86 targets, what's your thoughts on using a vector > instruction based loop. I believe the spec is specific on the size of the store required given alignment and size. I want to honor that spec even though wider stores could be done in many cases. - PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573373720 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573374108
Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v25]
> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See > [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around > this change. > > Overall, making this an intrinsic improves overall performance of > `Unsafe::setMemory` by up to 4x for all buffer sizes. > > Tested with tier-1 (and full CI). I've added a table of the before and after > numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). > > [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fix UnsafeCopyMemoryMark scope issue - Changes: - all: https://git.openjdk.org/jdk/pull/18555/files - new: https://git.openjdk.org/jdk/pull/18555/files/19616244..c1290169 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18555&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18555&range=23-24 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18555.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18555/head:pull/18555 PR: https://git.openjdk.org/jdk/pull/18555
Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v24]
On Fri, 19 Apr 2024 22:08:52 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See >> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around >> this change. >> >> Overall, making this an intrinsic improves overall performance of >> `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and >> after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Long to short jmp; other cleanup src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2530: > 2528: switch (type) { > 2529: case USM_SHORT: > 2530: __ movw(Address(dest, (2 * i)), wide_value); MOVW emits an extra Operand Size Override prefix byte compared to 32 and 64 bit stores, any specific reason for keeping same unroll factor for all the stores. src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2539: > 2537: break; > 2538: } > 2539: } I understand we want to be as accurate as possible in filling the tail in an event of SIGBUS, but we are anyways creating a wide value for 8 packed bytes if destination segment was quadword aligned, aligned quadword stores are implicitly atomic on x86 targets, what's your thoughts on using a vector instruction based loop. - PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573297441 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573299069
Re: Addition of Predicate-based findIndex and findLastIndex methods to java.util.List
Regarding what Holo said, I think enumerated() would be a more suitable name. I have few ideas about implementation of such things, like separate EnumeratedStream interface, which extends stream and provides methods like forEach(BiConsumer), etc. This would eliminate unnecessary object instantiation and also make stream interface more comfortable as enumeration of steam won't require to consume index in every operation.\ Although it is an interesting and important topic, I have to ask you guys to move it to another thread if possible. This thread already shifted significantly from the original topic, which was potential compatibility and API issues that I need to fix before applying for CSR. сб, 20 апр. 2024 г. в 14:10, Holo The Sage Wolf : > Hello, > > I want to challenge the idea that every stream may be indexed for 3 > reasons: > > Any sensible type to use for an index (int/long) will be bounded, but > streams need not, e.g. > new Random().ints().withIndex().forEach(v -> ...); > May result (after a very long time) non sensical values. Using bigInteger > will solve this problem but would be silly as in practice 99.9% of the > times long is more than enough (and 90% int is probably enough) > > The two other problems have to do with the semantics of "index": > Index gives the impression that the "index" is a real position, but the > stream may be truly non deterministic (using e.g. Hardware Random Stream) > Index gives the impression that the "index" respects the canonical order > of the underlying collection (if exists), that is: > var x = myList.stream().parallel() > .withIndex().findAny().get(); > assert x.value() == myList.get(x.index()); // implementation dependent > > Which I don't think is possible without enormous performance problems. > > --- > > To be clear about my position: I like and want the idea. The first > problem, in my opinion, shouldn't block it, but we should document it. That > being said, I don't think we should be calling this "index". > > On Sat, 20 Apr 2024, 11:54 -, wrote: > >> Hi Rotan', >> Interface methods are both exposed to callers and to implementations. Is >> there any scenario where the implementation can be more efficient than the >> stream/gatherer scanning approach? >> >> indexOf or lastIndexOf may be faster if a list has some tricks like a >> hashmap to quickly look up the index of an item. In contrast, the predicate >> looks too generic and barely optimizable by list implementations; even if >> we can stream in parallel, this is already covered by the stream API, so we >> can simply declare a Gatherer to perform this findIndex operation on a >> stream of elements (also note that a list can be reversed with >> List::reversed() since JDK 21, so the same Gatherer can be reused for >> findLastIndex) >> >> I think if there is no scenario where implementation can do better with >> findIndex, the findIndex should become a Gatherer, which has better >> compatibility. >> >> Best, >> Chen Liang >> >> On Fri, Apr 19, 2024 at 8:51 PM ІП-24 Олександр Ротань < >> rotan.olexa...@gmail.com> wrote: >> >>> Turned out this whole time I was only replying to David. I have to thank >>> him for his patience explaining me how this whole mailing thing work. >>> >>> To summarize what has been missed out, besides what have you seen in >>> quotes. I have pointed out that, while flexibility and simplicity is >>> important, the main reason why I have decided to propose this feature is to >>> enhance developer experience. >>> >>> I know that conciseness is not a main goal of Java, and therefore my >>> proposal is kind of contradicts the general direction of development of >>> Java as language, but my desire to make Java more user friendly is based on >>> personal experience of introducing fresh devs or switchers to java and >>> their feedback. Therefore, while it is not a priority for community, I >>> really hope you guys don't mind me working at a backrank of Java making it >>> a bit prettier. >>> >>> On Sat, Apr 20, 2024, 02:56 David Alayachew >>> wrote: >>> > That was what i referred to as "Map-solvable" problems, > but if this way of doing it is a thing, then I agree with > the point. Doing it the Map way would not only be slower, but it would force you to include more concepts than you need. Your identifier is your index. Lists/arrays have index. Bringing a map in only complicates things. As for the rest of your response, I think I understand the trouble here. You put it best when you said that you are introducing this feature with the intent of improving the developer experience, specifically through making code more concise. I can understand that goal, and it is a good goal to strive for, but conciseness is one of the last things Java aims to provide as a language or through its standard library. I won't tell you not to do it, but I will tell you that there will be significant
RFR: 8330624: Inconsistent use of semicolon in the code snippet of java.io.Serializable javadoc
Fix the description of java.io.Serializable. - Commit messages: - Inconsistent use of semicolon in the code snippet of java.io.Serializable javadoc Changes: https://git.openjdk.org/jdk/pull/18874/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18874&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330624 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18874.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18874/head:pull/18874 PR: https://git.openjdk.org/jdk/pull/18874
Re: Addition of Predicate-based findIndex and findLastIndex methods to java.util.List
Hello, I want to challenge the idea that every stream may be indexed for 3 reasons: Any sensible type to use for an index (int/long) will be bounded, but streams need not, e.g. new Random().ints().withIndex().forEach(v -> ...); May result (after a very long time) non sensical values. Using bigInteger will solve this problem but would be silly as in practice 99.9% of the times long is more than enough (and 90% int is probably enough) The two other problems have to do with the semantics of "index": Index gives the impression that the "index" is a real position, but the stream may be truly non deterministic (using e.g. Hardware Random Stream) Index gives the impression that the "index" respects the canonical order of the underlying collection (if exists), that is: var x = myList.stream().parallel() .withIndex().findAny().get(); assert x.value() == myList.get(x.index()); // implementation dependent Which I don't think is possible without enormous performance problems. --- To be clear about my position: I like and want the idea. The first problem, in my opinion, shouldn't block it, but we should document it. That being said, I don't think we should be calling this "index". On Sat, 20 Apr 2024, 11:54 -, wrote: > Hi Rotan', > Interface methods are both exposed to callers and to implementations. Is > there any scenario where the implementation can be more efficient than the > stream/gatherer scanning approach? > > indexOf or lastIndexOf may be faster if a list has some tricks like a > hashmap to quickly look up the index of an item. In contrast, the predicate > looks too generic and barely optimizable by list implementations; even if > we can stream in parallel, this is already covered by the stream API, so we > can simply declare a Gatherer to perform this findIndex operation on a > stream of elements (also note that a list can be reversed with > List::reversed() since JDK 21, so the same Gatherer can be reused for > findLastIndex) > > I think if there is no scenario where implementation can do better with > findIndex, the findIndex should become a Gatherer, which has better > compatibility. > > Best, > Chen Liang > > On Fri, Apr 19, 2024 at 8:51 PM ІП-24 Олександр Ротань < > rotan.olexa...@gmail.com> wrote: > >> Turned out this whole time I was only replying to David. I have to thank >> him for his patience explaining me how this whole mailing thing work. >> >> To summarize what has been missed out, besides what have you seen in >> quotes. I have pointed out that, while flexibility and simplicity is >> important, the main reason why I have decided to propose this feature is to >> enhance developer experience. >> >> I know that conciseness is not a main goal of Java, and therefore my >> proposal is kind of contradicts the general direction of development of >> Java as language, but my desire to make Java more user friendly is based on >> personal experience of introducing fresh devs or switchers to java and >> their feedback. Therefore, while it is not a priority for community, I >> really hope you guys don't mind me working at a backrank of Java making it >> a bit prettier. >> >> On Sat, Apr 20, 2024, 02:56 David Alayachew >> wrote: >> >>> > That was what i referred to as "Map-solvable" problems, >>> > but if this way of doing it is a thing, then I agree with >>> > the point. >>> >>> Doing it the Map way would not only be slower, but it would force you to >>> include more concepts than you need. Your identifier is your index. >>> Lists/arrays have index. Bringing a map in only complicates things. >>> >>> As for the rest of your response, I think I understand the trouble here. >>> You put it best when you said that you are introducing this feature with >>> the intent of improving the developer experience, specifically through >>> making code more concise. >>> >>> I can understand that goal, and it is a good goal to strive for, but >>> conciseness is one of the last things Java aims to provide as a language or >>> through its standard library. I won't tell you not to do it, but I will >>> tell you that there will be significant push back from folks here, and for >>> good reason -- conciseness is not the priority here at Java. At Java, the >>> lack of conciseness is not a weakness. Sure, unneeded verbosity can be a >>> weakness, but the solution I suggested above is basically the tempo that >>> Java strives for -- compose simple concepts so that you can create what you >>> need. >>> >>> All of that is to say, you have made your point clear. You goal is >>> conciseness and ease of access. Understanding that, it's clear that my >>> solution is not concise enough to meet that goal, so I won't push it >>> further. >>> >>> On Fri, Apr 19, 2024 at 7:26 PM ІП-24 Олександр Ротань < >>> rotan.olexa...@gmail.com> wrote: >>> " 1. Fetching data from same-sized lists -- This is a trick from Data-Oriented Design (not to be confused with Data-Oriented Programming) where you filter
RFR: 8330686: Fix typos in the DatabaseMetaData javadoc
Fix typos within the `DatabaseMetaData.java`. - Commit messages: - Merge branch 'master' of github.com:onacit/jdk - Merge branch 'master' of github.com:onacit/jdk - Fix more typos - Fix typo - Fix wrongly typed constant name - Fix more typos - Fix typo - Fix wrongly typed constant name - Fix more typos - Fix typo Changes: https://git.openjdk.org/jdk/pull/18860/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18860&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330686 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18860.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18860/head:pull/18860 PR: https://git.openjdk.org/jdk/pull/18860
Re: RFR: 8330686: Fix typos in the DatabaseMetaData javadoc
On Fri, 19 Apr 2024 11:36:48 GMT, Jin Kwon wrote: >> Fix typos within the `DatabaseMetaData.java`. > > @openjdk with issue from where? Hello @onacit, it appears a JBS issue has been filed for this https://bugs.openjdk.org/browse/JDK-8330686. Please change the title of this PR to `8330686: Fix typos in the DatabaseMetaData javadoc` so that it triggers the official review process. - PR Comment: https://git.openjdk.org/jdk/pull/18860#issuecomment-2067548609
Re: RFR: 8330686: Fix typos in the DatabaseMetaData javadoc
On Fri, 19 Apr 2024 11:33:48 GMT, Jin Kwon wrote: > Fix typos within the `DatabaseMetaData.java`. @openjdk with issue from where? - PR Comment: https://git.openjdk.org/jdk/pull/18860#issuecomment-2066389888
Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v22]
On Fri, 5 Apr 2024 23:13:39 GMT, Brent Christian wrote: >> Classes in the `java.lang.ref` package would benefit from an update to bring >> the spec in line with how the VM already behaves. The changes would focus on >> _happens-before_ edges at some key points during reference processing. >> >> A couple key things we want to be able to say are: >> - `Reference.reachabilityFence(x)` _happens-before_ reference processing >> occurs for 'x'. >> - `Cleaner.register()` _happens-before_ the Cleaner thread runs the >> registered cleaning action. >> >> This will bring Cleaner in line (or close) with the memory visibility >> guarantees made for finalizers in [JLS >> 17.4.5](https://docs.oracle.com/javase/specs/jls/se18/html/jls-17.html#jls-17.4.5): >> _"There is a happens-before edge from the end of a constructor of an object >> to the start of a finalizer (§12.6) for that object."_ > > Brent Christian has updated the pull request incrementally with one > additional commit since the last revision: > > Another update to reachabilityFence() @apiNote Thanks Brent, I have no objections. Good job! - PR Comment: https://git.openjdk.org/jdk/pull/16644#issuecomment-2067611311
Re: RFR: 8330681: Explicit hashCode and equals for java.lang.runtime.SwitchBootstraps$TypePairs
On Sat, 20 Apr 2024 00:00:54 GMT, ExE Boss wrote: >> We can reduce overhead of first use of a switch bootstrap by moving >> `typePairToName` into `TypePairs` and by explicitly overriding `hashCode` >> and `equals`. The first change avoids loading and initializing the >> `TypePairs` class in actual cases, the second remove some excess code >> generation from happening on first use. > > src/java.base/share/classes/java/lang/runtime/SwitchBootstraps.java line 685: > >> 683: record TypePairs(Class from, Class to) { >> 684: >> 685: private static final Map typePairToName = >> initialize(); > > If `TypePairs.typePairToName` is never modified after initialisation, then it > should probably be made immutable: > Suggestion: > > private static final Map typePairToName = > Map.copyOf(initialize()); If you really think about it, the `initialize` method itself is somewhat problematic, as it's initializing with byte/short/char on the left, all of which are already converted to int in the of() factory. This should be done in a separate issue. - PR Review Comment: https://git.openjdk.org/jdk/pull/18865#discussion_r1573195851
Re: Addition of Predicate-based findIndex and findLastIndex methods to java.util.List
Hi Rotan', Interface methods are both exposed to callers and to implementations. Is there any scenario where the implementation can be more efficient than the stream/gatherer scanning approach? indexOf or lastIndexOf may be faster if a list has some tricks like a hashmap to quickly look up the index of an item. In contrast, the predicate looks too generic and barely optimizable by list implementations; even if we can stream in parallel, this is already covered by the stream API, so we can simply declare a Gatherer to perform this findIndex operation on a stream of elements (also note that a list can be reversed with List::reversed() since JDK 21, so the same Gatherer can be reused for findLastIndex) I think if there is no scenario where implementation can do better with findIndex, the findIndex should become a Gatherer, which has better compatibility. Best, Chen Liang On Fri, Apr 19, 2024 at 8:51 PM ІП-24 Олександр Ротань < rotan.olexa...@gmail.com> wrote: > Turned out this whole time I was only replying to David. I have to thank > him for his patience explaining me how this whole mailing thing work. > > To summarize what has been missed out, besides what have you seen in > quotes. I have pointed out that, while flexibility and simplicity is > important, the main reason why I have decided to propose this feature is to > enhance developer experience. > > I know that conciseness is not a main goal of Java, and therefore my > proposal is kind of contradicts the general direction of development of > Java as language, but my desire to make Java more user friendly is based on > personal experience of introducing fresh devs or switchers to java and > their feedback. Therefore, while it is not a priority for community, I > really hope you guys don't mind me working at a backrank of Java making it > a bit prettier. > > On Sat, Apr 20, 2024, 02:56 David Alayachew > wrote: > >> > That was what i referred to as "Map-solvable" problems, >> > but if this way of doing it is a thing, then I agree with >> > the point. >> >> Doing it the Map way would not only be slower, but it would force you to >> include more concepts than you need. Your identifier is your index. >> Lists/arrays have index. Bringing a map in only complicates things. >> >> As for the rest of your response, I think I understand the trouble here. >> You put it best when you said that you are introducing this feature with >> the intent of improving the developer experience, specifically through >> making code more concise. >> >> I can understand that goal, and it is a good goal to strive for, but >> conciseness is one of the last things Java aims to provide as a language or >> through its standard library. I won't tell you not to do it, but I will >> tell you that there will be significant push back from folks here, and for >> good reason -- conciseness is not the priority here at Java. At Java, the >> lack of conciseness is not a weakness. Sure, unneeded verbosity can be a >> weakness, but the solution I suggested above is basically the tempo that >> Java strives for -- compose simple concepts so that you can create what you >> need. >> >> All of that is to say, you have made your point clear. You goal is >> conciseness and ease of access. Understanding that, it's clear that my >> solution is not concise enough to meet that goal, so I won't push it >> further. >> >> On Fri, Apr 19, 2024 at 7:26 PM ІП-24 Олександр Ротань < >> rotan.olexa...@gmail.com> wrote: >> >>> " 1. Fetching data from same-sized lists -- This is a trick from >>> Data-Oriented Design (not to be confused with Data-Oriented Programming) >>> where you filter down to the indexes that you want, and use the indexes to >>> fetch related values from the respective lists. So, I would filter down to >>> Player1, Player4, and Player6, then I would use their indexes to do lookups >>> in PlayerItemList, PlayerSkinList, etc. This tactic is especially useful >>> when working with structs (or value classes when Java gets them)." >>> >>> That was what i referred to as "Map-solvable" problems, but if this way >>> of doing it is a thing, then I agree with the point. >>> >>> "By this logic, many of the methods in the Stream library should >>> instead be in the Collection library. There are several emails on the >>> lambda mailing list that address why this wasn't done. I imagine that Rémi >>> himself could point them out to you." >>> >>> It may be just my opinion, but I don't see anything harmful in some >>> shortcuts that could be optimized, especially if they still converge to a >>> single point under the hood. I would love to read mails about maintainers' >>> opinions on why some features of Streams couldnt be also a part of >>> Collection library if it would enhance developer experience, so, if it >>> wouldn't be hard for you, could you attach some links? Also I guess a good >>> compromise would be extension methods, I am currently exploring >>> possibilities of their implementation. >>> >>> "Well