Re: Generalizing binary search

2024-05-15 Thread Louis Wasserman
be dispatch) overhead of the lambda. Some real-world benchmarks
>> would have to be written with various-sized data sets.
>> >
>> > It would also be possible to produce primitive variations which operate
>> on int, float, long, and double values, using existing functions if
>> capturing is deemed "OK". It is also possible to produce a variation which
>> uses a `long` for the index, for huge data sets (for example, very large
>> mapped files using `MemorySegment`).
>> >
>> > Also unclear is: where would it live? `Collections`? Somewhere else?
>> >
>> > Any thoughts/opinions would be appreciated (even if they are along the
>> lines of "it's not worth the effort"). Particularly, any insight would be
>> appreciated as to whether or not this kind of hypothetical enhancement
>> would warrant a JEP (I wouldn't think so, but I'm no expert at such
>> assessments).
>> >
>> > --
>> > - DML • he/him
>>
>> Have a look at this recently filed issue:
>> https://bugs.openjdk.org/browse/JDK-8326330
>>
>> -Pavel
>>
>>
>>

-- 
Louis Wasserman (he/they)


Re: In support of Instant.minus(Instant)

2024-05-02 Thread Louis Wasserman
That doesn't follow for me at all.

The structure formed by Instants and Durations is an affine space
<https://en.wikipedia.org/wiki/Affine_space#Definition>, with instants the
points and durations the vectors.  (An affine space is a vector space
without a distinguished origin, which of course Instants don't have.)  It
is 100% standard to use the minus sign for the operation "point - point =
vector," even when "point + point" is not defined, and to use all the other
standard idioms for subtraction; the Wikipedia article uses "subtraction"
and "difference" ubiquitously.

Personally, I'd be willing to live with a different name for the operation,
but consider "users keep getting it wrong" a strong enough argument all by
itself for a version with the swapped argument order; it's not obvious to
me that another API with the same argument order adds enough value over
Duration.between to bother with.

On Thu, May 2, 2024 at 10:04 AM Stephen Colebourne 
wrote:

> On Thu, 2 May 2024 at 15:58, Kurt Alfred Kluever  wrote:
> > instant − instant = duration // what we're discussing
> > instant + duration = instant // satisfied by instant.plus(duration)
> > instant - duration = instant // satisfied by instant.minus(duration)
> > duration + duration = duration // satisfied by duration.plus(duration)
> > duration - duration = duration // satisfied by duration.minus(duration)
> > duration × real number = duration // satisfied by
> duration.multipliedBy(long)
> > duration ÷ real number = duration // satisfied by
> duration.dividedBy(long)
> >
> > All but the first operation have very clear translations from conceptual
> model to code. I'm hoping we can achieve the same clarity for instant -
> instant by using the obvious name: instant.minus(instant)
>
> But you can't have
>  instant + instant = ???
> It doesn't make sense.
>
> This is at the heart of why minus isn't right in this case.
> Stephen
>


-- 
Louis Wasserman (he/they)


Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v7]

2024-04-24 Thread Louis Wasserman
On Sun, 14 Apr 2024 14:33:26 GMT, Claes Redestad  wrote:

>> What are the scenarios which had regressions? 
>> Given the conservative growth for StringBuilder, it surprises me a bit that 
>> any scenario would regress.
>
> I took a second look and it turns out that there were neither regressions nor 
> improvements in my test, only a few false positives: For C2 the SB chain is 
> recognized by the (fragile) C2 OptimizeStringConcat pass and transformed into 
> a shape where the initial size in java bytecode - if any - is ignored.
> 
> With C1 having an initial size can have a significant effect. One way to 
> provoke a regression there is to have a huge constant suffix while 
> underestimating the size of the operands, which can lead to excessive 
> allocation:
> 
> 
> Name Cnt BaseError   TestError   
> Unit  Change
> StringConcat.concat23StringConst   5  385,268 ±  5,238341,251 ±  2,606  
> ns/op   1,13x (p = 0,000*)
>   :gc.alloc.rate 6039,095 ± 82,309  12764,169 ± 97,146 
> MB/sec   2,11x (p = 0,000*)
>   :gc.alloc.rate.norm2440,003 ±  0,000   4568,002 ±  0,000   
> B/op   1,87x (p = 0,000*)
> 
> 
> Still a bit better on throughput from less actual copying.
> 
> *If* the `StringBuilder` is sized sufficiently (constants + args * N) things 
> look much better, of course: 
> 
> -XX:TieredStopAtLevel=1
> Name Cnt BaseError  TestError   
> Unit  Change
> StringConcat.concat23StringConst   5  385,268 ±  5,238   259,628 ±  2,515  
> ns/op   1,48x (p = 0,000*)
>   :gc.alloc.rate 6039,095 ± 82,309  8902,803 ± 86,563 
> MB/sec   1,47x (p = 0,000*)
>   :gc.alloc.rate.norm2440,003 ±  0,000  2424,002 ±  0,000   
> B/op   0,99x (p = 0,000*)
> 
> 
> For most cases having a size based on the sum of the constants plus some 
> small factor per argument seem to be a decent heuristic - for C1. Since this 
> adds just one bytecode to the generated method I guess it wouldn't hurt.

Presizing this StringBuilder is a thing I looked into once upon a time, 
discussed here: 
https://mail.openjdk.org/pipermail/compiler-dev/2015-March/009356.html  This 
work, I understand, the indyfication of string concatenation in the first place.

Primitive values can get bounded at appropriate lengths for their types (see 
e.g. https://stackoverflow.com/a/21146952/869736).  It sounds like you're 
trying to conserve bytecode length, so maybe you don't want to presize the 
StringBuilder with the exact Object.toString() lengths, though.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1576819289


Re: RFR: 8302204: Optimize BigDecimal.divide

2023-02-12 Thread Louis Wasserman
Could you do that benchmark with e.g. JMH rather than taking the difference
of System.currentTimeMillis?  That would probably make it easier to read
and trust the results.

On Sun, Feb 12, 2023, 7:56 PM Sergey Kuksenko  wrote:

> On Fri, 10 Feb 2023 10:00:05 GMT, Xiaowei Lu  wrote:
>
> > [JDK-8269667](https://bugs.openjdk.org/browse/JDK-8269667) has
> uncovered the poor performance of BigDecimal.divide under certain
> circumstance.
> >
> > We confront similar situations when benchmarking Spark3 on TPC-DS test
> kit. According to the flame-graph below, it is StripZeros that spends most
> of the time of BigDecimal.divide. Hence we propose this patch to optimize
> stripping zeros.
> > ![图片 1](
> https://user-images.githubusercontent.com/39413832/218062061-53cd0220-776e-4b72-8b9a-6b0f11707986.png
> )
> >
> > Currently, createAndStripZerosToMatchScale() is performed linearly. That
> is, the target value is parsed from back to front, each time stripping out
> single ‘0’. To optimize, we can adopt the method of binary search. That is,
> each time we try to strip out ${scale/2} ‘0’s.
> >
> > The performance looks good. Therotically, time complexity of our method
> is O(log n), while the current one is O(n). In practice, benchmarks on
> Spark3 show that 1/3 less time (102s->68s) is spent on TPC-DS query4. We
> also runs Jtreg and JCK to check correctness, and it seems fine.
> >
> > More about environment:
> > we run Spark3.3.0 on Openjdk11, but it seems jdk version doesn’t have
> much impact on BigDecimal. Spark cluster consists of a main node and 2 core
> nodes, each has 4cores, 16g memory and 4x500GB storage.
>
> As for TPC-DS
> [AUTO-RESULT] QueryTotal=1968s  vs  [AUTO-RESULT] QueryTotal=1934s
> that gives ~1.7% of performance difference.
> Are you sure that this small diff is a real diff, but not run-to-run
> variance?
>
> -
>
> PR: https://git.openjdk.org/jdk/pull/12509
>