from:"Srinivas"

[PATCH] drm/i915/dpt: Make DPT object unshrinkable

2024-05-20 Thread Vidya Srinivas

In some scenarios, the DPT object gets shrunk but
the actual framebuffer did not and thus its still
there on the DPT's vm->bound_list. Then it tries to
rewrite the PTEs via a stale CPU mapping. This causes panic.

Credits-to: Ville Syrjala 
Shawn Lee 

Cc: sta...@vger.kernel.org
Fixes: 0dc987b699ce ("drm/i915/display: Add smem fallback allocation for dpt")
Signed-off-by: Vidya Srinivas 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 3560a062d287..e6b485fc54d4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -284,7 +284,8 @@ bool i915_gem_object_has_iomem(const struct 
drm_i915_gem_object *obj);
 static inline bool
 i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj)
 {
-   return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE);
+   return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE) &&
+   !obj->is_dpt;
 }
 
 static inline bool
-- 
2.34.1

RE: [PATCH] drm/i915/dpt: Make DPT object unshrinkable

2024-05-20 Thread Srinivas, Vidya




> -Original Message-
> From: Ville Syrjälä 
> Sent: Monday, May 20, 2024 10:10 PM
> To: Srinivas, Vidya 
> Cc: intel-gfx@lists.freedesktop.org; Syrjala, Ville 
> ; Lee,
> Shawn C ; srini...@freedesktop.org
> Subject: Re: [PATCH] drm/i915/dpt: Make DPT object unshrinkable
> 
> On Mon, May 20, 2024 at 08:54:10PM +0530, Srinivas, Vidya wrote:
> > In some scenarios, the DPT object gets shrunk but the actual
> > framebuffer did not and thus its still there on the DPT's
> > vm->bound_list. Then it tries to rewrite the PTEs via a stale CPU
> > mapping. This causes panic.
> >
> > Credits-to: Ville Syrjala 
> > Shawn Lee 
> >
> > Signed-off-by: Srinivas, Vidya 
> 
> The format should be "first_name last_name "

Apologies for the mistake. My gitconfig got messed up.

> 
> We also probably want
> Cc: sta...@vger.kernel.org
> Fixes: 0dc987b699ce ("drm/i915/display: Add smem fallback allocation for
> dpt")
> 
Thank you so much. Will float new patch with this added.

> Although the patch won't actually build unless we also have commit
> 779cb5ba64ec ("drm/i915/dpt: Treat the DPT BO as a framebuffer") but that
> hast the same fixes tag, so should be fine even if someone backports things
> that far back.
> 
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_object.h | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > index 3560a062d287..e6b485fc54d4 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > @@ -284,7 +284,8 @@ bool i915_gem_object_has_iomem(const struct
> > drm_i915_gem_object *obj);  static inline bool
> > i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj)
> > {
> 
> Maybe toss something like this here:
> /* TODO: make DPT shrinkable when it has no bound vmas */
> 
> DPTs aren't necessarily so small that shrinking them wouldn't have any
> benefits. But actually implementing that would require some actual work, so
> not suitable for a quick fix.
> 
> I can add all that stuff when applying the patch, no need to resend for this.
> 
> > -   return i915_gem_object_type_has(obj,
> I915_GEM_OBJECT_IS_SHRINKABLE);
> > +   return i915_gem_object_type_has(obj,
> I915_GEM_OBJECT_IS_SHRINKABLE) &&
> > +   !obj->is_dpt;
> >  }
> >
> >  static inline bool
> > --
> > 2.34.1
> 
> --
> Ville Syrjälä
> Intel

[PATCH] drm/i915/dpt: Make DPT object unshrinkable

2024-05-20 Thread Srinivas, Vidya

In some scenarios, the DPT object gets shrunk but
the actual framebuffer did not and thus its still
there on the DPT's vm->bound_list. Then it tries to
rewrite the PTEs via a stale CPU mapping. This causes panic.

Credits-to: Ville Syrjala 
Shawn Lee 

Signed-off-by: Srinivas, Vidya 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 3560a062d287..e6b485fc54d4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -284,7 +284,8 @@ bool i915_gem_object_has_iomem(const struct 
drm_i915_gem_object *obj);
 static inline bool
 i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj)
 {
-   return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE);
+   return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE) &&
+   !obj->is_dpt;
 }
 
 static inline bool
-- 
2.34.1

Re: RFR: 8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2024-05-10 Thread Srinivas Vamsi Parasa

On Wed, 8 May 2024 20:37:28 GMT, Vladimir Yaroslavskiy  wrote:

>> Hi Vladimir (@iaroslavski),
>> 
>> Please see the data below.
>> 
>> Thanks,
>> Vamsi
>> 
>> name | builder | size | mode | count | score
>> -- | -- | -- | -- | -- | --
>> b01 | RANDOM | 600 | avg | 325677 | 6.764
>> b01 | RANDOM | 3000 | avg | 52041 | 77.742
>> b01 | RANDOM | 9 | avg | 1217 | 4449.668
>> b01 | RANDOM | 40 | avg | 242 | 22764.05
>> b01 | RANDOM | 100 | avg | 90 | 60737.71
>> b01 | REPEATED | 600 | avg | 651354 | 1.723
>> b01 | REPEATED | 3000 | avg | 104083 | 12.383
>> b01 | REPEATED | 9 | avg | 2435 | 714.451
>> b01 | REPEATED | 40 | avg | 484 | 3039.447
>> b01 | REPEATED | 100 | avg | 180 | 8114.503
>> b01 | SAWTOOTH | 600 | avg | 1954062 | 1.009
>> b01 | SAWTOOTH | 3000 | avg | 312251 | 4.94
>> b01 | SAWTOOTH | 9 | avg | 7305 | 133.192
>> b01 | SAWTOOTH | 40 | avg | 1453 | 591.854
>> b01 | SAWTOOTH | 100 | avg | 542 | 1494.252
>> b01 | STAGGER | 600 | avg | 1954062 | 8.252
>> b01 | STAGGER | 3000 | avg | 312251 | 10.449
>> b01 | STAGGER | 9 | avg | 7305 | 287.811
>> b01 | STAGGER | 40 | avg | 1453 | 1288.92
>> b01 | STAGGER | 100 | avg | 542 | 3245.649
>> b01 | SHUFFLE | 600 | avg | 325677 | 5.199
>> b01 | SHUFFLE | 3000 | avg | 52041 | 29.734
>> b01 | SHUFFLE | 9 | avg | 1217 | 1392.125
>> b01 | SHUFFLE | 40 | avg | 242 | 5772.859
>> b01 | SHUFFLE | 100 | avg | 90 | 15483.65
>> r30 | RANDOM | 600 | avg | 325677 | 4.307
>> r30 | RANDOM | 3000 | avg | 52041 | 71.438
>> r30 | RANDOM | 9 | avg | 1217 | 3971.947
>> r30 | RANDOM | 40 | avg | 242 | 19924.32
>> r30 | RANDOM | 100 | avg | 90 | 53671.9
>> r30 | REPEATED | 600 | avg | 651354 | 1.36
>> r30 | REPEATED | 3000 | avg | 104083 | 6.415
>> r30 | REPEATED | 9 | avg | 2435 | 578.708
>> r30 | REPEATED | 40 | avg | 484 | 2488.414
>> r30 | REPEATED | 100 | avg | 180 | 6280.025
>> r30 | SAWTOOTH | 600 | avg | 1954062 | 0.488
>> r30 | SAWTOOTH | 3000 | avg | 312251 | 2.409
>> r30 | SAWTOOTH | 9 | avg | 7305 | 71.98
>> r30 | SAWTOOTH | 40 | avg | 1453 | 343.433
>> r30 | SAWTOOTH | 100 | avg | 542 | 954.287
>> r30 | STAGGER | 600 | avg | 1954062 | 1.064
>> r30 | STAGGER | 3000 | avg | 312251 | 4.559
>> r30 | STAGGER | 9 | avg | 7305 | 135.383
>> r30 | STAGGER | 40 | avg | 1453 | 626.657
>> r30 | STAGGER | 100 | avg | 542 | 1653.92
>> r30 | SHUFFLE | 600 | avg | 325677 | 2.924
>> r30 | SHUFFLE | 3000 | avg | 52041 | 18.819
>> r30 | SHUFFLE | 9 | avg | 1217 | 1019.036
>> r30 | SHUFFLE | 40 | avg | 242 | 4661.484
>> r30 | SHUFFLE | 100 ...
>
> Hello Vamsi (@vamsi-parasa),
> 
> Could you please run the new benchmarking to finalize the best version?
> What you need is to compile and run JavaBenchmarkHarness:
> 
> javac --patch-module java.base=. -d classes *.java
> java -XX:CompileThreshold=1 -XX:-TieredCompilation --patch-module 
> java.base=classes -cp classes java.util.JavaBenchmarkHarness
> 
> Find the sources there:
> 
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r31_11.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r31_11a.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r31_12.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r31_12a.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/JavaBenchmarkHarness.java
> 
> Thank you,
> Vladimir

Hi Vladimir (@iaroslavski),

Please see the data below.

Thanks,
Vamsi


name | builder | size | mode | count | score
-- | -- | -- | -- | -- | --
b01 | RANDOM | 600 | avg | 325677 | 6.861
b01 | RANDOM | 3000 | avg | 52041 | 77.313
b01 | RANDOM | 9 | avg | 1217 | 4315.41
b01 | RANDOM | 40 | avg | 242 | 22110.95
b01 | RANDOM | 100 | avg | 90 | 58613.45
b01 | REPEATED | 600 | avg | 651354 | 1.993
b01 | REPEATED | 3000 | avg | 104083 | 13.026
b01 | REPEATED | 9 | avg | 2435 | 741.97
b01 | REPEATED | 40 | avg | 484 | 3161.073
b01 | REPEATED | 100 | avg | 180 | 8363.671
b01 | STAGGER | 600 | avg | 1954062 | 9.124
b01 | STAGGER | 3000 | avg | 312251 | 10.026
b01 | STAGGER | 9 | avg | 7305 | 286.313
b01 | STAGGER | 40 | avg | 1453 | 1278.758
b01 | STAGGER | 100 | avg | 542 | 3242.849
b01 | SHUFFLE | 600 | avg | 325677 | 5.113
b01 | SHUFFLE | 3000 | avg | 52041 | 28.85
b01 | SHUFFLE | 9 | avg | 1217 | 1368.91
b01 | SHUFFLE | 40 | avg | 242 | 5718.052
b01 | SHUFFLE | 100 | avg | 90 | 15376.1
r31_11 | RANDOM | 600 | avg | 325677 | 4.305
r31_11 | RANDOM | 3000 | avg | 52041 | 73.399
r31_11 | RANDOM | 9 | avg | 1217 | 3963.515
r31_11 | RANDOM | 40 | avg | 242 | 19841.07
r31_11 | RANDOM | 100 | avg | 90 | 53372.63
r31_11 | REPEATED | 600 | avg | 651354 | 1.208
r31_11 | REPEATED | 3000 | avg | 104083 | 6.206
r31_11 | REPEATED |

RE: [go-nuts] Re: Slice conversion function not working on big endian platform

2024-05-08 Thread 'Pokala Srinivas' via golang-nuts

I am using generics here to work for other types as well, it 's not only for 
converting int32[] slice to int64[], it is generic function to work for any 
conversion like BytestoInt64/32/16/8,  BytestoFloat32/64  etc.
It is failing for other conversion like BytestoInt64 , BytestoFloat64 etc on 
big-endian machine but working on little endian .

From: 'Pokala Srinivas' via golang-nuts 
Sent: 08 May 2024 15:39
To: golang-nuts ; Brian Candler 

Subject: [EXTERNAL] Re: [go-nuts] Re: Slice conversion function not working on 
big endian platform

There is typo mistake while entering, below is the code snippest: package main 
import ( "fmt" "unsafe" ) type slice struct { ptr unsafe. Pointer len int cap 
int } func Slice[To, From any](data []From) []To { var zf From var
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/AdhS1Rd-!-_FXHRY-Xk9bWls5LqRhn2e2tDNNzW1ctRrPku-hvEZNithWP5nF9U-mxEjHKiEUeUCpmCHdjAG0cGdniF1FpXeKdtwsl0gl5O2lCiyDkjAWi2nPl60i2gDpWV2mZPc3bXuKyo3qwjnWgQ$>
Report Suspicious

ZjQcmQRYFpfptBannerEnd
There is typo mistake while entering, below is the code snippest:
package main

import (
  "fmt"
  "unsafe"
)

type slice struct {
  ptr unsafe.Pointer
  len int
  cap int
}

func Slice[To, From any](data []From) []To {
  var zf From
  var zt To
  var s = (*slice)(unsafe.Pointer())
  s.len = int((uintptr(s.len) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt))
  s.cap = int((uintptr(s.cap) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt))
  x := *(*[]To)(unsafe.Pointer(s))
  return x
}
func main() {
  a := make([]uint32, 4, 13)
  a[0] = 1
  a[1] = 0
  a[2] = 2
  a[3] = 0
  // 1 0 2 0
  b := Slice[int64](a)
  //Expecxted : []int64[]{0x 0001,  0x 0002}
  //Got: []int64{0x0001 , 0x0002 000}
  if b[0] != 1 {
fmt.Printf("wrong value at index 0: want=1 got=0x%x \n", b[0])
  }
  if b[1] != 2 {
fmt.Printf("wrong value at index 1: want=2 got=0x%x\n", b[0])
  }

}

Please try this code

From: 'Brian Candler' via golang-nuts 
Sent: 08 May 2024 15:25
To: golang-nuts 
Subject: [EXTERNAL] [go-nuts] Re: Slice conversion function not working on big 
endian platform

That code doesn't even compile in go 1. 22 or go. 1. 21: https: //go. 
dev/play/p/mPCBUQizSVo ./prog. go: 20: 14: cannot convert unsafe. Pointer(s) 
(value of type unsafe. Pointer) to type []To What's the underlying requirement? 
In the test case it looks
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/AdhS1Rd-!9vFRUvxymM-xP3bTTo4q8qx6b2CrF5gPfr8Me9A3EyV6cVa-gmyw6ZGkFA4NlHRRL7kWecZgG9CzX2hwHwRwt8ufUL5f9ZLu589lFRWws9oA_w$>
Report Suspicious

ZjQcmQRYFpfptBannerEnd
That code doesn't even compile in go 1.22 or go.1.21:
https://go.dev/play/p/mPCBUQizSVo<https://go.dev/play/p/mPCBUQizSVo>
./prog.go:20:14: cannot convert unsafe.Pointer(s) (value of type 
unsafe.Pointer) to type []To

What's the underlying requirement? In the test case it looks like you want to 
take a slice of int32's, in whatever their internal in-memory representation 
is, and re-represent them as a slice of half as many int64's?? Then of *course* 
each pair of int32's will become one int64, and the order of the hi/lo halves 
will depend entirely on the system's internal representation of int64's. It 
*is* working, in the sense that it's doing exactly what you told it to do. 
There's a reason why the "unsafe" package is called "unsafe"!

It would be straightforward to write a function which takes a slice containing 
pairs of int32's and assembles them into int64's in a consistent way. What 
you've not explained is:
- why you need to do this with generics (for example, what behaviour would you 
expect from other types?)
- why you need to do this in-place with "unsafe"

On Wednesday 8 May 2024 at 10:24:30 UTC+1 Srinivas Pokala wrote:
Hello gopher's,

I have simple go program which convert slice of one type to slice of other type 
using go generics for handling all the supported types.
Below is the code snippest for this:
package main

import "fmt"
import "unsafe"

type slice struct {
ptr unsafe.Pointer
len int
cap int
}

func Slice[To, From any](data []From) []To {
var zf From
var zt To
var s = (*slice)(unsafe.Pointer())
s.len = int((uintptr(s.len) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt))
s.cap = int((uintptr(s.cap) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt))
x := ([]To)(unsafe.Pointer(s))
return x
}
func main() {
a := make([]uint32, 4, 13)

Re: [go-nuts] Re: Slice conversion function not working on big endian platform

2024-05-08 Thread 'Pokala Srinivas' via golang-nuts

There is typo mistake while entering, below is the code snippest:
package main

import (
  "fmt"
  "unsafe"
)

type slice struct {
  ptr unsafe.Pointer
  len int
  cap int
}

func Slice[To, From any](data []From) []To {
  var zf From
  var zt To
  var s = (*slice)(unsafe.Pointer())
  s.len = int((uintptr(s.len) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt))
  s.cap = int((uintptr(s.cap) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt))
  x := *(*[]To)(unsafe.Pointer(s))
  return x
}
func main() {
  a := make([]uint32, 4, 13)
  a[0] = 1
  a[1] = 0
  a[2] = 2
  a[3] = 0
  // 1 0 2 0
  b := Slice[int64](a)
  //Expecxted : []int64[]{0x 0001,  0x 0002}
  //Got: []int64{0x0001 , 0x0002 000}
  if b[0] != 1 {
fmt.Printf("wrong value at index 0: want=1 got=0x%x \n", b[0])
  }
  if b[1] != 2 {
fmt.Printf("wrong value at index 1: want=2 got=0x%x\n", b[0])
  }

}

Please try this code

From: 'Brian Candler' via golang-nuts 
Sent: 08 May 2024 15:25
To: golang-nuts 
Subject: [EXTERNAL] [go-nuts] Re: Slice conversion function not working on big 
endian platform

That code doesn't even compile in go 1. 22 or go. 1. 21: https: //go. 
dev/play/p/mPCBUQizSVo ./prog. go: 20: 14: cannot convert unsafe. Pointer(s) 
(value of type unsafe. Pointer) to type []To What's the underlying requirement? 
In the test case it looks
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/AdhS1Rd-!9vFRUvxymM-xP3bTTo4q8qx6b2CrF5gPfr8Me9A3EyV6cVa-gmyw6ZGkFA4NlHRRL7kWecZgG9CzX2hwHwRwt8ufUL5f9ZLu589lFRWws9oA_w$>
Report Suspicious

ZjQcmQRYFpfptBannerEnd
That code doesn't even compile in go 1.22 or go.1.21:
https://go.dev/play/p/mPCBUQizSVo<https://go.dev/play/p/mPCBUQizSVo>
./prog.go:20:14: cannot convert unsafe.Pointer(s) (value of type 
unsafe.Pointer) to type []To

What's the underlying requirement? In the test case it looks like you want to 
take a slice of int32's, in whatever their internal in-memory representation 
is, and re-represent them as a slice of half as many int64's?? Then of *course* 
each pair of int32's will become one int64, and the order of the hi/lo halves 
will depend entirely on the system's internal representation of int64's. It 
*is* working, in the sense that it's doing exactly what you told it to do. 
There's a reason why the "unsafe" package is called "unsafe"!

It would be straightforward to write a function which takes a slice containing 
pairs of int32's and assembles them into int64's in a consistent way. What 
you've not explained is:
- why you need to do this with generics (for example, what behaviour would you 
expect from other types?)
- why you need to do this in-place with "unsafe"

On Wednesday 8 May 2024 at 10:24:30 UTC+1 Srinivas Pokala wrote:
Hello gopher's,

I have simple go program which convert slice of one type to slice of other type 
using go generics for handling all the supported types.
Below is the code snippest for this:
package main

import "fmt"
import "unsafe"

type slice struct {
ptr unsafe.Pointer
len int
cap int
}

func Slice[To, From any](data []From) []To {
var zf From
var zt To
var s = (*slice)(unsafe.Pointer())
s.len = int((uintptr(s.len) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt))
s.cap = int((uintptr(s.cap) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt))
x := ([]To)(unsafe.Pointer(s))
return x
}
func main() {
a := make([]uint32, 4, 13)
a[0] = 1
a[1] = 0
a[2] = 2
a[3] = 0
// 1 0 2 0
b := Slice[int64](a)
//Expecxted : []int64[]{0x 0001,  0x 0002}
//Got: []int64{0x0001 , 0x0002 000}
if b[0] != 1 {
fmt.Printf("wrong value at index 0: want=1 got=0x%x \n", b[0])
}
if b[1] != 2 {
fmt.Printf("wrong value at index 1: want=2 got=0x%x\n", b[0])
}

}

This is working fine on little endian architectures(amd64,arm64 etc), but when 
i run on big endian machine(s390x) it is not working , it is resulting wrong 
data
//Expecxted : []int64[]{0x 0001,  0x 0002}
 //Got: []int64{0x0001 , 0x0002 000}
Can somepoint point me how do we write such scenario which should work on both 
little/endian platforms.
Any leads on this?

Thanks,
Srinivas

--
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
golang-nuts+unsubscr...@googlegroups.com<mailto:golang-nuts+unsubsc

[go-nuts] Slice conversion function not working on big endian platform

2024-05-08 Thread 'Srinivas Pokala' via golang-nuts

Hello gopher's,

I have simple go program which convert slice of one type to slice of other 
type using go generics for handling all the supported types.
Below is the code snippest for this:
package main

import "fmt"
import "unsafe"

type slice struct {
ptr unsafe.Pointer
len int 
cap int 
}

func Slice[To, From any](data []From) []To {
var zf From
var zt To
var s = (*slice)(unsafe.Pointer())
s.len = int((uintptr(s.len) * unsafe.Sizeof(zf)) / 
unsafe.Sizeof(zt))
s.cap = int((uintptr(s.cap) * unsafe.Sizeof(zf)) / 
unsafe.Sizeof(zt))
x := ([]To)(unsafe.Pointer(s))
return x
}
func main() {
a := make([]uint32, 4, 13) 
a[0] = 1 
a[1] = 0 
a[2] = 2 
a[3] = 0 
// 1 0 2 0
b := Slice[int64](a)
//Expecxted : []int64[]{0x 0001,  0x 0002}
//Got: []int64{0x0001 , 0x0002 000}
if b[0] != 1 { 
fmt.Printf("wrong value at index 0: want=1 got=0x%x \n", 
b[0])
}
if b[1] != 2 { 
fmt.Printf("wrong value at index 1: want=2 got=0x%x\n", 
b[0])
}

}

This is working fine on little endian architectures(amd64,arm64 etc), but 
when i run on big endian machine(s390x) it is not working , it is resulting 
wrong data
//Expecxted : []int64[]{0x 0001,  0x 0002}
 //Got: []int64{0x0001 , 0x0002 000}
Can somepoint point me how do we write such scenario which should work on 
both little/endian platforms.
Any leads on this?

Thanks,
Srinivas

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/4282aff6-0c61-4105-8032-f7c92ff341d1n%40googlegroups.com.

Re: RFR: 8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2024-05-06 Thread Srinivas Vamsi Parasa

On Tue, 30 Apr 2024 22:01:30 GMT, Vladimir Yaroslavskiy  
wrote:

>> Hi Vladimir (@iaroslavski),
>> 
>> Please see the data below:
>> 
>> Thanks,
>> Vamsi
>> 
>> 
>> 
>> name | builder | size | mode | count | score
>> -- | -- | -- | -- | -- | --
>> b01 | RANDOM | 600 | avg | 325677 | 6.862
>> b01 | RANDOM | 3000 | avg | 52041 | 82.233
>> b01 | RANDOM | 9 | avg | 1217 | 4456.51
>> b01 | RANDOM | 40 | avg | 242 | 22923.28
>> b01 | RANDOM | 100 | avg | 90 | 60598.84
>> b01 | REPEATED | 600 | avg | 651354 | 1.933
>> b01 | REPEATED | 3000 | avg | 104083 | 13.753
>> b01 | REPEATED | 9 | avg | 2435 | 723.039
>> b01 | REPEATED | 40 | avg | 484 | 3084.416
>> b01 | REPEATED | 100 | avg | 180 | 8234.428
>> b01 | STAGGER | 600 | avg | 1954062 | 1.005
>> b01 | STAGGER | 3000 | avg | 312251 | 4.945
>> b01 | STAGGER | 9 | avg | 7305 | 133.126
>> b01 | STAGGER | 40 | avg | 1453 | 592.144
>> b01 | STAGGER | 100 | avg | 542 | 1493.876
>> b01 | SHUFFLE | 600 | avg | 325677 | 5.12
>> b01 | SHUFFLE | 3000 | avg | 52041 | 29.252
>> b01 | SHUFFLE | 9 | avg | 1217 | 1396.664
>> b01 | SHUFFLE | 40 | avg | 242 | 5743.489
>> b01 | SHUFFLE | 100 | avg | 90 | 15490.81
>> b01_ins | RANDOM | 600 | avg | 325677 | 7.594
>> b01_ins | RANDOM | 3000 | avg | 52041 | 78.631
>> b01_ins | RANDOM | 9 | avg | 1217 | 4312.511
>> b01_ins | RANDOM | 40 | avg | 242 | 22108.18
>> b01_ins | RANDOM | 100 | avg | 90 | 58467.16
>> b01_ins | REPEATED | 600 | avg | 651354 | 1.569
>> b01_ins | REPEATED | 3000 | avg | 104083 | 11.313
>> b01_ins | REPEATED | 9 | avg | 2435 | 720.838
>> b01_ins | REPEATED | 40 | avg | 484 | 3003.673
>> b01_ins | REPEATED | 100 | avg | 180 | 8144.944
>> b01_ins | STAGGER | 600 | avg | 1954062 | 0.98
>> b01_ins | STAGGER | 3000 | avg | 312251 | 4.948
>> b01_ins | STAGGER | 9 | avg | 7305 | 132.909
>> b01_ins | STAGGER | 40 | avg | 1453 | 592.572
>> b01_ins | STAGGER | 100 | avg | 542 | 1492.627
>> b01_ins | SHUFFLE | 600 | avg | 325677 | 4.092
>> b01_ins | SHUFFLE | 3000 | avg | 52041 | 27.138
>> b01_ins | SHUFFLE | 9 | avg | 1217 | 1304.326
>> b01_ins | SHUFFLE | 40 | avg | 242 | 5465.745
>> b01_ins | SHUFFLE | 100 | avg | 90 | 14585.08
>> b01_mrg | RANDOM | 600 | avg | 325677 | 7.139
>> b01_mrg | RANDOM | 3000 | avg | 52041 | 81.01
>> b01_mrg | RANDOM | 9 | avg | 1217 | 4266.084
>> b01_mrg | RANDOM | 40 | avg | 242 | 21937.77
>> b01_mrg | RANDOM | 100 | avg | 90 | 58177.72
>> b01_mrg | REPEATED | 600 | avg | 651354 | 1.36
>> b01_mrg | REPEATED | 3000 | avg | 104083 | 9.013
>> b01_mrg | REPEATED ...
>
> Hello Vamsi (@vamsi-parasa),
> 
> Could you please run the new benchmarking to detect the best case
> for Radix sort and parallel sorting?
> 
> What you need is to compile and run JavaBenchmarkHarness:
> 
> javac --patch-module java.base=. -d classes *.java
> java -XX:CompileThreshold=1 -XX:-TieredCompilation --patch-module 
> java.base=classes -cp classes java.util.JavaBenchmarkHarness
> 
> Find the sources there:
> 
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30_a.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30_5.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30_11.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30_12.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30_13.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30_14.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30_21.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30_23.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/JavaBenchmarkHarness.java
> 
> Thank you,
> Vladimir

Hi Vladimir (@iaroslavski),

Please see the data below.

Thanks,
Vamsi

name | builder | size | mode | count | score
-- | -- | -- | -- | -- | --
b01 | RANDOM | 600 | avg | 325677 | 6.764
b01 | RANDOM | 3000 | avg | 52041 | 77.742
b01 | RANDOM | 9 | avg | 1217 | 4449.668
b01 | RANDOM | 40 | avg | 242 | 22764.05
b01 | RANDOM | 100 | avg | 90 | 60737.71
b01 | REPEATED | 600 | avg | 651354 | 1.723
b01 | REPEATED | 3000 | avg | 104083 | 12.383
b01 | REPEATED | 9 | avg | 2435 | 714.451
b01 | REPEATED | 40 | avg | 484 | 3039.447
b01 | REPEATED | 100 | avg | 180 | 8114.503
b01 | SAWTOOTH | 600 | avg | 1954062 | 1.009
b01 | SAWTOOTH | 3000 | avg | 312251 | 4.94
b01 | SAWTOOTH | 9 | avg | 7305 | 133.192
b01 | SAWTOOTH | 40 | avg | 1453 | 591.854
b01 | SAWTOOTH | 100 | avg | 542 | 1494.252
b01 | STAGGER | 600 | avg | 1954062 | 8.252
b01 | STAGGER | 3000 |

Re: RFR: 8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2024-04-20 Thread Srinivas Vamsi Parasa

On Tue, 9 Apr 2024 21:36:46 GMT, Vladimir Yaroslavskiy  wrote:

>>> Hi Vamsi (@vamsi-parasa), few questions on your test environment:
>>> 
>>> * what are the hardware specs of your server ?
>>> * bare-metal or virtual ?
>>> * are other services or big processes running ?
>>> * os tuning ? CPU HT: off? Fixed CPU governor or frequency ?
>>> * isolation using taskset ?
>>> 
>>> Maybe C2 JIT (+ CDS archive) are given more performance on stock jdk sort 
>>> than same code running outside jdk...
>>> 
>>> Thanks, Laurent
>> 
>> Hi Laurent,
>> 
>> The benchmarks are run on Intel TigerLake Core i7 machine. It's bare-metal 
>> without any virtualization. HT is ON and there is no other specific OS 
>> tuning or isolation using taskset.
>> 
>> Thanks,
>> Vamsi
>
> Hello Vamsi (@vamsi-parasa),
> 
> Could you please run the new benchmarking?
> To save time and don't patch JDK several times, I've created 
> JavaBenchmarkHarness
> class which is under package java.util and compares several versions of DPQS.
> Also I prepared several versions of current sorting source from JDK to detect 
> what is going wrong.
> 
> What you need is to compile and run JavaBenchmarkHarness once:
> 
> javac --patch-module java.base=. -d classes *.java
> java -XX:CompileThreshold=1 -XX:-TieredCompilation --patch-module 
> java.base=classes -cp classes java.util.JavaBenchmarkHarness
> 
> Find the sources there:
> 
> https://github.com/iaroslavski/sorting/blob/master/radixsort/JavaBenchmarkHarness.java
>   
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01_ins.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01_mrg.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01_piv.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01_prt.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r29p.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r29p5.java
> 
> Thank you,
> Vladimir

Hi Vladimir (@iaroslavski),

Please see the data below:

Thanks,
Vamsi



name | builder | size | mode | count | score
-- | -- | -- | -- | -- | --
b01 | RANDOM | 600 | avg | 325677 | 6.862
b01 | RANDOM | 3000 | avg | 52041 | 82.233
b01 | RANDOM | 9 | avg | 1217 | 4456.51
b01 | RANDOM | 40 | avg | 242 | 22923.28
b01 | RANDOM | 100 | avg | 90 | 60598.84
b01 | REPEATED | 600 | avg | 651354 | 1.933
b01 | REPEATED | 3000 | avg | 104083 | 13.753
b01 | REPEATED | 9 | avg | 2435 | 723.039
b01 | REPEATED | 40 | avg | 484 | 3084.416
b01 | REPEATED | 100 | avg | 180 | 8234.428
b01 | STAGGER | 600 | avg | 1954062 | 1.005
b01 | STAGGER | 3000 | avg | 312251 | 4.945
b01 | STAGGER | 9 | avg | 7305 | 133.126
b01 | STAGGER | 40 | avg | 1453 | 592.144
b01 | STAGGER | 100 | avg | 542 | 1493.876
b01 | SHUFFLE | 600 | avg | 325677 | 5.12
b01 | SHUFFLE | 3000 | avg | 52041 | 29.252
b01 | SHUFFLE | 9 | avg | 1217 | 1396.664
b01 | SHUFFLE | 40 | avg | 242 | 5743.489
b01 | SHUFFLE | 100 | avg | 90 | 15490.81
b01_ins | RANDOM | 600 | avg | 325677 | 7.594
b01_ins | RANDOM | 3000 | avg | 52041 | 78.631
b01_ins | RANDOM | 9 | avg | 1217 | 4312.511
b01_ins | RANDOM | 40 | avg | 242 | 22108.18
b01_ins | RANDOM | 100 | avg | 90 | 58467.16
b01_ins | REPEATED | 600 | avg | 651354 | 1.569
b01_ins | REPEATED | 3000 | avg | 104083 | 11.313
b01_ins | REPEATED | 9 | avg | 2435 | 720.838
b01_ins | REPEATED | 40 | avg | 484 | 3003.673
b01_ins | REPEATED | 100 | avg | 180 | 8144.944
b01_ins | STAGGER | 600 | avg | 1954062 | 0.98
b01_ins | STAGGER | 3000 | avg | 312251 | 4.948
b01_ins | STAGGER | 9 | avg | 7305 | 132.909
b01_ins | STAGGER | 40 | avg | 1453 | 592.572
b01_ins | STAGGER | 100 | avg | 542 | 1492.627
b01_ins | SHUFFLE | 600 | avg | 325677 | 4.092
b01_ins | SHUFFLE | 3000 | avg | 52041 | 27.138
b01_ins | SHUFFLE | 9 | avg | 1217 | 1304.326
b01_ins | SHUFFLE | 40 | avg | 242 | 5465.745
b01_ins | SHUFFLE | 100 | avg | 90 | 14585.08
b01_mrg | RANDOM | 600 | avg | 325677 | 7.139
b01_mrg | RANDOM | 3000 | avg | 52041 | 81.01
b01_mrg | RANDOM | 9 | avg | 1217 | 4266.084
b01_mrg | RANDOM | 40 | avg | 242 | 21937.77
b01_mrg | RANDOM | 100 | avg | 90 | 58177.72
b01_mrg | REPEATED | 600 | avg | 651354 | 1.36
b01_mrg | REPEATED | 3000 | avg | 104083 | 9.013
b01_mrg | REPEATED | 9 | avg | 2435 | 737.684
b01_mrg | REPEATED | 40 | avg | 484 | 3152.447
b01_mrg | REPEATED | 100 | avg | 180 | 8366.866
b01_mrg | STAGGER | 600 | avg | 1954062 | 0.73
b01_mrg | STAGGER | 3000 | avg | 312251 | 3.733
b01_mrg | STAGGER | 9 | avg | 7305 | 114.35
b01_mrg | STAGGER | 40 | avg | 1453 | 524.821
b01_mrg | STAGGER | 100 | avg | 542 | 1351.504
b01_mrg | SHUFFLE | 600 | avg | 325677 | 4.986
b01_mrg |

[Kernel-packages] [Bug 2036135] Re: thermald assert failure: * stack smashing detected *: terminated

2024-04-18 Thread Srinivas Pandruvada

Is this possible to reproduce using thermald built from
https://github.com/intel/thermal_daemon?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to thermald in Ubuntu.
https://bugs.launchpad.net/bugs/2036135

Title:
  thermald assert failure: *** stack smashing detected ***: terminated

Status in thermald package in Ubuntu:
  Confirmed

Bug description:
  suddenly occured

  ProblemType: Crash
  DistroRelease: Ubuntu 23.10
  Package: thermald 2.5.4-2
  ProcVersionSignature: Ubuntu 6.5.0-5.5-generic 6.5.0
  Uname: Linux 6.5.0-5-generic x86_64
  ApportVersion: 2.27.0-0ubuntu2
  Architecture: amd64
  AssertionMessage: *** stack smashing detected ***: terminated
  CasperMD5CheckResult: pass
  Date: Thu Sep 14 12:41:34 2023
  ExecutablePath: /usr/sbin/thermald
  InstallationDate: Installed on 2023-09-14 (1 days ago)
  InstallationMedia: Ubuntu 23.10 "Mantic Minotaur" - Daily amd64 (20230908.2)
  ProcCmdline: /usr/sbin/thermald --systemd --dbus-enable --adaptive
  ProcEnviron:
   LANG=ja_JP.UTF-8
   PATH=(custom, no user)
  Signal: 6
  SourcePackage: thermald
  StacktraceTop:
   __libc_message (fmt=fmt@entry=0x7fad897c38d3 "*** %s ***: terminated\n") at 
../sysdeps/posix/libc_fatal.c:150
   __GI___fortify_fail (msg=msg@entry=0x7fad897c38eb "stack smashing detected") 
at ./debug/fortify_fail.c:24
   __stack_chk_fail () at ./debug/stack_chk_fail.c:24
   cthd_acpi_rel::read_psvt() ()
   ?? ()
  Title: thermald assert failure: *** stack smashing detected ***: terminated
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  modified.conffile..etc.init.thermald.conf: [deleted]
  mtime.conffile..etc.thermald.thermal-cpu-cdev-order.xml: 2023-08-25T19:29:11
  separator:

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/2036135/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Bug 2036135] Re: thermald assert failure: * stack smashing detected *: terminated

2024-04-18 Thread Srinivas Pandruvada

Is this possible to reproduce using thermald built from
https://github.com/intel/thermal_daemon?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2036135

Title:
  thermald assert failure: *** stack smashing detected ***: terminated

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/2036135/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

RE: [PATCH 22/22] drm/i915: Use debugfs_create_bool() for "i915_bigjoiner_force_enable"

2024-04-02 Thread Srinivas, Vidya

Hello Ville, Thank you very much for the series. 6K detects fine and works.
Tested-by: Vidya Srinivas 

> -Original Message-
> From: Intel-gfx  On Behalf Of Ville
> Syrjala
> Sent: Friday, March 29, 2024 6:43 AM
> To: intel-gfx@lists.freedesktop.org
> Subject: [PATCH 22/22] drm/i915: Use debugfs_create_bool() for
> "i915_bigjoiner_force_enable"
> 
> From: Ville Syrjälä 
> 
> There is no reason to make this debugfs file for a simple boolean so
> complicated. Just use debugfs_create_bool().
> 
> Signed-off-by: Ville Syrjälä 
> ---
>  .../drm/i915/display/intel_display_debugfs.c  | 44 +--
>  1 file changed, 2 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display_debugfs.c
> b/drivers/gpu/drm/i915/display/intel_display_debugfs.c
> index b99c024b0934..3e364891dcd0 100644
> --- a/drivers/gpu/drm/i915/display/intel_display_debugfs.c
> +++ b/drivers/gpu/drm/i915/display/intel_display_debugfs.c
> @@ -1402,20 +1402,6 @@ out:   drm_modeset_unlock(
> >drm.mode_config.connection_mutex);
>   return ret;
>  }
> 
> -static int i915_bigjoiner_enable_show(struct seq_file *m, void *data) -{
> - struct intel_connector *connector = m->private;
> - struct drm_crtc *crtc;
> -
> - crtc = connector->base.state->crtc;
> - if (connector->base.status != connector_status_connected || !crtc)
> - return -ENODEV;
> -
> - seq_printf(m, "Bigjoiner enable: %d\n", connector-
> >force_bigjoiner_enable);
> -
> - return 0;
> -}
> -
>  static ssize_t i915_dsc_output_format_write(struct file *file,
>   const char __user *ubuf,
>   size_t len, loff_t *offp)
> @@ -1437,30 +1423,6 @@ static ssize_t i915_dsc_output_format_write(struct
> file *file,
>   return len;
>  }
> 
> -static ssize_t i915_bigjoiner_enable_write(struct file *file,
> -const char __user *ubuf,
> -size_t len, loff_t *offp)
> -{
> - struct seq_file *m = file->private_data;
> - struct intel_connector *connector = m->private;
> - struct drm_crtc *crtc;
> - bool bigjoiner_en = 0;
> - int ret;
> -
> - crtc = connector->base.state->crtc;
> - if (connector->base.status != connector_status_connected || !crtc)
> - return -ENODEV;
> -
> - ret = kstrtobool_from_user(ubuf, len, _en);
> - if (ret < 0)
> - return ret;
> -
> - connector->force_bigjoiner_enable = bigjoiner_en;
> - *offp += len;
> -
> - return len;
> -}
> -
>  static int i915_dsc_output_format_open(struct inode *inode,
>  struct file *file)
>  {
> @@ -1554,8 +1516,6 @@ static const struct file_operations
> i915_dsc_fractional_bpp_fops = {
>   .write = i915_dsc_fractional_bpp_write  };
> 
> -DEFINE_SHOW_STORE_ATTRIBUTE(i915_bigjoiner_enable);
> -
>  /*
>   * Returns the Current CRTC's bpc.
>   * Example usage: cat /sys/kernel/debug/dri/0/crtc-0/i915_current_bpc
> @@ -1640,8 +1600,8 @@ void intel_connector_debugfs_add(struct
> intel_connector *connector)
>   if (DISPLAY_VER(i915) >= 11 &&
>   (connector_type == DRM_MODE_CONNECTOR_DisplayPort ||
>connector_type == DRM_MODE_CONNECTOR_eDP)) {
> - debugfs_create_file("i915_bigjoiner_force_enable", 0644,
> root,
> - connector, _bigjoiner_enable_fops);
> + debugfs_create_bool("i915_bigjoiner_force_enable", 0644,
> root,
> + >force_bigjoiner_enable);
>   }
> 
>   if (connector_type == DRM_MODE_CONNECTOR_DSI ||
> --
> 2.43.2

RE: [PATCH 5/6] drm/i915: Handle joined pipes inside hsw_crtc_enable()

2024-03-25 Thread Srinivas, Vidya

Thank you Stan. Rev 14 works.
Tested-by: Vidya Srinivas 

> -Original Message-
> From: Lisovskiy, Stanislav 
> Sent: Wednesday, March 20, 2024 8:45 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: Lisovskiy, Stanislav ; Saarinen, Jani
> ; ville.syrj...@linux.intel.com; Srinivas, Vidya
> 
> Subject: [PATCH 5/6] drm/i915: Handle joined pipes inside hsw_crtc_enable()
> 
> Handle only bigjoiner masters in skl_commit_modeset_enables/disables,
> slave crtcs should be handled by master hooks. Same for encoders.
> That way we can also remove a bunch of checks like
> intel_crtc_is_bigjoiner_slave.
> 
> v2: - Moved skl_pfit_enable, intel_dsc_enable, intel_crtc_vblank_on to
> intel_enable_ddi,
>   so that it is now finally symmetrical with the disable case, because
> currently
>   for some weird reason we are calling those from
> skl_commit_modeset_enables, while
>   for the disable case those are called from the ddi disable hooks.
> 
> v3: - Create intel_ddi_enable_hdmi_or_sst symmetrical to
>   intel_ddi_post_disable_hdmi_or_sst and move it also under non-mst
> check.
> 
> v4: - Fix intel_enable_ddi sequence
> - Call intel_crtc_update_active_timings for slave pipes as well
> 
> Signed-off-by: Stanislav Lisovskiy 
> ---
>  drivers/gpu/drm/i915/display/intel_ddi.c |  45 -
>  drivers/gpu/drm/i915/display/intel_display.c | 179 ++-
>  drivers/gpu/drm/i915/display/intel_display.h |   7 +
>  3 files changed, 137 insertions(+), 94 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c
> b/drivers/gpu/drm/i915/display/intel_ddi.c
> index 290ccab7c9ee8..9128b82a49c31 100644
> --- a/drivers/gpu/drm/i915/display/intel_ddi.c
> +++ b/drivers/gpu/drm/i915/display/intel_ddi.c
> @@ -3366,15 +3366,28 @@ static void intel_enable_ddi_hdmi(struct
> intel_atomic_state *state,
>   intel_wait_ddi_buf_active(dev_priv, port);  }
> 
> -static void intel_enable_ddi(struct intel_atomic_state *state,
> -  struct intel_encoder *encoder,
> -  const struct intel_crtc_state *crtc_state,
> -  const struct drm_connector_state *conn_state)
> +static void intel_ddi_enable_hdmi_or_sst(struct intel_atomic_state *state,
> +  struct intel_encoder *encoder,
> +  const struct intel_crtc_state
> *crtc_state,
> +  const struct drm_connector_state
> *conn_state)
>  {
> - drm_WARN_ON(state->base.dev, crtc_state->has_pch_encoder);
> + struct drm_i915_private *i915 = to_i915(encoder->base.dev);
> + u8 pipe_mask = intel_crtc_joined_pipe_mask(crtc_state);
> + struct intel_crtc *crtc;
> +
> + for_each_intel_crtc_in_pipe_mask_reverse(>drm, crtc,
> pipe_mask) {
> + const struct intel_crtc_state *new_crtc_state =
> + intel_atomic_get_new_crtc_state(state, crtc);
> +
> + intel_dsc_enable(new_crtc_state);
> +
> + if (DISPLAY_VER(i915) >= 9)
> + skl_pfit_enable(new_crtc_state);
> + else
> + ilk_pfit_enable(new_crtc_state);
> + }
> 
> - if (!intel_crtc_is_bigjoiner_slave(crtc_state))
> - intel_ddi_enable_transcoder_func(encoder, crtc_state);
> + intel_ddi_enable_transcoder_func(encoder, crtc_state);
> 
>   /* Enable/Disable DP2.0 SDP split config before transcoder */
>   intel_audio_sdp_split_update(crtc_state);
> @@ -3383,7 +3396,22 @@ static void intel_enable_ddi(struct
> intel_atomic_state *state,
> 
>   intel_ddi_wait_for_fec_status(encoder, crtc_state, true);
> 
> - intel_crtc_vblank_on(crtc_state);
> + for_each_intel_crtc_in_pipe_mask_reverse(>drm, crtc,
> pipe_mask) {
> + const struct intel_crtc_state *new_crtc_state =
> + intel_atomic_get_new_crtc_state(state, crtc);
> + intel_crtc_vblank_on(new_crtc_state);
> + }
> +}
> +
> +static void intel_enable_ddi(struct intel_atomic_state *state,
> +  struct intel_encoder *encoder,
> +  const struct intel_crtc_state *crtc_state,
> +  const struct drm_connector_state *conn_state) {
> + drm_WARN_ON(state->base.dev, crtc_state->has_pch_encoder);
> +
> + if (!intel_crtc_has_type(crtc_state, INTEL_OUTPUT_DP_MST))
> + intel_ddi_enable_hdmi_or_sst(state, encoder, crtc_state,
> conn_state);
> 
>   if (intel_crtc_has_type(crtc_state, INTEL_OUTPUT_HDMI))
>   intel_enable_ddi_hdmi(state, encoder, crtc_state,
> conn_state); @

Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v13]

2024-03-20 Thread Y . Srinivas Ramakrishna

On Tue, 19 Mar 2024 22:26:22 GMT, Stuart Marks  wrote:

>> I think you are overthinking this somewhat Ramki. I don't see a practical 
>> (non discrete-math) distinction between "some" and "any", so would not 
>> object to that single word change if it helps. But "potential" should remain 
>> as it covers branching in the program whereby if we proceed down one branch 
>> an object remains reachable, whereas if we precede down another then it may 
>> not.
>
> I don't think changing "any" to "some" is helpful. I think "any" is ambiguous 
> regarding meaning universal or existential strength. The sense used here is, 
> considering the possible future execution paths of a thread, if any of them 
> accesses the object, that object is reachable. In other words, it means "any 
> one" and not "all".

OK, no worries; will let you decide what makes sense. Thanks!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1531559810

Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v13]

2024-03-19 Thread Y . Srinivas Ramakrishna

On Tue, 19 Mar 2024 16:20:55 GMT, Y. Srinivas Ramakrishna  
wrote:

>> https://docs.oracle.com/javase/specs/jls/se21/html/jls-12.html#jls-12.6.1
>> 
>>> A reachable object is any object that can be accessed in any potential 
>>> continuing computation from any live thread. 
>> 
>> It may be "loose" because the devil is in the details when it comes to 
>> reachability, but I disagree that it is "sloppy". This expresses 
>> reachability in simple terms, as a "first-order" or "Newtonian" model. There 
>> are of course "Quantum" effects that need to be dealt with in practice. The 
>> JLS alludes to this with:
>>> Optimizing transformations of a program can be designed that reduce the 
>>> number of objects that are reachable to be less than those which would 
>>> naively be considered reachable.
>
> Sorry, my use of words was sloppy here. I think I did mean loose or somewhat 
> informal and therefore slippery.
> 
> What I was saying is that using terms such as "any continuing computation" 
> doesn't make sense because this is referring to a current state of the 
> computation. I'm not sure what "any continuing computation" from a state is 
> because the concept of what constitutes the notion of "a continuing 
> computation" has not been defined before. To me it sounds like a computation 
> tree with nodes as state and transitions as edges and a continuing 
> computation as a path through that tree into the future. The way it is 
> written then, it sounds to the naive reader, or to me at least, as if the 
> object is perpetually reachable by every thread always. I assume I am 
> misinterpreting the intention of the writing, but it sounds too loose for a 
> definition being invoked here in the javadoc. May be it can be tightened up a 
> bit.
> 
> Could one state instead that "An object is reachable at a given state when 
> some thread is able to access the object through a sequence of steps starting 
> at that state without other threads taking any steps."  ? Or something along 
> those lines? Or at least something tighter than the current wording that is 
> somewhat too loose.

In fact, it appears as if the problem is with the use of "any", which is 
universal in strength, whereas the intention here is existential in strength 
(as suggested by. my wording). Indeed, you might achieve the same effect by 
replacing "any" with "some" so that:

"An object is reachable if it can be accessed in some continuing computation 
from some live thread."

You needn't even say live because dead threads can neither take steps nor 
continue participating in the computation nor can they "access" objects for 
whatever informal notion of access. The "some continuing computation" subsumes 
"potential" (as in a possible future) so potential can be dropped.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1530731176

Re: [PATCH] thermal: intel: int340x_thermal: replace deprecated strncpy with strscpy

2024-03-19 Thread srinivas pandruvada

On Tue, 2024-03-19 at 12:39 +0100, Rafael J. Wysocki wrote:
> On Mon, Mar 18, 2024 at 11:36 PM Justin Stitt
>  wrote:
> > 
> > strncpy() is deprecated for use on NUL-terminated destination
> > strings
> > [1] and as such we should prefer more robust and less ambiguous
> > string
> > interfaces.
> > 
> > psvt->limit.string can only be 8 bytes so let's use the appropriate
> > size
> > macro ACPI_LIMIT_STR_MAX_LEN.
> > 
> > Neither psvt->limit.string or psvt_user[i].limit.string requires
> > the
> > NUL-padding behavior that strncpy() provides as they have both been
> > filled with NUL-bytes prior to the string operation.
> > >   memset(>limit, 0, sizeof(u64));
> > and
> > >   psvt_user = kzalloc(psvt_len, GFP_KERNEL);
> > 
> > Let's use `strscpy` [2] due to the fact that it guarantees
> > NUL-termination on the destination buffer without unnecessarily
> > NUL-padding.
> > 
> > Link:
> > https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
> >  [1]
> > Link:
> > https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
> >  [2]
> > Link: https://github.com/KSPP/linux/issues/90
> > Cc: linux-hardening@vger.kernel.org
> > Signed-off-by: Justin Stitt 
> 
> Srinivas, any objections?
No

Reviewed-by: Srinivas Pandruvada 

> 
> > ---
> > Note: build-tested only.
> > 
> > Found with: $ rg "strncpy\("
> > ---
> >  drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git
> > a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> > b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> > index dc519a665c18..4b4a4d63e61f 100644
> > --- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> > +++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> > @@ -309,7 +309,7 @@ static int acpi_parse_psvt(acpi_handle handle,
> > int *psvt_count, struct psvt **ps
> > 
> >     if (knob->type == ACPI_TYPE_STRING) {
> >     memset(>limit, 0, sizeof(u64));
> > -   strncpy(psvt->limit.string, psvt_ptr-
> > >limit.str_ptr, knob->string.length);
> > +   strscpy(psvt->limit.string, psvt_ptr-
> > >limit.str_ptr, ACPI_LIMIT_STR_MAX_LEN);
> >     } else {
> >     psvt->limit.integer = psvt_ptr-
> > >limit.integer;
> >     }
> > @@ -468,7 +468,7 @@ static int fill_psvt(char __user *ubuf)
> >     psvt_user[i].unlimit_coeff =
> > psvts[i].unlimit_coeff;
> >     psvt_user[i].control_knob_type =
> > psvts[i].control_knob_type;
> >     if (psvt_user[i].control_knob_type ==
> > ACPI_TYPE_STRING)
> > -   strncpy(psvt_user[i].limit.string,
> > psvts[i].limit.string,
> > +   strscpy(psvt_user[i].limit.string,
> > psvts[i].limit.string,
> >     ACPI_LIMIT_STR_MAX_LEN);
> >     else
> >     psvt_user[i].limit.integer =
> > psvts[i].limit.integer;
> > 
> > ---
> > base-commit: bf3a69c6861ff4dc7892d895c87074af7bc1c400
> > change-id: 20240318-strncpy-drivers-thermal-intel-int340x_thermal-
> > acpi_thermal_rel-c-17070c1e42f3
> > 
> > Best regards,
> > --
> > Justin Stitt 
> >

Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v13]

2024-03-19 Thread Y . Srinivas Ramakrishna

On Tue, 19 Mar 2024 02:53:37 GMT, David Holmes  wrote:

>> src/java.base/share/classes/java/lang/ref/package-info.java line 137:
>> 
>>> 135:  *
>>> 136:  * A reachable object is any object that can be accessed in 
>>> any potential
>>> 137:  * continuing computation from any live thread (as stated in {@jls 
>>> 12.6.1}).
>> 
>> This seems like somewhat loose and sloppy wording to me. "Any potential 
>> continuing computation"? "Any live thread"? Could you share a pointer to JLS 
>> 12.6.1 being referenced here?
>
> https://docs.oracle.com/javase/specs/jls/se21/html/jls-12.html#jls-12.6.1
> 
>> A reachable object is any object that can be accessed in any potential 
>> continuing computation from any live thread. 
> 
> It may be "loose" because the devil is in the details when it comes to 
> reachability, but I disagree that it is "sloppy". This expresses reachability 
> in simple terms, as a "first-order" or "Newtonian" model. There are of course 
> "Quantum" effects that need to be dealt with in practice. The JLS alludes to 
> this with:
>> Optimizing transformations of a program can be designed that reduce the 
>> number of objects that are reachable to be less than those which would 
>> naively be considered reachable.

Sorry, my use of words was sloppy here. I think I did mean loose or somewhat 
informal and therefore slippery.

What I was saying is that using terms such as "any continuing computation" 
doesn't make sense because this is referring to a current state of the 
computation. I'm not sure what "any continuing computation" from a state is 
because the concept of what constitutes the notion of "a continuing 
computation" has not been defined before. To me it sounds like a computation 
tree with nodes as state and transitions as edges and a continuing computation 
as a path through that tree into the future. The way it is written then, it 
sounds to the naive reader, or to me at least, as if the object is perpetually 
reachable by every thread always. I assume I am misinterpreting the intention 
of the writing, but it sounds too loose for a definition being invoked here in 
the javadoc. May be it can be tightened up a bit.

Could one state instead that "An object is reachable at a given state when some 
thread is able to access the object through a sequence of steps starting at 
that state without other threads taking any steps."  ? Or something along those 
lines? Or at least something tighter than the current wording that is somewhat 
too loose.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1530705355

Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v13]

2024-03-18 Thread Y . Srinivas Ramakrishna

On Thu, 14 Mar 2024 23:23:07 GMT, Brent Christian  wrote:

>> Classes in the `java.lang.ref` package would benefit from an update to bring 
>> the spec in line with how the VM already behaves. The changes would focus on 
>> _happens-before_ edges at some key points during reference processing.
>> 
>> A couple key things we want to be able to say are:
>> - `Reference.reachabilityFence(x)` _happens-before_ reference processing 
>> occurs for 'x'.
>> - `Cleaner.register()` _happens-before_ the Cleaner thread runs the 
>> registered cleaning action.
>> 
>> This will bring Cleaner in line (or close) with the memory visibility 
>> guarantees made for finalizers in [JLS 
>> 17.4.5](https://docs.oracle.com/javase/specs/jls/se18/html/jls-17.html#jls-17.4.5):
>> _"There is a happens-before edge from the end of a constructor of an object 
>> to the start of a finalizer (§12.6) for that object."_
>
> Brent Christian has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   further tweaks to reachability

src/java.base/share/classes/java/lang/ref/package-info.java line 137:

> 135:  *
> 136:  * A reachable object is any object that can be accessed in any 
> potential
> 137:  * continuing computation from any live thread (as stated in {@jls 
> 12.6.1}).

This seems like somewhat loose and sloppy wording to me. "Any potential 
continuing computation"? "Any live thread"? Could you share a pointer to JLS 
12.6.1 being referenced here?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1529523835

[jira] [Updated] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.

2024-03-14 Thread kaushik srinivas (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-16370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-16370:
-
Issue Type: Bug  (was: Improvement)

> offline rollback procedure from kraft mode to zookeeper mode.
> -
>
> Key: KAFKA-16370
> URL: https://issues.apache.org/jira/browse/KAFKA-16370
> Project: Kafka
>  Issue Type: Bug
>    Reporter: kaushik srinivas
>Priority: Major
>
> From the KIP, 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,]
>  
> h2. Finalizing the Migration
> Once the cluster has been fully upgraded to KRaft mode, the controller will 
> still be running in migration mode and making dual writes to KRaft and ZK. 
> Since the data in ZK is still consistent with that of the KRaft metadata log, 
> it is still possible to revert back to ZK.
> *_The time that the cluster is running all KRaft brokers/controllers, but 
> still running in migration mode, is effectively unbounded._*
> Once the operator has decided to commit to KRaft mode, the final step is to 
> restart the controller quorum and take it out of migration mode by setting 
> _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The 
> active controller will only finalize the migration once it detects that all 
> members of the quorum have signaled that they are finalizing the migration 
> (again, using the tagged field in ApiVersionsResponse). Once the controller 
> leaves migration mode, it will write a ZkMigrationStateRecord to the log and 
> no longer perform writes to ZK. It will also disable its special handling of 
> ZK RPCs.
> *At this point, the cluster is fully migrated and is running in KRaft mode. A 
> rollback to ZK is still possible after finalizing the migration, but it must 
> be done offline and it will cause metadata loss (which can also cause 
> partition data loss).*
>  
> Trying out the same in a kafka cluster which is migrated from zookeeper into 
> kraft mode. We observe the rollback is possible by deleting the "/controller" 
> node in the zookeeper before the rollback from kraft mode to zookeeper is 
> done.
> The above snippet indicates that the rollback from kraft to zk after 
> migration is finalized is still possible in offline method. Is there any 
> already known steps to be done as part of this offline method of rollback ?
> From our experience, we currently know of the step "deletion of /controller 
> node in zookeeper to force zookeper based brokers to be elected as new 
> controller after the rollback is done". Are there any additional 
> steps/actions apart from this ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.

2024-03-14 Thread kaushik srinivas (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-16370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-16370:
-
Issue Type: Wish  (was: Improvement)

> offline rollback procedure from kraft mode to zookeeper mode.
> -
>
> Key: KAFKA-16370
> URL: https://issues.apache.org/jira/browse/KAFKA-16370
> Project: Kafka
>  Issue Type: Wish
>    Reporter: kaushik srinivas
>Priority: Major
>
> From the KIP, 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,]
>  
> h2. Finalizing the Migration
> Once the cluster has been fully upgraded to KRaft mode, the controller will 
> still be running in migration mode and making dual writes to KRaft and ZK. 
> Since the data in ZK is still consistent with that of the KRaft metadata log, 
> it is still possible to revert back to ZK.
> *_The time that the cluster is running all KRaft brokers/controllers, but 
> still running in migration mode, is effectively unbounded._*
> Once the operator has decided to commit to KRaft mode, the final step is to 
> restart the controller quorum and take it out of migration mode by setting 
> _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The 
> active controller will only finalize the migration once it detects that all 
> members of the quorum have signaled that they are finalizing the migration 
> (again, using the tagged field in ApiVersionsResponse). Once the controller 
> leaves migration mode, it will write a ZkMigrationStateRecord to the log and 
> no longer perform writes to ZK. It will also disable its special handling of 
> ZK RPCs.
> *At this point, the cluster is fully migrated and is running in KRaft mode. A 
> rollback to ZK is still possible after finalizing the migration, but it must 
> be done offline and it will cause metadata loss (which can also cause 
> partition data loss).*
>  
> Trying out the same in a kafka cluster which is migrated from zookeeper into 
> kraft mode. We observe the rollback is possible by deleting the "/controller" 
> node in the zookeeper before the rollback from kraft mode to zookeeper is 
> done.
> The above snippet indicates that the rollback from kraft to zk after 
> migration is finalized is still possible in offline method. Is there any 
> already known steps to be done as part of this offline method of rollback ?
> From our experience, we currently know of the step "deletion of /controller 
> node in zookeeper to force zookeper based brokers to be elected as new 
> controller after the rollback is done". Are there any additional 
> steps/actions apart from this ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.

2024-03-14 Thread kaushik srinivas (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-16370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas updated KAFKA-16370:
-
Issue Type: Improvement  (was: Wish)

> offline rollback procedure from kraft mode to zookeeper mode.
> -
>
> Key: KAFKA-16370
> URL: https://issues.apache.org/jira/browse/KAFKA-16370
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: kaushik srinivas
>Priority: Major
>
> From the KIP, 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,]
>  
> h2. Finalizing the Migration
> Once the cluster has been fully upgraded to KRaft mode, the controller will 
> still be running in migration mode and making dual writes to KRaft and ZK. 
> Since the data in ZK is still consistent with that of the KRaft metadata log, 
> it is still possible to revert back to ZK.
> *_The time that the cluster is running all KRaft brokers/controllers, but 
> still running in migration mode, is effectively unbounded._*
> Once the operator has decided to commit to KRaft mode, the final step is to 
> restart the controller quorum and take it out of migration mode by setting 
> _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The 
> active controller will only finalize the migration once it detects that all 
> members of the quorum have signaled that they are finalizing the migration 
> (again, using the tagged field in ApiVersionsResponse). Once the controller 
> leaves migration mode, it will write a ZkMigrationStateRecord to the log and 
> no longer perform writes to ZK. It will also disable its special handling of 
> ZK RPCs.
> *At this point, the cluster is fully migrated and is running in KRaft mode. A 
> rollback to ZK is still possible after finalizing the migration, but it must 
> be done offline and it will cause metadata loss (which can also cause 
> partition data loss).*
>  
> Trying out the same in a kafka cluster which is migrated from zookeeper into 
> kraft mode. We observe the rollback is possible by deleting the "/controller" 
> node in the zookeeper before the rollback from kraft mode to zookeeper is 
> done.
> The above snippet indicates that the rollback from kraft to zk after 
> migration is finalized is still possible in offline method. Is there any 
> already known steps to be done as part of this offline method of rollback ?
> From our experience, we currently know of the step "deletion of /controller 
> node in zookeeper to force zookeper based brokers to be elected as new 
> controller after the rollback is done". Are there any additional 
> steps/actions apart from this ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.

2024-03-14 Thread kaushik srinivas (Jira)

kaushik srinivas created KAFKA-16370:


 Summary: offline rollback procedure from kraft mode to zookeeper 
mode.
 Key: KAFKA-16370
 URL: https://issues.apache.org/jira/browse/KAFKA-16370
 Project: Kafka
  Issue Type: Improvement
Reporter: kaushik srinivas


>From the KIP, 
>[https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,]

 
h2. Finalizing the Migration

Once the cluster has been fully upgraded to KRaft mode, the controller will 
still be running in migration mode and making dual writes to KRaft and ZK. 
Since the data in ZK is still consistent with that of the KRaft metadata log, 
it is still possible to revert back to ZK.

*_The time that the cluster is running all KRaft brokers/controllers, but still 
running in migration mode, is effectively unbounded._*

Once the operator has decided to commit to KRaft mode, the final step is to 
restart the controller quorum and take it out of migration mode by setting 
_zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The active 
controller will only finalize the migration once it detects that all members of 
the quorum have signaled that they are finalizing the migration (again, using 
the tagged field in ApiVersionsResponse). Once the controller leaves migration 
mode, it will write a ZkMigrationStateRecord to the log and no longer perform 
writes to ZK. It will also disable its special handling of ZK RPCs.

*At this point, the cluster is fully migrated and is running in KRaft mode. A 
rollback to ZK is still possible after finalizing the migration, but it must be 
done offline and it will cause metadata loss (which can also cause partition 
data loss).*

 

Trying out the same in a kafka cluster which is migrated from zookeeper into 
kraft mode. We observe the rollback is possible by deleting the "/controller" 
node in the zookeeper before the rollback from kraft mode to zookeeper is done.

The above snippet indicates that the rollback from kraft to zk after migration 
is finalized is still possible in offline method. Is there any already known 
steps to be done as part of this offline method of rollback ?

>From our experience, we currently know of the step "deletion of /controller 
>node in zookeeper to force zookeper based brokers to be elected as new 
>controller after the rollback is done". Are there any additional steps/actions 
>apart from this ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.

2024-03-14 Thread kaushik srinivas (Jira)

kaushik srinivas created KAFKA-16370:


 Summary: offline rollback procedure from kraft mode to zookeeper 
mode.
 Key: KAFKA-16370
 URL: https://issues.apache.org/jira/browse/KAFKA-16370
 Project: Kafka
  Issue Type: Improvement
Reporter: kaushik srinivas


>From the KIP, 
>[https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,]

 
h2. Finalizing the Migration

Once the cluster has been fully upgraded to KRaft mode, the controller will 
still be running in migration mode and making dual writes to KRaft and ZK. 
Since the data in ZK is still consistent with that of the KRaft metadata log, 
it is still possible to revert back to ZK.

*_The time that the cluster is running all KRaft brokers/controllers, but still 
running in migration mode, is effectively unbounded._*

Once the operator has decided to commit to KRaft mode, the final step is to 
restart the controller quorum and take it out of migration mode by setting 
_zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The active 
controller will only finalize the migration once it detects that all members of 
the quorum have signaled that they are finalizing the migration (again, using 
the tagged field in ApiVersionsResponse). Once the controller leaves migration 
mode, it will write a ZkMigrationStateRecord to the log and no longer perform 
writes to ZK. It will also disable its special handling of ZK RPCs.

*At this point, the cluster is fully migrated and is running in KRaft mode. A 
rollback to ZK is still possible after finalizing the migration, but it must be 
done offline and it will cause metadata loss (which can also cause partition 
data loss).*

 

Trying out the same in a kafka cluster which is migrated from zookeeper into 
kraft mode. We observe the rollback is possible by deleting the "/controller" 
node in the zookeeper before the rollback from kraft mode to zookeeper is done.

The above snippet indicates that the rollback from kraft to zk after migration 
is finalized is still possible in offline method. Is there any already known 
steps to be done as part of this offline method of rollback ?

>From our experience, we currently know of the step "deletion of /controller 
>node in zookeeper to force zookeper based brokers to be elected as new 
>controller after the rollback is done". Are there any additional steps/actions 
>apart from this ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2024-03-11 Thread Srinivas Vamsi Parasa

On Mon, 11 Mar 2024 19:29:59 GMT, Srinivas Vamsi Parasa  
wrote:

>> Hello Vamsi (@vamsi-parasa),
>> 
>> Could you please run benchmarking of 4 cases with **updated** test class 
>> **ArraysSortNew2**?
>> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSortNew2.java
>> 
>> Put each DPQS class in java.util package and recompiling the JDK for each 
>> case as you
>> did before, and run new class **ArraysSortNew2**.
>> 
>> Find the sources there:
>> 
>> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSortNew2.java
>> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01.java
>> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r27b.java
>> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r27p.java
>> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r27s.java
>> 
>> Thank you,
>> Vladimir
>
> Hi Vladimir (@iaroslavski),
> 
> Please see the data below.
> 
> Thanks,
> Vamsi
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
> 
> 
> 
> 
> 
> 
> 
> 
> Builder | Size | Stock JDK | b01 | r27b | r27p | r27s
> -- | -- | -- | -- | -- | -- | --
> RANDOM | 600 | 1.615 | 1.59 | 2.316 | 1.805 | 1.77
> RANDOM | 2000 | 6.794 | 6.638 | 8.443 | 6.354 | 6.295
> RANDOM | 9 | 296.877 | 304.15 | 337.625 | 341.999 | 307.099
> RANDOM | 40 | 838.061 | 801.108 | 1136.688 | 1161.181 | 781.487
> RANDOM | 300 | 5468.214 | 5452.125 | 8522.698 | 8476.445 | 5368.777
> PERIOD | 600 | 0.877 | 0.875 | 0.663 | 0.663 | 0.685
> PERIOD | 2000 | 1.57 | 1.548 | 1.458 | 1.451 | 1.487
> PERIOD | 9 | 97.208 | 97.677 | 106.01 | 106.516 | 106.629
> PERIOD | 40 | 237.4 | 264.103 | 235.466 | 231.349 | 231.235
> PERIOD | 300 | 2604.56 | 2829.935 | 4867.668 | 4872.361 | 4888.391
> STAGGER | 600 | 1.052 | 1.064 | 0.774 | 0.78 | 0.791
> STAGGER | 2000 | 3.449 | 3.443 | 2.604 | 2.627 | 2.597
> STAGGER | 9 | 102.331 | 103.464 | 73.582 | 73.532 | 75.85
> STAGGER | 40 | 210.829 | 229.37 | 207.356 | 208.565 | 205.141
> STAGGER | 300 | 2205.565 | 2174.588 | 2086.885 | 2070.132 | 2373.443
> SHUFFLE | 600 | 1.885 | 1.892 | 1.934 | 1.36 | 1.386
> SHUFFLE | 2000 | 6.787 | 6.724 | 7.338 | 4.994 | 4.96
> SHUFFLE | 9 | 158.065 | 154.48 | 152.874 | 148.337 | 140.703
> SHUFFLE | 40 | 415.089 | 424.777 | 676.272 | 676.89 | 410.717
> SHUFFLE | 300 | 3999.006 | 4017.496 | 6861.872 | 6894.785 | 3880.883
> RANDOM | 600 | 1.614 | 1.588 | 2.329 | 1.789 | 1.847
> RANDOM | 2000 | 6.756 | 6.634 | 7.757 | 6.224 | 6.23
> RANDOM | 9 | 516.671 | 512.52 | 623.995 | 488.492 | 482.646
> RANDOM | 40 | 2400.818 | 2399.264 | 2903.654 | 2356.675 | 2358.409
> RANDOM | 300 | 20933.23 | 20822.49 | 24428.27 | 20847.57 | 20868.68
> PERIOD | 600 | 0.864 | 0.871 | 0.681 | 0.665 | 0.664
> PERIOD | 2000 | 1.583 | 1.547 | 1.451 | 1.46 | 1.483
> PERIOD | 9 | 63.436 | 63.148 | 63.617 | 64.391 | 65.865
> PERIOD | 40 | 209.807 | 209.234 | 228.7 | 232.854 | 235.667
> PERIOD | 3000...

> Hi Vamsi (@vamsi-parasa), few questions on your test environment:
> 
> * what are the hardware specs of your server ?
> * bare-metal or virtual ?
> * are other services or big processes running ?
> * os tuning ? CPU HT: off? Fixed CPU governor or frequency ?
> * isolation using taskset ?
> 
> Maybe C2 JIT (+ CDS archive) are given more performance on stock jdk sort 
> than same code running outside jdk...
> 
> Thanks, Laurent

Hi Laurent,

The benchmarks are run on Intel TigerLake Core i7 machine. It's bare-metal 
without any virtualization. HT is ON and there is no other specific OS tuning 
or isolation using taskset.

Thanks,
Vamsi

-

PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1989274286

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2024-03-11 Thread Srinivas Vamsi Parasa

On Tue, 27 Feb 2024 20:54:03 GMT, Vladimir Yaroslavskiy  
wrote:

>> Hello Vladimir (@iaroslavski),
>> 
>> Please see the data below. Each DPQS class was copied to java.util and the 
>> JDK was recompiled.
>> 
>> Thanks,
>> Vamsi
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark | (builder) | (size) | Stock JDK | r20p | r20s | r25p | r25s
>> -- | -- | -- | -- | -- | -- | -- | --
>> ArraysSort.Int.p_sort | RANDOM | 600 | 1.618 | 2.601 | 2.966 | 2.898 | 3.269
>> ArraysSort.Int.p_sort | RANDOM | 2000 | 7.433 | 8.438 | 8.463 | 8.414 | 8.65
>> ArraysSort.Int.p_sort | RANDOM | 9 | 258.853 | 355.261 | 326.378 | 
>> 347.65 | 321.894
>> ArraysSort.Int.p_sort | RANDOM | 40 | 842.085 | 1225.929 | 899.852 | 
>> 1278.681 | 932.627
>> ArraysSort.Int.p_sort | RANDOM | 300 | 5723.659 | 8711.108 | 6086.974 | 
>> 8948.101 | 6122.612
>> ArraysSort.Int.p_sort | REPEATED | 600 | 0.52 | 0.585 | 0.629 | 0.586 | 0.579
>> ArraysSort.Int.p_sort | REPEATED | 2000 | 1.18 | 1.225 | 1.21 | 1.225 | 1.238
>> ArraysSort.Int.p_sort | REPEATED | 9 | 102.142 | 85.79 | 86.131 | 87.954 
>> | 86.036
>> ArraysSort.Int.p_sort | REPEATED | 40 | 244.508 | 229.142 | 227.613 | 
>> 228.608 | 228.367
>> ArraysSort.Int.p_sort | REPEATED | 300 | 2752.745 | 2584.103 | 2544.192 
>> | 2576.803 | 2609.833
>> ArraysSort.Int.p_sort | STAGGER | 600 | 1.146 | 0.894 | 0.898 | 0.904 | 0.912
>> ArraysSort.Int.p_sort | STAGGER | 2000 | 3.712 | 3.096 | 3.121 | 3.03 | 3.049
>> ArraysSort.Int.p_sort | STAGGER | 9 | 72.763 | 77.575 | 78.366 | 79.158 
>> | 77.199
>> ArraysSort.Int.p_sort | STAGGER | 40 | 212.455 | 228.331 | 225.888 | 
>> 224.686 | 225.728
>> ArraysSort.Int.p_sort | STAGGER | 300 | 2290.327 | 2216.741 | 2196.138 | 
>> 2236.658 | 2262.472
>> ArraysSort.Int.p_sort | SHUFFLE | 600 | 2.01 | 2.92 | 2.907 | 2.91 | 2.926
>> ArraysSort.Int.p_sort | SHUFFLE | 2000 | 7.06 | 7.759 | 7.776 | 7.688 | 8.062
>> ArraysSort.Int.p_sort | SHUFFLE | 9 | 157.728 | 151.871 | 151.101 | 
>> 154.03 | 151.2
>> ArraysSort.Int.p_sort | SHUFFLE | 40 | 441.166 | 715.243 | 449...
>
> Hello Vamsi (@vamsi-parasa),
> 
> Could you please run benchmarking of 4 cases with **updated** test class 
> **ArraysSortNew2**?
> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSortNew2.java
> 
> Put each DPQS class in java.util package and recompiling the JDK for each 
> case as you
> did before, and run new class **ArraysSortNew2**.
> 
> Find the sources there:
> 
> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSortNew2.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r27b.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r27p.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r27s.java
> 
> Thank you,
> Vladimir

Hi Vladimir (@iaroslavski),

Please see the data below.

Thanks,
Vamsi

http://www.w3.org/TR/REC-html40;>















Builder | Size | Stock JDK | b01 | r27b | r27p | r27s
-- | -- | -- | -- | -- | -- | --
RANDOM | 600 | 1.615 | 1.59 | 2.316 | 1.805 | 1.77
RANDOM | 2000 | 6.794 | 6.638 | 8.443 | 6.354 | 6.295
RANDOM | 9 | 296.877 | 304.15 | 337.625 | 341.999 | 307.099
RANDOM | 40 | 838.061 | 801.108 | 1136.688 | 1161.181 | 781.487
RANDOM | 300 | 5468.214 | 5452.125 | 8522.698 | 8476.445 | 5368.777
PERIOD | 600 | 0.877 | 0.875 | 0.663 | 0.663 | 0.685
PERIOD | 2000 | 1.57 | 1.548 | 1.458 | 1.451 | 1.487
PERIOD | 9 | 97.208 | 97.677 | 106.01 | 106.516 | 106.629
PERIOD | 40 | 237.4 | 264.103 | 235.466 | 231.349 | 231.235
PERIOD | 300 | 2604.56 | 2829.935 | 4867.668 | 4872.361 | 4888.391
STAGGER | 600 | 1.052 | 1.064 | 0.774 | 0.78 | 0.791
STAGGER | 2000 | 3.449 | 3.443 | 2.604 | 2.627 | 2.597
STAGGER | 9 | 102.331 | 103.464 | 73.582 | 73.532 | 75.85
STAGGER | 40 | 210.829 | 229.37 | 207.356 | 208.565 | 205.141
STAGGER | 300 | 2205.565 | 2174.588 | 2086.885 | 2070.132 | 2373.443
SHUFFLE | 600 | 1.885 | 1.892 | 1.934 | 1.36 | 1.386
SHUFFLE | 2000 | 6.787 | 6.724 | 7.338 | 4.994 | 4.96
SHUFFLE | 9 | 158.065 | 154.48 | 152.874 | 148.337 | 140.703
SHUFFLE | 40 | 415.089 | 424.777 | 676.272 | 676.89 | 410.717
SHUFFLE | 300 | 3999.006 | 4017.496 | 6861.872 | 6894.785 | 3880.883
RANDOM | 600 | 1.614 | 1.588 | 2.329 | 1.789 | 1.847
RANDOM | 2000 | 6.756 | 6.634 | 7.757 | 6.224 | 6.23
RANDOM | 9 | 516.671 | 512.52 | 623.995 | 488.492 | 482.646
RANDOM | 40 | 2400.818 | 2399.264 | 2903.654 | 2356.675 | 2358.409
RANDOM | 300 |

[jira] [Created] (KAFKA-16360) Release plan of 3.x kafka releases.

2024-03-11 Thread kaushik srinivas (Jira)

kaushik srinivas created KAFKA-16360:


 Summary: Release plan of 3.x kafka releases.
 Key: KAFKA-16360
 URL: https://issues.apache.org/jira/browse/KAFKA-16360
 Project: Kafka
  Issue Type: Improvement
Reporter: kaushik srinivas


KIP 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-ReleaseTimeline]
 mentions ,
h2. Kafka 3.7
 * January 2024
 * Final release with ZK mode

But we see in Jira, some tickets are marked for 3.8 release. Does apache 
continue to make 3.x releases having zookeeper and kraft supported independent 
of pure kraft 4.x releases ?

If yes, how many more releases can be expected on 3.x release line ?

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-16360) Release plan of 3.x kafka releases.

2024-03-11 Thread kaushik srinivas (Jira)

kaushik srinivas created KAFKA-16360:


 Summary: Release plan of 3.x kafka releases.
 Key: KAFKA-16360
 URL: https://issues.apache.org/jira/browse/KAFKA-16360
 Project: Kafka
  Issue Type: Improvement
Reporter: kaushik srinivas


KIP 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-ReleaseTimeline]
 mentions ,
h2. Kafka 3.7
 * January 2024
 * Final release with ZK mode

But we see in Jira, some tickets are marked for 3.8 release. Does apache 
continue to make 3.x releases having zookeeper and kraft supported independent 
of pure kraft 4.x releases ?

If yes, how many more releases can be expected on 3.x release line ?

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Performance Issue at Ignite Level

2024-03-09 Thread Kalwakuntla, Srinivas

Dear Ignite Team,

Hope this email finds your well !

Reaching out to you, as currently we are facing performance issue with Apache 
Ignite. Our application is running on Azure Kubernetes (v1.27.1), which has 24 
nodes of size "Standard_F64s_v2" (handling roughly 13million of records).
Frequently we are getting "[WARNING][jvm-pause-detector-worker][IgniteKernal] 
Possible too long JVM pause:  milliseconds",  while performing any 
operations such as reading, encoding etc.

Here below is the system information:

System Information (Apache Ignite Clusters)
Heap size: 32GB out of 64GB of memory
CPUs:  Standard_F64s_v2
The number of server instances: 24
IOWait: within normal range
IOPS: 10k-600K per hr
JDK: Oracle JDK8, 1.8.0_281
APACHE IGNITE - 2.14.0

GC configuration in ignite.sh:

XX:-UseContainerSupport
 -XX:+AlwaysPreTouch
 -XX:+UseG1GC
-XX:+ScavengeBeforeFullGC
  -XX:+DisableExplicitGC"

Please assist us in identifying the underlying source of this problem.
Thank you,
Srinivas



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security, AI-powered support capabilities, and assessment of 
internal compliance with Accenture policy. Your privacy is important to us. 
Accenture uses your personal data only in compliance with data protection laws. 
For further information on how Accenture processes your personal data, please 
see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
__

www.accenture.com

[PATCH 1/1] drm/i915: Allow bigjoiner for MST

2024-03-06 Thread Vidya Srinivas

We need bigjoiner support with MST functionality
for MST monitor resolutions > 5K to work.
Adding support for the same.

v2: Addressed review comments from Jani.
Revert rejection of MST bigjoiner modes and add
functionality

v3: Fixed pipe_mismatch WARN for mst_master_transcoder
Credits-to: Manasi Navare 

Signed-off-by: Vidya Srinivas 
---
 drivers/gpu/drm/i915/display/intel_ddi.c|  6 --
 drivers/gpu/drm/i915/display/intel_dp_mst.c | 17 +
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c 
b/drivers/gpu/drm/i915/display/intel_ddi.c
index c587a8efeafc..41998022ed07 100644
--- a/drivers/gpu/drm/i915/display/intel_ddi.c
+++ b/drivers/gpu/drm/i915/display/intel_ddi.c
@@ -3902,9 +3902,11 @@ static void intel_ddi_read_func_ctl(struct intel_encoder 
*encoder,
pipe_config->lane_count =
((temp & DDI_PORT_WIDTH_MASK) >> DDI_PORT_WIDTH_SHIFT) 
+ 1;
 
-   if (DISPLAY_VER(dev_priv) >= 12)
-   pipe_config->mst_master_transcoder =
+   if (DISPLAY_VER(dev_priv) >= 12) {
+   if (!intel_crtc_is_bigjoiner_slave(pipe_config))
+   pipe_config->mst_master_transcoder =

REG_FIELD_GET(TRANS_DDI_MST_TRANSPORT_SELECT_MASK, temp);
+   }
 
intel_cpu_transcoder_get_m1_n1(crtc, cpu_transcoder,
   _config->dp_m_n);
diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c 
b/drivers/gpu/drm/i915/display/intel_dp_mst.c
index db1254b036f1..c5e7293c13eb 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@@ -525,6 +525,7 @@ static int intel_dp_mst_compute_config(struct intel_encoder 
*encoder,
 {
struct drm_i915_private *dev_priv = to_i915(encoder->base.dev);
struct intel_atomic_state *state = 
to_intel_atomic_state(conn_state->state);
+   struct intel_crtc *crtc = to_intel_crtc(pipe_config->uapi.crtc);
struct intel_dp_mst_encoder *intel_mst = enc_to_mst(encoder);
struct intel_dp *intel_dp = _mst->primary->dp;
const struct intel_connector *connector =
@@ -542,6 +543,10 @@ static int intel_dp_mst_compute_config(struct 
intel_encoder *encoder,
if (adjusted_mode->flags & DRM_MODE_FLAG_DBLSCAN)
return -EINVAL;
 
+   if (intel_dp_need_bigjoiner(intel_dp, adjusted_mode->crtc_hdisplay,
+   adjusted_mode->crtc_clock))
+   pipe_config->bigjoiner_pipes = GENMASK(crtc->pipe + 1, 
crtc->pipe);
+
pipe_config->sink_format = INTEL_OUTPUT_FORMAT_RGB;
pipe_config->output_format = INTEL_OUTPUT_FORMAT_RGB;
pipe_config->has_pch_encoder = false;
@@ -1330,12 +1335,6 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector 
*connector,
 *   corresponding link capabilities of the sink) in case the
 *   stream is uncompressed for it by the last branch device.
 */
-   if (mode_rate > max_rate || mode->clock > max_dotclk ||
-   drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port->full_pbn) {
-   *status = MODE_CLOCK_HIGH;
-   return 0;
-   }
-
if (mode->clock < 1) {
*status = MODE_CLOCK_LOW;
return 0;
@@ -1349,8 +1348,10 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector 
*connector,
if (intel_dp_need_bigjoiner(intel_dp, mode->hdisplay, target_clock)) {
bigjoiner = true;
max_dotclk *= 2;
+   }
 
-   /* TODO: add support for bigjoiner */
+   if (mode_rate > max_rate || mode->clock > max_dotclk ||
+   drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port->full_pbn) {
*status = MODE_CLOCK_HIGH;
return 0;
}
@@ -1397,7 +1398,7 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector 
*connector,
return 0;
}
 
-   *status = intel_mode_valid_max_plane_size(dev_priv, mode, false);
+   *status = intel_mode_valid_max_plane_size(dev_priv, mode, bigjoiner);
return 0;
 }
 
-- 
2.33.0

[PATCH 0/1] Enable MST bigjoiner

2024-03-06 Thread Vidya Srinivas

Support resolutions > 5k on MST monitors that need bigjoiner
by adding MST bigjoiner functionality

Vidya Srinivas (1):
  drm/i915: Allow bigjoiner for MST

 drivers/gpu/drm/i915/display/intel_ddi.c|  6 --
 drivers/gpu/drm/i915/display/intel_dp_mst.c | 17 +
 2 files changed, 13 insertions(+), 10 deletions(-)

-- 
2.33.0

RE: [PATCH v2 8/8] drm/i915: Handle joined pipes inside hsw_crtc_disable()

2024-03-03 Thread Srinivas, Vidya

Thank you very much Ville and Stan.
With https://patchwork.freedesktop.org/series/130619/ and 
https://patchwork.freedesktop.org/series/130449/ tested that 6K works
Tested-by: Vidya Srinivas 

> -Original Message-
> From: Intel-gfx  On Behalf Of Ville
> Syrjala
> Sent: Friday, March 1, 2024 10:54 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: Lisovskiy, Stanislav 
> Subject: [PATCH v2 8/8] drm/i915: Handle joined pipes inside
> hsw_crtc_disable()
> 
> From: Ville Syrjälä 
> 
> Reorganize the crtc disable path to only deal with the master
> pipes/transcoders in intel_old_crtc_state_disables() and offload the handling
> of joined pipes to hsw_crtc_disable().
> This makes the whole thing much more sensible since we can actually control
> the order in which we do the per-pipe vs.
> per-transcoder modeset steps.
> 
> v2: Pass the correct crtc pointer to .crtc_disable()
> 
> Signed-off-by: Ville Syrjälä 
> ---
>  drivers/gpu/drm/i915/display/intel_display.c | 66 
>  1 file changed, 39 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c
> b/drivers/gpu/drm/i915/display/intel_display.c
> index 1df3923cc30d..e01536983303 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -1793,29 +1793,27 @@ static void hsw_crtc_disable(struct
> intel_atomic_state *state,
>   const struct intel_crtc_state *old_master_crtc_state =
>   intel_atomic_get_old_crtc_state(state, master_crtc);
>   struct drm_i915_private *i915 = to_i915(master_crtc->base.dev);
> + u8 pipe_mask = intel_crtc_joined_pipe_mask(old_master_crtc_state);
> + struct intel_crtc *crtc;
> 
>   /*
>* FIXME collapse everything to one hook.
>* Need care with mst->ddi interactions.
>*/
> - if (!intel_crtc_is_bigjoiner_slave(old_master_crtc_state)) {
> - intel_encoders_disable(state, master_crtc);
> - intel_encoders_post_disable(state, master_crtc);
> - }
> -
> - intel_disable_shared_dpll(old_master_crtc_state);
> + intel_encoders_disable(state, master_crtc);
> + intel_encoders_post_disable(state, master_crtc);
> 
> - if (!intel_crtc_is_bigjoiner_slave(old_master_crtc_state)) {
> - struct intel_crtc *slave_crtc;
> + for_each_intel_crtc_in_pipe_mask(>drm, crtc, pipe_mask) {
> + const struct intel_crtc_state *old_crtc_state =
> + intel_atomic_get_old_crtc_state(state, crtc);
> 
> - intel_encoders_post_pll_disable(state, master_crtc);
> + intel_disable_shared_dpll(old_crtc_state);
> + }
> 
> - intel_dmc_disable_pipe(i915, master_crtc->pipe);
> + intel_encoders_post_pll_disable(state, master_crtc);
> 
> - for_each_intel_crtc_in_pipe_mask(>drm, slave_crtc,
> -
> intel_crtc_bigjoiner_slave_pipes(old_master_crtc_state))
> - intel_dmc_disable_pipe(i915, slave_crtc->pipe);
> - }
> + for_each_intel_crtc_in_pipe_mask(>drm, crtc, pipe_mask)
> + intel_dmc_disable_pipe(i915, crtc->pipe);
>  }
> 
>  static void i9xx_pfit_enable(const struct intel_crtc_state *crtc_state) @@ -
> 6753,24 +6751,33 @@ static void intel_update_crtc(struct intel_atomic_state
> *state,  }
> 
>  static void intel_old_crtc_state_disables(struct intel_atomic_state *state,
> -   struct intel_crtc *crtc)
> +   struct intel_crtc *master_crtc)
>  {
>   struct drm_i915_private *dev_priv = to_i915(state->base.dev);
> - const struct intel_crtc_state *new_crtc_state =
> - intel_atomic_get_new_crtc_state(state, crtc);
> + const struct intel_crtc_state *old_master_crtc_state =
> + intel_atomic_get_old_crtc_state(state, master_crtc);
> + u8 pipe_mask = intel_crtc_joined_pipe_mask(old_master_crtc_state);
> + struct intel_crtc *crtc;
> 
>   /*
>* We need to disable pipe CRC before disabling the pipe,
>* or we race against vblank off.
>*/
> - intel_crtc_disable_pipe_crc(crtc);
> + for_each_intel_crtc_in_pipe_mask(_priv->drm, crtc, pipe_mask)
> + intel_crtc_disable_pipe_crc(crtc);
> 
> - dev_priv->display.funcs.display->crtc_disable(state, crtc);
> - crtc->active = false;
> - intel_fbc_disable(crtc);
> + dev_priv->display.funcs.display->crtc_disable(state, master_crtc);
> 
> - if (!new_crtc_state->hw.active)
> - intel_initial_watermarks(state, crtc);
> + for_each_intel_crtc_in_pipe_mask

[PATCH 1/1] drm/i915: Allow bigjoiner for MST

2024-02-28 Thread Vidya Srinivas

We need bigjoiner support with MST functionality
for MST monitor resolutions > 5K to work.
Adding support for the same.

v2: Addressed review comments from Jani.
Revert rejection of MST bigjoiner modes and add
functionality

Signed-off-by: Vidya Srinivas 
---
 drivers/gpu/drm/i915/display/intel_dp_mst.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c 
b/drivers/gpu/drm/i915/display/intel_dp_mst.c
index db1254b036f1..c5e7293c13eb 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@@ -525,6 +525,7 @@ static int intel_dp_mst_compute_config(struct intel_encoder 
*encoder,
 {
struct drm_i915_private *dev_priv = to_i915(encoder->base.dev);
struct intel_atomic_state *state = 
to_intel_atomic_state(conn_state->state);
+   struct intel_crtc *crtc = to_intel_crtc(pipe_config->uapi.crtc);
struct intel_dp_mst_encoder *intel_mst = enc_to_mst(encoder);
struct intel_dp *intel_dp = _mst->primary->dp;
const struct intel_connector *connector =
@@ -542,6 +543,10 @@ static int intel_dp_mst_compute_config(struct 
intel_encoder *encoder,
if (adjusted_mode->flags & DRM_MODE_FLAG_DBLSCAN)
return -EINVAL;
 
+   if (intel_dp_need_bigjoiner(intel_dp, adjusted_mode->crtc_hdisplay,
+   adjusted_mode->crtc_clock))
+   pipe_config->bigjoiner_pipes = GENMASK(crtc->pipe + 1, 
crtc->pipe);
+
pipe_config->sink_format = INTEL_OUTPUT_FORMAT_RGB;
pipe_config->output_format = INTEL_OUTPUT_FORMAT_RGB;
pipe_config->has_pch_encoder = false;
@@ -1330,12 +1335,6 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector 
*connector,
 *   corresponding link capabilities of the sink) in case the
 *   stream is uncompressed for it by the last branch device.
 */
-   if (mode_rate > max_rate || mode->clock > max_dotclk ||
-   drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port->full_pbn) {
-   *status = MODE_CLOCK_HIGH;
-   return 0;
-   }
-
if (mode->clock < 1) {
*status = MODE_CLOCK_LOW;
return 0;
@@ -1349,8 +1348,10 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector 
*connector,
if (intel_dp_need_bigjoiner(intel_dp, mode->hdisplay, target_clock)) {
bigjoiner = true;
max_dotclk *= 2;
+   }
 
-   /* TODO: add support for bigjoiner */
+   if (mode_rate > max_rate || mode->clock > max_dotclk ||
+   drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port->full_pbn) {
*status = MODE_CLOCK_HIGH;
return 0;
}
@@ -1397,7 +1398,7 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector 
*connector,
return 0;
}
 
-   *status = intel_mode_valid_max_plane_size(dev_priv, mode, false);
+   *status = intel_mode_valid_max_plane_size(dev_priv, mode, bigjoiner);
return 0;
 }
 
-- 
2.33.0

[PATCH 0/1] Enable MST bigjoiner

2024-02-28 Thread Vidya Srinivas

Support resolutions > 5k on MST monitors that need bigjoiner
by adding MST bigjoiner functionality

Vidya Srinivas (1):
  drm/i915: Allow bigjoiner for MST

 drivers/gpu/drm/i915/display/intel_dp_mst.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

-- 
2.33.0

[PATCH 1/1] drm/i915: Allow bigjoiner for MST

2024-02-28 Thread Vidya Srinivas

We need bigjoiner support with MST functionality
for MST monitor resolutions > 5K to work.
Adding support for the same.

Signed-off-by: Vidya Srinivas 
---
 drivers/gpu/drm/i915/display/intel_dp_mst.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c 
b/drivers/gpu/drm/i915/display/intel_dp_mst.c
index db1254b036f1..c5e7293c13eb 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@@ -525,6 +525,7 @@ static int intel_dp_mst_compute_config(struct intel_encoder 
*encoder,
 {
struct drm_i915_private *dev_priv = to_i915(encoder->base.dev);
struct intel_atomic_state *state = 
to_intel_atomic_state(conn_state->state);
+   struct intel_crtc *crtc = to_intel_crtc(pipe_config->uapi.crtc);
struct intel_dp_mst_encoder *intel_mst = enc_to_mst(encoder);
struct intel_dp *intel_dp = _mst->primary->dp;
const struct intel_connector *connector =
@@ -542,6 +543,10 @@ static int intel_dp_mst_compute_config(struct 
intel_encoder *encoder,
if (adjusted_mode->flags & DRM_MODE_FLAG_DBLSCAN)
return -EINVAL;
 
+   if (intel_dp_need_bigjoiner(intel_dp, adjusted_mode->crtc_hdisplay,
+   adjusted_mode->crtc_clock))
+   pipe_config->bigjoiner_pipes = GENMASK(crtc->pipe + 1, 
crtc->pipe);
+
pipe_config->sink_format = INTEL_OUTPUT_FORMAT_RGB;
pipe_config->output_format = INTEL_OUTPUT_FORMAT_RGB;
pipe_config->has_pch_encoder = false;
@@ -1330,12 +1335,6 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector 
*connector,
 *   corresponding link capabilities of the sink) in case the
 *   stream is uncompressed for it by the last branch device.
 */
-   if (mode_rate > max_rate || mode->clock > max_dotclk ||
-   drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port->full_pbn) {
-   *status = MODE_CLOCK_HIGH;
-   return 0;
-   }
-
if (mode->clock < 1) {
*status = MODE_CLOCK_LOW;
return 0;
@@ -1349,8 +1348,10 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector 
*connector,
if (intel_dp_need_bigjoiner(intel_dp, mode->hdisplay, target_clock)) {
bigjoiner = true;
max_dotclk *= 2;
+   }
 
-   /* TODO: add support for bigjoiner */
+   if (mode_rate > max_rate || mode->clock > max_dotclk ||
+   drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port->full_pbn) {
*status = MODE_CLOCK_HIGH;
return 0;
}
@@ -1397,7 +1398,7 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector 
*connector,
return 0;
}
 
-   *status = intel_mode_valid_max_plane_size(dev_priv, mode, false);
+   *status = intel_mode_valid_max_plane_size(dev_priv, mode, bigjoiner);
return 0;
 }
 
-- 
2.33.0

[PATCH 0/1] Enable MST bigjoiner

2024-02-28 Thread Vidya Srinivas

Support resolutions > 5k on MST monitors that need bigjoiner
by adding MST bigjoiner functionality

Vidya Srinivas (1):
  drm/i915: Allow bigjoiner for MST

 drivers/gpu/drm/i915/display/intel_dp_mst.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

-- 
2.33.0

RE: [PATCH 1/2] Revert "drm/i915/mst: Reject modes that require the bigjoiner"

2024-02-28 Thread Srinivas, Vidya




> -Original Message-
> From: Jani Nikula 
> Sent: Wednesday, February 28, 2024 2:39 PM
> To: Srinivas, Vidya ; 
> intel-gfx@lists.freedesktop.org
> Cc: Almahallawy, Khaled ; Srinivas, Vidya
> 
> Subject: Re: [PATCH 1/2] Revert "drm/i915/mst: Reject modes that require the
> bigjoiner"
> 
> On Tue, 27 Feb 2024, Vidya Srinivas  wrote:
> > This reverts commit 9c058492b16f90bb772cb0dad567e8acc68e155d.
> >
> > Reverting for adding MST bigjoiner functionality.
> 
> Please squash this together with the fix. Someone might think a revert is a 
> fix
> that needs to be backported. Besides, for bisection this creates a non-working
> commit.

Hello Jani

Thank you very much. Sure, I will squash it together with the fix and submit.

Regards
Vidya

> 
> BR,
> Jani.
> 
> 
> >
> > Signed-off-by: Vidya Srinivas 
> > ---
> >  drivers/gpu/drm/i915/display/intel_dp_mst.c | 4 
> >  1 file changed, 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > index db1254b036f1..b062f4ee6c8b 100644
> > --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > @@ -1349,10 +1349,6 @@ intel_dp_mst_mode_valid_ctx(struct
> drm_connector *connector,
> > if (intel_dp_need_bigjoiner(intel_dp, mode->hdisplay, target_clock)) {
> > bigjoiner = true;
> > max_dotclk *= 2;
> > -
> > -   /* TODO: add support for bigjoiner */
> > -   *status = MODE_CLOCK_HIGH;
> > -   return 0;
> > }
> >
> > if (DISPLAY_VER(dev_priv) >= 10 &&
> 
> --
> Jani Nikula, Intel

RE: [PATCH 3/3] drm/i915: Fix bigjoiner case for DP2.0

2024-02-27 Thread Srinivas, Vidya




> -Original Message-
> From: Lisovskiy, Stanislav 
> Sent: Tuesday, February 27, 2024 2:44 PM
> To: Srinivas, Vidya 
> Cc: Jani Nikula ; 
> intel-gfx@lists.freedesktop.org;
> Saarinen, Jani ; ville.syrj...@linux.intel.com
> Subject: Re: [PATCH 3/3] drm/i915: Fix bigjoiner case for DP2.0
> 
> On Tue, Feb 27, 2024 at 11:06:16AM +0200, Srinivas, Vidya wrote:
> >
> >
> > > -Original Message-
> > > From: Lisovskiy, Stanislav 
> > > Sent: Tuesday, February 27, 2024 2:34 PM
> > > To: Jani Nikula 
> > > Cc: intel-gfx@lists.freedesktop.org; Saarinen, Jani
> > > ; ville.syrj...@linux.intel.com; Srinivas,
> > > Vidya 
> > > Subject: Re: [PATCH 3/3] drm/i915: Fix bigjoiner case for DP2.0
> > >
> > > On Mon, Feb 26, 2024 at 09:56:10PM +0200, Jani Nikula wrote:
> > > > On Wed, 21 Feb 2024, Stanislav Lisovskiy
> > > > 
> > > wrote:
> > > > > Patch calculates bigjoiner pipes in mst compute.
> > > > > Patch also passes bigjoiner bool to validate plane max size.
> > > >
> > > > Please use the imperative mood in commit messages, e.g. "calculate"
> > > > intead of "calculates".
> > > >
> > > > Please do not refer to "patch". We know it's a patch, until it
> > > > isn't, and then it's a commit.
> > > >
> > > > Please explain *why* the changes are being done, not just *what*
> > > > is being done.
> > > >
> > > > In the subject, what is "bigjoiner case for DP2.0"? DP 2.0 is a
> > > > spec version, and as such irrelevant for the changes being done.
> > > >
> > > > > Signed-off-by: vsrini4 
> > > >
> > > > ?
> > >
> > > Hi Jani, I just added that patch from Vidya to my series, to be
> > > honest, didn't have time at all to look much into it.
> > > Looks like its me who is going to fix that.
> >
> > Hello Stan
> > My sincere apologies. I dint want to disturb your series, so I did not fix 
> > it.
> > Please let me know if I should fix it. Sorry again.
> > Thank you Jani for the comments.
> >
> > Regards
> > Vidya
> 
> Hi Vidya,
> 
> it is a bit unclear for me as well now, how do we proceed, since your patch is
> part of my series, I was explicitly asked to add it, does it mean you are 
> fixing it
> now or me?
> Well if you address Jani's comments, I definitely dont mind :)

Hello Stan
Thank you so much. Just so that I don't disturb your series,
I have pushed this series https://patchwork.freedesktop.org/series/130449/
After addressing comments from Jani Nikula.

Many thanks Jani for the review
and apologies for the commit message errors. Kindly help check if this series
is okay. Thank you.

Regards
Vidya
 
> 
> > >
> > > >
> > > > > Signed-off-by: Stanislav Lisovskiy
> > > > > 
> > > > > ---
> > > > >  drivers/gpu/drm/i915/display/intel_dp_mst.c | 19
> > > > > ---
> > > > >  1 file changed, 12 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > > > b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > > > index 5307ddd4edcf5..fd27d9976c050 100644
> > > > > --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > > > +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > > > @@ -523,6 +523,7 @@ static int
> > > > > intel_dp_mst_compute_config(struct
> > > intel_encoder *encoder,
> > > > >  struct drm_connector_state 
> > > > > *conn_state)
> > > {
> > > > >   struct drm_i915_private *dev_priv =
> > > > > to_i915(encoder->base.dev);
> > > > > + struct intel_crtc *crtc =
> > > > > +to_intel_crtc(pipe_config->uapi.crtc);
> > > > >   struct intel_dp_mst_encoder *intel_mst = enc_to_mst(encoder);
> > > > >   struct intel_dp *intel_dp = _mst->primary->dp;
> > > > >   const struct intel_connector *connector = @@ -540,6 +541,10 @@
> > > > > static int intel_dp_mst_compute_config(struct intel_encoder *encoder,
> > > > >   if (adjusted_mode->flags & DRM_MODE_FLAG_DBLSCAN)
> > > > >   return -EINVAL;
> > > > >
> > > > > + if (intel_dp_need_bi

RE: [PATCH 3/3] drm/i915: Fix bigjoiner case for DP2.0

2024-02-27 Thread Srinivas, Vidya



> -Original Message-
> From: Manasi Navare 
> Sent: Tuesday, February 27, 2024 11:37 PM
> To: Jani Nikula 
> Cc: Lisovskiy, Stanislav ; intel-
> g...@lists.freedesktop.org; Saarinen, Jani ;
> ville.syrj...@linux.intel.com; Srinivas, Vidya 
> Subject: Re: [PATCH 3/3] drm/i915: Fix bigjoiner case for DP2.0
> 
> Thanks Jani for your review.
> Thanks @Lisovskiy, Stanislav  and @vidya.srini...@intel.com for taking this
> patch forward.
> 
> @Jani Nikula , @Ville Syrjälä : MST bigjoiner as a feature needs to be enabled
> upstream and this patch enables that feature.
> If you agree that bigjoiner refactoring patches 1 and 2 have no impact on
> enabling bigjoiner on MST, could we decouple this patch from bigjoiner
> refactoring and land this separately?

Hello Manasi

Thank you.
I have submitted this series as suggested after addressing comments
from Jani Nikula about the commit message errors.
https://patchwork.freedesktop.org/series/130449/

Regards
Vidya

> 
> We need the Bigjoiner to be enabled on MST feature landed asap and
> bigjoiner refactoring can follow.
> 
> Regards
> Manasi
> 
> On Tue, Feb 27, 2024 at 1:15 AM Jani Nikula 
> wrote:
> >
> > On Tue, 27 Feb 2024, "Lisovskiy, Stanislav" 
> wrote:
> > > On Mon, Feb 26, 2024 at 09:56:10PM +0200, Jani Nikula wrote:
> > >> On Wed, 21 Feb 2024, Stanislav Lisovskiy 
> wrote:
> > >> > Patch calculates bigjoiner pipes in mst compute.
> > >> > Patch also passes bigjoiner bool to validate plane max size.
> > >>
> > >> Please use the imperative mood in commit messages, e.g. "calculate"
> > >> intead of "calculates".
> > >>
> > >> Please do not refer to "patch". We know it's a patch, until it
> > >> isn't, and then it's a commit.
> > >>
> > >> Please explain *why* the changes are being done, not just *what* is
> > >> being done.
> > >>
> > >> In the subject, what is "bigjoiner case for DP2.0"? DP 2.0 is a
> > >> spec version, and as such irrelevant for the changes being done.
> > >>
> > >> > Signed-off-by: vsrini4 
> > >>
> > >> ?
> > >
> > > Hi Jani, I just added that patch from Vidya to my series, to be
> > > honest, didn't have time at all to look much into it.
> > > Looks like its me who is going to fix that.
> >
> > Should the original authorship be preserved? If not, please add
> > Co-developed-by. Just having the Signed-off-by is not enough.
> >
> > BR,
> > Jani.
> >
> >
> > >
> > >>
> > >> > Signed-off-by: Stanislav Lisovskiy
> > >> > 
> > >> > ---
> > >> >  drivers/gpu/drm/i915/display/intel_dp_mst.c | 19
> > >> > ---
> > >> >  1 file changed, 12 insertions(+), 7 deletions(-)
> > >> >
> > >> > diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > >> > b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > >> > index 5307ddd4edcf5..fd27d9976c050 100644
> > >> > --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > >> > +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > >> > @@ -523,6 +523,7 @@ static int intel_dp_mst_compute_config(struct
> intel_encoder *encoder,
> > >> >   struct drm_connector_state
> > >> > *conn_state)  {
> > >> >struct drm_i915_private *dev_priv =
> > >> > to_i915(encoder->base.dev);
> > >> > +  struct intel_crtc *crtc =
> > >> > + to_intel_crtc(pipe_config->uapi.crtc);
> > >> >struct intel_dp_mst_encoder *intel_mst = enc_to_mst(encoder);
> > >> >struct intel_dp *intel_dp = _mst->primary->dp;
> > >> >const struct intel_connector *connector = @@ -540,6 +541,10 @@
> > >> > static int intel_dp_mst_compute_config(struct intel_encoder *encoder,
> > >> >if (adjusted_mode->flags & DRM_MODE_FLAG_DBLSCAN)
> > >> >return -EINVAL;
> > >> >
> > >> > +  if (intel_dp_need_bigjoiner(intel_dp, adjusted_mode->crtc_hdisplay,
> > >> > +  adjusted_mode->crtc_clock))
> > >> > +  pipe_config->bigjoiner_pipes = GENMASK(crtc->pipe + 1,
> > >> > + crtc->pipe);
> > >> > +
> > >> >pipe_config->sink_format = INTEL_OUTPUT_FORMAT_RGB

[PATCH 2/2] drm/i915: Allow bigjoiner for MST

2024-02-27 Thread Vidya Srinivas

We need bigjoiner support with MST functionality
for MST monitor resolutions > 5K to work.
Adding support for the same.

Signed-off-by: Vidya Srinivas 
---
 drivers/gpu/drm/i915/display/intel_dp_mst.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c 
b/drivers/gpu/drm/i915/display/intel_dp_mst.c
index b062f4ee6c8b..c5e7293c13eb 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@@ -525,6 +525,7 @@ static int intel_dp_mst_compute_config(struct intel_encoder 
*encoder,
 {
struct drm_i915_private *dev_priv = to_i915(encoder->base.dev);
struct intel_atomic_state *state = 
to_intel_atomic_state(conn_state->state);
+   struct intel_crtc *crtc = to_intel_crtc(pipe_config->uapi.crtc);
struct intel_dp_mst_encoder *intel_mst = enc_to_mst(encoder);
struct intel_dp *intel_dp = _mst->primary->dp;
const struct intel_connector *connector =
@@ -542,6 +543,10 @@ static int intel_dp_mst_compute_config(struct 
intel_encoder *encoder,
if (adjusted_mode->flags & DRM_MODE_FLAG_DBLSCAN)
return -EINVAL;
 
+   if (intel_dp_need_bigjoiner(intel_dp, adjusted_mode->crtc_hdisplay,
+   adjusted_mode->crtc_clock))
+   pipe_config->bigjoiner_pipes = GENMASK(crtc->pipe + 1, 
crtc->pipe);
+
pipe_config->sink_format = INTEL_OUTPUT_FORMAT_RGB;
pipe_config->output_format = INTEL_OUTPUT_FORMAT_RGB;
pipe_config->has_pch_encoder = false;
@@ -1330,12 +1335,6 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector 
*connector,
 *   corresponding link capabilities of the sink) in case the
 *   stream is uncompressed for it by the last branch device.
 */
-   if (mode_rate > max_rate || mode->clock > max_dotclk ||
-   drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port->full_pbn) {
-   *status = MODE_CLOCK_HIGH;
-   return 0;
-   }
-
if (mode->clock < 1) {
*status = MODE_CLOCK_LOW;
return 0;
@@ -1351,6 +1350,12 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector 
*connector,
max_dotclk *= 2;
}
 
+   if (mode_rate > max_rate || mode->clock > max_dotclk ||
+   drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port->full_pbn) {
+   *status = MODE_CLOCK_HIGH;
+   return 0;
+   }
+
if (DISPLAY_VER(dev_priv) >= 10 &&
drm_dp_sink_supports_dsc(intel_connector->dp.dsc_dpcd)) {
/*
@@ -1393,7 +1398,7 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector 
*connector,
return 0;
}
 
-   *status = intel_mode_valid_max_plane_size(dev_priv, mode, false);
+   *status = intel_mode_valid_max_plane_size(dev_priv, mode, bigjoiner);
return 0;
 }
 
-- 
2.33.0

[PATCH 1/2] Revert "drm/i915/mst: Reject modes that require the bigjoiner"

2024-02-27 Thread Vidya Srinivas

This reverts commit 9c058492b16f90bb772cb0dad567e8acc68e155d.

Reverting for adding MST bigjoiner functionality.

Signed-off-by: Vidya Srinivas 
---
 drivers/gpu/drm/i915/display/intel_dp_mst.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c 
b/drivers/gpu/drm/i915/display/intel_dp_mst.c
index db1254b036f1..b062f4ee6c8b 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@@ -1349,10 +1349,6 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector 
*connector,
if (intel_dp_need_bigjoiner(intel_dp, mode->hdisplay, target_clock)) {
bigjoiner = true;
max_dotclk *= 2;
-
-   /* TODO: add support for bigjoiner */
-   *status = MODE_CLOCK_HIGH;
-   return 0;
}
 
if (DISPLAY_VER(dev_priv) >= 10 &&
-- 
2.33.0

[PATCH 0/2] Enable MST bigjoiner

2024-02-27 Thread Vidya Srinivas

Series reverts rejection of modes on MST monitors that need bigjoiner
and adds MST bigjoiner functionality

Vidya Srinivas (2):
  Revert "drm/i915/mst: Reject modes that require the bigjoiner"
  drm/i915: Allow bigjoiner for MST

 drivers/gpu/drm/i915/display/intel_dp_mst.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

-- 
2.33.0

RE: [PATCH 2/3] Start separating pipe vs transcoder set logic for bigjoiner during modeset

2024-02-27 Thread Srinivas, Vidya




> -Original Message-
> From: Lisovskiy, Stanislav 
> Sent: Tuesday, February 27, 2024 2:41 PM
> To: Srinivas, Vidya 
> Cc: intel-gfx@lists.freedesktop.org; Saarinen, Jani ;
> ville.syrj...@linux.intel.com
> Subject: Re: [PATCH 2/3] Start separating pipe vs transcoder set logic for
> bigjoiner during modeset
> 
> On Tue, Feb 27, 2024 at 06:40:23AM +0200, Srinivas, Vidya wrote:
> >
> >
> > > -Original Message-
> > > From: Lisovskiy, Stanislav 
> > > Sent: Thursday, February 22, 2024 12:50 AM
> > > To: intel-gfx@lists.freedesktop.org
> > > Cc: Lisovskiy, Stanislav ; Saarinen,
> > > Jani ; ville.syrj...@linux.intel.com;
> > > Srinivas, Vidya 
> > > Subject: [PATCH 2/3] Start separating pipe vs transcoder set logic
> > > for bigjoiner during modeset
> > >
> > > Handle only bigjoiner masters in
> > > skl_commit_modeset_enables/disables,
> > > slave crtcs should be handled by master hooks. Same for encoders.
> > > That way we can also remove a bunch of checks like
> > > intel_crtc_is_bigjoiner_slave.
> > >
> > > v2: Get rid of master vs slave checks and separation in crtc
> > > enable/disable hooks.
> > > Use unified iteration cycle for all of those, while enabling/disabling
> > > transcoder only for those pipes where its needed(Ville Syrjälä)
> > >
> > > v3: Move all the intel_encoder_* calls under transcoder code
> > > path(Ville
> > > Syrjälä)
> > >
> > > v4:  - Call intel_crtc_vblank_on from hsw_crtc_enable only for
> > > non-transcoder path
> > >(for master pipe that will be called from
> > > intel_encoders_enable/intel_enable_ddi)
> > >  - Fix stupid mistake with using crtc->pipe for the mask,
> > > instead of BIT(crtc-
> > > >pipe)
> > >
> > > Signed-off-by: Stanislav Lisovskiy 
> > > ---
> > >  drivers/gpu/drm/i915/display/intel_ddi.c |  21 +--
> > >  drivers/gpu/drm/i915/display/intel_display.c | 183 ---
> > >  drivers/gpu/drm/i915/display/intel_display.h |   6 +
> > >  3 files changed, 121 insertions(+), 89 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c
> > > b/drivers/gpu/drm/i915/display/intel_ddi.c
> > > index bea4415902044..6071e9f500871 100644
> > > --- a/drivers/gpu/drm/i915/display/intel_ddi.c
> > > +++ b/drivers/gpu/drm/i915/display/intel_ddi.c
> > > @@ -3100,7 +3100,6 @@ static void intel_ddi_post_disable(struct
> > > intel_atomic_state *state,
> > >  const struct drm_connector_state
> > > *old_conn_state)  {
> > >   struct drm_i915_private *dev_priv = to_i915(encoder->base.dev);
> > > - struct intel_crtc *slave_crtc;
> > >
> > >   if (!intel_crtc_has_type(old_crtc_state, INTEL_OUTPUT_DP_MST)) {
> > >   intel_crtc_vblank_off(old_crtc_state);
> > > @@ -3117,17 +3116,6 @@ static void intel_ddi_post_disable(struct
> > > intel_atomic_state *state,
> > >   ilk_pfit_disable(old_crtc_state);
> > >   }
> > >
> > > - for_each_intel_crtc_in_pipe_mask(_priv->drm, slave_crtc,
> > > -
> > > intel_crtc_bigjoiner_slave_pipes(old_crtc_state)) {
> > > - const struct intel_crtc_state *old_slave_crtc_state =
> > > - intel_atomic_get_old_crtc_state(state, slave_crtc);
> > > -
> > > - intel_crtc_vblank_off(old_slave_crtc_state);
> > > -
> > > - intel_dsc_disable(old_slave_crtc_state);
> > > - skl_scaler_disable(old_slave_crtc_state);
> > > - }
> > > -
> > >   /*
> > >* When called from DP MST code:
> > >* - old_conn_state will be NULL
> > > @@ -3363,8 +3351,7 @@ static void intel_enable_ddi(struct
> > > intel_atomic_state *state,  {
> > >   drm_WARN_ON(state->base.dev, crtc_state->has_pch_encoder);
> > >
> > > - if (!intel_crtc_is_bigjoiner_slave(crtc_state))
> > > - intel_ddi_enable_transcoder_func(encoder, crtc_state);
> > > + intel_ddi_enable_transcoder_func(encoder, crtc_state);
> > >
> > >   /* Enable/Disable DP2.0 SDP split config before transcoder */
> > >   intel_audio_sdp_split_update(crtc_state);
> > > @@ -3469,9 +3456,6 @@ void intel_ddi_update_active_dpll(struct
> > > intel_atomic_state *state,
> > > struct intel_crtc *crtc)
> > >

RE: [PATCH 3/3] drm/i915: Fix bigjoiner case for DP2.0

2024-02-27 Thread Srinivas, Vidya




> -Original Message-
> From: Lisovskiy, Stanislav 
> Sent: Tuesday, February 27, 2024 2:34 PM
> To: Jani Nikula 
> Cc: intel-gfx@lists.freedesktop.org; Saarinen, Jani ;
> ville.syrj...@linux.intel.com; Srinivas, Vidya 
> Subject: Re: [PATCH 3/3] drm/i915: Fix bigjoiner case for DP2.0
> 
> On Mon, Feb 26, 2024 at 09:56:10PM +0200, Jani Nikula wrote:
> > On Wed, 21 Feb 2024, Stanislav Lisovskiy 
> wrote:
> > > Patch calculates bigjoiner pipes in mst compute.
> > > Patch also passes bigjoiner bool to validate plane max size.
> >
> > Please use the imperative mood in commit messages, e.g. "calculate"
> > intead of "calculates".
> >
> > Please do not refer to "patch". We know it's a patch, until it isn't,
> > and then it's a commit.
> >
> > Please explain *why* the changes are being done, not just *what* is
> > being done.
> >
> > In the subject, what is "bigjoiner case for DP2.0"? DP 2.0 is a spec
> > version, and as such irrelevant for the changes being done.
> >
> > > Signed-off-by: vsrini4 
> >
> > ?
> 
> Hi Jani, I just added that patch from Vidya to my series, to be honest, didn't
> have time at all to look much into it.
> Looks like its me who is going to fix that.

Hello Stan
My sincere apologies. I dint want to disturb your series, so I did not fix it.
Please let me know if I should fix it. Sorry again.
Thank you Jani for the comments.

Regards
Vidya
> 
> >
> > > Signed-off-by: Stanislav Lisovskiy 
> > > ---
> > >  drivers/gpu/drm/i915/display/intel_dp_mst.c | 19
> > > ---
> > >  1 file changed, 12 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > index 5307ddd4edcf5..fd27d9976c050 100644
> > > --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
> > > @@ -523,6 +523,7 @@ static int intel_dp_mst_compute_config(struct
> intel_encoder *encoder,
> > >  struct drm_connector_state *conn_state)
> {
> > >   struct drm_i915_private *dev_priv = to_i915(encoder->base.dev);
> > > + struct intel_crtc *crtc = to_intel_crtc(pipe_config->uapi.crtc);
> > >   struct intel_dp_mst_encoder *intel_mst = enc_to_mst(encoder);
> > >   struct intel_dp *intel_dp = _mst->primary->dp;
> > >   const struct intel_connector *connector = @@ -540,6 +541,10 @@
> > > static int intel_dp_mst_compute_config(struct intel_encoder *encoder,
> > >   if (adjusted_mode->flags & DRM_MODE_FLAG_DBLSCAN)
> > >   return -EINVAL;
> > >
> > > + if (intel_dp_need_bigjoiner(intel_dp, adjusted_mode->crtc_hdisplay,
> > > + adjusted_mode->crtc_clock))
> > > + pipe_config->bigjoiner_pipes = GENMASK(crtc->pipe + 1,
> > > +crtc->pipe);
> > > +
> > >   pipe_config->sink_format = INTEL_OUTPUT_FORMAT_RGB;
> > >   pipe_config->output_format = INTEL_OUTPUT_FORMAT_RGB;
> > >   pipe_config->has_pch_encoder = false; @@ -1318,12 +1323,6 @@
> > > intel_dp_mst_mode_valid_ctx(struct drm_connector *connector,
> > >*   corresponding link capabilities of the sink) in case the
> > >*   stream is uncompressed for it by the last branch device.
> > >*/
> > > - if (mode_rate > max_rate || mode->clock > max_dotclk ||
> > > - drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port-
> >full_pbn) {
> > > - *status = MODE_CLOCK_HIGH;
> > > - return 0;
> > > - }
> > > -
> > >   if (mode->clock < 1) {
> > >   *status = MODE_CLOCK_LOW;
> > >   return 0;
> > > @@ -1343,6 +1342,12 @@ intel_dp_mst_mode_valid_ctx(struct
> drm_connector *connector,
> > >   return 0;
> > >   }
> > >
> > > + if (mode_rate > max_rate || mode->clock > max_dotclk ||
> > > + drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port-
> >full_pbn) {
> > > + *status = MODE_CLOCK_HIGH;
> > > + return 0;
> > > + }
> > > +
> > >   if (DISPLAY_VER(dev_priv) >= 10 &&
> > >   drm_dp_sink_supports_dsc(intel_connector->dp.dsc_dpcd)) {
> > >   /*
> > > @@ -1385,7 +1390,7 @@ intel_dp_mst_mode_valid_ctx(struct
> drm_connector *connector,
> > >   return 0;
> > >   }
> > >
> > > - *status = intel_mode_valid_max_plane_size(dev_priv, mode, false);
> > > + *status = intel_mode_valid_max_plane_size(dev_priv, mode,
> > > +bigjoiner);
> > >   return 0;
> > >  }
> >
> > --
> > Jani Nikula, Intel

RE: [PATCH 2/3] Start separating pipe vs transcoder set logic for bigjoiner during modeset

2024-02-26 Thread Srinivas, Vidya



> -Original Message-
> From: Intel-gfx  On Behalf Of
> Srinivas, Vidya
> Sent: Tuesday, February 27, 2024 10:10 AM
> To: Lisovskiy, Stanislav ; intel-
> g...@lists.freedesktop.org
> Cc: Saarinen, Jani ; ville.syrj...@linux.intel.com
> Subject: RE: [PATCH 2/3] Start separating pipe vs transcoder set logic for
> bigjoiner during modeset
> 
> 
> 
> > -Original Message-
> > From: Lisovskiy, Stanislav 
> > Sent: Thursday, February 22, 2024 12:50 AM
> > To: intel-gfx@lists.freedesktop.org
> > Cc: Lisovskiy, Stanislav ; Saarinen,
> > Jani ; ville.syrj...@linux.intel.com;
> > Srinivas, Vidya 
> > Subject: [PATCH 2/3] Start separating pipe vs transcoder set logic for
> > bigjoiner during modeset
> >
> > Handle only bigjoiner masters in skl_commit_modeset_enables/disables,
> > slave crtcs should be handled by master hooks. Same for encoders.
> > That way we can also remove a bunch of checks like
> > intel_crtc_is_bigjoiner_slave.
> >
> > v2: Get rid of master vs slave checks and separation in crtc
> > enable/disable hooks.
> > Use unified iteration cycle for all of those, while enabling/disabling
> > transcoder only for those pipes where its needed(Ville Syrjälä)
> >
> > v3: Move all the intel_encoder_* calls under transcoder code
> > path(Ville
> > Syrjälä)
> >
> > v4:  - Call intel_crtc_vblank_on from hsw_crtc_enable only for
> > non-transcoder path
> >(for master pipe that will be called from
> > intel_encoders_enable/intel_enable_ddi)
> >  - Fix stupid mistake with using crtc->pipe for the mask, instead
> > of BIT(crtc-
> > >pipe)
> >
> > Signed-off-by: Stanislav Lisovskiy 
> > ---
> >  drivers/gpu/drm/i915/display/intel_ddi.c |  21 +--
> >  drivers/gpu/drm/i915/display/intel_display.c | 183 ---
> >  drivers/gpu/drm/i915/display/intel_display.h |   6 +
> >  3 files changed, 121 insertions(+), 89 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c
> > b/drivers/gpu/drm/i915/display/intel_ddi.c
> > index bea4415902044..6071e9f500871 100644
> > --- a/drivers/gpu/drm/i915/display/intel_ddi.c
> > +++ b/drivers/gpu/drm/i915/display/intel_ddi.c
> > @@ -3100,7 +3100,6 @@ static void intel_ddi_post_disable(struct
> > intel_atomic_state *state,
> >const struct drm_connector_state
> > *old_conn_state)  {
> > struct drm_i915_private *dev_priv = to_i915(encoder->base.dev);
> > -   struct intel_crtc *slave_crtc;
> >
> > if (!intel_crtc_has_type(old_crtc_state, INTEL_OUTPUT_DP_MST)) {
> > intel_crtc_vblank_off(old_crtc_state);
> > @@ -3117,17 +3116,6 @@ static void intel_ddi_post_disable(struct
> > intel_atomic_state *state,
> > ilk_pfit_disable(old_crtc_state);
> > }
> >
> > -   for_each_intel_crtc_in_pipe_mask(_priv->drm, slave_crtc,
> > -
> > intel_crtc_bigjoiner_slave_pipes(old_crtc_state)) {
> > -   const struct intel_crtc_state *old_slave_crtc_state =
> > -   intel_atomic_get_old_crtc_state(state, slave_crtc);
> > -
> > -   intel_crtc_vblank_off(old_slave_crtc_state);
> > -
> > -   intel_dsc_disable(old_slave_crtc_state);
> > -   skl_scaler_disable(old_slave_crtc_state);
> > -   }
> > -
> > /*
> >  * When called from DP MST code:
> >  * - old_conn_state will be NULL
> > @@ -3363,8 +3351,7 @@ static void intel_enable_ddi(struct
> > intel_atomic_state *state,  {
> > drm_WARN_ON(state->base.dev, crtc_state->has_pch_encoder);
> >
> > -   if (!intel_crtc_is_bigjoiner_slave(crtc_state))
> > -   intel_ddi_enable_transcoder_func(encoder, crtc_state);
> > +   intel_ddi_enable_transcoder_func(encoder, crtc_state);
> >
> > /* Enable/Disable DP2.0 SDP split config before transcoder */
> > intel_audio_sdp_split_update(crtc_state);
> > @@ -3469,9 +3456,6 @@ void intel_ddi_update_active_dpll(struct
> > intel_atomic_state *state,
> >   struct intel_crtc *crtc)
> >  {
> > struct drm_i915_private *i915 = to_i915(encoder->base.dev);
> > -   struct intel_crtc_state *crtc_state =
> > -   intel_atomic_get_new_crtc_state(state, crtc);
> > -   struct intel_crtc *slave_crtc;
> > enum phy phy = intel_port_to_phy(i915, encoder->port);
> >
> > /* FIXME: Add MTL pll_mgr */
> > @@ -3479,9 +3463,6 @@ void intel_ddi_update_active_dpll(struct
> &

RE: [PATCH 2/3] Start separating pipe vs transcoder set logic for bigjoiner during modeset

2024-02-26 Thread Srinivas, Vidya



> -Original Message-
> From: Lisovskiy, Stanislav 
> Sent: Thursday, February 22, 2024 12:50 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: Lisovskiy, Stanislav ; Saarinen, Jani
> ; ville.syrj...@linux.intel.com; Srinivas, Vidya
> 
> Subject: [PATCH 2/3] Start separating pipe vs transcoder set logic for 
> bigjoiner
> during modeset
> 
> Handle only bigjoiner masters in skl_commit_modeset_enables/disables,
> slave crtcs should be handled by master hooks. Same for encoders.
> That way we can also remove a bunch of checks like
> intel_crtc_is_bigjoiner_slave.
> 
> v2: Get rid of master vs slave checks and separation in crtc enable/disable
> hooks.
> Use unified iteration cycle for all of those, while enabling/disabling
> transcoder only for those pipes where its needed(Ville Syrjälä)
> 
> v3: Move all the intel_encoder_* calls under transcoder code path(Ville
> Syrjälä)
> 
> v4:  - Call intel_crtc_vblank_on from hsw_crtc_enable only for non-transcoder
> path
>(for master pipe that will be called from
> intel_encoders_enable/intel_enable_ddi)
>  - Fix stupid mistake with using crtc->pipe for the mask, instead of 
> BIT(crtc-
> >pipe)
> 
> Signed-off-by: Stanislav Lisovskiy 
> ---
>  drivers/gpu/drm/i915/display/intel_ddi.c |  21 +--
>  drivers/gpu/drm/i915/display/intel_display.c | 183 ---
>  drivers/gpu/drm/i915/display/intel_display.h |   6 +
>  3 files changed, 121 insertions(+), 89 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c
> b/drivers/gpu/drm/i915/display/intel_ddi.c
> index bea4415902044..6071e9f500871 100644
> --- a/drivers/gpu/drm/i915/display/intel_ddi.c
> +++ b/drivers/gpu/drm/i915/display/intel_ddi.c
> @@ -3100,7 +3100,6 @@ static void intel_ddi_post_disable(struct
> intel_atomic_state *state,
>  const struct drm_connector_state
> *old_conn_state)  {
>   struct drm_i915_private *dev_priv = to_i915(encoder->base.dev);
> - struct intel_crtc *slave_crtc;
> 
>   if (!intel_crtc_has_type(old_crtc_state, INTEL_OUTPUT_DP_MST)) {
>   intel_crtc_vblank_off(old_crtc_state);
> @@ -3117,17 +3116,6 @@ static void intel_ddi_post_disable(struct
> intel_atomic_state *state,
>   ilk_pfit_disable(old_crtc_state);
>   }
> 
> - for_each_intel_crtc_in_pipe_mask(_priv->drm, slave_crtc,
> -
> intel_crtc_bigjoiner_slave_pipes(old_crtc_state)) {
> - const struct intel_crtc_state *old_slave_crtc_state =
> - intel_atomic_get_old_crtc_state(state, slave_crtc);
> -
> - intel_crtc_vblank_off(old_slave_crtc_state);
> -
> - intel_dsc_disable(old_slave_crtc_state);
> - skl_scaler_disable(old_slave_crtc_state);
> - }
> -
>   /*
>* When called from DP MST code:
>* - old_conn_state will be NULL
> @@ -3363,8 +3351,7 @@ static void intel_enable_ddi(struct
> intel_atomic_state *state,  {
>   drm_WARN_ON(state->base.dev, crtc_state->has_pch_encoder);
> 
> - if (!intel_crtc_is_bigjoiner_slave(crtc_state))
> - intel_ddi_enable_transcoder_func(encoder, crtc_state);
> + intel_ddi_enable_transcoder_func(encoder, crtc_state);
> 
>   /* Enable/Disable DP2.0 SDP split config before transcoder */
>   intel_audio_sdp_split_update(crtc_state);
> @@ -3469,9 +3456,6 @@ void intel_ddi_update_active_dpll(struct
> intel_atomic_state *state,
> struct intel_crtc *crtc)
>  {
>   struct drm_i915_private *i915 = to_i915(encoder->base.dev);
> - struct intel_crtc_state *crtc_state =
> - intel_atomic_get_new_crtc_state(state, crtc);
> - struct intel_crtc *slave_crtc;
>   enum phy phy = intel_port_to_phy(i915, encoder->port);
> 
>   /* FIXME: Add MTL pll_mgr */
> @@ -3479,9 +3463,6 @@ void intel_ddi_update_active_dpll(struct
> intel_atomic_state *state,
>   return;
> 
>   intel_update_active_dpll(state, crtc, encoder);
> - for_each_intel_crtc_in_pipe_mask(>drm, slave_crtc,
> -
> intel_crtc_bigjoiner_slave_pipes(crtc_state))
> - intel_update_active_dpll(state, slave_crtc, encoder);
>  }
> 
>  static void
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c
> b/drivers/gpu/drm/i915/display/intel_display.c
> index 916c13a149fd5..e1ea53fd6a288 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -1631,31 +1631,12 @@ static void hsw_configure_cpu_transcoder(const
> struct intel_crtc_state *crtc_sta
>   hsw_set_transconf(crtc_state);
>  }
> 
>

Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v9]

2024-02-23 Thread Y . Srinivas Ramakrishna

On Thu, 22 Feb 2024 23:43:41 GMT, Brent Christian  wrote:

>> Thanks for finding my misspelling, djelinski. 
>
> The use of "(un)successful(ly)" in relation to `Reference.enqueue()` is quite 
> deliberate (and builds on the previous wording, "successful").
> 
> The intention was to use it consistently (is that not the case somewhere?). 
> For example, it's also used in the new **Memory Consistency Properties** 
> section of the `java.lang.ref` package docs ("The enqueueing of a 
> reference...by a successful call to `Reference.enqueue()`...").
> 
> A "successful call to `enqueue()`" is meant to be shorthand for:
> "the reference has been enqueued, and the enqueuing was performed by the 
> `enqueue()` method (rather than by the garbage collector). Therefore there is 
> a _happens-before_ edge between the `enqueue()` method call and the dequeuing 
> of the Reference (whereas there would not be this _happens-before_ if the GC 
> had already enqueued the Reference at the time of the `enqueue()` call)."
> 
> The text emphasis with italics is to indicate this added significance of the 
> result of the `enqueue()` call -- ala `happens-before`.
> 
> I'm not aware of a similar scenario covered in the JLS, so AFAIK there is not 
> precedent to be consistent with in that regard.

Sounds good, thanks!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1501127639

Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v9]

2024-02-22 Thread Y . Srinivas Ramakrishna

On Mon, 27 Nov 2023 22:41:25 GMT, Hans Boehm  wrote:

>> Brent Christian has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Cleaner thread dequeue happens-before running cleaning action
>
> src/java.base/share/classes/java/lang/ref/Reference.java line 568:
> 
>> 566:  *
>> 567:  * @apiNote
>> 568:  * Reference processing or finalization may occur whenever the 
>> virtual machine detects that no
> 
> How about "detects that all needed data from the object is available 
> elsewhere, and no reference to that object will ever be stored ..." Otherwise 
> this seems needlessly mysterious to me.

I find the additional suggested "detects that all needed data from the object 
is available elsewhere" more mysterious and confusing. The current wording 
seems clearer, as it sets the scene for, and motivates, when and why the 
`rechabilityFence()` might be needed or used.

I may be missing the significance of the suggested "all needed data from the 
object is available elsewhere" at this point in the description.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1499668692

Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v9]

2024-02-22 Thread Y . Srinivas Ramakrishna

On Thu, 22 Feb 2024 01:42:17 GMT, Brent Christian  wrote:

>> Classes in the `java.lang.ref` package would benefit from an update to bring 
>> the spec in line with how the VM already behaves. The changes would focus on 
>> _happens-before_ edges at some key points during reference processing.
>> 
>> A couple key things we want to be able to say are:
>> - `Reference.reachabilityFence(x)` _happens-before_ reference processing 
>> occurs for 'x'.
>> - `Cleaner.register()` _happens-before_ the Cleaner thread runs the 
>> registered cleaning action.
>> 
>> This will bring Cleaner in line (or close) with the memory visibility 
>> guarantees made for finalizers in [JLS 
>> 17.4.5](https://docs.oracle.com/javase/specs/jls/se18/html/jls-17.html#jls-17.4.5):
>> _"There is a happens-before edge from the end of a constructor of an object 
>> to the start of a finalizer (§12.6) for that object."_
>
> Brent Christian has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Cleaner thread dequeue happens-before running cleaning action

Looks good; just some casual remarks on verbage & font at a couple of places.

-

PR Review: https://git.openjdk.org/jdk/pull/16644#pullrequestreview-1896527410

Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v9]

2024-02-22 Thread Y . Srinivas Ramakrishna

On Thu, 22 Feb 2024 12:05:31 GMT, Daniel Jeliński  wrote:

>> src/java.base/share/classes/java/lang/ref/Reference.java line 491:
>> 
>>> 489:  * If this reference is not registered with a queue, or was 
>>> already enqueued
>>> 490:  * (by the garbage collector, or a previous call to {@code 
>>> enqueue}), this
>>> 491:  * method is unnsuccessful and returns false.
>> 
>> Suggestion:
>> 
>>  * method is unsuccessful and returns false.
>
> or, better yet, `fails`

I note that the adjective(s) (un)successful and the adverb(s) (un)successfully 
are used at several places in these comments, it might makes sense to use those 
terms here as well such that the documentation in internally consistent in its 
use of success or failure of actions. In particular, if this terminology is 
consistent with precedent in the official JLS spec.

However, I note that there are places where these terms are italicized and 
places where they aren't. I am not sure I follow the convention for 
italicization. In general, the first use (i.e. introduction) of a term that the 
reader might want to pay attention to calls for italicization when documents 
are read sequentially, such as in research papers. These javadoc specs will 
usually not be read in sequentially. But considering that someone does read 
them in order, I'd suggest italicizing only the first use of the term or, if 
not, then perhaps none. Alternatively, you might want to italicize all uses 
(but why?).

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1499714011

RE: GBM as standalone buffer allocator

2024-02-22 Thread Srinivas Pullakavi (QUIC)

+ Abhinav

From: Srinivas Pullakavi (QUIC)
Sent: Monday, January 22, 2024 10:44 PM
To: 'Yiwei Zhang' 
Cc: Rob Clark ; mesa-dev@lists.freedesktop.org
Subject: RE: GBM as standalone buffer allocator

Hi Yiwei,

Looks like this thread is closed.

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26038#note_2243187

Can we collaborate on this?

Thanks,
Srinivas

From: Yiwei Zhang mailto:zzyi...@chromium.org>>
Sent: Monday, November 20, 2023 4:38 AM
To: Srinivas Pullakavi (QUIC) 
mailto:quic_spull...@quicinc.com>>
Cc: Rob Clark mailto:robdcl...@gmail.com>>; 
mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org>
Subject: Re: GBM as standalone buffer allocator

There’s
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26038. It is quite 
appealing to me considering a VK only scenario.

On Thu, Nov 2, 2023 at 5:50 AM Srinivas Pullakavi (QUIC) 
mailto:quic_spull...@quicinc.com>> wrote:
Hi Rob,

Thanks for your inputs.

We are planning to use DMA-Buf for GBM backend. DMA-buf supported heaps are 
listed in /dev/dma_heap/
Gbm backend selects the best heap based on usage. For example: Secure buffers 
will be allocated from secure heap.

Sample output:
 # ls /dev/dma_heap
 reserved  system

Sample code to allocate a buffer from system heap:
int heap_fd = open(/dev/dma_heap/system, O_RDONLY | O_CLOEXEC))
struct dma_heap_allocation_data heap_data {
  .len = size,  // length of data to be 
allocated in bytes
  .fd_flags = O_RDWR | O_CLOEXEC,   // permissions for the memory 
to be allocated
  };
int status = ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, _data)
if (status == 0) {
 int buffer_fd = heap_data.fd;
  }

In this case, there is no dependency on display / Graphics driver. But still 
GBM create device expects a device fd to be passed.

Can we make it optional to pass device fd ?

Thanks,
Srinivas

-Original Message-
From: Rob Clark mailto:robdcl...@gmail.com>>
Sent: Tuesday, October 24, 2023 1:06 AM
To: Srinivas Pullakavi (QUIC) 
mailto:quic_spull...@quicinc.com>>
Cc: mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org>
Subject: Re: GBM as standalone buffer allocator

On Mon, Oct 23, 2023 at 6:22 AM Srinivas Pullakavi (QUIC) 
mailto:quic_spull...@quicinc.com>> wrote:
>
> Hi,
>
>
>
> We are planning to enhance GBM as a standalone buffer allocator, which
> can be used for all multi-media clients. Ex: video, camera, display
> etc;
>
>
>
> GBM create device expects a file descriptor to be passed, which points to drm 
> node. This brings in a dependency on display for buffer allocation. On 
> headless devices where display driver is not present, GBM cannot be used for 
> buffer allocations. E.g. Recording cases where pipeline is setup between 
> Camera, Video, Graphics.
>

Note that you need some sort of device to allocate buffers from.  With mesa and 
upstream kernel, that would be the drm device.  (However as Adam points out, a 
drm device does not necessarily need a display..
for example, several vendors have compute-only GPUs (pci) which have no display 
outputs.)

You might want to look at ChromeOS's minigbm.  It already handles these cases 
(buffer sharing across display/gpu/video/camera).

BR,
-R

[1] https://chromium.googlesource.com/chromiumos/platform/minigbm/

>
> Could you please share your comments on what will be a good design to make 
> GBM flexible for above?
>
>
>
> Thanks,
>
> Srinivas
>
>

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2024-02-16 Thread Srinivas Vamsi Parasa

On Thu, 8 Feb 2024 20:04:20 GMT, Vladimir Yaroslavskiy  wrote:

>> Hi Vladimir (@iaroslavski),
>> 
>> The new ArraysSortNew.Java has compilation issues:
>> 
>> 
>> error: DualPivotQuicksort is not public in java.util; cannot be accessed 
>> from outside package
>>  java.util.DualPivotQuicksort.sort(b, PARALLELISM, 0, b.length);
>> 
>> Have you run into this issue? 
>> 
>> Thanks,
>> Vamsi
>
> Hi Vamsi (@vamsi-parasa),
> 
> My fault, there was an incorrect version of ArraysSortNew.java. Methods, of 
> course, should be
> 
> @Benchmark
> public void sort() {
> Arrays.sort(b);
> }
> 
> @Benchmark
> public void p_sort() {
> Arrays.parallelSort(b);
> }
> 
> I uploaded correct version, see
> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSortNew.java
> 
> I also comment that pom.xml contains additional options (I guess you have the 
> same)
> --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED
> --add-exports=java.base/jdk.internal.vm.annotation=ALL-UNNAMED
> full text is there 
> https://github.com/iaroslavski/sorting/blob/master/radixsort/pom.xml
> 
> and command to run test is
> java --add-exports=java.base/jdk.internal.vm.annotation=ALL-UNNAMED 
> --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED -jar 
> target/benchmarks.jar
> 
> I assume that each variant of DPQS (DualPivotQuicksort_jdk, 
> DualPivotQuicksort_r20p, DualPivotQuicksort_r20s, DualPivotQuicksort_r25p, 
> DualPivotQuicksort_r25s) is renamed to DualPivotQuicksort and put into
> package java.util. Then benchmarking for a given variant with patched JDK is 
> executed.
> 
> Thank you,
> Vladimir

Hello Vladimir (@iaroslavski),

Please see the data below. Each DPQS class was copied to java.util and the JDK 
was recompiled.

Thanks,
Vamsi

http://www.w3.org/TR/REC-html40;>















Benchmark | (builder) | (size) | Stock JDK | r20p | r20s | r25p | r25s
-- | -- | -- | -- | -- | -- | -- | --
ArraysSort.Int.p_sort | RANDOM | 600 | 1.618 | 2.601 | 2.966 | 2.898 | 3.269
ArraysSort.Int.p_sort | RANDOM | 2000 | 7.433 | 8.438 | 8.463 | 8.414 | 8.65
ArraysSort.Int.p_sort | RANDOM | 9 | 258.853 | 355.261 | 326.378 | 347.65 | 
321.894
ArraysSort.Int.p_sort | RANDOM | 40 | 842.085 | 1225.929 | 899.852 | 
1278.681 | 932.627
ArraysSort.Int.p_sort | RANDOM | 300 | 5723.659 | 8711.108 | 6086.974 | 
8948.101 | 6122.612
ArraysSort.Int.p_sort | REPEATED | 600 | 0.52 | 0.585 | 0.629 | 0.586 | 0.579
ArraysSort.Int.p_sort | REPEATED | 2000 | 1.18 | 1.225 | 1.21 | 1.225 | 1.238
ArraysSort.Int.p_sort | REPEATED | 9 | 102.142 | 85.79 | 86.131 | 87.954 | 
86.036
ArraysSort.Int.p_sort | REPEATED | 40 | 244.508 | 229.142 | 227.613 | 
228.608 | 228.367
ArraysSort.Int.p_sort | REPEATED | 300 | 2752.745 | 2584.103 | 2544.192 | 
2576.803 | 2609.833
ArraysSort.Int.p_sort | STAGGER | 600 | 1.146 | 0.894 | 0.898 | 0.904 | 0.912
ArraysSort.Int.p_sort | STAGGER | 2000 | 3.712 | 3.096 | 3.121 | 3.03 | 3.049
ArraysSort.Int.p_sort | STAGGER | 9 | 72.763 | 77.575 | 78.366 | 79.158 | 
77.199
ArraysSort.Int.p_sort | STAGGER | 40 | 212.455 | 228.331 | 225.888 | 
224.686 | 225.728
ArraysSort.Int.p_sort | STAGGER | 300 | 2290.327 | 2216.741 | 2196.138 | 
2236.658 | 2262.472
ArraysSort.Int.p_sort | SHUFFLE | 600 | 2.01 | 2.92 | 2.907 | 2.91 | 2.926
ArraysSort.Int.p_sort | SHUFFLE | 2000 | 7.06 | 7.759 | 7.776 | 7.688 | 8.062
ArraysSort.Int.p_sort | SHUFFLE | 9 | 157.728 | 151.871 | 151.101 | 154.03 
| 151.2
ArraysSort.Int.p_sort | SHUFFLE | 40 | 441.166 | 715.243 | 449.698 | 699.75 
| 447.069
ArraysSort.Int.p_sort | SHUFFLE | 300 | 4326.88 | 7133.045 | 4205.47 | 
7161.862 | 4337.321
ArraysSort.Int.sort | RANDOM | 600 | 1.671 | 2.707 | 2.741 | 2.698 | 2.779
ArraysSort.Int.sort | RANDOM | 2000 | 7.265 | 8.226 | 8.942 | 8.193 | 8.339
ArraysSort.Int.sort | RANDOM | 9 | 529.054 | 559.499 | 554.29 | 566.009 | 
559.131
ArraysSort.Int.sort | RANDOM | 40 | 2448.226 | 2654.71 | 2622.964 | 
2629.673 | 2619.051
ArraysSort.Int.sort | RANDOM | 300 | 21471.133 | 22670.45 | 22654.94 | 
22811.7 | 22957.97
ArraysSort.Int.sort | REPEATED | 600 | 0.517 | 0.578 | 0.578 | 0.587 | 0.568
ArraysSort.Int.sort | REPEATED | 2000 | 1.136 | 1.228 | 1.215 | 1.377 | 1.222
ArraysSort.Int.sort | REPEATED | 9 | 57.575 | 56.406 | 56.542 | 56.068 | 
56.77
ArraysSort.Int.sort | REPEATED | 40 | 178.874 | 173.883 | 176.098 | 171.975 
| 172.067
ArraysSort.Int.sort | REPEATED | 300 | 1856.71 | 1588.104 | 1489.842 | 
1480.34 | 1522.399
ArraysSort.Int.sort | STAGGER | 600 | 1.143 | 0.893 | 0.901 | 0.896 | 0.906
ArraysSort.Int.sort | STAGGER | 2000 | 3.726 | 3.062 | 3.18 | 3.061 | 3.169
ArraysSort.Int.sort | STAGGER | 9 | 138.503 | 135.008 | 134.023 | 136.328 | 
136.026
ArraysSort.Int.sort | STAGGER | 40 | 615.732 | 608.269 | 609.348 | 606.986 
| 603.287
ArraysSort.Int.sort | STAGGER | 300 | 4914.443 | 4578.733 | 4584.407 | 
4591.832 | 4613.16
ArraysSort.Int.sort | SHUFFLE | 600 | 2.137 | 2.886 | 2.948 |

[PATCH v2] libstdc++: add ARM SVE support to std::experimental::simd

2024-02-09 Thread Srinivas Yadav Singanaboina

Hi,

Thanks for review @Richard!. I have tried to address most of your comments in 
this patch.
The major updates include optimizing operator[] for masks, find_first_set and 
find_last_set.

My further comments on some of the pointed out issues are
a. regarding the coverage of types supported for sve : Yes, all the types are 
covered by 
mapping any type using simple two rules : the size of the type and signedness 
of it.
b. all the operator overloads now use infix operators. For division and 
remainder, 
the inactive elements are padded with 1 to avoid undefined behavior.
c. isnan is optimized to have only two cases i.e finite_math_only case or case 
where svcmpuo is used.
d. _S_load for masks (bool) now uses svld1 by reinterpret_casting the pointer 
to uint8_t pointer and then performing a svunpklo.
The same optimization is not done for masked_load and stores, as conversion of 
mask from a higher size type to lower size type is not optimal (sequential).
e. _S_unary_minus could not use svneg_x because it does not support unsigned 
types.
f. added specializations for reductions.
g. find_first_set and find_last_set are optimized using svclastb.


libstdc++-v3/ChangeLog:

* include/Makefile.am: Add simd_sve.h.
* include/Makefile.in: Add simd_sve.h.
* include/experimental/bits/simd.h: Add new SveAbi.
* include/experimental/bits/simd_builtin.h: Use
  __no_sve_deduce_t to support existing Neon Abi.
* include/experimental/bits/simd_converter.h: Convert
  sequentially when sve is available.
* include/experimental/bits/simd_detail.h: Define sve
  specific macro.
* include/experimental/bits/simd_math.h: Fallback frexp
  to execute sequntially when sve is available, to handle
  fixed_size_simd return type that always uses sve.
* include/experimental/simd: Include bits/simd_sve.h.
* testsuite/experimental/simd/tests/bits/main.h: Enable
  testing for sve128, sve256, sve512.
* include/experimental/bits/simd_sve.h: New file.

 Signed-off-by: Srinivas Yadav Singanaboina
 vasu.srinivasvasu...@gmail.com
---
 libstdc++-v3/include/Makefile.am  |1 +
 libstdc++-v3/include/Makefile.in  |1 +
 libstdc++-v3/include/experimental/bits/simd.h |  131 +-
 .../include/experimental/bits/simd_builtin.h  |   35 +-
 .../experimental/bits/simd_converter.h|   57 +-
 .../include/experimental/bits/simd_detail.h   |7 +-
 .../include/experimental/bits/simd_math.h |   14 +-
 .../include/experimental/bits/simd_sve.h  | 1863 +
 libstdc++-v3/include/experimental/simd|3 +
 .../experimental/simd/tests/bits/main.h   |3 +
 10 files changed, 2084 insertions(+), 31 deletions(-)
 create mode 100644 libstdc++-v3/include/experimental/bits/simd_sve.h

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 6209f390e08..1170cb047a6 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -826,6 +826,7 @@ experimental_bits_headers = \
${experimental_bits_srcdir}/simd_neon.h \
${experimental_bits_srcdir}/simd_ppc.h \
${experimental_bits_srcdir}/simd_scalar.h \
+   ${experimental_bits_srcdir}/simd_sve.h \
${experimental_bits_srcdir}/simd_x86.h \
${experimental_bits_srcdir}/simd_x86_conversions.h \
${experimental_bits_srcdir}/string_view.tcc \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 596fa0d2390..bc44582a2da 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -1172,6 +1172,7 @@ experimental_bits_headers = \
${experimental_bits_srcdir}/simd_neon.h \
${experimental_bits_srcdir}/simd_ppc.h \
${experimental_bits_srcdir}/simd_scalar.h \
+   ${experimental_bits_srcdir}/simd_sve.h \
${experimental_bits_srcdir}/simd_x86.h \
${experimental_bits_srcdir}/simd_x86_conversions.h \
${experimental_bits_srcdir}/string_view.tcc \
diff --git a/libstdc++-v3/include/experimental/bits/simd.h 
b/libstdc++-v3/include/experimental/bits/simd.h
index 90523ea57dc..d274cd740fe 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -39,12 +39,16 @@
 #include 
 #include 
 #include 
+#include 
 
 #if _GLIBCXX_SIMD_X86INTRIN
 #include 
 #elif _GLIBCXX_SIMD_HAVE_NEON
 #include 
 #endif
+#if _GLIBCXX_SIMD_HAVE_SVE
+#include 
+#endif
 
 /** @ingroup ts_simd
  * @{
@@ -83,6 +87,12 @@ using __m512d [[__gnu__::__vector_size__(64)]] = double;
 using __m512i [[__gnu__::__vector_size__(64)]] = long long;
 #endif
 
+#if _GLIBCXX_SIMD_HAVE_SVE
+constexpr inline int __sve_vectorized_size_bytes = __ARM_FEATURE_SVE_BITS / 8;
+#else
+constexpr inline int __sve_vectorized_size_bytes = 0;
+#endif 
+
 namespace simd_abi {
 // simd_abi forward declarations {{{
 // implementation details:
@@ -108,6

[PATCH v2] libstdc++: add ARM SVE support to std::experimental::simd

2024-02-09 Thread Srinivas Yadav Singanaboina

Hi,

Thanks for review @Richard!. I have tried to address most of your comments in 
this patch.
The major updates include optimizing operator[] for masks, find_first_set and 
find_last_set.

My further comments on some of the pointed out issues are
a. regarding the coverage of types supported for sve : Yes, all the types are 
covered by 
mapping any type using simple two rules : the size of the type and signedness 
of it.
b. all the operator overloads now use infix operators. For division and 
remainder, 
the inactive elements are padded with 1 to avoid undefined behavior.
c. isnan is optimized to have only two cases i.e finite_math_only case or case 
where svcmpuo is used.
d. _S_load for masks (bool) now uses svld1 by reinterpret_casting the pointer 
to uint8_t pointer and then performing a svunpklo.
The same optimization is not done for masked_load and stores, as conversion of 
mask from a higher size type to lower size type is not optimal (sequential).
e. _S_unary_minus could not use svneg_x because it does not support unsigned 
types.
f. added specializations for reductions.
g. find_first_set and find_last_set are optimized using svclastb.


libstdc++-v3/ChangeLog:

* include/Makefile.am: Add simd_sve.h.
* include/Makefile.in: Add simd_sve.h.
* include/experimental/bits/simd.h: Add new SveAbi.
* include/experimental/bits/simd_builtin.h: Use
  __no_sve_deduce_t to support existing Neon Abi.
* include/experimental/bits/simd_converter.h: Convert
  sequentially when sve is available.
* include/experimental/bits/simd_detail.h: Define sve
  specific macro.
* include/experimental/bits/simd_math.h: Fallback frexp
  to execute sequntially when sve is available, to handle
  fixed_size_simd return type that always uses sve.
* include/experimental/simd: Include bits/simd_sve.h.
* testsuite/experimental/simd/tests/bits/main.h: Enable
  testing for sve128, sve256, sve512.
* include/experimental/bits/simd_sve.h: New file.

 Signed-off-by: Srinivas Yadav Singanaboina
 vasu.srinivasvasu...@gmail.com
---
 libstdc++-v3/include/Makefile.am  |1 +
 libstdc++-v3/include/Makefile.in  |1 +
 libstdc++-v3/include/experimental/bits/simd.h |  131 +-
 .../include/experimental/bits/simd_builtin.h  |   35 +-
 .../experimental/bits/simd_converter.h|   57 +-
 .../include/experimental/bits/simd_detail.h   |7 +-
 .../include/experimental/bits/simd_math.h |   14 +-
 .../include/experimental/bits/simd_sve.h  | 1863 +
 libstdc++-v3/include/experimental/simd|3 +
 .../experimental/simd/tests/bits/main.h   |3 +
 10 files changed, 2084 insertions(+), 31 deletions(-)
 create mode 100644 libstdc++-v3/include/experimental/bits/simd_sve.h

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 6209f390e08..1170cb047a6 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -826,6 +826,7 @@ experimental_bits_headers = \
${experimental_bits_srcdir}/simd_neon.h \
${experimental_bits_srcdir}/simd_ppc.h \
${experimental_bits_srcdir}/simd_scalar.h \
+   ${experimental_bits_srcdir}/simd_sve.h \
${experimental_bits_srcdir}/simd_x86.h \
${experimental_bits_srcdir}/simd_x86_conversions.h \
${experimental_bits_srcdir}/string_view.tcc \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 596fa0d2390..bc44582a2da 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -1172,6 +1172,7 @@ experimental_bits_headers = \
${experimental_bits_srcdir}/simd_neon.h \
${experimental_bits_srcdir}/simd_ppc.h \
${experimental_bits_srcdir}/simd_scalar.h \
+   ${experimental_bits_srcdir}/simd_sve.h \
${experimental_bits_srcdir}/simd_x86.h \
${experimental_bits_srcdir}/simd_x86_conversions.h \
${experimental_bits_srcdir}/string_view.tcc \
diff --git a/libstdc++-v3/include/experimental/bits/simd.h 
b/libstdc++-v3/include/experimental/bits/simd.h
index 90523ea57dc..d274cd740fe 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -39,12 +39,16 @@
 #include 
 #include 
 #include 
+#include 
 
 #if _GLIBCXX_SIMD_X86INTRIN
 #include 
 #elif _GLIBCXX_SIMD_HAVE_NEON
 #include 
 #endif
+#if _GLIBCXX_SIMD_HAVE_SVE
+#include 
+#endif
 
 /** @ingroup ts_simd
  * @{
@@ -83,6 +87,12 @@ using __m512d [[__gnu__::__vector_size__(64)]] = double;
 using __m512i [[__gnu__::__vector_size__(64)]] = long long;
 #endif
 
+#if _GLIBCXX_SIMD_HAVE_SVE
+constexpr inline int __sve_vectorized_size_bytes = __ARM_FEATURE_SVE_BITS / 8;
+#else
+constexpr inline int __sve_vectorized_size_bytes = 0;
+#endif 
+
 namespace simd_abi {
 // simd_abi forward declarations {{{
 // implementation details:
@@ -108,6

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2024-02-07 Thread Srinivas Vamsi Parasa

On Mon, 5 Feb 2024 21:31:36 GMT, Vladimir Yaroslavskiy  wrote:

>> Hi Vladimir (@iaroslavski),
>> 
>> Please see the data below. All tests were run after putting the DPQS code in 
>> java.util package and recompiling the JDK for each case.
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark (us/op) | (builder) | (size) | Stock JDK | a15 | r20p | r20s
>> -- | -- | -- | -- | -- | -- | --
>> ArraysSort.Int.testParallelSort | RANDOM | 600 | 2.24 | 2.201 | 2.423 | 2.389
>> ArraysSort.Int.testParallelSort | RANDOM | 9000 | 35.318 | 35.961 | 79.028 | 
>> 83.774
>> ArraysSort.Int.testParallelSort | RANDOM | 2 | 118.729 | 120.872 | 
>> 134.829 | 138.349
>> ArraysSort.Int.testParallelSort | RANDOM | 40 | 822.676 | 822.44 | 
>> 1200.858 | 872.264
>> ArraysSort.Int.testParallelSort | RANDOM | 300 | 5864.514 | 5948.82 | 
>> 8800.391 | 6020.616
>> ArraysSort.Int.testParallelSort | REPEATED | 600 | 0.924 | 0.936 | 0.752 | 
>> 0.733
>> ArraysSort.Int.testParallelSort | REPEATED | 9000 | 9.896 | 9.317 | 31.409 | 
>> 24.896
>> ArraysSort.Int.testParallelSort | REPEATED | 2 | 58.265 | 42.189 | 40.92 
>> | 40.101
>> ArraysSort.Int.testParallelSort | REPEATED | 40 | 256.952 | 253.217 | 
>> 236.568 | 239.163
>> ArraysSort.Int.testParallelSort | REPEATED | 300 | 2844.107 | 2851.088 | 
>> 2752.939 | 3040.423
>> ArraysSort.Int.testParallelSort | STAGGER | 600 | 2.245 | 2.296 | 2.15 | 
>> 2.219
>> ArraysSort.Int.testParallelSort | STAGGER | 9000 | 29.278 | 29.119 | 28.288 
>> | 28.141
>> ArraysSort.Int.testParallelSort | STAGGER | 2 | 50.129 | 50.442 | 49.746 
>> | 49.686
>> ArraysSort.Int.testParallelSort | STAGGER | 40 | 463.309 | 413.619 | 
>> 418.077 | 407.519
>> ArraysSort.Int.testParallelSort | STAGGER | 300 | 3687.198 | 4363.242 | 
>> 3732.777 | 3769.898
>> ArraysSort.Int.testParallelSort | SHUFFLE | 600 | 1.715 | 1.698 | 2.799 | 
>> 2.733
>> ArraysSort.Int.testParallelSort | SHUFFLE | 9000 | 27.69 | 27.183 | 32.883 | 
>> 32.373
>> ArraysSort.Int.testParallelSort | SHUFFLE | 2 | 62.067 | 60.987 | 63.281 
>> | 52.89
>> ArraysSort.Int.testParalle...
>
> Hello Vamsi (@vamsi-parasa),
> 
> Many thanks for the results! Now we can see that intrinsics are applied in 
> all cases,
> but there are big differences between the same code.
> 
> For example,
> parallelSort REPEATED 2:  58.265(Stock JDK) and 42.189(a15) with speedup 
> x1.38
> parallelSort STAGGER 300:  3687.198(Stock JDK) 4363.242(a15) with speedup 
>  x0.85
> 
> Case a15 is the current source code from JDK, but in one benchmarking it is 
> faster,
> in other benchmarking it is slower (~15-30%).
> 
> Other strange behaviour with new sorting: r20p and r20s have the same code for
> sequential sorting (no radix sort at all), but we can see that on case works 
> much slower
>  
> sort STAGGER 300: 34406.74(r20p) and 10467.03(r20s) - 3.3 times slower,
> whereas other sizes show more or less equal values.
> 
> Vamsi (@vamsi-parasa),
> Could you please run benchmarking of 5 cases with **updated** test class 
> **ArraysSortNew**?
> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSortNew.java
> 
> Put the DPQS code in java.util package and recompiling the JDK for each case 
> as you
> did before, but run new **ArraysSortNew**.
> 
> Find the sources there:
> 
> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSortNew.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_jdk.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20p.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20s.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r25p.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r25s.java
> 
> Thank you,
> Vladimir

Hi Vladimir (@iaroslavski),

The new ArraysSortNew.Java has compilation issues:


error: DualPivotQuicksort is not public in java.util; cannot be accessed from 
outside package
 java.util.DualPivotQuicksort.sort(b, PARALLELISM, 0, b.length);

Have you run into this issue? 

Thanks,
Vamsi

-

PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1933243711

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2024-02-02 Thread Srinivas Vamsi Parasa

On Sun, 28 Jan 2024 22:23:38 GMT, Vladimir Yaroslavskiy  
wrote:

>> Hi Vladimir (@iaroslavski),
>> 
>> Please see the JMH data below.
>> 
>> Thanks,
>> Vamsi
>> 
>> Benchmark  (builder)   (size)  Mode  Cnt   Score  Error  
>> Units
>> ArraysSort.Int.a15RANDOM  600  avgt4   7.096 ±0.081  
>> us/op
>> ArraysSort.Int.a15RANDOM 2000  avgt4  44.014 ±1.717  
>> us/op
>> ArraysSort.Int.a15RANDOM9  avgt44451.444 ±   71.168  
>> us/op
>> ArraysSort.Int.a15RANDOM   40  avgt4   22751.966 ±  683.597  
>> us/op
>> ArraysSort.Int.a15RANDOM  300  avgt4  190326.306 ± 8008.512  
>> us/op
>> ArraysSort.Int.a15  REPEATED  600  avgt4   1.044 ±0.016  
>> us/op
>> ArraysSort.Int.a15  REPEATED 2000  avgt4   2.272 ±0.287  
>> us/op
>> ArraysSort.Int.a15  REPEATED9  avgt4 412.331 ±   11.656  
>> us/op
>> ArraysSort.Int.a15  REPEATED   40  avgt41908.978 ±   30.241  
>> us/op
>> ArraysSort.Int.a15  REPEATED  300  avgt4   15163.443 ±  100.425  
>> us/op
>> ArraysSort.Int.a15   STAGGER  600  avgt4   1.055 ±0.057  
>> us/op
>> ArraysSort.Int.a15   STAGGER 2000  avgt4   3.408 ±0.096  
>> us/op
>> ArraysSort.Int.a15   STAGGER9  avgt4 149.220 ±4.022  
>> us/op
>> ArraysSort.Int.a15   STAGGER   40  avgt4 663.096 ±   30.252  
>> us/op
>> ArraysSort.Int.a15   STAGGER  300  avgt45206.890 ±  234.857  
>> us/op
>> ArraysSort.Int.a15   SHUFFLE  600  avgt4   4.611 ±0.118  
>> us/op
>> ArraysSort.Int.a15   SHUFFLE 2000  avgt4  17.955 ±0.356  
>> us/op
>> ArraysSort.Int.a15   SHUFFLE9  avgt41410.357 ±   41.128  
>> us/op
>> ArraysSort.Int.a15   SHUFFLE   40  avgt45739.311 ±  128.270  
>> us/op
>> ArraysSort.Int.a15   SHUFFLE  300  avgt4   41501.980 ±  829.443  
>> us/op
>> ArraysSort.Int.jdkRANDOM  600  avgt4   1.612 ±0.088  
>> us/op
>> ArraysSort.Int.jdkRANDOM 2000  avgt4   6.893 ±0.375  
>> us/op
>> ArraysSort.Int.jdkRANDOM9  avgt4 522.749 ±   19.386  
>> us/op
>> ArraysSort.Int.jdkRANDOM   40  avgt42424.204 ±   63.844  
>> us/op
>> ArraysSort.Int.jdkRANDOM  300  avgt4   21000.434 ±  801.315  
>> us/op
>> ArraysSort.Int.jdk  REPEATED  600  avgt4   0.496 ±0.030  
>> us/op
>> ArraysSort.Int.jdk  REPEATED 2000  avgt4   1.037 ±0.083  
>> us/op
>> ArraysSort.Int.jdk  REPE...
>
> Hi Vamsi (@vamsi-parasa), Laurent(@bourgesl),
> 
> The latest benchmarking compares compares the following versions:
> jdk - direct call of Arrays.sort();
> a15 - the current source of DualPivotQuicksort from the latest build (except 
> renaming)
> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/DualPivotQuicksort.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a15.java
> r20s - new version without Radix sort
> r20p - new version with Radix sort in parallel case only
> 
> It is expected that timing of jdk and a15 should be more or less the same, 
> but please look at the results:
> 
> Benchmark | Data Type | Array Size | Arrays.sort() from jdk | Current source 
> (a15)
> -- | -- | -- | -- | --
> ArraysSort.Int.testSort | RANDOM   |     600 | 1.612     | 7.096
> ArraysSort.Int.testSort | RANDOM   |    2000 | 6.893     | 44.014
> ArraysSort.Int.testSort | RANDOM   |   9 | 522.749   | 4451.444
> ArraysSort.Int.testSort | RANDOM   |  40 | 2424.204  | 22751.966
> ArraysSort.Int.testSort | RANDOM   | 300 | 21000.434 | 190326.306
> ArraysSort.Int.testSort | REPEATED |     600 | 0.496     | 1.044
> ArraysSort.Int.testSort | REPEATED |    2000 | 1.037     | 2.272
> ArraysSort.Int.testSort | REPEATED |   9 | 57.763    | 412.331
> ArraysSort.Int.testSort | REPEATED |  40 | 182.238   | 1908.978
> ArraysSort.Int.testSort | REPEATED | 300 | 1708.082  | 15163.443
> ArraysSort.Int.testSort | STAGGER  |     600 | 1.038     | 1.055
> ArraysSort.Int.testSort | STAGGER  |    2000 | 3.434     | 3.408
> ArraysSort.Int.testSort | STAGGER  |   9 | 148.638   | 149.220
> ArraysSort.Int.testSort | STAGGER  |  40 | 663.076   | 663.096
> ArraysSort.Int.testSort | STAGGER  | 300 | 5212.821  | 5206.890
> ArraysSort.Int.testSort | SHUFFLE  |     600 | 1.926     | 4.611
> ArraysSort.Int.testSort | SHUFFLE  |    2000 | 6.858     | 17.955
> ArraysSort.Int.testSort | SHUFFLE  |   9 | 473.441   | 1410.357
> ArraysSort.Int.testSort | SHUFFLE  |  40 | 2153.779  | 5739.311
> ArraysSort.Int.testSort | SHUFFLE  | 300 | 18180.141 | 41501.980
> 
> You can see that a15 (current source) works extremly slower than 
> Arrays.sort(), but the code is the same
> with

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2024-02-02 Thread Srinivas Vamsi Parasa

On Sun, 28 Jan 2024 22:23:38 GMT, Vladimir Yaroslavskiy  
wrote:

>> Hi Vladimir (@iaroslavski),
>> 
>> Please see the JMH data below.
>> 
>> Thanks,
>> Vamsi
>> 
>> Benchmark  (builder)   (size)  Mode  Cnt   Score  Error  
>> Units
>> ArraysSort.Int.a15RANDOM  600  avgt4   7.096 ±0.081  
>> us/op
>> ArraysSort.Int.a15RANDOM 2000  avgt4  44.014 ±1.717  
>> us/op
>> ArraysSort.Int.a15RANDOM9  avgt44451.444 ±   71.168  
>> us/op
>> ArraysSort.Int.a15RANDOM   40  avgt4   22751.966 ±  683.597  
>> us/op
>> ArraysSort.Int.a15RANDOM  300  avgt4  190326.306 ± 8008.512  
>> us/op
>> ArraysSort.Int.a15  REPEATED  600  avgt4   1.044 ±0.016  
>> us/op
>> ArraysSort.Int.a15  REPEATED 2000  avgt4   2.272 ±0.287  
>> us/op
>> ArraysSort.Int.a15  REPEATED9  avgt4 412.331 ±   11.656  
>> us/op
>> ArraysSort.Int.a15  REPEATED   40  avgt41908.978 ±   30.241  
>> us/op
>> ArraysSort.Int.a15  REPEATED  300  avgt4   15163.443 ±  100.425  
>> us/op
>> ArraysSort.Int.a15   STAGGER  600  avgt4   1.055 ±0.057  
>> us/op
>> ArraysSort.Int.a15   STAGGER 2000  avgt4   3.408 ±0.096  
>> us/op
>> ArraysSort.Int.a15   STAGGER9  avgt4 149.220 ±4.022  
>> us/op
>> ArraysSort.Int.a15   STAGGER   40  avgt4 663.096 ±   30.252  
>> us/op
>> ArraysSort.Int.a15   STAGGER  300  avgt45206.890 ±  234.857  
>> us/op
>> ArraysSort.Int.a15   SHUFFLE  600  avgt4   4.611 ±0.118  
>> us/op
>> ArraysSort.Int.a15   SHUFFLE 2000  avgt4  17.955 ±0.356  
>> us/op
>> ArraysSort.Int.a15   SHUFFLE9  avgt41410.357 ±   41.128  
>> us/op
>> ArraysSort.Int.a15   SHUFFLE   40  avgt45739.311 ±  128.270  
>> us/op
>> ArraysSort.Int.a15   SHUFFLE  300  avgt4   41501.980 ±  829.443  
>> us/op
>> ArraysSort.Int.jdkRANDOM  600  avgt4   1.612 ±0.088  
>> us/op
>> ArraysSort.Int.jdkRANDOM 2000  avgt4   6.893 ±0.375  
>> us/op
>> ArraysSort.Int.jdkRANDOM9  avgt4 522.749 ±   19.386  
>> us/op
>> ArraysSort.Int.jdkRANDOM   40  avgt42424.204 ±   63.844  
>> us/op
>> ArraysSort.Int.jdkRANDOM  300  avgt4   21000.434 ±  801.315  
>> us/op
>> ArraysSort.Int.jdk  REPEATED  600  avgt4   0.496 ±0.030  
>> us/op
>> ArraysSort.Int.jdk  REPEATED 2000  avgt4   1.037 ±0.083  
>> us/op
>> ArraysSort.Int.jdk  REPE...
>
> Hi Vamsi (@vamsi-parasa), Laurent(@bourgesl),
> 
> The latest benchmarking compares compares the following versions:
> jdk - direct call of Arrays.sort();
> a15 - the current source of DualPivotQuicksort from the latest build (except 
> renaming)
> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/DualPivotQuicksort.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a15.java
> r20s - new version without Radix sort
> r20p - new version with Radix sort in parallel case only
> 
> It is expected that timing of jdk and a15 should be more or less the same, 
> but please look at the results:
> 
> Benchmark | Data Type | Array Size | Arrays.sort() from jdk | Current source 
> (a15)
> -- | -- | -- | -- | --
> ArraysSort.Int.testSort | RANDOM   |     600 | 1.612     | 7.096
> ArraysSort.Int.testSort | RANDOM   |    2000 | 6.893     | 44.014
> ArraysSort.Int.testSort | RANDOM   |   9 | 522.749   | 4451.444
> ArraysSort.Int.testSort | RANDOM   |  40 | 2424.204  | 22751.966
> ArraysSort.Int.testSort | RANDOM   | 300 | 21000.434 | 190326.306
> ArraysSort.Int.testSort | REPEATED |     600 | 0.496     | 1.044
> ArraysSort.Int.testSort | REPEATED |    2000 | 1.037     | 2.272
> ArraysSort.Int.testSort | REPEATED |   9 | 57.763    | 412.331
> ArraysSort.Int.testSort | REPEATED |  40 | 182.238   | 1908.978
> ArraysSort.Int.testSort | REPEATED | 300 | 1708.082  | 15163.443
> ArraysSort.Int.testSort | STAGGER  |     600 | 1.038     | 1.055
> ArraysSort.Int.testSort | STAGGER  |    2000 | 3.434     | 3.408
> ArraysSort.Int.testSort | STAGGER  |   9 | 148.638   | 149.220
> ArraysSort.Int.testSort | STAGGER  |  40 | 663.076   | 663.096
> ArraysSort.Int.testSort | STAGGER  | 300 | 5212.821  | 5206.890
> ArraysSort.Int.testSort | SHUFFLE  |     600 | 1.926     | 4.611
> ArraysSort.Int.testSort | SHUFFLE  |    2000 | 6.858     | 17.955
> ArraysSort.Int.testSort | SHUFFLE  |   9 | 473.441   | 1410.357
> ArraysSort.Int.testSort | SHUFFLE  |  40 | 2153.779  | 5739.311
> ArraysSort.Int.testSort | SHUFFLE  | 300 | 18180.141 | 41501.980
> 
> You can see that a15 (current source) works extremly slower than 
> Arrays.sort(), but the code is the same
> with

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2024-01-26 Thread Srinivas Vamsi Parasa

On Thu, 18 Jan 2024 21:36:22 GMT, Vladimir Yaroslavskiy  
wrote:

>> Hi Vladimir (@iaroslavski)
>> 
>> Please see the data below using the latest version of AVX512 sort that got 
>> integrated into OpenJDK.
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark   (us/op) | (builder) | Stock JDK | a10 | r14 | r17 | r18
>> -- | -- | -- | -- | -- | -- | --
>> ArraysSort.Int.testSort | RANDOM | 2.202 | 2.226 | 1.535 | 1.556 | 1.546
>> ArraysSort.Int.testSort | RANDOM | 35.128 | 34.804 | 30.808 | 30.914 | 31.284
>> ArraysSort.Int.testSort | RANDOM | 78.571 | 77.224 | 72.567 | 73.098 | 73.337
>> ArraysSort.Int.testSort | RANDOM | 2466.487 | 2470.66 | 2504.654 | 2494.051 
>> | 2499.746
>> ArraysSort.Int.testSort | RANDOM | 20704.14 | 20668.19 | 21377.73 | 21362.63 
>> | 21278.94
>> ArraysSort.Int.testSort | REPEATED | 0.877 | 0.892 | 0.74 | 0.724 | 0.718
>> ArraysSort.Int.testSort | REPEATED | 4.789 | 4.788 | 4.92 | 4.721 | 4.891
>> ArraysSort.Int.testSort | REPEATED | 11.172 | 11.778 | 11.53 | 11.467 | 
>> 11.406
>> ArraysSort.Int.testSort | REPEATED | 207.212 | 207.292 | 255.46 | 258.832 | 
>> 254.44
>> ArraysSort.Int.testSort | REPEATED | 1862.544 | 1876.759 | 1952.646 | 
>> 1957.978 | 1981.906
>> ArraysSort.Int.testSort | STAGGER | 2.092 | 2.137 | 1.999 | 2.031 | 2.015
>> ArraysSort.Int.testSort | STAGGER | 29.891 | 30.321 | 25.626 | 26.318 | 
>> 26.396
>> ArraysSort.Int.testSort | STAGGER | 60.979 | 83.439 | 57.864 | 57.213 | 
>> 79.762
>> ArraysSort.Int.testSort | STAGGER | 1227.933 | 1224.495 | 1236.133 | 
>> 1229.773 | 1228.877
>> ArraysSort.Int.testSort | STAGGER | 9514.873 | 9565.599 | 9491.509 | 
>> 9481.147 | 9481.905
>> ArraysSort.Int.testSort | SHUFFLE | 1.608 | 1.595 | 1.419 | 1.442 | 1.491
>> ArraysSort.Int.testSort | SHUFFLE | 31.566 | 32.789 | 28.718 | 28.768 | 
>> 28.671
>> ArraysSort.Int.testSort | SHUFFLE | 82.157 | 83.741 | 70.889 | 69.951 | 
>> 71.196
>> ArraysSort.Int.testSort | SHUFFLE | 2251.219 | 2248.496 | 2184.459 | 
>> 2163.969 | 2156.239
>> ArraysSort.Int.testSort | SHUFFLE | 18211.05 | 18223.24 | 17987.4 | 18114.26 
>> | 17994.98
> ...
>
> Hello Vamsi (@vamsi-parasa),
> 
> Could you please run the benchmarking of new DQPS in your environment with 
> AVX?
> 
> Take all classes below and put them in the package 
> org.openjdk.bench.java.util.
> ArraysSort class contains all tests for the new versions and ready to use.
> (it will run all tests in one execution).
> 
> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSort.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a15.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20s.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20p.java
> 
> Many thanks,
> Vladimir

Hi Vladimir (@iaroslavski),

Please see the JMH data below.

Thanks,
Vamsi

Benchmark  (builder)   (size)  Mode  Cnt   Score  Error  
Units
ArraysSort.Int.a15RANDOM  600  avgt4   7.096 ±0.081  
us/op
ArraysSort.Int.a15RANDOM 2000  avgt4  44.014 ±1.717  
us/op
ArraysSort.Int.a15RANDOM9  avgt44451.444 ±   71.168  
us/op
ArraysSort.Int.a15RANDOM   40  avgt4   22751.966 ±  683.597  
us/op
ArraysSort.Int.a15RANDOM  300  avgt4  190326.306 ± 8008.512  
us/op
ArraysSort.Int.a15  REPEATED  600  avgt4   1.044 ±0.016  
us/op
ArraysSort.Int.a15  REPEATED 2000  avgt4   2.272 ±0.287  
us/op
ArraysSort.Int.a15  REPEATED9  avgt4 412.331 ±   11.656  
us/op
ArraysSort.Int.a15  REPEATED   40  avgt41908.978 ±   30.241  
us/op
ArraysSort.Int.a15  REPEATED  300  avgt4   15163.443 ±  100.425  
us/op
ArraysSort.Int.a15   STAGGER  600  avgt4   1.055 ±0.057  
us/op
ArraysSort.Int.a15   STAGGER 2000  avgt4   3.408 ±0.096  
us/op
ArraysSort.Int.a15   STAGGER9  avgt4 149.220 ±4.022  
us/op
ArraysSort.Int.a15   STAGGER   40  avgt4 663.096 ±   30.252  
us/op
ArraysSort.Int.a15   STAGGER  300  avgt45206.890 ±  234.857  
us/op
ArraysSort.Int.a15   SHUFFLE  600  avgt4   4.611 ±0.118  
us/op
ArraysSort.Int.a15   SHUFFLE 2000  avgt4  17.955 ±0.356  
us/op
ArraysSort.Int.a15   SHUFFLE9  avgt41410.357 ±   41.128  
us/op
ArraysSort.Int.a15   SHUFFLE   40  avgt45739.311 ±  128.270  
us/op
ArraysSort.Int.a15   SHUFFLE  300  avgt4   41501.980 ±  829.443  
us/op

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2024-01-26 Thread Srinivas Vamsi Parasa

On Thu, 18 Jan 2024 21:36:22 GMT, Vladimir Yaroslavskiy  
wrote:

>> Hi Vladimir (@iaroslavski)
>> 
>> Please see the data below using the latest version of AVX512 sort that got 
>> integrated into OpenJDK.
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark   (us/op) | (builder) | Stock JDK | a10 | r14 | r17 | r18
>> -- | -- | -- | -- | -- | -- | --
>> ArraysSort.Int.testSort | RANDOM | 2.202 | 2.226 | 1.535 | 1.556 | 1.546
>> ArraysSort.Int.testSort | RANDOM | 35.128 | 34.804 | 30.808 | 30.914 | 31.284
>> ArraysSort.Int.testSort | RANDOM | 78.571 | 77.224 | 72.567 | 73.098 | 73.337
>> ArraysSort.Int.testSort | RANDOM | 2466.487 | 2470.66 | 2504.654 | 2494.051 
>> | 2499.746
>> ArraysSort.Int.testSort | RANDOM | 20704.14 | 20668.19 | 21377.73 | 21362.63 
>> | 21278.94
>> ArraysSort.Int.testSort | REPEATED | 0.877 | 0.892 | 0.74 | 0.724 | 0.718
>> ArraysSort.Int.testSort | REPEATED | 4.789 | 4.788 | 4.92 | 4.721 | 4.891
>> ArraysSort.Int.testSort | REPEATED | 11.172 | 11.778 | 11.53 | 11.467 | 
>> 11.406
>> ArraysSort.Int.testSort | REPEATED | 207.212 | 207.292 | 255.46 | 258.832 | 
>> 254.44
>> ArraysSort.Int.testSort | REPEATED | 1862.544 | 1876.759 | 1952.646 | 
>> 1957.978 | 1981.906
>> ArraysSort.Int.testSort | STAGGER | 2.092 | 2.137 | 1.999 | 2.031 | 2.015
>> ArraysSort.Int.testSort | STAGGER | 29.891 | 30.321 | 25.626 | 26.318 | 
>> 26.396
>> ArraysSort.Int.testSort | STAGGER | 60.979 | 83.439 | 57.864 | 57.213 | 
>> 79.762
>> ArraysSort.Int.testSort | STAGGER | 1227.933 | 1224.495 | 1236.133 | 
>> 1229.773 | 1228.877
>> ArraysSort.Int.testSort | STAGGER | 9514.873 | 9565.599 | 9491.509 | 
>> 9481.147 | 9481.905
>> ArraysSort.Int.testSort | SHUFFLE | 1.608 | 1.595 | 1.419 | 1.442 | 1.491
>> ArraysSort.Int.testSort | SHUFFLE | 31.566 | 32.789 | 28.718 | 28.768 | 
>> 28.671
>> ArraysSort.Int.testSort | SHUFFLE | 82.157 | 83.741 | 70.889 | 69.951 | 
>> 71.196
>> ArraysSort.Int.testSort | SHUFFLE | 2251.219 | 2248.496 | 2184.459 | 
>> 2163.969 | 2156.239
>> ArraysSort.Int.testSort | SHUFFLE | 18211.05 | 18223.24 | 17987.4 | 18114.26 
>> | 17994.98
> ...
>
> Hello Vamsi (@vamsi-parasa),
> 
> Could you please run the benchmarking of new DQPS in your environment with 
> AVX?
> 
> Take all classes below and put them in the package 
> org.openjdk.bench.java.util.
> ArraysSort class contains all tests for the new versions and ready to use.
> (it will run all tests in one execution).
> 
> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSort.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a15.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20s.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20p.java
> 
> Many thanks,
> Vladimir

Hi Vladimir (@iaroslavski),

I was able to figure out the issue and started the benchmarking JMH run. It's 
night time here, will provide the data Friday morning (US PST)

Thanks,
Vamsi

-

PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1911741126

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2024-01-26 Thread Srinivas Vamsi Parasa

On Thu, 18 Jan 2024 21:36:22 GMT, Vladimir Yaroslavskiy  
wrote:

>> Hi Vladimir (@iaroslavski)
>> 
>> Please see the data below using the latest version of AVX512 sort that got 
>> integrated into OpenJDK.
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark   (us/op) | (builder) | Stock JDK | a10 | r14 | r17 | r18
>> -- | -- | -- | -- | -- | -- | --
>> ArraysSort.Int.testSort | RANDOM | 2.202 | 2.226 | 1.535 | 1.556 | 1.546
>> ArraysSort.Int.testSort | RANDOM | 35.128 | 34.804 | 30.808 | 30.914 | 31.284
>> ArraysSort.Int.testSort | RANDOM | 78.571 | 77.224 | 72.567 | 73.098 | 73.337
>> ArraysSort.Int.testSort | RANDOM | 2466.487 | 2470.66 | 2504.654 | 2494.051 
>> | 2499.746
>> ArraysSort.Int.testSort | RANDOM | 20704.14 | 20668.19 | 21377.73 | 21362.63 
>> | 21278.94
>> ArraysSort.Int.testSort | REPEATED | 0.877 | 0.892 | 0.74 | 0.724 | 0.718
>> ArraysSort.Int.testSort | REPEATED | 4.789 | 4.788 | 4.92 | 4.721 | 4.891
>> ArraysSort.Int.testSort | REPEATED | 11.172 | 11.778 | 11.53 | 11.467 | 
>> 11.406
>> ArraysSort.Int.testSort | REPEATED | 207.212 | 207.292 | 255.46 | 258.832 | 
>> 254.44
>> ArraysSort.Int.testSort | REPEATED | 1862.544 | 1876.759 | 1952.646 | 
>> 1957.978 | 1981.906
>> ArraysSort.Int.testSort | STAGGER | 2.092 | 2.137 | 1.999 | 2.031 | 2.015
>> ArraysSort.Int.testSort | STAGGER | 29.891 | 30.321 | 25.626 | 26.318 | 
>> 26.396
>> ArraysSort.Int.testSort | STAGGER | 60.979 | 83.439 | 57.864 | 57.213 | 
>> 79.762
>> ArraysSort.Int.testSort | STAGGER | 1227.933 | 1224.495 | 1236.133 | 
>> 1229.773 | 1228.877
>> ArraysSort.Int.testSort | STAGGER | 9514.873 | 9565.599 | 9491.509 | 
>> 9481.147 | 9481.905
>> ArraysSort.Int.testSort | SHUFFLE | 1.608 | 1.595 | 1.419 | 1.442 | 1.491
>> ArraysSort.Int.testSort | SHUFFLE | 31.566 | 32.789 | 28.718 | 28.768 | 
>> 28.671
>> ArraysSort.Int.testSort | SHUFFLE | 82.157 | 83.741 | 70.889 | 69.951 | 
>> 71.196
>> ArraysSort.Int.testSort | SHUFFLE | 2251.219 | 2248.496 | 2184.459 | 
>> 2163.969 | 2156.239
>> ArraysSort.Int.testSort | SHUFFLE | 18211.05 | 18223.24 | 17987.4 | 18114.26 
>> | 17994.98
> ...
>
> Hello Vamsi (@vamsi-parasa),
> 
> Could you please run the benchmarking of new DQPS in your environment with 
> AVX?
> 
> Take all classes below and put them in the package 
> org.openjdk.bench.java.util.
> ArraysSort class contains all tests for the new versions and ready to use.
> (it will run all tests in one execution).
> 
> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSort.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a15.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20s.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20p.java
> 
> Many thanks,
> Vladimir

Hello Vladimir (@iaroslavski),

Could you please share your pom.xml as am running into issues when the JHM 
benchmark is run:

`java.lang.IllegalAccessError: class 
org.openjdk.bench.java.util.DualPivotQuicksort_a15 (in unnamed module 
@0x520a3426) cannot access class jdk.internal.misc.Unsafe (in module java.base) 
because module java.base does not export jdk.internal.misc to unnamed module 
@0x520a3426`

Added the following add-exports in pom.xml, but it's still not working.



org.apache.maven.plugins
maven-compiler-plugin
3.8.1
  
   
 --add-exports
  java.base/jdk.internal.misc=ALL-UNNAMED
  --add-exports
  
java.base/jdk.internal.vm.annotation=ALL-UNNAMED


   


Thanks,
Vamsi

-

PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1911712234

RE: GBM as standalone buffer allocator

2024-01-22 Thread Srinivas Pullakavi (QUIC)

Hi Yiwei,

Looks like this thread is closed.

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26038#note_2243187

Can we collaborate on this?

Thanks,
Srinivas

From: Yiwei Zhang 
Sent: Monday, November 20, 2023 4:38 AM
To: Srinivas Pullakavi (QUIC) 
Cc: Rob Clark ; mesa-dev@lists.freedesktop.org
Subject: Re: GBM as standalone buffer allocator

There’s
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26038. It is quite 
appealing to me considering a VK only scenario.

On Thu, Nov 2, 2023 at 5:50 AM Srinivas Pullakavi (QUIC) 
mailto:quic_spull...@quicinc.com>> wrote:
Hi Rob,

Thanks for your inputs.

We are planning to use DMA-Buf for GBM backend. DMA-buf supported heaps are 
listed in /dev/dma_heap/
Gbm backend selects the best heap based on usage. For example: Secure buffers 
will be allocated from secure heap.

Sample output:
 # ls /dev/dma_heap
 reserved  system

Sample code to allocate a buffer from system heap:
int heap_fd = open(/dev/dma_heap/system, O_RDONLY | O_CLOEXEC))
struct dma_heap_allocation_data heap_data {
  .len = size,  // length of data to be 
allocated in bytes
  .fd_flags = O_RDWR | O_CLOEXEC,   // permissions for the memory 
to be allocated
  };
int status = ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, _data)
if (status == 0) {
 int buffer_fd = heap_data.fd;
  }

In this case, there is no dependency on display / Graphics driver. But still 
GBM create device expects a device fd to be passed.

Can we make it optional to pass device fd ?

Thanks,
Srinivas

-Original Message-
From: Rob Clark mailto:robdcl...@gmail.com>>
Sent: Tuesday, October 24, 2023 1:06 AM
To: Srinivas Pullakavi (QUIC) 
mailto:quic_spull...@quicinc.com>>
Cc: mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org>
Subject: Re: GBM as standalone buffer allocator

On Mon, Oct 23, 2023 at 6:22 AM Srinivas Pullakavi (QUIC) 
mailto:quic_spull...@quicinc.com>> wrote:
>
> Hi,
>
>
>
> We are planning to enhance GBM as a standalone buffer allocator, which
> can be used for all multi-media clients. Ex: video, camera, display
> etc;
>
>
>
> GBM create device expects a file descriptor to be passed, which points to drm 
> node. This brings in a dependency on display for buffer allocation. On 
> headless devices where display driver is not present, GBM cannot be used for 
> buffer allocations. E.g. Recording cases where pipeline is setup between 
> Camera, Video, Graphics.
>

Note that you need some sort of device to allocate buffers from.  With mesa and 
upstream kernel, that would be the drm device.  (However as Adam points out, a 
drm device does not necessarily need a display..
for example, several vendors have compute-only GPUs (pci) which have no display 
outputs.)

You might want to look at ChromeOS's minigbm.  It already handles these cases 
(buffer sharing across display/gpu/video/camera).

BR,
-R

[1] https://chromium.googlesource.com/chromiumos/platform/minigbm/

>
> Could you please share your comments on what will be a good design to make 
> GBM flexible for above?
>
>
>
> Thanks,
>
> Srinivas
>
>

[sig-policy] Reminder: APNIC 57 Call for Policy Proposals

2024-01-07 Thread Srinivas (Sunny) Chendi


Dear SIG Members,

Happy New Year 2024!

This is a reminder that the deadline set by the Policy SIG Chair for 
proposals
to be discussed at APNIC 57 Open Policy Meeting (OPM) is *Friday, 12 
January

2024 at 23:59 UTC +7.*

If you have any ideas to improve policy, or wish to make an informational
presentation about an aspect of resource management, please follow the
instructions below.

To propose a new policy or submit an informational presentation synopsis,
please visit

https://www.apnic.net/community/policy/proposals/submit-a-policy-proposal/

We look forward to and encourage your participation in the APNIC 57 Open
Policy Meeting (OPM), which will be held on Thursday, 29 February 2024
in Bangkok, Thailand.

https://conference.apnic.net/57/

Best Regards,
Sunny
APNIC Secretariat


On 6/11/2023 1:21 pm, Bertrand Cherrier wrote:

Dear Colleagues,

The APNIC 57 Open Policy Meeting (OPM) will be held on Thursday, 29 
February 2024

in Bangkok, Thailand.

If you have any ideas to improve current policies, or propose new 
policy, or wish
to make an informational presentation about an aspect of resource 
management,

please follow the instructions below.

The submission deadline is Friday, 12 January 2024 at 23:59 UTC +7.

To propose a new policy or submit an informational presentation 
synopsis, please visit:
https://www.apnic.net/community/policy/proposals/submit-a-policy-proposal/ 



We look forward to your participation in the APNIC 57 OPM.

Kind regards,

Bertrand, Shaila and Anupam
Policy SIG Chairs

___
SIG-policy - https://mailman.apnic.net/sig-policy@lists.apnic.net/
To unsubscribe send an email to sig-policy-le...@lists.apnic.net
___
SIG-policy - https://mailman.apnic.net/sig-policy@lists.apnic.net/
To unsubscribe send an email to sig-policy-le...@lists.apnic.net

Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

2024-01-03 Thread Srinivas Yadav

nstexpr _MaskMember<_Tp>
> _S_broadcast(bool __x)
> {
>  constexpr size_t _Np = simd_size_v<_Tp, _Abi>;
>  __sve_bool_type __tr = __sve_vector_type<_Tp,
> _Np>::__sve_active_mask();
>  __sve_bool_type __fl = svnot_z(__tr, __tr);
>
> This can just be svpfalse_b();
>
> Got it! Thanks!


>   template 
> struct _MaskImplSve
> {
> ...
>   template 
> _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
> _S_load(const bool* __mem)
> {
>  _SveMaskWrapper> __r;
>
>  __execute_n_times>(
>[&](auto __i) _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA {
> __r._M_set(__i, __mem[__i]); });
>
>  return __r;
> }
>
>   template 
> static inline _SveMaskWrapper<_Bits, _Np>
> _S_masked_load(_SveMaskWrapper<_Bits, _Np> __merge,
> _SveMaskWrapper<_Bits, _Np> __mask,
>   const bool* __mem) noexcept
> {
>  _SveMaskWrapper<_Bits, _Np> __r;
>
>  __execute_n_times<_Np>([&](auto __i)
> _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA {
>if (__mask[__i])
>  __r._M_set(__i, __mem[__i]);
>else
>  __r._M_set(__i, __merge[__i]);
>  });
>
>  return __r;
> }
>
> If these are loading unpacked booleans, couldn't we just use svld1
> followed by a comparison?  Similarly the stores could use svdup_u8_z
> to load a vector of 1s and 0s and then use svst1 to store it.

Do you mean reinterpret-casting the input pointer (bool*) to (uint8*) and
perform a comparison ?


>   template 
> _GLIBCXX_SIMD_INTRINSIC static bool
> _S_all_of(simd_mask<_Tp, _Abi> __k)
> { return _S_popcount(__k) == simd_size_v<_Tp, _Abi>; }
>
> In principle, this should be better as !svptest_any(..., svnot_z (...,
> __k)),
> since we should then be able to use a single flag-setting predicate
> logic instruction.
>
> Incidentally, __k seems like a bit of an AVX-centric name :)
>
>   template 
> _GLIBCXX_SIMD_INTRINSIC static bool
> _S_any_of(simd_mask<_Tp, _Abi> __k)
> { return _S_popcount(__k) > 0; }
>
>   template 
> _GLIBCXX_SIMD_INTRINSIC static bool
> _S_none_of(simd_mask<_Tp, _Abi> __k)
> { return _S_popcount(__k) == 0; }
>
> These should map directly to svptest_any and !svptest_any respectively.
>
> Got it! I will update with these changes.


>   template 
> _GLIBCXX_SIMD_INTRINSIC static int
> _S_find_first_set(simd_mask<_Tp, _Abi> __k)
> {
>  constexpr size_t _Np = simd_size_v<_Tp, _Abi>;
>
>  auto __first_index =
> __sve_mask_type::__sve_mask_first_true();
>  for (int __idx = 0; __idx < _Np; __idx++)
>{
>  if (__sve_mask_type::__sve_mask_active_count(
>__sve_vector_type<_Tp, _Np>::__sve_active_mask(),
>svand_z(__sve_vector_type<_Tp,
> _Np>::__sve_active_mask(), __k._M_data,
>__first_index)))
>return __idx;
>  __first_index =
> __sve_mask_type::__sve_mask_next_true(
>__sve_vector_type<_Tp,
> _Np>::__sve_active_mask(), __first_index);
>}
>  return -1;
> }
>
>   template 
> _GLIBCXX_SIMD_INTRINSIC static int
> _S_find_last_set(simd_mask<_Tp, _Abi> __k)
> {
>  constexpr size_t _Np = simd_size_v<_Tp, _Abi>;
>
>  int __ret = -1;
>  auto __first_index =
> __sve_mask_type::__sve_mask_first_true();
>  for (int __idx = 0; __idx < _Np; __idx++)
>{
>  if (__sve_mask_type::__sve_mask_active_count(
>__sve_vector_type<_Tp, _Np>::__sve_active_mask(),
>svand_z(__sve_vector_type<_Tp,
> _Np>::__sve_active_mask(), __k._M_data,
>__first_index)))
>__ret = __idx;
>  __first_index =
> __sve_mask_type::__sve_mask_next_true(
>__sve_vector_type<_Tp,
> _Np>::__sve_active_mask(), __first_index);
>}
>  return __ret;
> }
>
> _S_find_last_set should be able to use svclasta and an iota vector.
> _S_find_first_set could do the same with a leading svpfirst.
>
Thanks. This solution for find_last_set should significantly improves the
performance.
Can you please elaborate solution for find_first_set ?
Other efficient solution for find_first_set I have in my mind is to use
svrev_b*  and then perform a find_last_set.

Thank you,
Srinivas Yadav Singanaboina

Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

2024-01-03 Thread Srinivas Yadav

nstexpr _MaskMember<_Tp>
> _S_broadcast(bool __x)
> {
>  constexpr size_t _Np = simd_size_v<_Tp, _Abi>;
>  __sve_bool_type __tr = __sve_vector_type<_Tp,
> _Np>::__sve_active_mask();
>  __sve_bool_type __fl = svnot_z(__tr, __tr);
>
> This can just be svpfalse_b();
>
> Got it! Thanks!


>   template 
> struct _MaskImplSve
> {
> ...
>   template 
> _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
> _S_load(const bool* __mem)
> {
>  _SveMaskWrapper> __r;
>
>  __execute_n_times>(
>[&](auto __i) _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA {
> __r._M_set(__i, __mem[__i]); });
>
>  return __r;
> }
>
>   template 
> static inline _SveMaskWrapper<_Bits, _Np>
> _S_masked_load(_SveMaskWrapper<_Bits, _Np> __merge,
> _SveMaskWrapper<_Bits, _Np> __mask,
>   const bool* __mem) noexcept
> {
>  _SveMaskWrapper<_Bits, _Np> __r;
>
>  __execute_n_times<_Np>([&](auto __i)
> _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA {
>if (__mask[__i])
>  __r._M_set(__i, __mem[__i]);
>else
>  __r._M_set(__i, __merge[__i]);
>  });
>
>  return __r;
> }
>
> If these are loading unpacked booleans, couldn't we just use svld1
> followed by a comparison?  Similarly the stores could use svdup_u8_z
> to load a vector of 1s and 0s and then use svst1 to store it.

Do you mean reinterpret-casting the input pointer (bool*) to (uint8*) and
perform a comparison ?


>   template 
> _GLIBCXX_SIMD_INTRINSIC static bool
> _S_all_of(simd_mask<_Tp, _Abi> __k)
> { return _S_popcount(__k) == simd_size_v<_Tp, _Abi>; }
>
> In principle, this should be better as !svptest_any(..., svnot_z (...,
> __k)),
> since we should then be able to use a single flag-setting predicate
> logic instruction.
>
> Incidentally, __k seems like a bit of an AVX-centric name :)
>
>   template 
> _GLIBCXX_SIMD_INTRINSIC static bool
> _S_any_of(simd_mask<_Tp, _Abi> __k)
> { return _S_popcount(__k) > 0; }
>
>   template 
> _GLIBCXX_SIMD_INTRINSIC static bool
> _S_none_of(simd_mask<_Tp, _Abi> __k)
> { return _S_popcount(__k) == 0; }
>
> These should map directly to svptest_any and !svptest_any respectively.
>
> Got it! I will update with these changes.


>   template 
> _GLIBCXX_SIMD_INTRINSIC static int
> _S_find_first_set(simd_mask<_Tp, _Abi> __k)
> {
>  constexpr size_t _Np = simd_size_v<_Tp, _Abi>;
>
>  auto __first_index =
> __sve_mask_type::__sve_mask_first_true();
>  for (int __idx = 0; __idx < _Np; __idx++)
>{
>  if (__sve_mask_type::__sve_mask_active_count(
>__sve_vector_type<_Tp, _Np>::__sve_active_mask(),
>svand_z(__sve_vector_type<_Tp,
> _Np>::__sve_active_mask(), __k._M_data,
>__first_index)))
>return __idx;
>  __first_index =
> __sve_mask_type::__sve_mask_next_true(
>__sve_vector_type<_Tp,
> _Np>::__sve_active_mask(), __first_index);
>}
>  return -1;
> }
>
>   template 
> _GLIBCXX_SIMD_INTRINSIC static int
> _S_find_last_set(simd_mask<_Tp, _Abi> __k)
> {
>  constexpr size_t _Np = simd_size_v<_Tp, _Abi>;
>
>  int __ret = -1;
>  auto __first_index =
> __sve_mask_type::__sve_mask_first_true();
>  for (int __idx = 0; __idx < _Np; __idx++)
>{
>  if (__sve_mask_type::__sve_mask_active_count(
>__sve_vector_type<_Tp, _Np>::__sve_active_mask(),
>svand_z(__sve_vector_type<_Tp,
> _Np>::__sve_active_mask(), __k._M_data,
>__first_index)))
>__ret = __idx;
>  __first_index =
> __sve_mask_type::__sve_mask_next_true(
>__sve_vector_type<_Tp,
> _Np>::__sve_active_mask(), __first_index);
>}
>  return __ret;
> }
>
> _S_find_last_set should be able to use svclasta and an iota vector.
> _S_find_first_set could do the same with a leading svpfirst.
>
Thanks. This solution for find_last_set should significantly improves the
performance.
Can you please elaborate solution for find_first_set ?
Other efficient solution for find_first_set I have in my mind is to use
svrev_b*  and then perform a find_last_set.

Thank you,
Srinivas Yadav Singanaboina

[sig-policy] APNIC EC Endorses Proposal from APNIC 56

2023-12-12 Thread Srinivas (Sunny) Chendi


Dear colleagues

The APNIC Executive Council endorsed the proposal, prop-155: IPv6 PI Assignment 
for Associate Members, at its meeting on 26-28 November 2023.

https://www.apnic.net/community/policy/proposals/prop-155/  


The EC has also decided to waive the fees on IPv6 PI assignments under this 
policy for a period of 12 months from the date of delegation. After the 12 
month period expires, the resources will become chargeable.

Next steps
--
The Secretariat will begin the implementation process and inform the community 
as soon as it is completed.

Regards,
Sunny

___

Srinivas (Sunny) Chendi (he/him)
Senior Advisor - Policy and Community Development

Asia Pacific Network Information Centre (APNIC) |  Tel: +61 7 3858 3100
PO Box 3646 South Brisbane, QLD 4101 Australia  |  Fax: +61 7 3858 3199
6 Cordelia Street, South Brisbane, QLD  |  http://www.apnic.net
___

NOTICE: This email message is for the sole use of the intended recipient(s)
and may contain confidential and privileged information. Any unauthorized
review, use, disclosure or distribution is prohibited. If you are not the
intended recipient, please contact the sender by reply email and destroy all
copies of the original message.

___
SIG-policy - https://mailman.apnic.net/sig-policy@lists.apnic.net/
To unsubscribe send an email to sig-policy-le...@lists.apnic.net

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]

2023-12-12 Thread Srinivas Vamsi Parasa

On Tue, 12 Dec 2023 15:42:09 GMT, Magnus Ihse Bursie  wrote:

>> Thank you Magnus!
>
> @vamsi-parasa You said:
>> Made sure that OpenJDK builds without errors using both GCC 7.5 and GCC 6.4.
> 
> but now we have https://bugs.openjdk.org/browse/JDK-8321688. Did you 
> introduce any changes after you tested with GCC 7.5? It seems strange to me 
> that the code simultaneously both works and not works with gcc 7.5.

Hi Magnus (@magicus), did a fresh pull of the OpenJDK and was able to build it 
successfully (without any errors) using GCC 7.5.0 on Ubuntu Linux machine. 
(I am on vacation till Jan7th, 2024. Our team will look into this issue)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1424352122

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]

2023-12-12 Thread Srinivas Vamsi Parasa

On Tue, 12 Dec 2023 15:42:09 GMT, Magnus Ihse Bursie  wrote:

>> Thank you Magnus!
>
> @vamsi-parasa You said:
>> Made sure that OpenJDK builds without errors using both GCC 7.5 and GCC 6.4.
> 
> but now we have https://bugs.openjdk.org/browse/JDK-8321688. Did you 
> introduce any changes after you tested with GCC 7.5? It seems strange to me 
> that the code simultaneously both works and not works with gcc 7.5.

Hi Magnus (@magicus), did a fresh pull of the OpenJDK and was able to build it 
successfully (without any errors) using GCC 7.5.0 on Ubuntu Linux machine. 
(I am on vacation till Jan7th, 2024. Our team will look into this issue)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1424352122

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2023-12-10 Thread Srinivas Vamsi Parasa

On Fri, 8 Dec 2023 20:08:22 GMT, Vladimir Yaroslavskiy  wrote:

>> Hi Vladimir (@iaroslavski),
>> 
>> Please see the data below.
>> 
>> Thanks,
>> Vamsi
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark (us/op) | (builder) | (size) | Stock JDK | r_02 | r_03 | r_04 | 
>> r_05 | r_06 | r_07 | r_08 | r_98 | r_99
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> ArraysSort.Int.testSort | RANDOM | 600 | 2.256 | 1.634 | 1.651 | 1.659 | 
>> 1.671 | 1.646 | 1.611 | 1.661 | 1.642 | 1.671
>> ArraysSort.Int.testSort | RANDOM | 9000 | 41.457 | 37.697 | 38.075 | 37.927 
>> | 39.693 | 38.989 | 37.86 | 38.163 | 39.222 | 38.835
>> ArraysSort.Int.testSort | RANDOM | 2 | 98.448 | 91.494 | 89.683 | 87.971 
>> | 90.231 | 90.141 | 90.515 | 90.415 | 89.571 | 90.308
>> ArraysSort.Int.testSort | RANDOM | 40 | 2820.939 | 2816.5 | 2811.334 | 
>> 2833.15 | 2802.958 | 2813.012 | 2815.24 | 2825.526 | 2801.497 | 2816.25
>> ArraysSort.Int.testSort | RANDOM | 300 | 23426.411 | 23661.09 | 23778.15 
>> | 23748.91 | 23802.62 | 23746.3 | 23778.16 | 23631.1 | 23651.78 | 23859.91
>> ArraysSort.Int.testSort | REPEATED | 600 | 1.032 | 0.929 | 0.955 | 0.944 | 
>> 0.927 | 0.928 | 0.953 | 0.918 | 0.934 | 0.93
>> ArraysSort.Int.testSort | REPEATED | 9000 | 5.114 | 5.059 | 4.832 | 5.162 | 
>> 4.965 | 4.973 | 5.518 | 5.003 | 5.435 | 4.971
>> ArraysSort.Int.testSort | REPEATED | 2 | 10.3 | 12.238 | 12.474 | 12.482 
>> | 12.351 | 12.338 | 12.372 | 12.394 | 12.688 | 13.477
>> ArraysSort.Int.testSort | REPEATED | 40 | 210.742 | 261.709 | 264.572 | 
>> 263.203 | 260.822 | 260.475 | 262.03 | 260.356 | 265.976 | 264.273
>> ArraysSort.Int.testSort | REPEATED | 300 | 1948.589 | 2062.235 | 
>> 2079.128 | 2065.445 | 2053.24 | 2076.278 | 2049.799 | 2059.1 | 2073.191 | 
>> 2075.65
>> ArraysSort.Int.testSort | STAGGER | 600 | 2.125 | 2.001 | 2.023 | 2.021 | 
>> 2.001 | 2.018 | 2.011 | 2.017 | 2.005 | 2.011
>> ArraysSort.Int.testSort | STAGGER | 9000 | 29.86 | 26.169 | 26.093 | 25.562 
>> | 26.385 | 26.109 | 26.485 | 26.375 | 26.412 | 25.712
>> ArraysSort.Int.te...
>
> Hello Vamsi (@vamsi-parasa),
> 
> I made the process simpler: added all variants to be compared into ArraysSort 
> class
> (set the same package org.openjdk.bench.java.util). It will run all sorts 
> incl. sort from jdk
> in the same environment. It should provide more accurate results, otherwise 
> we see some anomalies.
> 
> Could you please find time to run the benchmarking?
> Take all classes below and put them in the package 
> org.openjdk.bench.java.util.
> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSort.java
> 
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a10.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r14.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r17.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r18.java
> 
> Many thanks,
> Vladimir

Hi Vladimir (@iaroslavski)

Please see the data below using the latest version of AVX512 sort that got 
integrated into OpenJDK.

http://www.w3.org/TR/REC-html40;>















Benchmark   (us/op) | (builder) | Stock JDK | a10 | r14 | r17 | r18
-- | -- | -- | -- | -- | -- | --
ArraysSort.Int.testSort | RANDOM | 2.202 | 2.226 | 1.535 | 1.556 | 1.546
ArraysSort.Int.testSort | RANDOM | 35.128 | 34.804 | 30.808 | 30.914 | 31.284
ArraysSort.Int.testSort | RANDOM | 78.571 | 77.224 | 72.567 | 73.098 | 73.337
ArraysSort.Int.testSort | RANDOM | 2466.487 | 2470.66 | 2504.654 | 2494.051 | 
2499.746
ArraysSort.Int.testSort | RANDOM | 20704.14 | 20668.19 | 21377.73 | 21362.63 | 
21278.94
ArraysSort.Int.testSort | REPEATED | 0.877 | 0.892 | 0.74 | 0.724 | 0.718
ArraysSort.Int.testSort | REPEATED | 4.789 | 4.788 | 4.92 | 4.721 | 4.891
ArraysSort.Int.testSort | REPEATED | 11.172 | 11.778 | 11.53 | 11.467 | 11.406
ArraysSort.Int.testSort | REPEATED | 207.212 | 207.292 | 255.46 | 258.832 | 
254.44
ArraysSort.Int.testSort | REPEATED | 1862.544 | 1876.759 | 1952.646 | 1957.978 
| 1981.906
ArraysSort.Int.testSort | STAGGER | 2.092 | 2.137 | 1.999 | 2.031 | 2.015
ArraysSort.Int.testSort | STAGGER | 29.891 | 30.321 | 25.626 | 26.318 | 26.396
ArraysSort.Int.testSort | STAGGER | 60.979 | 83.439 | 57.864 | 57.213 | 79.762
ArraysSort.Int.testSort | STAGGER | 1227.933 | 1224.495 | 1236.133 | 1229.773 | 
1228.877
ArraysSort.Int.testSort | STAGGER | 9514.873 | 9565.599 | 9491.509 | 9481.147 | 
9481.905
ArraysSort.Int.testSort | SHUFFLE | 1.608 | 1.595 | 1.419 | 1.442 | 1.491
ArraysSort.Int.testSort | SHUFFLE | 31.566 | 32.789

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2023-12-08 Thread Srinivas Vamsi Parasa

On Fri, 8 Dec 2023 20:08:22 GMT, Vladimir Yaroslavskiy  wrote:

>> Hi Vladimir (@iaroslavski),
>> 
>> Please see the data below.
>> 
>> Thanks,
>> Vamsi
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark (us/op) | (builder) | (size) | Stock JDK | r_02 | r_03 | r_04 | 
>> r_05 | r_06 | r_07 | r_08 | r_98 | r_99
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> ArraysSort.Int.testSort | RANDOM | 600 | 2.256 | 1.634 | 1.651 | 1.659 | 
>> 1.671 | 1.646 | 1.611 | 1.661 | 1.642 | 1.671
>> ArraysSort.Int.testSort | RANDOM | 9000 | 41.457 | 37.697 | 38.075 | 37.927 
>> | 39.693 | 38.989 | 37.86 | 38.163 | 39.222 | 38.835
>> ArraysSort.Int.testSort | RANDOM | 2 | 98.448 | 91.494 | 89.683 | 87.971 
>> | 90.231 | 90.141 | 90.515 | 90.415 | 89.571 | 90.308
>> ArraysSort.Int.testSort | RANDOM | 40 | 2820.939 | 2816.5 | 2811.334 | 
>> 2833.15 | 2802.958 | 2813.012 | 2815.24 | 2825.526 | 2801.497 | 2816.25
>> ArraysSort.Int.testSort | RANDOM | 300 | 23426.411 | 23661.09 | 23778.15 
>> | 23748.91 | 23802.62 | 23746.3 | 23778.16 | 23631.1 | 23651.78 | 23859.91
>> ArraysSort.Int.testSort | REPEATED | 600 | 1.032 | 0.929 | 0.955 | 0.944 | 
>> 0.927 | 0.928 | 0.953 | 0.918 | 0.934 | 0.93
>> ArraysSort.Int.testSort | REPEATED | 9000 | 5.114 | 5.059 | 4.832 | 5.162 | 
>> 4.965 | 4.973 | 5.518 | 5.003 | 5.435 | 4.971
>> ArraysSort.Int.testSort | REPEATED | 2 | 10.3 | 12.238 | 12.474 | 12.482 
>> | 12.351 | 12.338 | 12.372 | 12.394 | 12.688 | 13.477
>> ArraysSort.Int.testSort | REPEATED | 40 | 210.742 | 261.709 | 264.572 | 
>> 263.203 | 260.822 | 260.475 | 262.03 | 260.356 | 265.976 | 264.273
>> ArraysSort.Int.testSort | REPEATED | 300 | 1948.589 | 2062.235 | 
>> 2079.128 | 2065.445 | 2053.24 | 2076.278 | 2049.799 | 2059.1 | 2073.191 | 
>> 2075.65
>> ArraysSort.Int.testSort | STAGGER | 600 | 2.125 | 2.001 | 2.023 | 2.021 | 
>> 2.001 | 2.018 | 2.011 | 2.017 | 2.005 | 2.011
>> ArraysSort.Int.testSort | STAGGER | 9000 | 29.86 | 26.169 | 26.093 | 25.562 
>> | 26.385 | 26.109 | 26.485 | 26.375 | 26.412 | 25.712
>> ArraysSort.Int.te...
>
> Hello Vamsi (@vamsi-parasa),
> 
> I made the process simpler: added all variants to be compared into ArraysSort 
> class
> (set the same package org.openjdk.bench.java.util). It will run all sorts 
> incl. sort from jdk
> in the same environment. It should provide more accurate results, otherwise 
> we see some anomalies.
> 
> Could you please find time to run the benchmarking?
> Take all classes below and put them in the package 
> org.openjdk.bench.java.util.
> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSort.java
> 
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a10.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r14.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r17.java
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r18.java
> 
> Many thanks,
> Vladimir

Sure Vladimir (@iaroslavski),

Will run the tests. 

Also, the baseline stock JDK has changed as a new PR which improves AVX512 sort 
by up to 35% has been integrated. The PR implements AVX2 sort 
(https://github.com/openjdk/jdk/pull/16534) but it also improves the 
performance of AVX512 sort.

Will use the new stock JDK for these measurements and provide the results by 
EOD Sunday (US pacific time).

Thanks,
Vamsi

-

PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1847956297

Integrated: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays)

2023-12-08 Thread Srinivas Vamsi Parasa

On Tue, 7 Nov 2023 00:12:41 GMT, Srinivas Vamsi Parasa  wrote:

> The goal is to develop faster sort routines for x86_64 CPUs by taking 
> advantage of AVX2 instructions. This enhancement provides an order of 
> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
> 
> For serial sort on random data, this PR shows upto ~7.5x improvement for 
> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
> performance data below.
> 
> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
> datatypes (int, float) as shown below.
> 
> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
> 
> 
> 
> 
> 
> 
> 
> 
> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
> Speedup
> -- | -- | -- | -- | --
> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
> 
> 
> 
> 
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/...

This pull request has now been integrated.

Changeset: ce108446
Author:vamsi-parasa 
Committer: Sandhya Viswanathan 
URL:   
https://git.openjdk.org/jdk/commit/ce108446ca1fe604ecc24bbefb0bf1c6318271c7
Stats: 4026 lines in 24 files changed: 2311 ins; 1560 del; 155 mod

8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays)

Reviewed-by: sviswanathan, ihse, jbhateja, kvn

-

PR: https://git.openjdk.org/jdk/pull/16534

Integrated: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays)

2023-12-08 Thread Srinivas Vamsi Parasa

On Tue, 7 Nov 2023 00:12:41 GMT, Srinivas Vamsi Parasa  wrote:

> The goal is to develop faster sort routines for x86_64 CPUs by taking 
> advantage of AVX2 instructions. This enhancement provides an order of 
> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
> 
> For serial sort on random data, this PR shows upto ~7.5x improvement for 
> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
> performance data below.
> 
> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
> datatypes (int, float) as shown below.
> 
> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
> 
> 
> 
> 
> 
> 
> 
> 
> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
> Speedup
> -- | -- | -- | -- | --
> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
> 
> 
> 
> 
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/...

This pull request has now been integrated.

Changeset: ce108446
Author:vamsi-parasa 
Committer: Sandhya Viswanathan 
URL:   
https://git.openjdk.org/jdk/commit/ce108446ca1fe604ecc24bbefb0bf1c6318271c7
Stats: 4026 lines in 24 files changed: 2311 ins; 1560 del; 155 mod

8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays)

Reviewed-by: sviswanathan, ihse, jbhateja, kvn

-

PR: https://git.openjdk.org/jdk/pull/16534

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]

2023-12-08 Thread Srinivas Vamsi Parasa

On Fri, 8 Dec 2023 22:37:26 GMT, Vladimir Kozlov  wrote:

> I pushed closed changes.

Thanks Vladimir!

-

PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1847939767

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]

2023-12-08 Thread Srinivas Vamsi Parasa

On Fri, 8 Dec 2023 22:37:26 GMT, Vladimir Kozlov  wrote:

> I pushed closed changes.

Thanks Vladimir!

-

PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1847939767

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2023-12-07 Thread Srinivas Vamsi Parasa

On Thu, 7 Dec 2023 22:06:14 GMT, Vladimir Yaroslavskiy  wrote:

>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark   (us/op) | Builder | (size) | Stock JDK (+ AVX512 sort) | 
>> DPQS_r01 (+ AVX512 sort) | Speedup
>> -- | -- | -- | -- | -- | --
>> ArraysSort.Int.testSort | RANDOM | 600 | 2.256 | 1.713 | 1.32
>> ArraysSort.Int.testSort | RANDOM | 9000 | 41.457 | 38.316 | 1.08
>> ArraysSort.Int.testSort | RANDOM | 2 | 98.448 | 86.376 | 1.14
>> ArraysSort.Int.testSort | RANDOM | 40 | 2820.939 | 2792.333 | 1.01
>> ArraysSort.Int.testSort | RANDOM | 300 | 23426.411 | 23711.885 | 0.99
>> ArraysSort.Int.testSort | REPEATED | 600 | 1.032 | 0.859 | 1.20
>> ArraysSort.Int.testSort | REPEATED | 9000 | 5.114 | 5.014 | 1.02
>> ArraysSort.Int.testSort | REPEATED | 2 | 10.3 | 9.532 | 1.08
>> ArraysSort.Int.testSort | REPEATED | 40 | 210.742 | 235.281 | 0.90
>> ArraysSort.Int.testSort | REPEATED | 300 | 1948.589 | 1955.258 | 1.00
>> ArraysSort.Int.testSort | STAGGER | 600 | 2.125 | 2.157 | 0.99
>> ArraysSort.Int.testSort | STAGGER | 9000 | 29.86 | 29.931 | 1.00
>> ArraysSort.Int.testSort | STAGGER | 2 | 67.096 | 66.543 | 1.01
>> ArraysSort.Int.testSort | STAGGER | 40 | 1247.53 | 1224.999 | 1.02
>> ArraysSort.Int.testSort | STAGGER | 300 | 9435.404 | 9495.189 | 0.99
>> ArraysSort.Int.testSort | SHUFFLE | 600 | 2.701 | 1.64 | 1.65
>> ArraysSort.Int.testSort | SHUFFLE | 9000 | 38.976 | 34.201 | 1.14
>> ArraysSort.Int.testSort | SHUFFLE | 2 | 96.399 | 79.616 | 1.21
>> ArraysSort.Int.testSort | SHUFFLE | 40 | 2566.338 | 2436.271 | 1.05
>> ArraysSort.Int.testSort | SHUFFLE | 300 | 20835.935 | 20071.12 | 1.04
>> 
>> 
>> 
>> 
>> 
>> 
>
> Hello Vamsi (@vamsi-parasa),
> 
> Did you have a chance to run benchmarking?

Hi Vladimir (@iaroslavski),

Please see the data below.

Thanks,
Vamsi

http://www.w3.org/TR/REC-html40;>















Benchmark (us/op) | (builder) | (size) | Stock JDK | r_02 | r_03 | r_04 | r_05 
| r_06 | r_07 | r_08 | r_98 | r_99
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
ArraysSort.Int.testSort | RANDOM | 600 | 2.256 | 1.634 | 1.651 | 1.659 | 1.671 
| 1.646 | 1.611 | 1.661 | 1.642 | 1.671
ArraysSort.Int.testSort | RANDOM | 9000 | 41.457 | 37.697 | 38.075 | 37.927 | 
39.693 | 38.989 | 37.86 | 38.163 | 39.222 | 38.835
ArraysSort.Int.testSort | RANDOM | 2 | 98.448 | 91.494 | 89.683 | 87.971 | 
90.231 | 90.141 | 90.515 | 90.415 | 89.571 | 90.308
ArraysSort.Int.testSort | RANDOM | 40 | 2820.939 | 2816.5 | 2811.334 | 
2833.15 | 2802.958 | 2813.012 | 2815.24 | 2825.526 | 2801.497 | 2816.25
ArraysSort.Int.testSort | RANDOM | 300 | 23426.411 | 23661.09 | 23778.15 | 
23748.91 | 23802.62 | 23746.3 | 23778.16 | 23631.1 | 23651.78 | 23859.91
ArraysSort.Int.testSort | REPEATED | 600 | 1.032 | 0.929 | 0.955 | 0.944 | 
0.927 | 0.928 | 0.953 | 0.918 | 0.934 | 0.93
ArraysSort.Int.testSort | REPEATED | 9000 | 5.114 | 5.059 | 4.832 | 5.162 | 
4.965 | 4.973 | 5.518 | 5.003 | 5.435 | 4.971
ArraysSort.Int.testSort | REPEATED | 2 | 10.3 | 12.238 | 12.474 | 12.482 | 
12.351 | 12.338 | 12.372 | 12.394 | 12.688 | 13.477
ArraysSort.Int.testSort | REPEATED | 40 | 210.742 | 261.709 | 264.572 | 
263.203 | 260.822 | 260.475 | 262.03 | 260.356 | 265.976 | 264.273
ArraysSort.Int.testSort | REPEATED | 300 | 1948.589 | 2062.235 | 2079.128 | 
2065.445 | 2053.24 | 2076.278 | 2049.799 | 2059.1 | 2073.191 | 2075.65
ArraysSort.Int.testSort | STAGGER | 600 | 2.125 | 2.001 | 2.023 | 2.021 | 2.001 
| 2.018 | 2.011 | 2.017 | 2.005 | 2.011
ArraysSort.Int.testSort | STAGGER | 9000 | 29.86 | 26.169 | 26.093 | 25.562 | 
26.385 | 26.109 | 26.485 | 26.375 | 26.412 | 25.712
ArraysSort.Int.testSort | STAGGER | 2 | 67.096 | 77.157 | 63.636 | 64.479 | 
58.697 | 59.728 | 58.913 | 59.482 | 58.633 | 76.904
ArraysSort.Int.testSort | STAGGER | 40 | 1247.53 | 1271.293 | 1236.158 | 
1240.29 | 1261.469 | 1233.526 | 1153.822 | 1255.238 | 1224.071 | 1235.624
ArraysSort.Int.testSort | STAGGER | 300 | 9435.404 | 9612.98 | 9597.262 | 
9590.393 | 9592.343 | 9616.005 | 9591.057 | 9637.881 | 9596.932 | 9570.482
ArraysSort.Int.testSort | SHUFFLE | 600 | 2.701 | 1.678 | 1.66 | 1.676 | 1.694 
| 1.704 | 1.693 | 1.686 | 1.675 | 1.699
ArraysSort.Int.testSort | SHUFFLE | 9000 | 38.976 | 35.146 | 34.879 | 34.723 | 
35.093 | 35.904 | 35.672 | 35.124 | 34.626 | 35.553
ArraysSort.Int.testSort | SHUFFLE | 2 | 96.399 | 81.651 | 83.113 | 81.186 | 
80.802 | 82.464 | 81.473 | 83.511 | 82.289 | 81.794
ArraysSort.Int.testSort | SHUFFLE | 40 | 2566.338 | 2446.738 | 2424.526 | 
2433.211 | 2459.019 | 2446.518 | 2450.989 | 2447.125 | 2449.441 | 2444.414

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]

2023-12-07 Thread Srinivas Vamsi Parasa

On Fri, 8 Dec 2023 00:31:26 GMT, Vladimir Kozlov  wrote:

> Testing have only one failure in closed tests and I need to fix it before 
> this can be pushed.

Thanks Vladimir for the update. Is the test failure because of this PR?

-

PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1846317507

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]

2023-12-07 Thread Srinivas Vamsi Parasa

On Fri, 8 Dec 2023 00:31:26 GMT, Vladimir Kozlov  wrote:

> Testing have only one failure in closed tests and I need to fix it before 
> this can be pushed.

Thanks Vladimir for the update. Is the test failure because of this PR?

-

PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1846317507

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

2023-12-07 Thread Srinivas Vamsi Parasa

On Thu, 7 Dec 2023 22:06:14 GMT, Vladimir Yaroslavskiy  wrote:

>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark   (us/op) | Builder | (size) | Stock JDK (+ AVX512 sort) | 
>> DPQS_r01 (+ AVX512 sort) | Speedup
>> -- | -- | -- | -- | -- | --
>> ArraysSort.Int.testSort | RANDOM | 600 | 2.256 | 1.713 | 1.32
>> ArraysSort.Int.testSort | RANDOM | 9000 | 41.457 | 38.316 | 1.08
>> ArraysSort.Int.testSort | RANDOM | 2 | 98.448 | 86.376 | 1.14
>> ArraysSort.Int.testSort | RANDOM | 40 | 2820.939 | 2792.333 | 1.01
>> ArraysSort.Int.testSort | RANDOM | 300 | 23426.411 | 23711.885 | 0.99
>> ArraysSort.Int.testSort | REPEATED | 600 | 1.032 | 0.859 | 1.20
>> ArraysSort.Int.testSort | REPEATED | 9000 | 5.114 | 5.014 | 1.02
>> ArraysSort.Int.testSort | REPEATED | 2 | 10.3 | 9.532 | 1.08
>> ArraysSort.Int.testSort | REPEATED | 40 | 210.742 | 235.281 | 0.90
>> ArraysSort.Int.testSort | REPEATED | 300 | 1948.589 | 1955.258 | 1.00
>> ArraysSort.Int.testSort | STAGGER | 600 | 2.125 | 2.157 | 0.99
>> ArraysSort.Int.testSort | STAGGER | 9000 | 29.86 | 29.931 | 1.00
>> ArraysSort.Int.testSort | STAGGER | 2 | 67.096 | 66.543 | 1.01
>> ArraysSort.Int.testSort | STAGGER | 40 | 1247.53 | 1224.999 | 1.02
>> ArraysSort.Int.testSort | STAGGER | 300 | 9435.404 | 9495.189 | 0.99
>> ArraysSort.Int.testSort | SHUFFLE | 600 | 2.701 | 1.64 | 1.65
>> ArraysSort.Int.testSort | SHUFFLE | 9000 | 38.976 | 34.201 | 1.14
>> ArraysSort.Int.testSort | SHUFFLE | 2 | 96.399 | 79.616 | 1.21
>> ArraysSort.Int.testSort | SHUFFLE | 40 | 2566.338 | 2436.271 | 1.05
>> ArraysSort.Int.testSort | SHUFFLE | 300 | 20835.935 | 20071.12 | 1.04
>> 
>> 
>> 
>> 
>> 
>> 
>
> Hello Vamsi (@vamsi-parasa),
> 
> Did you have a chance to run benchmarking?

Hello Vladimir (@iaroslavski),

Will provide the data by EOD Friday (US Pacific time). 
Had to wrap up some important things at work as I'll be going on a winter 
vacation for 4 weeks starting from Monday.
Thanks for understanding!

Thanks,
Vamsi

-

PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1846189152

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]

2023-12-06 Thread Srinivas Vamsi Parasa

On Wed, 6 Dec 2023 23:09:01 GMT, Srinivas Vamsi Parasa  wrote:

>>> LGTM, thanks!
>> 
>> Thanks Jatin!
>
>> @vamsi-parasa, sorry, I was wrong. I missed that you need to check type 
>> `bt`. Latest change is more complicated than it was before. Please revert it 
>> back (undo last change). I will test previous version 09.
> @vnkozlov 
> Vladimir, please see the commit reverted in the updated changes pushed now.

> @vamsi-parasa, please, remind me which tests check that code in 
> `libsmdsort.so` is used?

@vnkozlov 
Please see the tests for simd sort code in 
`test/jdk/java/util/Arrays/Sorting.java`

-

PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1843963054

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]

2023-12-06 Thread Srinivas Vamsi Parasa

On Wed, 6 Dec 2023 23:09:01 GMT, Srinivas Vamsi Parasa  wrote:

>>> LGTM, thanks!
>> 
>> Thanks Jatin!
>
>> @vamsi-parasa, sorry, I was wrong. I missed that you need to check type 
>> `bt`. Latest change is more complicated than it was before. Please revert it 
>> back (undo last change). I will test previous version 09.
> @vnkozlov 
> Vladimir, please see the commit reverted in the updated changes pushed now.

> @vamsi-parasa, please, remind me which tests check that code in 
> `libsmdsort.so` is used?

@vnkozlov 
Please see the tests for simd sort code in 
`test/jdk/java/util/Arrays/Sorting.java`

-

PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1843963054

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]

2023-12-06 Thread Srinivas Vamsi Parasa

On Wed, 6 Dec 2023 17:44:24 GMT, Srinivas Vamsi Parasa  wrote:

>> LGTM, thanks!
>
>> LGTM, thanks!
> 
> Thanks Jatin!

> @vamsi-parasa, sorry, I was wrong. I missed that you need to check type `bt`. 
> Latest change is more complicated than it was before. Please revert it back 
> (undo last change). I will test previous version 09.
@vnkozlov 
Vladimir, please see the commit reverted in the updated changes pushed now.

-

PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1843834085

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]

2023-12-06 Thread Srinivas Vamsi Parasa

> The goal is to develop faster sort routines for x86_64 CPUs by taking 
> advantage of AVX2 instructions. This enhancement provides an order of 
> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
> 
> For serial sort on random data, this PR shows upto ~7.5x improvement for 
> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
> performance data below.
> 
> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
> datatypes (int, float) as shown below.
> 
> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
> 
> 
> 
> 
> 
> 
> 
> 
> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
> Speedup
> -- | -- | -- | -- | --
> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
> 
> 
> 
> 
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/...

Srinivas Vamsi Parasa has updated the pull request incrementally with one 
additional commit since the last revision:

  Revert "Change supported intrinsic check"
  
  This reverts commit 9621eb045c2958582f81ec06b237789a07481ddd.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/16534/files
  - new: https://git.openjdk.org/jdk/pull/16534/files/9621eb04..eadba369

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=16534=11
 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=10-11

  Stats: 28 lines in 4 files changed: 0 ins; 20 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/16534.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534

PR: https://git.openjdk.org/jdk/pull/16534

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]

2023-12-06 Thread Srinivas Vamsi Parasa

On Wed, 6 Dec 2023 17:44:24 GMT, Srinivas Vamsi Parasa  wrote:

>> LGTM, thanks!
>
>> LGTM, thanks!
> 
> Thanks Jatin!

> @vamsi-parasa, sorry, I was wrong. I missed that you need to check type `bt`. 
> Latest change is more complicated than it was before. Please revert it back 
> (undo last change). I will test previous version 09.
@vnkozlov 
Vladimir, please see the commit reverted in the updated changes pushed now.

-

PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1843834085

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]

2023-12-06 Thread Srinivas Vamsi Parasa

> The goal is to develop faster sort routines for x86_64 CPUs by taking 
> advantage of AVX2 instructions. This enhancement provides an order of 
> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
> 
> For serial sort on random data, this PR shows upto ~7.5x improvement for 
> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
> performance data below.
> 
> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
> datatypes (int, float) as shown below.
> 
> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
> 
> 
> 
> 
> 
> 
> 
> 
> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
> Speedup
> -- | -- | -- | -- | --
> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
> 
> 
> 
> 
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/...

Srinivas Vamsi Parasa has updated the pull request incrementally with one 
additional commit since the last revision:

  Revert "Change supported intrinsic check"
  
  This reverts commit 9621eb045c2958582f81ec06b237789a07481ddd.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/16534/files
  - new: https://git.openjdk.org/jdk/pull/16534/files/9621eb04..eadba369

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=16534=11
 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=10-11

  Stats: 28 lines in 4 files changed: 0 ins; 20 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/16534.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534

PR: https://git.openjdk.org/jdk/pull/16534

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v11]

2023-12-06 Thread Srinivas Vamsi Parasa

> The goal is to develop faster sort routines for x86_64 CPUs by taking 
> advantage of AVX2 instructions. This enhancement provides an order of 
> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
> 
> For serial sort on random data, this PR shows upto ~7.5x improvement for 
> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
> performance data below.
> 
> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
> datatypes (int, float) as shown below.
> 
> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
> 
> 
> 
> 
> 
> 
> 
> 
> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
> Speedup
> -- | -- | -- | -- | --
> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
> 
> 
> 
> 
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/...

Srinivas Vamsi Parasa has updated the pull request incrementally with one 
additional commit since the last revision:

  Change supported intrinsic check

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/16534/files
  - new: https://git.openjdk.org/jdk/pull/16534/files/7e124581..9621eb04

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=16534=10
 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=09-10

  Stats: 28 lines in 4 files changed: 20 ins; 0 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/16534.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534

PR: https://git.openjdk.org/jdk/pull/16534

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v10]

2023-12-06 Thread Srinivas Vamsi Parasa

On Wed, 6 Dec 2023 18:41:26 GMT, Vladimir Kozlov  wrote:

>> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   add missing header files
>
> src/hotspot/share/opto/library_call.cpp line 5393:
> 
>> 5391: if (!Matcher::supports_simd_sort(bt)) {
>> 5392:   return false;
>> 5393: }
> 
> This check should be in `C2Compiler::is_intrinsic_supported()`

Hi Vladimir (@vnkozlov), please see the updated changes which use 
`C2Compiler::is_intrinsic_supported(id, bt)`

> src/hotspot/share/opto/library_call.cpp line 5450:
> 
>> 5448:   if (!Matcher::supports_simd_sort(bt)) {
>> 5449: return false;
>> 5450:   }
> 
> Same.

Please see the updated changes which use C2Compiler::is_intrinsic_supported(id, 
bt)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417946689
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417946968

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v10]

2023-12-06 Thread Srinivas Vamsi Parasa

On Wed, 6 Dec 2023 18:41:26 GMT, Vladimir Kozlov  wrote:

>> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   add missing header files
>
> src/hotspot/share/opto/library_call.cpp line 5393:
> 
>> 5391: if (!Matcher::supports_simd_sort(bt)) {
>> 5392:   return false;
>> 5393: }
> 
> This check should be in `C2Compiler::is_intrinsic_supported()`

Hi Vladimir (@vnkozlov), please see the updated changes which use 
`C2Compiler::is_intrinsic_supported(id, bt)`

> src/hotspot/share/opto/library_call.cpp line 5450:
> 
>> 5448:   if (!Matcher::supports_simd_sort(bt)) {
>> 5449: return false;
>> 5450:   }
> 
> Same.

Please see the updated changes which use C2Compiler::is_intrinsic_supported(id, 
bt)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417946689
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417946968

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v11]

2023-12-06 Thread Srinivas Vamsi Parasa

> The goal is to develop faster sort routines for x86_64 CPUs by taking 
> advantage of AVX2 instructions. This enhancement provides an order of 
> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
> 
> For serial sort on random data, this PR shows upto ~7.5x improvement for 
> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
> performance data below.
> 
> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
> datatypes (int, float) as shown below.
> 
> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
> 
> 
> 
> 
> 
> 
> 
> 
> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
> Speedup
> -- | -- | -- | -- | --
> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
> 
> 
> 
> 
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/...

Srinivas Vamsi Parasa has updated the pull request incrementally with one 
additional commit since the last revision:

  Change supported intrinsic check

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/16534/files
  - new: https://git.openjdk.org/jdk/pull/16534/files/7e124581..9621eb04

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=16534=10
 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=09-10

  Stats: 28 lines in 4 files changed: 20 ins; 0 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/16534.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534

PR: https://git.openjdk.org/jdk/pull/16534

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v10]

2023-12-06 Thread Srinivas Vamsi Parasa

> The goal is to develop faster sort routines for x86_64 CPUs by taking 
> advantage of AVX2 instructions. This enhancement provides an order of 
> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
> 
> For serial sort on random data, this PR shows upto ~7.5x improvement for 
> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
> performance data below.
> 
> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
> datatypes (int, float) as shown below.
> 
> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
> 
> 
> 
> 
> 
> 
> 
> 
> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
> Speedup
> -- | -- | -- | -- | --
> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
> 
> 
> 
> 
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/...

Srinivas Vamsi Parasa has updated the pull request incrementally with one 
additional commit since the last revision:

  add missing header files

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/16534/files
  - new: https://git.openjdk.org/jdk/pull/16534/files/c143e0b9..7e124581

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=16534=09
 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=08-09

  Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/16534.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534

PR: https://git.openjdk.org/jdk/pull/16534

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v10]

2023-12-06 Thread Srinivas Vamsi Parasa

> The goal is to develop faster sort routines for x86_64 CPUs by taking 
> advantage of AVX2 instructions. This enhancement provides an order of 
> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
> 
> For serial sort on random data, this PR shows upto ~7.5x improvement for 
> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
> performance data below.
> 
> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
> datatypes (int, float) as shown below.
> 
> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
> 
> 
> 
> 
> 
> 
> 
> 
> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
> Speedup
> -- | -- | -- | -- | --
> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
> 
> 
> 
> 
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/...

Srinivas Vamsi Parasa has updated the pull request incrementally with one 
additional commit since the last revision:

  add missing header files

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/16534/files
  - new: https://git.openjdk.org/jdk/pull/16534/files/c143e0b9..7e124581

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=16534=09
 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=08-09

  Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/16534.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534

PR: https://git.openjdk.org/jdk/pull/16534

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v10]

2023-12-06 Thread Srinivas Vamsi Parasa

On Wed, 6 Dec 2023 17:42:39 GMT, Jatin Bhateja  wrote:

> LGTM, thanks!

Thanks Jatin!

-

PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1843372385

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v10]

2023-12-06 Thread Srinivas Vamsi Parasa

On Wed, 6 Dec 2023 17:42:39 GMT, Jatin Bhateja  wrote:

> LGTM, thanks!

Thanks Jatin!

-

PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1843372385

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v9]

2023-12-06 Thread Srinivas Vamsi Parasa

On Tue, 5 Dec 2023 19:19:23 GMT, Jatin Bhateja  wrote:

>> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   remove unused avx2 64 bit sort functions; add assertions
>
> src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 50:
> 
>> 48: case JVM_T_DOUBLE:
>> 49: avx2_fast_sort((double*)array, from_index, to_index, 
>> INSERTION_SORT_THRESHOLD_64BIT);
>> 50: break;
> 
> Please add safe assertions for missing types.

This is from an older (but outdated) commit. The assertions have been added in 
other cases.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417706670

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v9]

2023-12-06 Thread Srinivas Vamsi Parasa

> The goal is to develop faster sort routines for x86_64 CPUs by taking 
> advantage of AVX2 instructions. This enhancement provides an order of 
> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
> 
> For serial sort on random data, this PR shows upto ~7.5x improvement for 
> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
> performance data below.
> 
> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
> datatypes (int, float) as shown below.
> 
> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
> 
> 
> 
> 
> 
> 
> 
> 
> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
> Speedup
> -- | -- | -- | -- | --
> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
> 
> 
> 
> 
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/...

Srinivas Vamsi Parasa has updated the pull request incrementally with one 
additional commit since the last revision:

  remove unused avx2 64 bit sort functions; add assertions

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/16534/files
  - new: https://git.openjdk.org/jdk/pull/16534/files/bc590d9f..c143e0b9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=16534=08
 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=07-08

  Stats: 128 lines in 4 files changed: 12 ins; 116 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/16534.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534

PR: https://git.openjdk.org/jdk/pull/16534

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-06 Thread Srinivas Vamsi Parasa

On Tue, 5 Dec 2023 19:37:34 GMT, Jatin Bhateja  wrote:

>> Srinivas Vamsi Parasa has updated the pull request with a new target base 
>> due to a merge or a rebase. The incremental webrev excludes the unrelated 
>> changes brought in by the merge/rebase. The pull request contains 17 
>> additional commits since the last revision:
>> 
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - add GCC version guards
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - Remove C++17 from C flags
>>  - add avoid masked stores operation
>>  - update the code to check for supported simd sort cpus
>>  - Disable AVX2 sort for 64-bit types
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - fix jcheck failures due to windows encoding
>>  - fix carriage return and change insertion sort thresholds
>>  - ... and 7 more: https://git.openjdk.org/jdk/compare/d4151e5b...bc590d9f
>
> src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 64:
> 
>> 62: }
>> 63: return lut;
>> 64: }();
> 
> Lut64 is needed for compress64 emulation, can be removed.

Removed in the latest commit...

> src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 234:
> 
>> 232: 
>> 233: vtype::mask_storeu(leftStore, left, temp);
>> 234: }
> 
> Can be removed if not being used.

Removed in the latest commit...

> src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 277:
> 
>> 275: 
>> 276: return _mm_popcnt_u32(shortMask);
>> 277: }
> 
> Can be removed if not being used.

Removed in the latest commit...

> src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 44:
> 
>> 42: break;
>> 43: case JVM_T_FLOAT:
>> 44: avx2_fast_sort((float*)array, from_index, to_index, 
>> INSERTION_SORT_THRESHOLD_32BIT);
> 
> Assertions for unsupported types.

Added in the latest commit...

> src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 56:
> 
>> 54: case JVM_T_FLOAT:
>> 55: avx2_fast_partition((float*)array, from_index, to_index, 
>> pivot_indices, index_pivot1, index_pivot2);
>> 56: break;
> 
> Please add assertion for unsupported types.

Added in the latest commit...

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701182
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417702999
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417702251
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701469
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701705

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v9]

2023-12-06 Thread Srinivas Vamsi Parasa

On Tue, 5 Dec 2023 19:19:23 GMT, Jatin Bhateja  wrote:

>> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   remove unused avx2 64 bit sort functions; add assertions
>
> src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 50:
> 
>> 48: case JVM_T_DOUBLE:
>> 49: avx2_fast_sort((double*)array, from_index, to_index, 
>> INSERTION_SORT_THRESHOLD_64BIT);
>> 50: break;
> 
> Please add safe assertions for missing types.

This is from an older (but outdated) commit. The assertions have been added in 
other cases.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417706670

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-06 Thread Srinivas Vamsi Parasa

On Tue, 5 Dec 2023 19:37:34 GMT, Jatin Bhateja  wrote:

>> Srinivas Vamsi Parasa has updated the pull request with a new target base 
>> due to a merge or a rebase. The incremental webrev excludes the unrelated 
>> changes brought in by the merge/rebase. The pull request contains 17 
>> additional commits since the last revision:
>> 
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - add GCC version guards
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - Remove C++17 from C flags
>>  - add avoid masked stores operation
>>  - update the code to check for supported simd sort cpus
>>  - Disable AVX2 sort for 64-bit types
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - fix jcheck failures due to windows encoding
>>  - fix carriage return and change insertion sort thresholds
>>  - ... and 7 more: https://git.openjdk.org/jdk/compare/d4151e5b...bc590d9f
>
> src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 64:
> 
>> 62: }
>> 63: return lut;
>> 64: }();
> 
> Lut64 is needed for compress64 emulation, can be removed.

Removed in the latest commit...

> src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 234:
> 
>> 232: 
>> 233: vtype::mask_storeu(leftStore, left, temp);
>> 234: }
> 
> Can be removed if not being used.

Removed in the latest commit...

> src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 277:
> 
>> 275: 
>> 276: return _mm_popcnt_u32(shortMask);
>> 277: }
> 
> Can be removed if not being used.

Removed in the latest commit...

> src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 44:
> 
>> 42: break;
>> 43: case JVM_T_FLOAT:
>> 44: avx2_fast_sort((float*)array, from_index, to_index, 
>> INSERTION_SORT_THRESHOLD_32BIT);
> 
> Assertions for unsupported types.

Added in the latest commit...

> src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 56:
> 
>> 54: case JVM_T_FLOAT:
>> 55: avx2_fast_partition((float*)array, from_index, to_index, 
>> pivot_indices, index_pivot1, index_pivot2);
>> 56: break;
> 
> Please add assertion for unsupported types.

Added in the latest commit...

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701182
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417702999
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417702251
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701469
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701705

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v9]

2023-12-06 Thread Srinivas Vamsi Parasa

On Wed, 6 Dec 2023 11:59:19 GMT, Magnus Ihse Bursie  wrote:

>> Hi Magnus (@magicus),
>>  
>>> Are you saying that when compiling with GCC 6, it will just silently ignore 
>>> `-std=c++17`? I'd have assumed that it printed a warning or error about an 
>>> unknown or invalid option, if C++17 is not supported.
>> 
>> The GCC complier for versions 6 (and even 5) silently ignores the flag 
>> `-std=c++17`. It does not print any warning or error. I tested it with a toy 
>> C++ program and also by building OpenJDK using GCC 6. 
>> 
>>> You can't check for if compiler options should be enabled or not inside 
>>> source code files.
>> 
>>  what I meant was, there are #ifdef guards using predefined macros in the 
>> C++ source code to check for GCC version and make the simdsort code 
>> available for compilation or not based on the GCC version
>> 
>> 
>> // src/java.base/linux/native/libsimdsort/simdsort-support.hpp
>> #if defined(_LP64) && (defined(__GNUC__) && ((__GNUC__ > 7) || ((__GNUC__ == 
>> 7) && (__GNUC_MINOR__ >= 5
>> #define __SIMDSORT_SUPPORTED_LINUX
>> #endif
>> 
>> 
>> 
>> //src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp
>> #include "simdsort-support.hpp"
>> #ifdef __SIMDSORT_SUPPORTED_LINUX
>> 
>> #endif
>
> Okay, then I guess I am fine with this.

Thank you Magnus!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417707661

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v9]

2023-12-06 Thread Srinivas Vamsi Parasa

> The goal is to develop faster sort routines for x86_64 CPUs by taking 
> advantage of AVX2 instructions. This enhancement provides an order of 
> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
> 
> For serial sort on random data, this PR shows upto ~7.5x improvement for 
> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
> performance data below.
> 
> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
> datatypes (int, float) as shown below.
> 
> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
> 
> 
> 
> 
> 
> 
> 
> 
> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
> Speedup
> -- | -- | -- | -- | --
> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
> 
> 
> 
> 
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/...

Srinivas Vamsi Parasa has updated the pull request incrementally with one 
additional commit since the last revision:

  remove unused avx2 64 bit sort functions; add assertions

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/16534/files
  - new: https://git.openjdk.org/jdk/pull/16534/files/bc590d9f..c143e0b9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=16534=08
 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=07-08

  Stats: 128 lines in 4 files changed: 12 ins; 116 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/16534.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534

PR: https://git.openjdk.org/jdk/pull/16534

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v9]

2023-12-06 Thread Srinivas Vamsi Parasa

On Wed, 6 Dec 2023 11:59:19 GMT, Magnus Ihse Bursie  wrote:

>> Hi Magnus (@magicus),
>>  
>>> Are you saying that when compiling with GCC 6, it will just silently ignore 
>>> `-std=c++17`? I'd have assumed that it printed a warning or error about an 
>>> unknown or invalid option, if C++17 is not supported.
>> 
>> The GCC complier for versions 6 (and even 5) silently ignores the flag 
>> `-std=c++17`. It does not print any warning or error. I tested it with a toy 
>> C++ program and also by building OpenJDK using GCC 6. 
>> 
>>> You can't check for if compiler options should be enabled or not inside 
>>> source code files.
>> 
>>  what I meant was, there are #ifdef guards using predefined macros in the 
>> C++ source code to check for GCC version and make the simdsort code 
>> available for compilation or not based on the GCC version
>> 
>> 
>> // src/java.base/linux/native/libsimdsort/simdsort-support.hpp
>> #if defined(_LP64) && (defined(__GNUC__) && ((__GNUC__ > 7) || ((__GNUC__ == 
>> 7) && (__GNUC_MINOR__ >= 5
>> #define __SIMDSORT_SUPPORTED_LINUX
>> #endif
>> 
>> 
>> 
>> //src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp
>> #include "simdsort-support.hpp"
>> #ifdef __SIMDSORT_SUPPORTED_LINUX
>> 
>> #endif
>
> Okay, then I guess I am fine with this.

Thank you Magnus!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417707661

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-06 Thread Srinivas Vamsi Parasa

On Tue, 5 Dec 2023 19:33:48 GMT, Jatin Bhateja  wrote:

>> Srinivas Vamsi Parasa has updated the pull request with a new target base 
>> due to a merge or a rebase. The incremental webrev excludes the unrelated 
>> changes brought in by the merge/rebase. The pull request contains 17 
>> additional commits since the last revision:
>> 
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - add GCC version guards
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - Remove C++17 from C flags
>>  - add avoid masked stores operation
>>  - update the code to check for supported simd sort cpus
>>  - Disable AVX2 sort for 64-bit types
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - fix jcheck failures due to windows encoding
>>  - fix carriage return and change insertion sort thresholds
>>  - ... and 7 more: https://git.openjdk.org/jdk/compare/d8b29378...bc590d9f
>
> src/java.base/linux/native/libsimdsort/avx512-32bit-qsort.hpp line 235:
> 
>> 233: return avx512_double_compressstore>(
>> 234: left_addr, right_addr, k, reg);
>> 235: }
> 
> Can be removed.

This is needed for AVX512 sort...

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417690992

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-06 Thread Srinivas Vamsi Parasa

On Tue, 5 Dec 2023 19:33:48 GMT, Jatin Bhateja  wrote:

>> Srinivas Vamsi Parasa has updated the pull request with a new target base 
>> due to a merge or a rebase. The incremental webrev excludes the unrelated 
>> changes brought in by the merge/rebase. The pull request contains 17 
>> additional commits since the last revision:
>> 
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - add GCC version guards
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - Remove C++17 from C flags
>>  - add avoid masked stores operation
>>  - update the code to check for supported simd sort cpus
>>  - Disable AVX2 sort for 64-bit types
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - fix jcheck failures due to windows encoding
>>  - fix carriage return and change insertion sort thresholds
>>  - ... and 7 more: https://git.openjdk.org/jdk/compare/d8b29378...bc590d9f
>
> src/java.base/linux/native/libsimdsort/avx512-32bit-qsort.hpp line 235:
> 
>> 233: return avx512_double_compressstore>(
>> 234: left_addr, right_addr, k, reg);
>> 235: }
> 
> Can be removed.

This is needed for AVX512 sort...

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417690992

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 51311 matches

Mail list logo