[PATCH] drm/i915/dpt: Make DPT object unshrinkable
In some scenarios, the DPT object gets shrunk but the actual framebuffer did not and thus its still there on the DPT's vm->bound_list. Then it tries to rewrite the PTEs via a stale CPU mapping. This causes panic. Credits-to: Ville Syrjala Shawn Lee Cc: sta...@vger.kernel.org Fixes: 0dc987b699ce ("drm/i915/display: Add smem fallback allocation for dpt") Signed-off-by: Vidya Srinivas --- drivers/gpu/drm/i915/gem/i915_gem_object.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 3560a062d287..e6b485fc54d4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -284,7 +284,8 @@ bool i915_gem_object_has_iomem(const struct drm_i915_gem_object *obj); static inline bool i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj) { - return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE); + return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE) && + !obj->is_dpt; } static inline bool -- 2.34.1
RE: [PATCH] drm/i915/dpt: Make DPT object unshrinkable
> -Original Message- > From: Ville Syrjälä > Sent: Monday, May 20, 2024 10:10 PM > To: Srinivas, Vidya > Cc: intel-gfx@lists.freedesktop.org; Syrjala, Ville > ; Lee, > Shawn C ; srini...@freedesktop.org > Subject: Re: [PATCH] drm/i915/dpt: Make DPT object unshrinkable > > On Mon, May 20, 2024 at 08:54:10PM +0530, Srinivas, Vidya wrote: > > In some scenarios, the DPT object gets shrunk but the actual > > framebuffer did not and thus its still there on the DPT's > > vm->bound_list. Then it tries to rewrite the PTEs via a stale CPU > > mapping. This causes panic. > > > > Credits-to: Ville Syrjala > > Shawn Lee > > > > Signed-off-by: Srinivas, Vidya > > The format should be "first_name last_name " Apologies for the mistake. My gitconfig got messed up. > > We also probably want > Cc: sta...@vger.kernel.org > Fixes: 0dc987b699ce ("drm/i915/display: Add smem fallback allocation for > dpt") > Thank you so much. Will float new patch with this added. > Although the patch won't actually build unless we also have commit > 779cb5ba64ec ("drm/i915/dpt: Treat the DPT BO as a framebuffer") but that > hast the same fixes tag, so should be fine even if someone backports things > that far back. > > > --- > > drivers/gpu/drm/i915/gem/i915_gem_object.h | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h > > b/drivers/gpu/drm/i915/gem/i915_gem_object.h > > index 3560a062d287..e6b485fc54d4 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h > > @@ -284,7 +284,8 @@ bool i915_gem_object_has_iomem(const struct > > drm_i915_gem_object *obj); static inline bool > > i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj) > > { > > Maybe toss something like this here: > /* TODO: make DPT shrinkable when it has no bound vmas */ > > DPTs aren't necessarily so small that shrinking them wouldn't have any > benefits. But actually implementing that would require some actual work, so > not suitable for a quick fix. > > I can add all that stuff when applying the patch, no need to resend for this. > > > - return i915_gem_object_type_has(obj, > I915_GEM_OBJECT_IS_SHRINKABLE); > > + return i915_gem_object_type_has(obj, > I915_GEM_OBJECT_IS_SHRINKABLE) && > > + !obj->is_dpt; > > } > > > > static inline bool > > -- > > 2.34.1 > > -- > Ville Syrjälä > Intel
[PATCH] drm/i915/dpt: Make DPT object unshrinkable
In some scenarios, the DPT object gets shrunk but the actual framebuffer did not and thus its still there on the DPT's vm->bound_list. Then it tries to rewrite the PTEs via a stale CPU mapping. This causes panic. Credits-to: Ville Syrjala Shawn Lee Signed-off-by: Srinivas, Vidya --- drivers/gpu/drm/i915/gem/i915_gem_object.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 3560a062d287..e6b485fc54d4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -284,7 +284,8 @@ bool i915_gem_object_has_iomem(const struct drm_i915_gem_object *obj); static inline bool i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj) { - return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE); + return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE) && + !obj->is_dpt; } static inline bool -- 2.34.1
Re: RFR: 8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Wed, 8 May 2024 20:37:28 GMT, Vladimir Yaroslavskiy wrote: >> Hi Vladimir (@iaroslavski), >> >> Please see the data below. >> >> Thanks, >> Vamsi >> >> name | builder | size | mode | count | score >> -- | -- | -- | -- | -- | -- >> b01 | RANDOM | 600 | avg | 325677 | 6.764 >> b01 | RANDOM | 3000 | avg | 52041 | 77.742 >> b01 | RANDOM | 9 | avg | 1217 | 4449.668 >> b01 | RANDOM | 40 | avg | 242 | 22764.05 >> b01 | RANDOM | 100 | avg | 90 | 60737.71 >> b01 | REPEATED | 600 | avg | 651354 | 1.723 >> b01 | REPEATED | 3000 | avg | 104083 | 12.383 >> b01 | REPEATED | 9 | avg | 2435 | 714.451 >> b01 | REPEATED | 40 | avg | 484 | 3039.447 >> b01 | REPEATED | 100 | avg | 180 | 8114.503 >> b01 | SAWTOOTH | 600 | avg | 1954062 | 1.009 >> b01 | SAWTOOTH | 3000 | avg | 312251 | 4.94 >> b01 | SAWTOOTH | 9 | avg | 7305 | 133.192 >> b01 | SAWTOOTH | 40 | avg | 1453 | 591.854 >> b01 | SAWTOOTH | 100 | avg | 542 | 1494.252 >> b01 | STAGGER | 600 | avg | 1954062 | 8.252 >> b01 | STAGGER | 3000 | avg | 312251 | 10.449 >> b01 | STAGGER | 9 | avg | 7305 | 287.811 >> b01 | STAGGER | 40 | avg | 1453 | 1288.92 >> b01 | STAGGER | 100 | avg | 542 | 3245.649 >> b01 | SHUFFLE | 600 | avg | 325677 | 5.199 >> b01 | SHUFFLE | 3000 | avg | 52041 | 29.734 >> b01 | SHUFFLE | 9 | avg | 1217 | 1392.125 >> b01 | SHUFFLE | 40 | avg | 242 | 5772.859 >> b01 | SHUFFLE | 100 | avg | 90 | 15483.65 >> r30 | RANDOM | 600 | avg | 325677 | 4.307 >> r30 | RANDOM | 3000 | avg | 52041 | 71.438 >> r30 | RANDOM | 9 | avg | 1217 | 3971.947 >> r30 | RANDOM | 40 | avg | 242 | 19924.32 >> r30 | RANDOM | 100 | avg | 90 | 53671.9 >> r30 | REPEATED | 600 | avg | 651354 | 1.36 >> r30 | REPEATED | 3000 | avg | 104083 | 6.415 >> r30 | REPEATED | 9 | avg | 2435 | 578.708 >> r30 | REPEATED | 40 | avg | 484 | 2488.414 >> r30 | REPEATED | 100 | avg | 180 | 6280.025 >> r30 | SAWTOOTH | 600 | avg | 1954062 | 0.488 >> r30 | SAWTOOTH | 3000 | avg | 312251 | 2.409 >> r30 | SAWTOOTH | 9 | avg | 7305 | 71.98 >> r30 | SAWTOOTH | 40 | avg | 1453 | 343.433 >> r30 | SAWTOOTH | 100 | avg | 542 | 954.287 >> r30 | STAGGER | 600 | avg | 1954062 | 1.064 >> r30 | STAGGER | 3000 | avg | 312251 | 4.559 >> r30 | STAGGER | 9 | avg | 7305 | 135.383 >> r30 | STAGGER | 40 | avg | 1453 | 626.657 >> r30 | STAGGER | 100 | avg | 542 | 1653.92 >> r30 | SHUFFLE | 600 | avg | 325677 | 2.924 >> r30 | SHUFFLE | 3000 | avg | 52041 | 18.819 >> r30 | SHUFFLE | 9 | avg | 1217 | 1019.036 >> r30 | SHUFFLE | 40 | avg | 242 | 4661.484 >> r30 | SHUFFLE | 100 ... > > Hello Vamsi (@vamsi-parasa), > > Could you please run the new benchmarking to finalize the best version? > What you need is to compile and run JavaBenchmarkHarness: > > javac --patch-module java.base=. -d classes *.java > java -XX:CompileThreshold=1 -XX:-TieredCompilation --patch-module > java.base=classes -cp classes java.util.JavaBenchmarkHarness > > Find the sources there: > > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r31_11.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r31_11a.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r31_12.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r31_12a.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/JavaBenchmarkHarness.java > > Thank you, > Vladimir Hi Vladimir (@iaroslavski), Please see the data below. Thanks, Vamsi name | builder | size | mode | count | score -- | -- | -- | -- | -- | -- b01 | RANDOM | 600 | avg | 325677 | 6.861 b01 | RANDOM | 3000 | avg | 52041 | 77.313 b01 | RANDOM | 9 | avg | 1217 | 4315.41 b01 | RANDOM | 40 | avg | 242 | 22110.95 b01 | RANDOM | 100 | avg | 90 | 58613.45 b01 | REPEATED | 600 | avg | 651354 | 1.993 b01 | REPEATED | 3000 | avg | 104083 | 13.026 b01 | REPEATED | 9 | avg | 2435 | 741.97 b01 | REPEATED | 40 | avg | 484 | 3161.073 b01 | REPEATED | 100 | avg | 180 | 8363.671 b01 | STAGGER | 600 | avg | 1954062 | 9.124 b01 | STAGGER | 3000 | avg | 312251 | 10.026 b01 | STAGGER | 9 | avg | 7305 | 286.313 b01 | STAGGER | 40 | avg | 1453 | 1278.758 b01 | STAGGER | 100 | avg | 542 | 3242.849 b01 | SHUFFLE | 600 | avg | 325677 | 5.113 b01 | SHUFFLE | 3000 | avg | 52041 | 28.85 b01 | SHUFFLE | 9 | avg | 1217 | 1368.91 b01 | SHUFFLE | 40 | avg | 242 | 5718.052 b01 | SHUFFLE | 100 | avg | 90 | 15376.1 r31_11 | RANDOM | 600 | avg | 325677 | 4.305 r31_11 | RANDOM | 3000 | avg | 52041 | 73.399 r31_11 | RANDOM | 9 | avg | 1217 | 3963.515 r31_11 | RANDOM | 40 | avg | 242 | 19841.07 r31_11 | RANDOM | 100 | avg | 90 | 53372.63 r31_11 | REPEATED | 600 | avg | 651354 | 1.208 r31_11 | REPEATED | 3000 | avg | 104083 | 6.206 r31_11 | REPEATED |
RE: [go-nuts] Re: Slice conversion function not working on big endian platform
I am using generics here to work for other types as well, it 's not only for converting int32[] slice to int64[], it is generic function to work for any conversion like BytestoInt64/32/16/8, BytestoFloat32/64 etc. It is failing for other conversion like BytestoInt64 , BytestoFloat64 etc on big-endian machine but working on little endian . From: 'Pokala Srinivas' via golang-nuts Sent: 08 May 2024 15:39 To: golang-nuts ; Brian Candler Subject: [EXTERNAL] Re: [go-nuts] Re: Slice conversion function not working on big endian platform There is typo mistake while entering, below is the code snippest: package main import ( "fmt" "unsafe" ) type slice struct { ptr unsafe. Pointer len int cap int } func Slice[To, From any](data []From) []To { var zf From var ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/AdhS1Rd-!-_FXHRY-Xk9bWls5LqRhn2e2tDNNzW1ctRrPku-hvEZNithWP5nF9U-mxEjHKiEUeUCpmCHdjAG0cGdniF1FpXeKdtwsl0gl5O2lCiyDkjAWi2nPl60i2gDpWV2mZPc3bXuKyo3qwjnWgQ$> Report Suspicious ZjQcmQRYFpfptBannerEnd There is typo mistake while entering, below is the code snippest: package main import ( "fmt" "unsafe" ) type slice struct { ptr unsafe.Pointer len int cap int } func Slice[To, From any](data []From) []To { var zf From var zt To var s = (*slice)(unsafe.Pointer()) s.len = int((uintptr(s.len) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt)) s.cap = int((uintptr(s.cap) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt)) x := *(*[]To)(unsafe.Pointer(s)) return x } func main() { a := make([]uint32, 4, 13) a[0] = 1 a[1] = 0 a[2] = 2 a[3] = 0 // 1 0 2 0 b := Slice[int64](a) //Expecxted : []int64[]{0x 0001, 0x 0002} //Got: []int64{0x0001 , 0x0002 000} if b[0] != 1 { fmt.Printf("wrong value at index 0: want=1 got=0x%x \n", b[0]) } if b[1] != 2 { fmt.Printf("wrong value at index 1: want=2 got=0x%x\n", b[0]) } } Please try this code From: 'Brian Candler' via golang-nuts Sent: 08 May 2024 15:25 To: golang-nuts Subject: [EXTERNAL] [go-nuts] Re: Slice conversion function not working on big endian platform That code doesn't even compile in go 1. 22 or go. 1. 21: https: //go. dev/play/p/mPCBUQizSVo ./prog. go: 20: 14: cannot convert unsafe. Pointer(s) (value of type unsafe. Pointer) to type []To What's the underlying requirement? In the test case it looks ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/AdhS1Rd-!9vFRUvxymM-xP3bTTo4q8qx6b2CrF5gPfr8Me9A3EyV6cVa-gmyw6ZGkFA4NlHRRL7kWecZgG9CzX2hwHwRwt8ufUL5f9ZLu589lFRWws9oA_w$> Report Suspicious ZjQcmQRYFpfptBannerEnd That code doesn't even compile in go 1.22 or go.1.21: https://go.dev/play/p/mPCBUQizSVo<https://go.dev/play/p/mPCBUQizSVo> ./prog.go:20:14: cannot convert unsafe.Pointer(s) (value of type unsafe.Pointer) to type []To What's the underlying requirement? In the test case it looks like you want to take a slice of int32's, in whatever their internal in-memory representation is, and re-represent them as a slice of half as many int64's?? Then of *course* each pair of int32's will become one int64, and the order of the hi/lo halves will depend entirely on the system's internal representation of int64's. It *is* working, in the sense that it's doing exactly what you told it to do. There's a reason why the "unsafe" package is called "unsafe"! It would be straightforward to write a function which takes a slice containing pairs of int32's and assembles them into int64's in a consistent way. What you've not explained is: - why you need to do this with generics (for example, what behaviour would you expect from other types?) - why you need to do this in-place with "unsafe" On Wednesday 8 May 2024 at 10:24:30 UTC+1 Srinivas Pokala wrote: Hello gopher's, I have simple go program which convert slice of one type to slice of other type using go generics for handling all the supported types. Below is the code snippest for this: package main import "fmt" import "unsafe" type slice struct { ptr unsafe.Pointer len int cap int } func Slice[To, From any](data []From) []To { var zf From var zt To var s = (*slice)(unsafe.Pointer()) s.len = int((uintptr(s.len) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt)) s.cap = int((uintptr(s.cap) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt)) x := ([]To)(unsafe.Pointer(s)) return x } func main() { a := make([]uint32, 4, 13)
Re: [go-nuts] Re: Slice conversion function not working on big endian platform
There is typo mistake while entering, below is the code snippest: package main import ( "fmt" "unsafe" ) type slice struct { ptr unsafe.Pointer len int cap int } func Slice[To, From any](data []From) []To { var zf From var zt To var s = (*slice)(unsafe.Pointer()) s.len = int((uintptr(s.len) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt)) s.cap = int((uintptr(s.cap) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt)) x := *(*[]To)(unsafe.Pointer(s)) return x } func main() { a := make([]uint32, 4, 13) a[0] = 1 a[1] = 0 a[2] = 2 a[3] = 0 // 1 0 2 0 b := Slice[int64](a) //Expecxted : []int64[]{0x 0001, 0x 0002} //Got: []int64{0x0001 , 0x0002 000} if b[0] != 1 { fmt.Printf("wrong value at index 0: want=1 got=0x%x \n", b[0]) } if b[1] != 2 { fmt.Printf("wrong value at index 1: want=2 got=0x%x\n", b[0]) } } Please try this code From: 'Brian Candler' via golang-nuts Sent: 08 May 2024 15:25 To: golang-nuts Subject: [EXTERNAL] [go-nuts] Re: Slice conversion function not working on big endian platform That code doesn't even compile in go 1. 22 or go. 1. 21: https: //go. dev/play/p/mPCBUQizSVo ./prog. go: 20: 14: cannot convert unsafe. Pointer(s) (value of type unsafe. Pointer) to type []To What's the underlying requirement? In the test case it looks ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/AdhS1Rd-!9vFRUvxymM-xP3bTTo4q8qx6b2CrF5gPfr8Me9A3EyV6cVa-gmyw6ZGkFA4NlHRRL7kWecZgG9CzX2hwHwRwt8ufUL5f9ZLu589lFRWws9oA_w$> Report Suspicious ZjQcmQRYFpfptBannerEnd That code doesn't even compile in go 1.22 or go.1.21: https://go.dev/play/p/mPCBUQizSVo<https://go.dev/play/p/mPCBUQizSVo> ./prog.go:20:14: cannot convert unsafe.Pointer(s) (value of type unsafe.Pointer) to type []To What's the underlying requirement? In the test case it looks like you want to take a slice of int32's, in whatever their internal in-memory representation is, and re-represent them as a slice of half as many int64's?? Then of *course* each pair of int32's will become one int64, and the order of the hi/lo halves will depend entirely on the system's internal representation of int64's. It *is* working, in the sense that it's doing exactly what you told it to do. There's a reason why the "unsafe" package is called "unsafe"! It would be straightforward to write a function which takes a slice containing pairs of int32's and assembles them into int64's in a consistent way. What you've not explained is: - why you need to do this with generics (for example, what behaviour would you expect from other types?) - why you need to do this in-place with "unsafe" On Wednesday 8 May 2024 at 10:24:30 UTC+1 Srinivas Pokala wrote: Hello gopher's, I have simple go program which convert slice of one type to slice of other type using go generics for handling all the supported types. Below is the code snippest for this: package main import "fmt" import "unsafe" type slice struct { ptr unsafe.Pointer len int cap int } func Slice[To, From any](data []From) []To { var zf From var zt To var s = (*slice)(unsafe.Pointer()) s.len = int((uintptr(s.len) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt)) s.cap = int((uintptr(s.cap) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt)) x := ([]To)(unsafe.Pointer(s)) return x } func main() { a := make([]uint32, 4, 13) a[0] = 1 a[1] = 0 a[2] = 2 a[3] = 0 // 1 0 2 0 b := Slice[int64](a) //Expecxted : []int64[]{0x 0001, 0x 0002} //Got: []int64{0x0001 , 0x0002 000} if b[0] != 1 { fmt.Printf("wrong value at index 0: want=1 got=0x%x \n", b[0]) } if b[1] != 2 { fmt.Printf("wrong value at index 1: want=2 got=0x%x\n", b[0]) } } This is working fine on little endian architectures(amd64,arm64 etc), but when i run on big endian machine(s390x) it is not working , it is resulting wrong data //Expecxted : []int64[]{0x 0001, 0x 0002} //Got: []int64{0x0001 , 0x0002 000} Can somepoint point me how do we write such scenario which should work on both little/endian platforms. Any leads on this? Thanks, Srinivas -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com<mailto:golang-nuts+unsubsc
[go-nuts] Slice conversion function not working on big endian platform
Hello gopher's, I have simple go program which convert slice of one type to slice of other type using go generics for handling all the supported types. Below is the code snippest for this: package main import "fmt" import "unsafe" type slice struct { ptr unsafe.Pointer len int cap int } func Slice[To, From any](data []From) []To { var zf From var zt To var s = (*slice)(unsafe.Pointer()) s.len = int((uintptr(s.len) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt)) s.cap = int((uintptr(s.cap) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt)) x := ([]To)(unsafe.Pointer(s)) return x } func main() { a := make([]uint32, 4, 13) a[0] = 1 a[1] = 0 a[2] = 2 a[3] = 0 // 1 0 2 0 b := Slice[int64](a) //Expecxted : []int64[]{0x 0001, 0x 0002} //Got: []int64{0x0001 , 0x0002 000} if b[0] != 1 { fmt.Printf("wrong value at index 0: want=1 got=0x%x \n", b[0]) } if b[1] != 2 { fmt.Printf("wrong value at index 1: want=2 got=0x%x\n", b[0]) } } This is working fine on little endian architectures(amd64,arm64 etc), but when i run on big endian machine(s390x) it is not working , it is resulting wrong data //Expecxted : []int64[]{0x 0001, 0x 0002} //Got: []int64{0x0001 , 0x0002 000} Can somepoint point me how do we write such scenario which should work on both little/endian platforms. Any leads on this? Thanks, Srinivas -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/4282aff6-0c61-4105-8032-f7c92ff341d1n%40googlegroups.com.
Re: RFR: 8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Tue, 30 Apr 2024 22:01:30 GMT, Vladimir Yaroslavskiy wrote: >> Hi Vladimir (@iaroslavski), >> >> Please see the data below: >> >> Thanks, >> Vamsi >> >> >> >> name | builder | size | mode | count | score >> -- | -- | -- | -- | -- | -- >> b01 | RANDOM | 600 | avg | 325677 | 6.862 >> b01 | RANDOM | 3000 | avg | 52041 | 82.233 >> b01 | RANDOM | 9 | avg | 1217 | 4456.51 >> b01 | RANDOM | 40 | avg | 242 | 22923.28 >> b01 | RANDOM | 100 | avg | 90 | 60598.84 >> b01 | REPEATED | 600 | avg | 651354 | 1.933 >> b01 | REPEATED | 3000 | avg | 104083 | 13.753 >> b01 | REPEATED | 9 | avg | 2435 | 723.039 >> b01 | REPEATED | 40 | avg | 484 | 3084.416 >> b01 | REPEATED | 100 | avg | 180 | 8234.428 >> b01 | STAGGER | 600 | avg | 1954062 | 1.005 >> b01 | STAGGER | 3000 | avg | 312251 | 4.945 >> b01 | STAGGER | 9 | avg | 7305 | 133.126 >> b01 | STAGGER | 40 | avg | 1453 | 592.144 >> b01 | STAGGER | 100 | avg | 542 | 1493.876 >> b01 | SHUFFLE | 600 | avg | 325677 | 5.12 >> b01 | SHUFFLE | 3000 | avg | 52041 | 29.252 >> b01 | SHUFFLE | 9 | avg | 1217 | 1396.664 >> b01 | SHUFFLE | 40 | avg | 242 | 5743.489 >> b01 | SHUFFLE | 100 | avg | 90 | 15490.81 >> b01_ins | RANDOM | 600 | avg | 325677 | 7.594 >> b01_ins | RANDOM | 3000 | avg | 52041 | 78.631 >> b01_ins | RANDOM | 9 | avg | 1217 | 4312.511 >> b01_ins | RANDOM | 40 | avg | 242 | 22108.18 >> b01_ins | RANDOM | 100 | avg | 90 | 58467.16 >> b01_ins | REPEATED | 600 | avg | 651354 | 1.569 >> b01_ins | REPEATED | 3000 | avg | 104083 | 11.313 >> b01_ins | REPEATED | 9 | avg | 2435 | 720.838 >> b01_ins | REPEATED | 40 | avg | 484 | 3003.673 >> b01_ins | REPEATED | 100 | avg | 180 | 8144.944 >> b01_ins | STAGGER | 600 | avg | 1954062 | 0.98 >> b01_ins | STAGGER | 3000 | avg | 312251 | 4.948 >> b01_ins | STAGGER | 9 | avg | 7305 | 132.909 >> b01_ins | STAGGER | 40 | avg | 1453 | 592.572 >> b01_ins | STAGGER | 100 | avg | 542 | 1492.627 >> b01_ins | SHUFFLE | 600 | avg | 325677 | 4.092 >> b01_ins | SHUFFLE | 3000 | avg | 52041 | 27.138 >> b01_ins | SHUFFLE | 9 | avg | 1217 | 1304.326 >> b01_ins | SHUFFLE | 40 | avg | 242 | 5465.745 >> b01_ins | SHUFFLE | 100 | avg | 90 | 14585.08 >> b01_mrg | RANDOM | 600 | avg | 325677 | 7.139 >> b01_mrg | RANDOM | 3000 | avg | 52041 | 81.01 >> b01_mrg | RANDOM | 9 | avg | 1217 | 4266.084 >> b01_mrg | RANDOM | 40 | avg | 242 | 21937.77 >> b01_mrg | RANDOM | 100 | avg | 90 | 58177.72 >> b01_mrg | REPEATED | 600 | avg | 651354 | 1.36 >> b01_mrg | REPEATED | 3000 | avg | 104083 | 9.013 >> b01_mrg | REPEATED ... > > Hello Vamsi (@vamsi-parasa), > > Could you please run the new benchmarking to detect the best case > for Radix sort and parallel sorting? > > What you need is to compile and run JavaBenchmarkHarness: > > javac --patch-module java.base=. -d classes *.java > java -XX:CompileThreshold=1 -XX:-TieredCompilation --patch-module > java.base=classes -cp classes java.util.JavaBenchmarkHarness > > Find the sources there: > > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30_a.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30_5.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30_11.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30_12.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30_13.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30_14.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30_21.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r30_23.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/JavaBenchmarkHarness.java > > Thank you, > Vladimir Hi Vladimir (@iaroslavski), Please see the data below. Thanks, Vamsi name | builder | size | mode | count | score -- | -- | -- | -- | -- | -- b01 | RANDOM | 600 | avg | 325677 | 6.764 b01 | RANDOM | 3000 | avg | 52041 | 77.742 b01 | RANDOM | 9 | avg | 1217 | 4449.668 b01 | RANDOM | 40 | avg | 242 | 22764.05 b01 | RANDOM | 100 | avg | 90 | 60737.71 b01 | REPEATED | 600 | avg | 651354 | 1.723 b01 | REPEATED | 3000 | avg | 104083 | 12.383 b01 | REPEATED | 9 | avg | 2435 | 714.451 b01 | REPEATED | 40 | avg | 484 | 3039.447 b01 | REPEATED | 100 | avg | 180 | 8114.503 b01 | SAWTOOTH | 600 | avg | 1954062 | 1.009 b01 | SAWTOOTH | 3000 | avg | 312251 | 4.94 b01 | SAWTOOTH | 9 | avg | 7305 | 133.192 b01 | SAWTOOTH | 40 | avg | 1453 | 591.854 b01 | SAWTOOTH | 100 | avg | 542 | 1494.252 b01 | STAGGER | 600 | avg | 1954062 | 8.252 b01 | STAGGER | 3000 |
Re: RFR: 8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Tue, 9 Apr 2024 21:36:46 GMT, Vladimir Yaroslavskiy wrote: >>> Hi Vamsi (@vamsi-parasa), few questions on your test environment: >>> >>> * what are the hardware specs of your server ? >>> * bare-metal or virtual ? >>> * are other services or big processes running ? >>> * os tuning ? CPU HT: off? Fixed CPU governor or frequency ? >>> * isolation using taskset ? >>> >>> Maybe C2 JIT (+ CDS archive) are given more performance on stock jdk sort >>> than same code running outside jdk... >>> >>> Thanks, Laurent >> >> Hi Laurent, >> >> The benchmarks are run on Intel TigerLake Core i7 machine. It's bare-metal >> without any virtualization. HT is ON and there is no other specific OS >> tuning or isolation using taskset. >> >> Thanks, >> Vamsi > > Hello Vamsi (@vamsi-parasa), > > Could you please run the new benchmarking? > To save time and don't patch JDK several times, I've created > JavaBenchmarkHarness > class which is under package java.util and compares several versions of DPQS. > Also I prepared several versions of current sorting source from JDK to detect > what is going wrong. > > What you need is to compile and run JavaBenchmarkHarness once: > > javac --patch-module java.base=. -d classes *.java > java -XX:CompileThreshold=1 -XX:-TieredCompilation --patch-module > java.base=classes -cp classes java.util.JavaBenchmarkHarness > > Find the sources there: > > https://github.com/iaroslavski/sorting/blob/master/radixsort/JavaBenchmarkHarness.java > > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01_ins.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01_mrg.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01_piv.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01_prt.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r29p.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r29p5.java > > Thank you, > Vladimir Hi Vladimir (@iaroslavski), Please see the data below: Thanks, Vamsi name | builder | size | mode | count | score -- | -- | -- | -- | -- | -- b01 | RANDOM | 600 | avg | 325677 | 6.862 b01 | RANDOM | 3000 | avg | 52041 | 82.233 b01 | RANDOM | 9 | avg | 1217 | 4456.51 b01 | RANDOM | 40 | avg | 242 | 22923.28 b01 | RANDOM | 100 | avg | 90 | 60598.84 b01 | REPEATED | 600 | avg | 651354 | 1.933 b01 | REPEATED | 3000 | avg | 104083 | 13.753 b01 | REPEATED | 9 | avg | 2435 | 723.039 b01 | REPEATED | 40 | avg | 484 | 3084.416 b01 | REPEATED | 100 | avg | 180 | 8234.428 b01 | STAGGER | 600 | avg | 1954062 | 1.005 b01 | STAGGER | 3000 | avg | 312251 | 4.945 b01 | STAGGER | 9 | avg | 7305 | 133.126 b01 | STAGGER | 40 | avg | 1453 | 592.144 b01 | STAGGER | 100 | avg | 542 | 1493.876 b01 | SHUFFLE | 600 | avg | 325677 | 5.12 b01 | SHUFFLE | 3000 | avg | 52041 | 29.252 b01 | SHUFFLE | 9 | avg | 1217 | 1396.664 b01 | SHUFFLE | 40 | avg | 242 | 5743.489 b01 | SHUFFLE | 100 | avg | 90 | 15490.81 b01_ins | RANDOM | 600 | avg | 325677 | 7.594 b01_ins | RANDOM | 3000 | avg | 52041 | 78.631 b01_ins | RANDOM | 9 | avg | 1217 | 4312.511 b01_ins | RANDOM | 40 | avg | 242 | 22108.18 b01_ins | RANDOM | 100 | avg | 90 | 58467.16 b01_ins | REPEATED | 600 | avg | 651354 | 1.569 b01_ins | REPEATED | 3000 | avg | 104083 | 11.313 b01_ins | REPEATED | 9 | avg | 2435 | 720.838 b01_ins | REPEATED | 40 | avg | 484 | 3003.673 b01_ins | REPEATED | 100 | avg | 180 | 8144.944 b01_ins | STAGGER | 600 | avg | 1954062 | 0.98 b01_ins | STAGGER | 3000 | avg | 312251 | 4.948 b01_ins | STAGGER | 9 | avg | 7305 | 132.909 b01_ins | STAGGER | 40 | avg | 1453 | 592.572 b01_ins | STAGGER | 100 | avg | 542 | 1492.627 b01_ins | SHUFFLE | 600 | avg | 325677 | 4.092 b01_ins | SHUFFLE | 3000 | avg | 52041 | 27.138 b01_ins | SHUFFLE | 9 | avg | 1217 | 1304.326 b01_ins | SHUFFLE | 40 | avg | 242 | 5465.745 b01_ins | SHUFFLE | 100 | avg | 90 | 14585.08 b01_mrg | RANDOM | 600 | avg | 325677 | 7.139 b01_mrg | RANDOM | 3000 | avg | 52041 | 81.01 b01_mrg | RANDOM | 9 | avg | 1217 | 4266.084 b01_mrg | RANDOM | 40 | avg | 242 | 21937.77 b01_mrg | RANDOM | 100 | avg | 90 | 58177.72 b01_mrg | REPEATED | 600 | avg | 651354 | 1.36 b01_mrg | REPEATED | 3000 | avg | 104083 | 9.013 b01_mrg | REPEATED | 9 | avg | 2435 | 737.684 b01_mrg | REPEATED | 40 | avg | 484 | 3152.447 b01_mrg | REPEATED | 100 | avg | 180 | 8366.866 b01_mrg | STAGGER | 600 | avg | 1954062 | 0.73 b01_mrg | STAGGER | 3000 | avg | 312251 | 3.733 b01_mrg | STAGGER | 9 | avg | 7305 | 114.35 b01_mrg | STAGGER | 40 | avg | 1453 | 524.821 b01_mrg | STAGGER | 100 | avg | 542 | 1351.504 b01_mrg | SHUFFLE | 600 | avg | 325677 | 4.986 b01_mrg |
[Kernel-packages] [Bug 2036135] Re: thermald assert failure: *** stack smashing detected ***: terminated
Is this possible to reproduce using thermald built from https://github.com/intel/thermal_daemon? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to thermald in Ubuntu. https://bugs.launchpad.net/bugs/2036135 Title: thermald assert failure: *** stack smashing detected ***: terminated Status in thermald package in Ubuntu: Confirmed Bug description: suddenly occured ProblemType: Crash DistroRelease: Ubuntu 23.10 Package: thermald 2.5.4-2 ProcVersionSignature: Ubuntu 6.5.0-5.5-generic 6.5.0 Uname: Linux 6.5.0-5-generic x86_64 ApportVersion: 2.27.0-0ubuntu2 Architecture: amd64 AssertionMessage: *** stack smashing detected ***: terminated CasperMD5CheckResult: pass Date: Thu Sep 14 12:41:34 2023 ExecutablePath: /usr/sbin/thermald InstallationDate: Installed on 2023-09-14 (1 days ago) InstallationMedia: Ubuntu 23.10 "Mantic Minotaur" - Daily amd64 (20230908.2) ProcCmdline: /usr/sbin/thermald --systemd --dbus-enable --adaptive ProcEnviron: LANG=ja_JP.UTF-8 PATH=(custom, no user) Signal: 6 SourcePackage: thermald StacktraceTop: __libc_message (fmt=fmt@entry=0x7fad897c38d3 "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:150 __GI___fortify_fail (msg=msg@entry=0x7fad897c38eb "stack smashing detected") at ./debug/fortify_fail.c:24 __stack_chk_fail () at ./debug/stack_chk_fail.c:24 cthd_acpi_rel::read_psvt() () ?? () Title: thermald assert failure: *** stack smashing detected ***: terminated UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: N/A modified.conffile..etc.init.thermald.conf: [deleted] mtime.conffile..etc.thermald.thermal-cpu-cdev-order.xml: 2023-08-25T19:29:11 separator: To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/2036135/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Bug 2036135] Re: thermald assert failure: *** stack smashing detected ***: terminated
Is this possible to reproduce using thermald built from https://github.com/intel/thermal_daemon? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2036135 Title: thermald assert failure: *** stack smashing detected ***: terminated To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/2036135/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
RE: [PATCH 22/22] drm/i915: Use debugfs_create_bool() for "i915_bigjoiner_force_enable"
Hello Ville, Thank you very much for the series. 6K detects fine and works. Tested-by: Vidya Srinivas > -Original Message- > From: Intel-gfx On Behalf Of Ville > Syrjala > Sent: Friday, March 29, 2024 6:43 AM > To: intel-gfx@lists.freedesktop.org > Subject: [PATCH 22/22] drm/i915: Use debugfs_create_bool() for > "i915_bigjoiner_force_enable" > > From: Ville Syrjälä > > There is no reason to make this debugfs file for a simple boolean so > complicated. Just use debugfs_create_bool(). > > Signed-off-by: Ville Syrjälä > --- > .../drm/i915/display/intel_display_debugfs.c | 44 +-- > 1 file changed, 2 insertions(+), 42 deletions(-) > > diff --git a/drivers/gpu/drm/i915/display/intel_display_debugfs.c > b/drivers/gpu/drm/i915/display/intel_display_debugfs.c > index b99c024b0934..3e364891dcd0 100644 > --- a/drivers/gpu/drm/i915/display/intel_display_debugfs.c > +++ b/drivers/gpu/drm/i915/display/intel_display_debugfs.c > @@ -1402,20 +1402,6 @@ out: drm_modeset_unlock( > >drm.mode_config.connection_mutex); > return ret; > } > > -static int i915_bigjoiner_enable_show(struct seq_file *m, void *data) -{ > - struct intel_connector *connector = m->private; > - struct drm_crtc *crtc; > - > - crtc = connector->base.state->crtc; > - if (connector->base.status != connector_status_connected || !crtc) > - return -ENODEV; > - > - seq_printf(m, "Bigjoiner enable: %d\n", connector- > >force_bigjoiner_enable); > - > - return 0; > -} > - > static ssize_t i915_dsc_output_format_write(struct file *file, > const char __user *ubuf, > size_t len, loff_t *offp) > @@ -1437,30 +1423,6 @@ static ssize_t i915_dsc_output_format_write(struct > file *file, > return len; > } > > -static ssize_t i915_bigjoiner_enable_write(struct file *file, > -const char __user *ubuf, > -size_t len, loff_t *offp) > -{ > - struct seq_file *m = file->private_data; > - struct intel_connector *connector = m->private; > - struct drm_crtc *crtc; > - bool bigjoiner_en = 0; > - int ret; > - > - crtc = connector->base.state->crtc; > - if (connector->base.status != connector_status_connected || !crtc) > - return -ENODEV; > - > - ret = kstrtobool_from_user(ubuf, len, _en); > - if (ret < 0) > - return ret; > - > - connector->force_bigjoiner_enable = bigjoiner_en; > - *offp += len; > - > - return len; > -} > - > static int i915_dsc_output_format_open(struct inode *inode, > struct file *file) > { > @@ -1554,8 +1516,6 @@ static const struct file_operations > i915_dsc_fractional_bpp_fops = { > .write = i915_dsc_fractional_bpp_write }; > > -DEFINE_SHOW_STORE_ATTRIBUTE(i915_bigjoiner_enable); > - > /* > * Returns the Current CRTC's bpc. > * Example usage: cat /sys/kernel/debug/dri/0/crtc-0/i915_current_bpc > @@ -1640,8 +1600,8 @@ void intel_connector_debugfs_add(struct > intel_connector *connector) > if (DISPLAY_VER(i915) >= 11 && > (connector_type == DRM_MODE_CONNECTOR_DisplayPort || >connector_type == DRM_MODE_CONNECTOR_eDP)) { > - debugfs_create_file("i915_bigjoiner_force_enable", 0644, > root, > - connector, _bigjoiner_enable_fops); > + debugfs_create_bool("i915_bigjoiner_force_enable", 0644, > root, > + >force_bigjoiner_enable); > } > > if (connector_type == DRM_MODE_CONNECTOR_DSI || > -- > 2.43.2
RE: [PATCH 5/6] drm/i915: Handle joined pipes inside hsw_crtc_enable()
Thank you Stan. Rev 14 works. Tested-by: Vidya Srinivas > -Original Message- > From: Lisovskiy, Stanislav > Sent: Wednesday, March 20, 2024 8:45 PM > To: intel-gfx@lists.freedesktop.org > Cc: Lisovskiy, Stanislav ; Saarinen, Jani > ; ville.syrj...@linux.intel.com; Srinivas, Vidya > > Subject: [PATCH 5/6] drm/i915: Handle joined pipes inside hsw_crtc_enable() > > Handle only bigjoiner masters in skl_commit_modeset_enables/disables, > slave crtcs should be handled by master hooks. Same for encoders. > That way we can also remove a bunch of checks like > intel_crtc_is_bigjoiner_slave. > > v2: - Moved skl_pfit_enable, intel_dsc_enable, intel_crtc_vblank_on to > intel_enable_ddi, > so that it is now finally symmetrical with the disable case, because > currently > for some weird reason we are calling those from > skl_commit_modeset_enables, while > for the disable case those are called from the ddi disable hooks. > > v3: - Create intel_ddi_enable_hdmi_or_sst symmetrical to > intel_ddi_post_disable_hdmi_or_sst and move it also under non-mst > check. > > v4: - Fix intel_enable_ddi sequence > - Call intel_crtc_update_active_timings for slave pipes as well > > Signed-off-by: Stanislav Lisovskiy > --- > drivers/gpu/drm/i915/display/intel_ddi.c | 45 - > drivers/gpu/drm/i915/display/intel_display.c | 179 ++- > drivers/gpu/drm/i915/display/intel_display.h | 7 + > 3 files changed, 137 insertions(+), 94 deletions(-) > > diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c > b/drivers/gpu/drm/i915/display/intel_ddi.c > index 290ccab7c9ee8..9128b82a49c31 100644 > --- a/drivers/gpu/drm/i915/display/intel_ddi.c > +++ b/drivers/gpu/drm/i915/display/intel_ddi.c > @@ -3366,15 +3366,28 @@ static void intel_enable_ddi_hdmi(struct > intel_atomic_state *state, > intel_wait_ddi_buf_active(dev_priv, port); } > > -static void intel_enable_ddi(struct intel_atomic_state *state, > - struct intel_encoder *encoder, > - const struct intel_crtc_state *crtc_state, > - const struct drm_connector_state *conn_state) > +static void intel_ddi_enable_hdmi_or_sst(struct intel_atomic_state *state, > + struct intel_encoder *encoder, > + const struct intel_crtc_state > *crtc_state, > + const struct drm_connector_state > *conn_state) > { > - drm_WARN_ON(state->base.dev, crtc_state->has_pch_encoder); > + struct drm_i915_private *i915 = to_i915(encoder->base.dev); > + u8 pipe_mask = intel_crtc_joined_pipe_mask(crtc_state); > + struct intel_crtc *crtc; > + > + for_each_intel_crtc_in_pipe_mask_reverse(>drm, crtc, > pipe_mask) { > + const struct intel_crtc_state *new_crtc_state = > + intel_atomic_get_new_crtc_state(state, crtc); > + > + intel_dsc_enable(new_crtc_state); > + > + if (DISPLAY_VER(i915) >= 9) > + skl_pfit_enable(new_crtc_state); > + else > + ilk_pfit_enable(new_crtc_state); > + } > > - if (!intel_crtc_is_bigjoiner_slave(crtc_state)) > - intel_ddi_enable_transcoder_func(encoder, crtc_state); > + intel_ddi_enable_transcoder_func(encoder, crtc_state); > > /* Enable/Disable DP2.0 SDP split config before transcoder */ > intel_audio_sdp_split_update(crtc_state); > @@ -3383,7 +3396,22 @@ static void intel_enable_ddi(struct > intel_atomic_state *state, > > intel_ddi_wait_for_fec_status(encoder, crtc_state, true); > > - intel_crtc_vblank_on(crtc_state); > + for_each_intel_crtc_in_pipe_mask_reverse(>drm, crtc, > pipe_mask) { > + const struct intel_crtc_state *new_crtc_state = > + intel_atomic_get_new_crtc_state(state, crtc); > + intel_crtc_vblank_on(new_crtc_state); > + } > +} > + > +static void intel_enable_ddi(struct intel_atomic_state *state, > + struct intel_encoder *encoder, > + const struct intel_crtc_state *crtc_state, > + const struct drm_connector_state *conn_state) { > + drm_WARN_ON(state->base.dev, crtc_state->has_pch_encoder); > + > + if (!intel_crtc_has_type(crtc_state, INTEL_OUTPUT_DP_MST)) > + intel_ddi_enable_hdmi_or_sst(state, encoder, crtc_state, > conn_state); > > if (intel_crtc_has_type(crtc_state, INTEL_OUTPUT_HDMI)) > intel_enable_ddi_hdmi(state, encoder, crtc_state, > conn_state); @
Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v13]
On Tue, 19 Mar 2024 22:26:22 GMT, Stuart Marks wrote: >> I think you are overthinking this somewhat Ramki. I don't see a practical >> (non discrete-math) distinction between "some" and "any", so would not >> object to that single word change if it helps. But "potential" should remain >> as it covers branching in the program whereby if we proceed down one branch >> an object remains reachable, whereas if we precede down another then it may >> not. > > I don't think changing "any" to "some" is helpful. I think "any" is ambiguous > regarding meaning universal or existential strength. The sense used here is, > considering the possible future execution paths of a thread, if any of them > accesses the object, that object is reachable. In other words, it means "any > one" and not "all". OK, no worries; will let you decide what makes sense. Thanks! - PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1531559810
Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v13]
On Tue, 19 Mar 2024 16:20:55 GMT, Y. Srinivas Ramakrishna wrote: >> https://docs.oracle.com/javase/specs/jls/se21/html/jls-12.html#jls-12.6.1 >> >>> A reachable object is any object that can be accessed in any potential >>> continuing computation from any live thread. >> >> It may be "loose" because the devil is in the details when it comes to >> reachability, but I disagree that it is "sloppy". This expresses >> reachability in simple terms, as a "first-order" or "Newtonian" model. There >> are of course "Quantum" effects that need to be dealt with in practice. The >> JLS alludes to this with: >>> Optimizing transformations of a program can be designed that reduce the >>> number of objects that are reachable to be less than those which would >>> naively be considered reachable. > > Sorry, my use of words was sloppy here. I think I did mean loose or somewhat > informal and therefore slippery. > > What I was saying is that using terms such as "any continuing computation" > doesn't make sense because this is referring to a current state of the > computation. I'm not sure what "any continuing computation" from a state is > because the concept of what constitutes the notion of "a continuing > computation" has not been defined before. To me it sounds like a computation > tree with nodes as state and transitions as edges and a continuing > computation as a path through that tree into the future. The way it is > written then, it sounds to the naive reader, or to me at least, as if the > object is perpetually reachable by every thread always. I assume I am > misinterpreting the intention of the writing, but it sounds too loose for a > definition being invoked here in the javadoc. May be it can be tightened up a > bit. > > Could one state instead that "An object is reachable at a given state when > some thread is able to access the object through a sequence of steps starting > at that state without other threads taking any steps." ? Or something along > those lines? Or at least something tighter than the current wording that is > somewhat too loose. In fact, it appears as if the problem is with the use of "any", which is universal in strength, whereas the intention here is existential in strength (as suggested by. my wording). Indeed, you might achieve the same effect by replacing "any" with "some" so that: "An object is reachable if it can be accessed in some continuing computation from some live thread." You needn't even say live because dead threads can neither take steps nor continue participating in the computation nor can they "access" objects for whatever informal notion of access. The "some continuing computation" subsumes "potential" (as in a possible future) so potential can be dropped. - PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1530731176
Re: [PATCH] thermal: intel: int340x_thermal: replace deprecated strncpy with strscpy
On Tue, 2024-03-19 at 12:39 +0100, Rafael J. Wysocki wrote: > On Mon, Mar 18, 2024 at 11:36 PM Justin Stitt > wrote: > > > > strncpy() is deprecated for use on NUL-terminated destination > > strings > > [1] and as such we should prefer more robust and less ambiguous > > string > > interfaces. > > > > psvt->limit.string can only be 8 bytes so let's use the appropriate > > size > > macro ACPI_LIMIT_STR_MAX_LEN. > > > > Neither psvt->limit.string or psvt_user[i].limit.string requires > > the > > NUL-padding behavior that strncpy() provides as they have both been > > filled with NUL-bytes prior to the string operation. > > > memset(>limit, 0, sizeof(u64)); > > and > > > psvt_user = kzalloc(psvt_len, GFP_KERNEL); > > > > Let's use `strscpy` [2] due to the fact that it guarantees > > NUL-termination on the destination buffer without unnecessarily > > NUL-padding. > > > > Link: > > https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings > > [1] > > Link: > > https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html > > [2] > > Link: https://github.com/KSPP/linux/issues/90 > > Cc: linux-hardening@vger.kernel.org > > Signed-off-by: Justin Stitt > > Srinivas, any objections? No Reviewed-by: Srinivas Pandruvada > > > --- > > Note: build-tested only. > > > > Found with: $ rg "strncpy\(" > > --- > > drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git > > a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c > > b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c > > index dc519a665c18..4b4a4d63e61f 100644 > > --- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c > > +++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c > > @@ -309,7 +309,7 @@ static int acpi_parse_psvt(acpi_handle handle, > > int *psvt_count, struct psvt **ps > > > > if (knob->type == ACPI_TYPE_STRING) { > > memset(>limit, 0, sizeof(u64)); > > - strncpy(psvt->limit.string, psvt_ptr- > > >limit.str_ptr, knob->string.length); > > + strscpy(psvt->limit.string, psvt_ptr- > > >limit.str_ptr, ACPI_LIMIT_STR_MAX_LEN); > > } else { > > psvt->limit.integer = psvt_ptr- > > >limit.integer; > > } > > @@ -468,7 +468,7 @@ static int fill_psvt(char __user *ubuf) > > psvt_user[i].unlimit_coeff = > > psvts[i].unlimit_coeff; > > psvt_user[i].control_knob_type = > > psvts[i].control_knob_type; > > if (psvt_user[i].control_knob_type == > > ACPI_TYPE_STRING) > > - strncpy(psvt_user[i].limit.string, > > psvts[i].limit.string, > > + strscpy(psvt_user[i].limit.string, > > psvts[i].limit.string, > > ACPI_LIMIT_STR_MAX_LEN); > > else > > psvt_user[i].limit.integer = > > psvts[i].limit.integer; > > > > --- > > base-commit: bf3a69c6861ff4dc7892d895c87074af7bc1c400 > > change-id: 20240318-strncpy-drivers-thermal-intel-int340x_thermal- > > acpi_thermal_rel-c-17070c1e42f3 > > > > Best regards, > > -- > > Justin Stitt > >
Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v13]
On Tue, 19 Mar 2024 02:53:37 GMT, David Holmes wrote: >> src/java.base/share/classes/java/lang/ref/package-info.java line 137: >> >>> 135: * >>> 136: * A reachable object is any object that can be accessed in >>> any potential >>> 137: * continuing computation from any live thread (as stated in {@jls >>> 12.6.1}). >> >> This seems like somewhat loose and sloppy wording to me. "Any potential >> continuing computation"? "Any live thread"? Could you share a pointer to JLS >> 12.6.1 being referenced here? > > https://docs.oracle.com/javase/specs/jls/se21/html/jls-12.html#jls-12.6.1 > >> A reachable object is any object that can be accessed in any potential >> continuing computation from any live thread. > > It may be "loose" because the devil is in the details when it comes to > reachability, but I disagree that it is "sloppy". This expresses reachability > in simple terms, as a "first-order" or "Newtonian" model. There are of course > "Quantum" effects that need to be dealt with in practice. The JLS alludes to > this with: >> Optimizing transformations of a program can be designed that reduce the >> number of objects that are reachable to be less than those which would >> naively be considered reachable. Sorry, my use of words was sloppy here. I think I did mean loose or somewhat informal and therefore slippery. What I was saying is that using terms such as "any continuing computation" doesn't make sense because this is referring to a current state of the computation. I'm not sure what "any continuing computation" from a state is because the concept of what constitutes the notion of "a continuing computation" has not been defined before. To me it sounds like a computation tree with nodes as state and transitions as edges and a continuing computation as a path through that tree into the future. The way it is written then, it sounds to the naive reader, or to me at least, as if the object is perpetually reachable by every thread always. I assume I am misinterpreting the intention of the writing, but it sounds too loose for a definition being invoked here in the javadoc. May be it can be tightened up a bit. Could one state instead that "An object is reachable at a given state when some thread is able to access the object through a sequence of steps starting at that state without other threads taking any steps." ? Or something along those lines? Or at least something tighter than the current wording that is somewhat too loose. - PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1530705355
Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v13]
On Thu, 14 Mar 2024 23:23:07 GMT, Brent Christian wrote: >> Classes in the `java.lang.ref` package would benefit from an update to bring >> the spec in line with how the VM already behaves. The changes would focus on >> _happens-before_ edges at some key points during reference processing. >> >> A couple key things we want to be able to say are: >> - `Reference.reachabilityFence(x)` _happens-before_ reference processing >> occurs for 'x'. >> - `Cleaner.register()` _happens-before_ the Cleaner thread runs the >> registered cleaning action. >> >> This will bring Cleaner in line (or close) with the memory visibility >> guarantees made for finalizers in [JLS >> 17.4.5](https://docs.oracle.com/javase/specs/jls/se18/html/jls-17.html#jls-17.4.5): >> _"There is a happens-before edge from the end of a constructor of an object >> to the start of a finalizer (§12.6) for that object."_ > > Brent Christian has updated the pull request incrementally with one > additional commit since the last revision: > > further tweaks to reachability src/java.base/share/classes/java/lang/ref/package-info.java line 137: > 135: * > 136: * A reachable object is any object that can be accessed in any > potential > 137: * continuing computation from any live thread (as stated in {@jls > 12.6.1}). This seems like somewhat loose and sloppy wording to me. "Any potential continuing computation"? "Any live thread"? Could you share a pointer to JLS 12.6.1 being referenced here? - PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1529523835
[jira] [Updated] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.
[ https://issues.apache.org/jira/browse/KAFKA-16370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-16370: - Issue Type: Bug (was: Improvement) > offline rollback procedure from kraft mode to zookeeper mode. > - > > Key: KAFKA-16370 > URL: https://issues.apache.org/jira/browse/KAFKA-16370 > Project: Kafka > Issue Type: Bug > Reporter: kaushik srinivas >Priority: Major > > From the KIP, > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,] > > h2. Finalizing the Migration > Once the cluster has been fully upgraded to KRaft mode, the controller will > still be running in migration mode and making dual writes to KRaft and ZK. > Since the data in ZK is still consistent with that of the KRaft metadata log, > it is still possible to revert back to ZK. > *_The time that the cluster is running all KRaft brokers/controllers, but > still running in migration mode, is effectively unbounded._* > Once the operator has decided to commit to KRaft mode, the final step is to > restart the controller quorum and take it out of migration mode by setting > _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The > active controller will only finalize the migration once it detects that all > members of the quorum have signaled that they are finalizing the migration > (again, using the tagged field in ApiVersionsResponse). Once the controller > leaves migration mode, it will write a ZkMigrationStateRecord to the log and > no longer perform writes to ZK. It will also disable its special handling of > ZK RPCs. > *At this point, the cluster is fully migrated and is running in KRaft mode. A > rollback to ZK is still possible after finalizing the migration, but it must > be done offline and it will cause metadata loss (which can also cause > partition data loss).* > > Trying out the same in a kafka cluster which is migrated from zookeeper into > kraft mode. We observe the rollback is possible by deleting the "/controller" > node in the zookeeper before the rollback from kraft mode to zookeeper is > done. > The above snippet indicates that the rollback from kraft to zk after > migration is finalized is still possible in offline method. Is there any > already known steps to be done as part of this offline method of rollback ? > From our experience, we currently know of the step "deletion of /controller > node in zookeeper to force zookeper based brokers to be elected as new > controller after the rollback is done". Are there any additional > steps/actions apart from this ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.
[ https://issues.apache.org/jira/browse/KAFKA-16370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-16370: - Issue Type: Wish (was: Improvement) > offline rollback procedure from kraft mode to zookeeper mode. > - > > Key: KAFKA-16370 > URL: https://issues.apache.org/jira/browse/KAFKA-16370 > Project: Kafka > Issue Type: Wish > Reporter: kaushik srinivas >Priority: Major > > From the KIP, > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,] > > h2. Finalizing the Migration > Once the cluster has been fully upgraded to KRaft mode, the controller will > still be running in migration mode and making dual writes to KRaft and ZK. > Since the data in ZK is still consistent with that of the KRaft metadata log, > it is still possible to revert back to ZK. > *_The time that the cluster is running all KRaft brokers/controllers, but > still running in migration mode, is effectively unbounded._* > Once the operator has decided to commit to KRaft mode, the final step is to > restart the controller quorum and take it out of migration mode by setting > _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The > active controller will only finalize the migration once it detects that all > members of the quorum have signaled that they are finalizing the migration > (again, using the tagged field in ApiVersionsResponse). Once the controller > leaves migration mode, it will write a ZkMigrationStateRecord to the log and > no longer perform writes to ZK. It will also disable its special handling of > ZK RPCs. > *At this point, the cluster is fully migrated and is running in KRaft mode. A > rollback to ZK is still possible after finalizing the migration, but it must > be done offline and it will cause metadata loss (which can also cause > partition data loss).* > > Trying out the same in a kafka cluster which is migrated from zookeeper into > kraft mode. We observe the rollback is possible by deleting the "/controller" > node in the zookeeper before the rollback from kraft mode to zookeeper is > done. > The above snippet indicates that the rollback from kraft to zk after > migration is finalized is still possible in offline method. Is there any > already known steps to be done as part of this offline method of rollback ? > From our experience, we currently know of the step "deletion of /controller > node in zookeeper to force zookeper based brokers to be elected as new > controller after the rollback is done". Are there any additional > steps/actions apart from this ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.
[ https://issues.apache.org/jira/browse/KAFKA-16370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaushik srinivas updated KAFKA-16370: - Issue Type: Improvement (was: Wish) > offline rollback procedure from kraft mode to zookeeper mode. > - > > Key: KAFKA-16370 > URL: https://issues.apache.org/jira/browse/KAFKA-16370 > Project: Kafka > Issue Type: Improvement > Reporter: kaushik srinivas >Priority: Major > > From the KIP, > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,] > > h2. Finalizing the Migration > Once the cluster has been fully upgraded to KRaft mode, the controller will > still be running in migration mode and making dual writes to KRaft and ZK. > Since the data in ZK is still consistent with that of the KRaft metadata log, > it is still possible to revert back to ZK. > *_The time that the cluster is running all KRaft brokers/controllers, but > still running in migration mode, is effectively unbounded._* > Once the operator has decided to commit to KRaft mode, the final step is to > restart the controller quorum and take it out of migration mode by setting > _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The > active controller will only finalize the migration once it detects that all > members of the quorum have signaled that they are finalizing the migration > (again, using the tagged field in ApiVersionsResponse). Once the controller > leaves migration mode, it will write a ZkMigrationStateRecord to the log and > no longer perform writes to ZK. It will also disable its special handling of > ZK RPCs. > *At this point, the cluster is fully migrated and is running in KRaft mode. A > rollback to ZK is still possible after finalizing the migration, but it must > be done offline and it will cause metadata loss (which can also cause > partition data loss).* > > Trying out the same in a kafka cluster which is migrated from zookeeper into > kraft mode. We observe the rollback is possible by deleting the "/controller" > node in the zookeeper before the rollback from kraft mode to zookeeper is > done. > The above snippet indicates that the rollback from kraft to zk after > migration is finalized is still possible in offline method. Is there any > already known steps to be done as part of this offline method of rollback ? > From our experience, we currently know of the step "deletion of /controller > node in zookeeper to force zookeper based brokers to be elected as new > controller after the rollback is done". Are there any additional > steps/actions apart from this ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.
kaushik srinivas created KAFKA-16370: Summary: offline rollback procedure from kraft mode to zookeeper mode. Key: KAFKA-16370 URL: https://issues.apache.org/jira/browse/KAFKA-16370 Project: Kafka Issue Type: Improvement Reporter: kaushik srinivas >From the KIP, >[https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,] h2. Finalizing the Migration Once the cluster has been fully upgraded to KRaft mode, the controller will still be running in migration mode and making dual writes to KRaft and ZK. Since the data in ZK is still consistent with that of the KRaft metadata log, it is still possible to revert back to ZK. *_The time that the cluster is running all KRaft brokers/controllers, but still running in migration mode, is effectively unbounded._* Once the operator has decided to commit to KRaft mode, the final step is to restart the controller quorum and take it out of migration mode by setting _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The active controller will only finalize the migration once it detects that all members of the quorum have signaled that they are finalizing the migration (again, using the tagged field in ApiVersionsResponse). Once the controller leaves migration mode, it will write a ZkMigrationStateRecord to the log and no longer perform writes to ZK. It will also disable its special handling of ZK RPCs. *At this point, the cluster is fully migrated and is running in KRaft mode. A rollback to ZK is still possible after finalizing the migration, but it must be done offline and it will cause metadata loss (which can also cause partition data loss).* Trying out the same in a kafka cluster which is migrated from zookeeper into kraft mode. We observe the rollback is possible by deleting the "/controller" node in the zookeeper before the rollback from kraft mode to zookeeper is done. The above snippet indicates that the rollback from kraft to zk after migration is finalized is still possible in offline method. Is there any already known steps to be done as part of this offline method of rollback ? >From our experience, we currently know of the step "deletion of /controller >node in zookeeper to force zookeper based brokers to be elected as new >controller after the rollback is done". Are there any additional steps/actions >apart from this ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.
kaushik srinivas created KAFKA-16370: Summary: offline rollback procedure from kraft mode to zookeeper mode. Key: KAFKA-16370 URL: https://issues.apache.org/jira/browse/KAFKA-16370 Project: Kafka Issue Type: Improvement Reporter: kaushik srinivas >From the KIP, >[https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,] h2. Finalizing the Migration Once the cluster has been fully upgraded to KRaft mode, the controller will still be running in migration mode and making dual writes to KRaft and ZK. Since the data in ZK is still consistent with that of the KRaft metadata log, it is still possible to revert back to ZK. *_The time that the cluster is running all KRaft brokers/controllers, but still running in migration mode, is effectively unbounded._* Once the operator has decided to commit to KRaft mode, the final step is to restart the controller quorum and take it out of migration mode by setting _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The active controller will only finalize the migration once it detects that all members of the quorum have signaled that they are finalizing the migration (again, using the tagged field in ApiVersionsResponse). Once the controller leaves migration mode, it will write a ZkMigrationStateRecord to the log and no longer perform writes to ZK. It will also disable its special handling of ZK RPCs. *At this point, the cluster is fully migrated and is running in KRaft mode. A rollback to ZK is still possible after finalizing the migration, but it must be done offline and it will cause metadata loss (which can also cause partition data loss).* Trying out the same in a kafka cluster which is migrated from zookeeper into kraft mode. We observe the rollback is possible by deleting the "/controller" node in the zookeeper before the rollback from kraft mode to zookeeper is done. The above snippet indicates that the rollback from kraft to zk after migration is finalized is still possible in offline method. Is there any already known steps to be done as part of this offline method of rollback ? >From our experience, we currently know of the step "deletion of /controller >node in zookeeper to force zookeper based brokers to be elected as new >controller after the rollback is done". Are there any additional steps/actions >apart from this ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Mon, 11 Mar 2024 19:29:59 GMT, Srinivas Vamsi Parasa wrote: >> Hello Vamsi (@vamsi-parasa), >> >> Could you please run benchmarking of 4 cases with **updated** test class >> **ArraysSortNew2**? >> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSortNew2.java >> >> Put each DPQS class in java.util package and recompiling the JDK for each >> case as you >> did before, and run new class **ArraysSortNew2**. >> >> Find the sources there: >> >> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSortNew2.java >> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01.java >> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r27b.java >> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r27p.java >> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r27s.java >> >> Thank you, >> Vladimir > > Hi Vladimir (@iaroslavski), > > Please see the data below. > > Thanks, > Vamsi > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> > > > > > > > > > Builder | Size | Stock JDK | b01 | r27b | r27p | r27s > -- | -- | -- | -- | -- | -- | -- > RANDOM | 600 | 1.615 | 1.59 | 2.316 | 1.805 | 1.77 > RANDOM | 2000 | 6.794 | 6.638 | 8.443 | 6.354 | 6.295 > RANDOM | 9 | 296.877 | 304.15 | 337.625 | 341.999 | 307.099 > RANDOM | 40 | 838.061 | 801.108 | 1136.688 | 1161.181 | 781.487 > RANDOM | 300 | 5468.214 | 5452.125 | 8522.698 | 8476.445 | 5368.777 > PERIOD | 600 | 0.877 | 0.875 | 0.663 | 0.663 | 0.685 > PERIOD | 2000 | 1.57 | 1.548 | 1.458 | 1.451 | 1.487 > PERIOD | 9 | 97.208 | 97.677 | 106.01 | 106.516 | 106.629 > PERIOD | 40 | 237.4 | 264.103 | 235.466 | 231.349 | 231.235 > PERIOD | 300 | 2604.56 | 2829.935 | 4867.668 | 4872.361 | 4888.391 > STAGGER | 600 | 1.052 | 1.064 | 0.774 | 0.78 | 0.791 > STAGGER | 2000 | 3.449 | 3.443 | 2.604 | 2.627 | 2.597 > STAGGER | 9 | 102.331 | 103.464 | 73.582 | 73.532 | 75.85 > STAGGER | 40 | 210.829 | 229.37 | 207.356 | 208.565 | 205.141 > STAGGER | 300 | 2205.565 | 2174.588 | 2086.885 | 2070.132 | 2373.443 > SHUFFLE | 600 | 1.885 | 1.892 | 1.934 | 1.36 | 1.386 > SHUFFLE | 2000 | 6.787 | 6.724 | 7.338 | 4.994 | 4.96 > SHUFFLE | 9 | 158.065 | 154.48 | 152.874 | 148.337 | 140.703 > SHUFFLE | 40 | 415.089 | 424.777 | 676.272 | 676.89 | 410.717 > SHUFFLE | 300 | 3999.006 | 4017.496 | 6861.872 | 6894.785 | 3880.883 > RANDOM | 600 | 1.614 | 1.588 | 2.329 | 1.789 | 1.847 > RANDOM | 2000 | 6.756 | 6.634 | 7.757 | 6.224 | 6.23 > RANDOM | 9 | 516.671 | 512.52 | 623.995 | 488.492 | 482.646 > RANDOM | 40 | 2400.818 | 2399.264 | 2903.654 | 2356.675 | 2358.409 > RANDOM | 300 | 20933.23 | 20822.49 | 24428.27 | 20847.57 | 20868.68 > PERIOD | 600 | 0.864 | 0.871 | 0.681 | 0.665 | 0.664 > PERIOD | 2000 | 1.583 | 1.547 | 1.451 | 1.46 | 1.483 > PERIOD | 9 | 63.436 | 63.148 | 63.617 | 64.391 | 65.865 > PERIOD | 40 | 209.807 | 209.234 | 228.7 | 232.854 | 235.667 > PERIOD | 3000... > Hi Vamsi (@vamsi-parasa), few questions on your test environment: > > * what are the hardware specs of your server ? > * bare-metal or virtual ? > * are other services or big processes running ? > * os tuning ? CPU HT: off? Fixed CPU governor or frequency ? > * isolation using taskset ? > > Maybe C2 JIT (+ CDS archive) are given more performance on stock jdk sort > than same code running outside jdk... > > Thanks, Laurent Hi Laurent, The benchmarks are run on Intel TigerLake Core i7 machine. It's bare-metal without any virtualization. HT is ON and there is no other specific OS tuning or isolation using taskset. Thanks, Vamsi - PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1989274286
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Tue, 27 Feb 2024 20:54:03 GMT, Vladimir Yaroslavskiy wrote: >> Hello Vladimir (@iaroslavski), >> >> Please see the data below. Each DPQS class was copied to java.util and the >> JDK was recompiled. >> >> Thanks, >> Vamsi >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> >> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> >> >> >> >> >> >> >> >> >> Benchmark | (builder) | (size) | Stock JDK | r20p | r20s | r25p | r25s >> -- | -- | -- | -- | -- | -- | -- | -- >> ArraysSort.Int.p_sort | RANDOM | 600 | 1.618 | 2.601 | 2.966 | 2.898 | 3.269 >> ArraysSort.Int.p_sort | RANDOM | 2000 | 7.433 | 8.438 | 8.463 | 8.414 | 8.65 >> ArraysSort.Int.p_sort | RANDOM | 9 | 258.853 | 355.261 | 326.378 | >> 347.65 | 321.894 >> ArraysSort.Int.p_sort | RANDOM | 40 | 842.085 | 1225.929 | 899.852 | >> 1278.681 | 932.627 >> ArraysSort.Int.p_sort | RANDOM | 300 | 5723.659 | 8711.108 | 6086.974 | >> 8948.101 | 6122.612 >> ArraysSort.Int.p_sort | REPEATED | 600 | 0.52 | 0.585 | 0.629 | 0.586 | 0.579 >> ArraysSort.Int.p_sort | REPEATED | 2000 | 1.18 | 1.225 | 1.21 | 1.225 | 1.238 >> ArraysSort.Int.p_sort | REPEATED | 9 | 102.142 | 85.79 | 86.131 | 87.954 >> | 86.036 >> ArraysSort.Int.p_sort | REPEATED | 40 | 244.508 | 229.142 | 227.613 | >> 228.608 | 228.367 >> ArraysSort.Int.p_sort | REPEATED | 300 | 2752.745 | 2584.103 | 2544.192 >> | 2576.803 | 2609.833 >> ArraysSort.Int.p_sort | STAGGER | 600 | 1.146 | 0.894 | 0.898 | 0.904 | 0.912 >> ArraysSort.Int.p_sort | STAGGER | 2000 | 3.712 | 3.096 | 3.121 | 3.03 | 3.049 >> ArraysSort.Int.p_sort | STAGGER | 9 | 72.763 | 77.575 | 78.366 | 79.158 >> | 77.199 >> ArraysSort.Int.p_sort | STAGGER | 40 | 212.455 | 228.331 | 225.888 | >> 224.686 | 225.728 >> ArraysSort.Int.p_sort | STAGGER | 300 | 2290.327 | 2216.741 | 2196.138 | >> 2236.658 | 2262.472 >> ArraysSort.Int.p_sort | SHUFFLE | 600 | 2.01 | 2.92 | 2.907 | 2.91 | 2.926 >> ArraysSort.Int.p_sort | SHUFFLE | 2000 | 7.06 | 7.759 | 7.776 | 7.688 | 8.062 >> ArraysSort.Int.p_sort | SHUFFLE | 9 | 157.728 | 151.871 | 151.101 | >> 154.03 | 151.2 >> ArraysSort.Int.p_sort | SHUFFLE | 40 | 441.166 | 715.243 | 449... > > Hello Vamsi (@vamsi-parasa), > > Could you please run benchmarking of 4 cases with **updated** test class > **ArraysSortNew2**? > https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSortNew2.java > > Put each DPQS class in java.util package and recompiling the JDK for each > case as you > did before, and run new class **ArraysSortNew2**. > > Find the sources there: > > https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSortNew2.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_b01.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r27b.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r27p.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r27s.java > > Thank you, > Vladimir Hi Vladimir (@iaroslavski), Please see the data below. Thanks, Vamsi http://www.w3.org/TR/REC-html40;> Builder | Size | Stock JDK | b01 | r27b | r27p | r27s -- | -- | -- | -- | -- | -- | -- RANDOM | 600 | 1.615 | 1.59 | 2.316 | 1.805 | 1.77 RANDOM | 2000 | 6.794 | 6.638 | 8.443 | 6.354 | 6.295 RANDOM | 9 | 296.877 | 304.15 | 337.625 | 341.999 | 307.099 RANDOM | 40 | 838.061 | 801.108 | 1136.688 | 1161.181 | 781.487 RANDOM | 300 | 5468.214 | 5452.125 | 8522.698 | 8476.445 | 5368.777 PERIOD | 600 | 0.877 | 0.875 | 0.663 | 0.663 | 0.685 PERIOD | 2000 | 1.57 | 1.548 | 1.458 | 1.451 | 1.487 PERIOD | 9 | 97.208 | 97.677 | 106.01 | 106.516 | 106.629 PERIOD | 40 | 237.4 | 264.103 | 235.466 | 231.349 | 231.235 PERIOD | 300 | 2604.56 | 2829.935 | 4867.668 | 4872.361 | 4888.391 STAGGER | 600 | 1.052 | 1.064 | 0.774 | 0.78 | 0.791 STAGGER | 2000 | 3.449 | 3.443 | 2.604 | 2.627 | 2.597 STAGGER | 9 | 102.331 | 103.464 | 73.582 | 73.532 | 75.85 STAGGER | 40 | 210.829 | 229.37 | 207.356 | 208.565 | 205.141 STAGGER | 300 | 2205.565 | 2174.588 | 2086.885 | 2070.132 | 2373.443 SHUFFLE | 600 | 1.885 | 1.892 | 1.934 | 1.36 | 1.386 SHUFFLE | 2000 | 6.787 | 6.724 | 7.338 | 4.994 | 4.96 SHUFFLE | 9 | 158.065 | 154.48 | 152.874 | 148.337 | 140.703 SHUFFLE | 40 | 415.089 | 424.777 | 676.272 | 676.89 | 410.717 SHUFFLE | 300 | 3999.006 | 4017.496 | 6861.872 | 6894.785 | 3880.883 RANDOM | 600 | 1.614 | 1.588 | 2.329 | 1.789 | 1.847 RANDOM | 2000 | 6.756 | 6.634 | 7.757 | 6.224 | 6.23 RANDOM | 9 | 516.671 | 512.52 | 623.995 | 488.492 | 482.646 RANDOM | 40 | 2400.818 | 2399.264 | 2903.654 | 2356.675 | 2358.409 RANDOM | 300 |
[jira] [Created] (KAFKA-16360) Release plan of 3.x kafka releases.
kaushik srinivas created KAFKA-16360: Summary: Release plan of 3.x kafka releases. Key: KAFKA-16360 URL: https://issues.apache.org/jira/browse/KAFKA-16360 Project: Kafka Issue Type: Improvement Reporter: kaushik srinivas KIP [https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-ReleaseTimeline] mentions , h2. Kafka 3.7 * January 2024 * Final release with ZK mode But we see in Jira, some tickets are marked for 3.8 release. Does apache continue to make 3.x releases having zookeeper and kraft supported independent of pure kraft 4.x releases ? If yes, how many more releases can be expected on 3.x release line ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16360) Release plan of 3.x kafka releases.
kaushik srinivas created KAFKA-16360: Summary: Release plan of 3.x kafka releases. Key: KAFKA-16360 URL: https://issues.apache.org/jira/browse/KAFKA-16360 Project: Kafka Issue Type: Improvement Reporter: kaushik srinivas KIP [https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-ReleaseTimeline] mentions , h2. Kafka 3.7 * January 2024 * Final release with ZK mode But we see in Jira, some tickets are marked for 3.8 release. Does apache continue to make 3.x releases having zookeeper and kraft supported independent of pure kraft 4.x releases ? If yes, how many more releases can be expected on 3.x release line ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
Performance Issue at Ignite Level
Dear Ignite Team, Hope this email finds your well ! Reaching out to you, as currently we are facing performance issue with Apache Ignite. Our application is running on Azure Kubernetes (v1.27.1), which has 24 nodes of size "Standard_F64s_v2" (handling roughly 13million of records). Frequently we are getting "[WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: milliseconds", while performing any operations such as reading, encoding etc. Here below is the system information: System Information (Apache Ignite Clusters) Heap size: 32GB out of 64GB of memory CPUs: Standard_F64s_v2 The number of server instances: 24 IOWait: within normal range IOPS: 10k-600K per hr JDK: Oracle JDK8, 1.8.0_281 APACHE IGNITE - 2.14.0 GC configuration in ignite.sh: XX:-UseContainerSupport -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC" Please assist us in identifying the underlying source of this problem. Thank you, Srinivas This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security, AI-powered support capabilities, and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy. __ www.accenture.com
[PATCH 1/1] drm/i915: Allow bigjoiner for MST
We need bigjoiner support with MST functionality for MST monitor resolutions > 5K to work. Adding support for the same. v2: Addressed review comments from Jani. Revert rejection of MST bigjoiner modes and add functionality v3: Fixed pipe_mismatch WARN for mst_master_transcoder Credits-to: Manasi Navare Signed-off-by: Vidya Srinivas --- drivers/gpu/drm/i915/display/intel_ddi.c| 6 -- drivers/gpu/drm/i915/display/intel_dp_mst.c | 17 + 2 files changed, 13 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c b/drivers/gpu/drm/i915/display/intel_ddi.c index c587a8efeafc..41998022ed07 100644 --- a/drivers/gpu/drm/i915/display/intel_ddi.c +++ b/drivers/gpu/drm/i915/display/intel_ddi.c @@ -3902,9 +3902,11 @@ static void intel_ddi_read_func_ctl(struct intel_encoder *encoder, pipe_config->lane_count = ((temp & DDI_PORT_WIDTH_MASK) >> DDI_PORT_WIDTH_SHIFT) + 1; - if (DISPLAY_VER(dev_priv) >= 12) - pipe_config->mst_master_transcoder = + if (DISPLAY_VER(dev_priv) >= 12) { + if (!intel_crtc_is_bigjoiner_slave(pipe_config)) + pipe_config->mst_master_transcoder = REG_FIELD_GET(TRANS_DDI_MST_TRANSPORT_SELECT_MASK, temp); + } intel_cpu_transcoder_get_m1_n1(crtc, cpu_transcoder, _config->dp_m_n); diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c index db1254b036f1..c5e7293c13eb 100644 --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c @@ -525,6 +525,7 @@ static int intel_dp_mst_compute_config(struct intel_encoder *encoder, { struct drm_i915_private *dev_priv = to_i915(encoder->base.dev); struct intel_atomic_state *state = to_intel_atomic_state(conn_state->state); + struct intel_crtc *crtc = to_intel_crtc(pipe_config->uapi.crtc); struct intel_dp_mst_encoder *intel_mst = enc_to_mst(encoder); struct intel_dp *intel_dp = _mst->primary->dp; const struct intel_connector *connector = @@ -542,6 +543,10 @@ static int intel_dp_mst_compute_config(struct intel_encoder *encoder, if (adjusted_mode->flags & DRM_MODE_FLAG_DBLSCAN) return -EINVAL; + if (intel_dp_need_bigjoiner(intel_dp, adjusted_mode->crtc_hdisplay, + adjusted_mode->crtc_clock)) + pipe_config->bigjoiner_pipes = GENMASK(crtc->pipe + 1, crtc->pipe); + pipe_config->sink_format = INTEL_OUTPUT_FORMAT_RGB; pipe_config->output_format = INTEL_OUTPUT_FORMAT_RGB; pipe_config->has_pch_encoder = false; @@ -1330,12 +1335,6 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector *connector, * corresponding link capabilities of the sink) in case the * stream is uncompressed for it by the last branch device. */ - if (mode_rate > max_rate || mode->clock > max_dotclk || - drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port->full_pbn) { - *status = MODE_CLOCK_HIGH; - return 0; - } - if (mode->clock < 1) { *status = MODE_CLOCK_LOW; return 0; @@ -1349,8 +1348,10 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector *connector, if (intel_dp_need_bigjoiner(intel_dp, mode->hdisplay, target_clock)) { bigjoiner = true; max_dotclk *= 2; + } - /* TODO: add support for bigjoiner */ + if (mode_rate > max_rate || mode->clock > max_dotclk || + drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port->full_pbn) { *status = MODE_CLOCK_HIGH; return 0; } @@ -1397,7 +1398,7 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector *connector, return 0; } - *status = intel_mode_valid_max_plane_size(dev_priv, mode, false); + *status = intel_mode_valid_max_plane_size(dev_priv, mode, bigjoiner); return 0; } -- 2.33.0
[PATCH 0/1] Enable MST bigjoiner
Support resolutions > 5k on MST monitors that need bigjoiner by adding MST bigjoiner functionality Vidya Srinivas (1): drm/i915: Allow bigjoiner for MST drivers/gpu/drm/i915/display/intel_ddi.c| 6 -- drivers/gpu/drm/i915/display/intel_dp_mst.c | 17 + 2 files changed, 13 insertions(+), 10 deletions(-) -- 2.33.0
RE: [PATCH v2 8/8] drm/i915: Handle joined pipes inside hsw_crtc_disable()
Thank you very much Ville and Stan. With https://patchwork.freedesktop.org/series/130619/ and https://patchwork.freedesktop.org/series/130449/ tested that 6K works Tested-by: Vidya Srinivas > -Original Message- > From: Intel-gfx On Behalf Of Ville > Syrjala > Sent: Friday, March 1, 2024 10:54 PM > To: intel-gfx@lists.freedesktop.org > Cc: Lisovskiy, Stanislav > Subject: [PATCH v2 8/8] drm/i915: Handle joined pipes inside > hsw_crtc_disable() > > From: Ville Syrjälä > > Reorganize the crtc disable path to only deal with the master > pipes/transcoders in intel_old_crtc_state_disables() and offload the handling > of joined pipes to hsw_crtc_disable(). > This makes the whole thing much more sensible since we can actually control > the order in which we do the per-pipe vs. > per-transcoder modeset steps. > > v2: Pass the correct crtc pointer to .crtc_disable() > > Signed-off-by: Ville Syrjälä > --- > drivers/gpu/drm/i915/display/intel_display.c | 66 > 1 file changed, 39 insertions(+), 27 deletions(-) > > diff --git a/drivers/gpu/drm/i915/display/intel_display.c > b/drivers/gpu/drm/i915/display/intel_display.c > index 1df3923cc30d..e01536983303 100644 > --- a/drivers/gpu/drm/i915/display/intel_display.c > +++ b/drivers/gpu/drm/i915/display/intel_display.c > @@ -1793,29 +1793,27 @@ static void hsw_crtc_disable(struct > intel_atomic_state *state, > const struct intel_crtc_state *old_master_crtc_state = > intel_atomic_get_old_crtc_state(state, master_crtc); > struct drm_i915_private *i915 = to_i915(master_crtc->base.dev); > + u8 pipe_mask = intel_crtc_joined_pipe_mask(old_master_crtc_state); > + struct intel_crtc *crtc; > > /* >* FIXME collapse everything to one hook. >* Need care with mst->ddi interactions. >*/ > - if (!intel_crtc_is_bigjoiner_slave(old_master_crtc_state)) { > - intel_encoders_disable(state, master_crtc); > - intel_encoders_post_disable(state, master_crtc); > - } > - > - intel_disable_shared_dpll(old_master_crtc_state); > + intel_encoders_disable(state, master_crtc); > + intel_encoders_post_disable(state, master_crtc); > > - if (!intel_crtc_is_bigjoiner_slave(old_master_crtc_state)) { > - struct intel_crtc *slave_crtc; > + for_each_intel_crtc_in_pipe_mask(>drm, crtc, pipe_mask) { > + const struct intel_crtc_state *old_crtc_state = > + intel_atomic_get_old_crtc_state(state, crtc); > > - intel_encoders_post_pll_disable(state, master_crtc); > + intel_disable_shared_dpll(old_crtc_state); > + } > > - intel_dmc_disable_pipe(i915, master_crtc->pipe); > + intel_encoders_post_pll_disable(state, master_crtc); > > - for_each_intel_crtc_in_pipe_mask(>drm, slave_crtc, > - > intel_crtc_bigjoiner_slave_pipes(old_master_crtc_state)) > - intel_dmc_disable_pipe(i915, slave_crtc->pipe); > - } > + for_each_intel_crtc_in_pipe_mask(>drm, crtc, pipe_mask) > + intel_dmc_disable_pipe(i915, crtc->pipe); > } > > static void i9xx_pfit_enable(const struct intel_crtc_state *crtc_state) @@ - > 6753,24 +6751,33 @@ static void intel_update_crtc(struct intel_atomic_state > *state, } > > static void intel_old_crtc_state_disables(struct intel_atomic_state *state, > - struct intel_crtc *crtc) > + struct intel_crtc *master_crtc) > { > struct drm_i915_private *dev_priv = to_i915(state->base.dev); > - const struct intel_crtc_state *new_crtc_state = > - intel_atomic_get_new_crtc_state(state, crtc); > + const struct intel_crtc_state *old_master_crtc_state = > + intel_atomic_get_old_crtc_state(state, master_crtc); > + u8 pipe_mask = intel_crtc_joined_pipe_mask(old_master_crtc_state); > + struct intel_crtc *crtc; > > /* >* We need to disable pipe CRC before disabling the pipe, >* or we race against vblank off. >*/ > - intel_crtc_disable_pipe_crc(crtc); > + for_each_intel_crtc_in_pipe_mask(_priv->drm, crtc, pipe_mask) > + intel_crtc_disable_pipe_crc(crtc); > > - dev_priv->display.funcs.display->crtc_disable(state, crtc); > - crtc->active = false; > - intel_fbc_disable(crtc); > + dev_priv->display.funcs.display->crtc_disable(state, master_crtc); > > - if (!new_crtc_state->hw.active) > - intel_initial_watermarks(state, crtc); > + for_each_intel_crtc_in_pipe_mask
[PATCH 1/1] drm/i915: Allow bigjoiner for MST
We need bigjoiner support with MST functionality for MST monitor resolutions > 5K to work. Adding support for the same. v2: Addressed review comments from Jani. Revert rejection of MST bigjoiner modes and add functionality Signed-off-by: Vidya Srinivas --- drivers/gpu/drm/i915/display/intel_dp_mst.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c index db1254b036f1..c5e7293c13eb 100644 --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c @@ -525,6 +525,7 @@ static int intel_dp_mst_compute_config(struct intel_encoder *encoder, { struct drm_i915_private *dev_priv = to_i915(encoder->base.dev); struct intel_atomic_state *state = to_intel_atomic_state(conn_state->state); + struct intel_crtc *crtc = to_intel_crtc(pipe_config->uapi.crtc); struct intel_dp_mst_encoder *intel_mst = enc_to_mst(encoder); struct intel_dp *intel_dp = _mst->primary->dp; const struct intel_connector *connector = @@ -542,6 +543,10 @@ static int intel_dp_mst_compute_config(struct intel_encoder *encoder, if (adjusted_mode->flags & DRM_MODE_FLAG_DBLSCAN) return -EINVAL; + if (intel_dp_need_bigjoiner(intel_dp, adjusted_mode->crtc_hdisplay, + adjusted_mode->crtc_clock)) + pipe_config->bigjoiner_pipes = GENMASK(crtc->pipe + 1, crtc->pipe); + pipe_config->sink_format = INTEL_OUTPUT_FORMAT_RGB; pipe_config->output_format = INTEL_OUTPUT_FORMAT_RGB; pipe_config->has_pch_encoder = false; @@ -1330,12 +1335,6 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector *connector, * corresponding link capabilities of the sink) in case the * stream is uncompressed for it by the last branch device. */ - if (mode_rate > max_rate || mode->clock > max_dotclk || - drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port->full_pbn) { - *status = MODE_CLOCK_HIGH; - return 0; - } - if (mode->clock < 1) { *status = MODE_CLOCK_LOW; return 0; @@ -1349,8 +1348,10 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector *connector, if (intel_dp_need_bigjoiner(intel_dp, mode->hdisplay, target_clock)) { bigjoiner = true; max_dotclk *= 2; + } - /* TODO: add support for bigjoiner */ + if (mode_rate > max_rate || mode->clock > max_dotclk || + drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port->full_pbn) { *status = MODE_CLOCK_HIGH; return 0; } @@ -1397,7 +1398,7 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector *connector, return 0; } - *status = intel_mode_valid_max_plane_size(dev_priv, mode, false); + *status = intel_mode_valid_max_plane_size(dev_priv, mode, bigjoiner); return 0; } -- 2.33.0
[PATCH 0/1] Enable MST bigjoiner
Support resolutions > 5k on MST monitors that need bigjoiner by adding MST bigjoiner functionality Vidya Srinivas (1): drm/i915: Allow bigjoiner for MST drivers/gpu/drm/i915/display/intel_dp_mst.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) -- 2.33.0
[PATCH 1/1] drm/i915: Allow bigjoiner for MST
We need bigjoiner support with MST functionality for MST monitor resolutions > 5K to work. Adding support for the same. Signed-off-by: Vidya Srinivas --- drivers/gpu/drm/i915/display/intel_dp_mst.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c index db1254b036f1..c5e7293c13eb 100644 --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c @@ -525,6 +525,7 @@ static int intel_dp_mst_compute_config(struct intel_encoder *encoder, { struct drm_i915_private *dev_priv = to_i915(encoder->base.dev); struct intel_atomic_state *state = to_intel_atomic_state(conn_state->state); + struct intel_crtc *crtc = to_intel_crtc(pipe_config->uapi.crtc); struct intel_dp_mst_encoder *intel_mst = enc_to_mst(encoder); struct intel_dp *intel_dp = _mst->primary->dp; const struct intel_connector *connector = @@ -542,6 +543,10 @@ static int intel_dp_mst_compute_config(struct intel_encoder *encoder, if (adjusted_mode->flags & DRM_MODE_FLAG_DBLSCAN) return -EINVAL; + if (intel_dp_need_bigjoiner(intel_dp, adjusted_mode->crtc_hdisplay, + adjusted_mode->crtc_clock)) + pipe_config->bigjoiner_pipes = GENMASK(crtc->pipe + 1, crtc->pipe); + pipe_config->sink_format = INTEL_OUTPUT_FORMAT_RGB; pipe_config->output_format = INTEL_OUTPUT_FORMAT_RGB; pipe_config->has_pch_encoder = false; @@ -1330,12 +1335,6 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector *connector, * corresponding link capabilities of the sink) in case the * stream is uncompressed for it by the last branch device. */ - if (mode_rate > max_rate || mode->clock > max_dotclk || - drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port->full_pbn) { - *status = MODE_CLOCK_HIGH; - return 0; - } - if (mode->clock < 1) { *status = MODE_CLOCK_LOW; return 0; @@ -1349,8 +1348,10 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector *connector, if (intel_dp_need_bigjoiner(intel_dp, mode->hdisplay, target_clock)) { bigjoiner = true; max_dotclk *= 2; + } - /* TODO: add support for bigjoiner */ + if (mode_rate > max_rate || mode->clock > max_dotclk || + drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port->full_pbn) { *status = MODE_CLOCK_HIGH; return 0; } @@ -1397,7 +1398,7 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector *connector, return 0; } - *status = intel_mode_valid_max_plane_size(dev_priv, mode, false); + *status = intel_mode_valid_max_plane_size(dev_priv, mode, bigjoiner); return 0; } -- 2.33.0
[PATCH 0/1] Enable MST bigjoiner
Support resolutions > 5k on MST monitors that need bigjoiner by adding MST bigjoiner functionality Vidya Srinivas (1): drm/i915: Allow bigjoiner for MST drivers/gpu/drm/i915/display/intel_dp_mst.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) -- 2.33.0
RE: [PATCH 1/2] Revert "drm/i915/mst: Reject modes that require the bigjoiner"
> -Original Message- > From: Jani Nikula > Sent: Wednesday, February 28, 2024 2:39 PM > To: Srinivas, Vidya ; > intel-gfx@lists.freedesktop.org > Cc: Almahallawy, Khaled ; Srinivas, Vidya > > Subject: Re: [PATCH 1/2] Revert "drm/i915/mst: Reject modes that require the > bigjoiner" > > On Tue, 27 Feb 2024, Vidya Srinivas wrote: > > This reverts commit 9c058492b16f90bb772cb0dad567e8acc68e155d. > > > > Reverting for adding MST bigjoiner functionality. > > Please squash this together with the fix. Someone might think a revert is a > fix > that needs to be backported. Besides, for bisection this creates a non-working > commit. Hello Jani Thank you very much. Sure, I will squash it together with the fix and submit. Regards Vidya > > BR, > Jani. > > > > > > Signed-off-by: Vidya Srinivas > > --- > > drivers/gpu/drm/i915/display/intel_dp_mst.c | 4 > > 1 file changed, 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c > > b/drivers/gpu/drm/i915/display/intel_dp_mst.c > > index db1254b036f1..b062f4ee6c8b 100644 > > --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c > > +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c > > @@ -1349,10 +1349,6 @@ intel_dp_mst_mode_valid_ctx(struct > drm_connector *connector, > > if (intel_dp_need_bigjoiner(intel_dp, mode->hdisplay, target_clock)) { > > bigjoiner = true; > > max_dotclk *= 2; > > - > > - /* TODO: add support for bigjoiner */ > > - *status = MODE_CLOCK_HIGH; > > - return 0; > > } > > > > if (DISPLAY_VER(dev_priv) >= 10 && > > -- > Jani Nikula, Intel
RE: [PATCH 3/3] drm/i915: Fix bigjoiner case for DP2.0
> -Original Message- > From: Lisovskiy, Stanislav > Sent: Tuesday, February 27, 2024 2:44 PM > To: Srinivas, Vidya > Cc: Jani Nikula ; > intel-gfx@lists.freedesktop.org; > Saarinen, Jani ; ville.syrj...@linux.intel.com > Subject: Re: [PATCH 3/3] drm/i915: Fix bigjoiner case for DP2.0 > > On Tue, Feb 27, 2024 at 11:06:16AM +0200, Srinivas, Vidya wrote: > > > > > > > -Original Message- > > > From: Lisovskiy, Stanislav > > > Sent: Tuesday, February 27, 2024 2:34 PM > > > To: Jani Nikula > > > Cc: intel-gfx@lists.freedesktop.org; Saarinen, Jani > > > ; ville.syrj...@linux.intel.com; Srinivas, > > > Vidya > > > Subject: Re: [PATCH 3/3] drm/i915: Fix bigjoiner case for DP2.0 > > > > > > On Mon, Feb 26, 2024 at 09:56:10PM +0200, Jani Nikula wrote: > > > > On Wed, 21 Feb 2024, Stanislav Lisovskiy > > > > > > > wrote: > > > > > Patch calculates bigjoiner pipes in mst compute. > > > > > Patch also passes bigjoiner bool to validate plane max size. > > > > > > > > Please use the imperative mood in commit messages, e.g. "calculate" > > > > intead of "calculates". > > > > > > > > Please do not refer to "patch". We know it's a patch, until it > > > > isn't, and then it's a commit. > > > > > > > > Please explain *why* the changes are being done, not just *what* > > > > is being done. > > > > > > > > In the subject, what is "bigjoiner case for DP2.0"? DP 2.0 is a > > > > spec version, and as such irrelevant for the changes being done. > > > > > > > > > Signed-off-by: vsrini4 > > > > > > > > ? > > > > > > Hi Jani, I just added that patch from Vidya to my series, to be > > > honest, didn't have time at all to look much into it. > > > Looks like its me who is going to fix that. > > > > Hello Stan > > My sincere apologies. I dint want to disturb your series, so I did not fix > > it. > > Please let me know if I should fix it. Sorry again. > > Thank you Jani for the comments. > > > > Regards > > Vidya > > Hi Vidya, > > it is a bit unclear for me as well now, how do we proceed, since your patch is > part of my series, I was explicitly asked to add it, does it mean you are > fixing it > now or me? > Well if you address Jani's comments, I definitely dont mind :) Hello Stan Thank you so much. Just so that I don't disturb your series, I have pushed this series https://patchwork.freedesktop.org/series/130449/ After addressing comments from Jani Nikula. Many thanks Jani for the review and apologies for the commit message errors. Kindly help check if this series is okay. Thank you. Regards Vidya > > > > > > > > > > > > > Signed-off-by: Stanislav Lisovskiy > > > > > > > > > > --- > > > > > drivers/gpu/drm/i915/display/intel_dp_mst.c | 19 > > > > > --- > > > > > 1 file changed, 12 insertions(+), 7 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c > > > > > b/drivers/gpu/drm/i915/display/intel_dp_mst.c > > > > > index 5307ddd4edcf5..fd27d9976c050 100644 > > > > > --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c > > > > > +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c > > > > > @@ -523,6 +523,7 @@ static int > > > > > intel_dp_mst_compute_config(struct > > > intel_encoder *encoder, > > > > > struct drm_connector_state > > > > > *conn_state) > > > { > > > > > struct drm_i915_private *dev_priv = > > > > > to_i915(encoder->base.dev); > > > > > + struct intel_crtc *crtc = > > > > > +to_intel_crtc(pipe_config->uapi.crtc); > > > > > struct intel_dp_mst_encoder *intel_mst = enc_to_mst(encoder); > > > > > struct intel_dp *intel_dp = _mst->primary->dp; > > > > > const struct intel_connector *connector = @@ -540,6 +541,10 @@ > > > > > static int intel_dp_mst_compute_config(struct intel_encoder *encoder, > > > > > if (adjusted_mode->flags & DRM_MODE_FLAG_DBLSCAN) > > > > > return -EINVAL; > > > > > > > > > > + if (intel_dp_need_bi
RE: [PATCH 3/3] drm/i915: Fix bigjoiner case for DP2.0
> -Original Message- > From: Manasi Navare > Sent: Tuesday, February 27, 2024 11:37 PM > To: Jani Nikula > Cc: Lisovskiy, Stanislav ; intel- > g...@lists.freedesktop.org; Saarinen, Jani ; > ville.syrj...@linux.intel.com; Srinivas, Vidya > Subject: Re: [PATCH 3/3] drm/i915: Fix bigjoiner case for DP2.0 > > Thanks Jani for your review. > Thanks @Lisovskiy, Stanislav and @vidya.srini...@intel.com for taking this > patch forward. > > @Jani Nikula , @Ville Syrjälä : MST bigjoiner as a feature needs to be enabled > upstream and this patch enables that feature. > If you agree that bigjoiner refactoring patches 1 and 2 have no impact on > enabling bigjoiner on MST, could we decouple this patch from bigjoiner > refactoring and land this separately? Hello Manasi Thank you. I have submitted this series as suggested after addressing comments from Jani Nikula about the commit message errors. https://patchwork.freedesktop.org/series/130449/ Regards Vidya > > We need the Bigjoiner to be enabled on MST feature landed asap and > bigjoiner refactoring can follow. > > Regards > Manasi > > On Tue, Feb 27, 2024 at 1:15 AM Jani Nikula > wrote: > > > > On Tue, 27 Feb 2024, "Lisovskiy, Stanislav" > wrote: > > > On Mon, Feb 26, 2024 at 09:56:10PM +0200, Jani Nikula wrote: > > >> On Wed, 21 Feb 2024, Stanislav Lisovskiy > wrote: > > >> > Patch calculates bigjoiner pipes in mst compute. > > >> > Patch also passes bigjoiner bool to validate plane max size. > > >> > > >> Please use the imperative mood in commit messages, e.g. "calculate" > > >> intead of "calculates". > > >> > > >> Please do not refer to "patch". We know it's a patch, until it > > >> isn't, and then it's a commit. > > >> > > >> Please explain *why* the changes are being done, not just *what* is > > >> being done. > > >> > > >> In the subject, what is "bigjoiner case for DP2.0"? DP 2.0 is a > > >> spec version, and as such irrelevant for the changes being done. > > >> > > >> > Signed-off-by: vsrini4 > > >> > > >> ? > > > > > > Hi Jani, I just added that patch from Vidya to my series, to be > > > honest, didn't have time at all to look much into it. > > > Looks like its me who is going to fix that. > > > > Should the original authorship be preserved? If not, please add > > Co-developed-by. Just having the Signed-off-by is not enough. > > > > BR, > > Jani. > > > > > > > > > >> > > >> > Signed-off-by: Stanislav Lisovskiy > > >> > > > >> > --- > > >> > drivers/gpu/drm/i915/display/intel_dp_mst.c | 19 > > >> > --- > > >> > 1 file changed, 12 insertions(+), 7 deletions(-) > > >> > > > >> > diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c > > >> > b/drivers/gpu/drm/i915/display/intel_dp_mst.c > > >> > index 5307ddd4edcf5..fd27d9976c050 100644 > > >> > --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c > > >> > +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c > > >> > @@ -523,6 +523,7 @@ static int intel_dp_mst_compute_config(struct > intel_encoder *encoder, > > >> > struct drm_connector_state > > >> > *conn_state) { > > >> >struct drm_i915_private *dev_priv = > > >> > to_i915(encoder->base.dev); > > >> > + struct intel_crtc *crtc = > > >> > + to_intel_crtc(pipe_config->uapi.crtc); > > >> >struct intel_dp_mst_encoder *intel_mst = enc_to_mst(encoder); > > >> >struct intel_dp *intel_dp = _mst->primary->dp; > > >> >const struct intel_connector *connector = @@ -540,6 +541,10 @@ > > >> > static int intel_dp_mst_compute_config(struct intel_encoder *encoder, > > >> >if (adjusted_mode->flags & DRM_MODE_FLAG_DBLSCAN) > > >> >return -EINVAL; > > >> > > > >> > + if (intel_dp_need_bigjoiner(intel_dp, adjusted_mode->crtc_hdisplay, > > >> > + adjusted_mode->crtc_clock)) > > >> > + pipe_config->bigjoiner_pipes = GENMASK(crtc->pipe + 1, > > >> > + crtc->pipe); > > >> > + > > >> >pipe_config->sink_format = INTEL_OUTPUT_FORMAT_RGB
[PATCH 2/2] drm/i915: Allow bigjoiner for MST
We need bigjoiner support with MST functionality for MST monitor resolutions > 5K to work. Adding support for the same. Signed-off-by: Vidya Srinivas --- drivers/gpu/drm/i915/display/intel_dp_mst.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c index b062f4ee6c8b..c5e7293c13eb 100644 --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c @@ -525,6 +525,7 @@ static int intel_dp_mst_compute_config(struct intel_encoder *encoder, { struct drm_i915_private *dev_priv = to_i915(encoder->base.dev); struct intel_atomic_state *state = to_intel_atomic_state(conn_state->state); + struct intel_crtc *crtc = to_intel_crtc(pipe_config->uapi.crtc); struct intel_dp_mst_encoder *intel_mst = enc_to_mst(encoder); struct intel_dp *intel_dp = _mst->primary->dp; const struct intel_connector *connector = @@ -542,6 +543,10 @@ static int intel_dp_mst_compute_config(struct intel_encoder *encoder, if (adjusted_mode->flags & DRM_MODE_FLAG_DBLSCAN) return -EINVAL; + if (intel_dp_need_bigjoiner(intel_dp, adjusted_mode->crtc_hdisplay, + adjusted_mode->crtc_clock)) + pipe_config->bigjoiner_pipes = GENMASK(crtc->pipe + 1, crtc->pipe); + pipe_config->sink_format = INTEL_OUTPUT_FORMAT_RGB; pipe_config->output_format = INTEL_OUTPUT_FORMAT_RGB; pipe_config->has_pch_encoder = false; @@ -1330,12 +1335,6 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector *connector, * corresponding link capabilities of the sink) in case the * stream is uncompressed for it by the last branch device. */ - if (mode_rate > max_rate || mode->clock > max_dotclk || - drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port->full_pbn) { - *status = MODE_CLOCK_HIGH; - return 0; - } - if (mode->clock < 1) { *status = MODE_CLOCK_LOW; return 0; @@ -1351,6 +1350,12 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector *connector, max_dotclk *= 2; } + if (mode_rate > max_rate || mode->clock > max_dotclk || + drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port->full_pbn) { + *status = MODE_CLOCK_HIGH; + return 0; + } + if (DISPLAY_VER(dev_priv) >= 10 && drm_dp_sink_supports_dsc(intel_connector->dp.dsc_dpcd)) { /* @@ -1393,7 +1398,7 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector *connector, return 0; } - *status = intel_mode_valid_max_plane_size(dev_priv, mode, false); + *status = intel_mode_valid_max_plane_size(dev_priv, mode, bigjoiner); return 0; } -- 2.33.0
[PATCH 1/2] Revert "drm/i915/mst: Reject modes that require the bigjoiner"
This reverts commit 9c058492b16f90bb772cb0dad567e8acc68e155d. Reverting for adding MST bigjoiner functionality. Signed-off-by: Vidya Srinivas --- drivers/gpu/drm/i915/display/intel_dp_mst.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c index db1254b036f1..b062f4ee6c8b 100644 --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c @@ -1349,10 +1349,6 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector *connector, if (intel_dp_need_bigjoiner(intel_dp, mode->hdisplay, target_clock)) { bigjoiner = true; max_dotclk *= 2; - - /* TODO: add support for bigjoiner */ - *status = MODE_CLOCK_HIGH; - return 0; } if (DISPLAY_VER(dev_priv) >= 10 && -- 2.33.0
[PATCH 0/2] Enable MST bigjoiner
Series reverts rejection of modes on MST monitors that need bigjoiner and adds MST bigjoiner functionality Vidya Srinivas (2): Revert "drm/i915/mst: Reject modes that require the bigjoiner" drm/i915: Allow bigjoiner for MST drivers/gpu/drm/i915/display/intel_dp_mst.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) -- 2.33.0
RE: [PATCH 2/3] Start separating pipe vs transcoder set logic for bigjoiner during modeset
> -Original Message- > From: Lisovskiy, Stanislav > Sent: Tuesday, February 27, 2024 2:41 PM > To: Srinivas, Vidya > Cc: intel-gfx@lists.freedesktop.org; Saarinen, Jani ; > ville.syrj...@linux.intel.com > Subject: Re: [PATCH 2/3] Start separating pipe vs transcoder set logic for > bigjoiner during modeset > > On Tue, Feb 27, 2024 at 06:40:23AM +0200, Srinivas, Vidya wrote: > > > > > > > -Original Message- > > > From: Lisovskiy, Stanislav > > > Sent: Thursday, February 22, 2024 12:50 AM > > > To: intel-gfx@lists.freedesktop.org > > > Cc: Lisovskiy, Stanislav ; Saarinen, > > > Jani ; ville.syrj...@linux.intel.com; > > > Srinivas, Vidya > > > Subject: [PATCH 2/3] Start separating pipe vs transcoder set logic > > > for bigjoiner during modeset > > > > > > Handle only bigjoiner masters in > > > skl_commit_modeset_enables/disables, > > > slave crtcs should be handled by master hooks. Same for encoders. > > > That way we can also remove a bunch of checks like > > > intel_crtc_is_bigjoiner_slave. > > > > > > v2: Get rid of master vs slave checks and separation in crtc > > > enable/disable hooks. > > > Use unified iteration cycle for all of those, while enabling/disabling > > > transcoder only for those pipes where its needed(Ville Syrjälä) > > > > > > v3: Move all the intel_encoder_* calls under transcoder code > > > path(Ville > > > Syrjälä) > > > > > > v4: - Call intel_crtc_vblank_on from hsw_crtc_enable only for > > > non-transcoder path > > >(for master pipe that will be called from > > > intel_encoders_enable/intel_enable_ddi) > > > - Fix stupid mistake with using crtc->pipe for the mask, > > > instead of BIT(crtc- > > > >pipe) > > > > > > Signed-off-by: Stanislav Lisovskiy > > > --- > > > drivers/gpu/drm/i915/display/intel_ddi.c | 21 +-- > > > drivers/gpu/drm/i915/display/intel_display.c | 183 --- > > > drivers/gpu/drm/i915/display/intel_display.h | 6 + > > > 3 files changed, 121 insertions(+), 89 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c > > > b/drivers/gpu/drm/i915/display/intel_ddi.c > > > index bea4415902044..6071e9f500871 100644 > > > --- a/drivers/gpu/drm/i915/display/intel_ddi.c > > > +++ b/drivers/gpu/drm/i915/display/intel_ddi.c > > > @@ -3100,7 +3100,6 @@ static void intel_ddi_post_disable(struct > > > intel_atomic_state *state, > > > const struct drm_connector_state > > > *old_conn_state) { > > > struct drm_i915_private *dev_priv = to_i915(encoder->base.dev); > > > - struct intel_crtc *slave_crtc; > > > > > > if (!intel_crtc_has_type(old_crtc_state, INTEL_OUTPUT_DP_MST)) { > > > intel_crtc_vblank_off(old_crtc_state); > > > @@ -3117,17 +3116,6 @@ static void intel_ddi_post_disable(struct > > > intel_atomic_state *state, > > > ilk_pfit_disable(old_crtc_state); > > > } > > > > > > - for_each_intel_crtc_in_pipe_mask(_priv->drm, slave_crtc, > > > - > > > intel_crtc_bigjoiner_slave_pipes(old_crtc_state)) { > > > - const struct intel_crtc_state *old_slave_crtc_state = > > > - intel_atomic_get_old_crtc_state(state, slave_crtc); > > > - > > > - intel_crtc_vblank_off(old_slave_crtc_state); > > > - > > > - intel_dsc_disable(old_slave_crtc_state); > > > - skl_scaler_disable(old_slave_crtc_state); > > > - } > > > - > > > /* > > >* When called from DP MST code: > > >* - old_conn_state will be NULL > > > @@ -3363,8 +3351,7 @@ static void intel_enable_ddi(struct > > > intel_atomic_state *state, { > > > drm_WARN_ON(state->base.dev, crtc_state->has_pch_encoder); > > > > > > - if (!intel_crtc_is_bigjoiner_slave(crtc_state)) > > > - intel_ddi_enable_transcoder_func(encoder, crtc_state); > > > + intel_ddi_enable_transcoder_func(encoder, crtc_state); > > > > > > /* Enable/Disable DP2.0 SDP split config before transcoder */ > > > intel_audio_sdp_split_update(crtc_state); > > > @@ -3469,9 +3456,6 @@ void intel_ddi_update_active_dpll(struct > > > intel_atomic_state *state, > > > struct intel_crtc *crtc) > > >
RE: [PATCH 3/3] drm/i915: Fix bigjoiner case for DP2.0
> -Original Message- > From: Lisovskiy, Stanislav > Sent: Tuesday, February 27, 2024 2:34 PM > To: Jani Nikula > Cc: intel-gfx@lists.freedesktop.org; Saarinen, Jani ; > ville.syrj...@linux.intel.com; Srinivas, Vidya > Subject: Re: [PATCH 3/3] drm/i915: Fix bigjoiner case for DP2.0 > > On Mon, Feb 26, 2024 at 09:56:10PM +0200, Jani Nikula wrote: > > On Wed, 21 Feb 2024, Stanislav Lisovskiy > wrote: > > > Patch calculates bigjoiner pipes in mst compute. > > > Patch also passes bigjoiner bool to validate plane max size. > > > > Please use the imperative mood in commit messages, e.g. "calculate" > > intead of "calculates". > > > > Please do not refer to "patch". We know it's a patch, until it isn't, > > and then it's a commit. > > > > Please explain *why* the changes are being done, not just *what* is > > being done. > > > > In the subject, what is "bigjoiner case for DP2.0"? DP 2.0 is a spec > > version, and as such irrelevant for the changes being done. > > > > > Signed-off-by: vsrini4 > > > > ? > > Hi Jani, I just added that patch from Vidya to my series, to be honest, didn't > have time at all to look much into it. > Looks like its me who is going to fix that. Hello Stan My sincere apologies. I dint want to disturb your series, so I did not fix it. Please let me know if I should fix it. Sorry again. Thank you Jani for the comments. Regards Vidya > > > > > > Signed-off-by: Stanislav Lisovskiy > > > --- > > > drivers/gpu/drm/i915/display/intel_dp_mst.c | 19 > > > --- > > > 1 file changed, 12 insertions(+), 7 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c > > > b/drivers/gpu/drm/i915/display/intel_dp_mst.c > > > index 5307ddd4edcf5..fd27d9976c050 100644 > > > --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c > > > +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c > > > @@ -523,6 +523,7 @@ static int intel_dp_mst_compute_config(struct > intel_encoder *encoder, > > > struct drm_connector_state *conn_state) > { > > > struct drm_i915_private *dev_priv = to_i915(encoder->base.dev); > > > + struct intel_crtc *crtc = to_intel_crtc(pipe_config->uapi.crtc); > > > struct intel_dp_mst_encoder *intel_mst = enc_to_mst(encoder); > > > struct intel_dp *intel_dp = _mst->primary->dp; > > > const struct intel_connector *connector = @@ -540,6 +541,10 @@ > > > static int intel_dp_mst_compute_config(struct intel_encoder *encoder, > > > if (adjusted_mode->flags & DRM_MODE_FLAG_DBLSCAN) > > > return -EINVAL; > > > > > > + if (intel_dp_need_bigjoiner(intel_dp, adjusted_mode->crtc_hdisplay, > > > + adjusted_mode->crtc_clock)) > > > + pipe_config->bigjoiner_pipes = GENMASK(crtc->pipe + 1, > > > +crtc->pipe); > > > + > > > pipe_config->sink_format = INTEL_OUTPUT_FORMAT_RGB; > > > pipe_config->output_format = INTEL_OUTPUT_FORMAT_RGB; > > > pipe_config->has_pch_encoder = false; @@ -1318,12 +1323,6 @@ > > > intel_dp_mst_mode_valid_ctx(struct drm_connector *connector, > > >* corresponding link capabilities of the sink) in case the > > >* stream is uncompressed for it by the last branch device. > > >*/ > > > - if (mode_rate > max_rate || mode->clock > max_dotclk || > > > - drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port- > >full_pbn) { > > > - *status = MODE_CLOCK_HIGH; > > > - return 0; > > > - } > > > - > > > if (mode->clock < 1) { > > > *status = MODE_CLOCK_LOW; > > > return 0; > > > @@ -1343,6 +1342,12 @@ intel_dp_mst_mode_valid_ctx(struct > drm_connector *connector, > > > return 0; > > > } > > > > > > + if (mode_rate > max_rate || mode->clock > max_dotclk || > > > + drm_dp_calc_pbn_mode(mode->clock, min_bpp << 4) > port- > >full_pbn) { > > > + *status = MODE_CLOCK_HIGH; > > > + return 0; > > > + } > > > + > > > if (DISPLAY_VER(dev_priv) >= 10 && > > > drm_dp_sink_supports_dsc(intel_connector->dp.dsc_dpcd)) { > > > /* > > > @@ -1385,7 +1390,7 @@ intel_dp_mst_mode_valid_ctx(struct > drm_connector *connector, > > > return 0; > > > } > > > > > > - *status = intel_mode_valid_max_plane_size(dev_priv, mode, false); > > > + *status = intel_mode_valid_max_plane_size(dev_priv, mode, > > > +bigjoiner); > > > return 0; > > > } > > > > -- > > Jani Nikula, Intel
RE: [PATCH 2/3] Start separating pipe vs transcoder set logic for bigjoiner during modeset
> -Original Message- > From: Intel-gfx On Behalf Of > Srinivas, Vidya > Sent: Tuesday, February 27, 2024 10:10 AM > To: Lisovskiy, Stanislav ; intel- > g...@lists.freedesktop.org > Cc: Saarinen, Jani ; ville.syrj...@linux.intel.com > Subject: RE: [PATCH 2/3] Start separating pipe vs transcoder set logic for > bigjoiner during modeset > > > > > -Original Message- > > From: Lisovskiy, Stanislav > > Sent: Thursday, February 22, 2024 12:50 AM > > To: intel-gfx@lists.freedesktop.org > > Cc: Lisovskiy, Stanislav ; Saarinen, > > Jani ; ville.syrj...@linux.intel.com; > > Srinivas, Vidya > > Subject: [PATCH 2/3] Start separating pipe vs transcoder set logic for > > bigjoiner during modeset > > > > Handle only bigjoiner masters in skl_commit_modeset_enables/disables, > > slave crtcs should be handled by master hooks. Same for encoders. > > That way we can also remove a bunch of checks like > > intel_crtc_is_bigjoiner_slave. > > > > v2: Get rid of master vs slave checks and separation in crtc > > enable/disable hooks. > > Use unified iteration cycle for all of those, while enabling/disabling > > transcoder only for those pipes where its needed(Ville Syrjälä) > > > > v3: Move all the intel_encoder_* calls under transcoder code > > path(Ville > > Syrjälä) > > > > v4: - Call intel_crtc_vblank_on from hsw_crtc_enable only for > > non-transcoder path > >(for master pipe that will be called from > > intel_encoders_enable/intel_enable_ddi) > > - Fix stupid mistake with using crtc->pipe for the mask, instead > > of BIT(crtc- > > >pipe) > > > > Signed-off-by: Stanislav Lisovskiy > > --- > > drivers/gpu/drm/i915/display/intel_ddi.c | 21 +-- > > drivers/gpu/drm/i915/display/intel_display.c | 183 --- > > drivers/gpu/drm/i915/display/intel_display.h | 6 + > > 3 files changed, 121 insertions(+), 89 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c > > b/drivers/gpu/drm/i915/display/intel_ddi.c > > index bea4415902044..6071e9f500871 100644 > > --- a/drivers/gpu/drm/i915/display/intel_ddi.c > > +++ b/drivers/gpu/drm/i915/display/intel_ddi.c > > @@ -3100,7 +3100,6 @@ static void intel_ddi_post_disable(struct > > intel_atomic_state *state, > >const struct drm_connector_state > > *old_conn_state) { > > struct drm_i915_private *dev_priv = to_i915(encoder->base.dev); > > - struct intel_crtc *slave_crtc; > > > > if (!intel_crtc_has_type(old_crtc_state, INTEL_OUTPUT_DP_MST)) { > > intel_crtc_vblank_off(old_crtc_state); > > @@ -3117,17 +3116,6 @@ static void intel_ddi_post_disable(struct > > intel_atomic_state *state, > > ilk_pfit_disable(old_crtc_state); > > } > > > > - for_each_intel_crtc_in_pipe_mask(_priv->drm, slave_crtc, > > - > > intel_crtc_bigjoiner_slave_pipes(old_crtc_state)) { > > - const struct intel_crtc_state *old_slave_crtc_state = > > - intel_atomic_get_old_crtc_state(state, slave_crtc); > > - > > - intel_crtc_vblank_off(old_slave_crtc_state); > > - > > - intel_dsc_disable(old_slave_crtc_state); > > - skl_scaler_disable(old_slave_crtc_state); > > - } > > - > > /* > > * When called from DP MST code: > > * - old_conn_state will be NULL > > @@ -3363,8 +3351,7 @@ static void intel_enable_ddi(struct > > intel_atomic_state *state, { > > drm_WARN_ON(state->base.dev, crtc_state->has_pch_encoder); > > > > - if (!intel_crtc_is_bigjoiner_slave(crtc_state)) > > - intel_ddi_enable_transcoder_func(encoder, crtc_state); > > + intel_ddi_enable_transcoder_func(encoder, crtc_state); > > > > /* Enable/Disable DP2.0 SDP split config before transcoder */ > > intel_audio_sdp_split_update(crtc_state); > > @@ -3469,9 +3456,6 @@ void intel_ddi_update_active_dpll(struct > > intel_atomic_state *state, > > struct intel_crtc *crtc) > > { > > struct drm_i915_private *i915 = to_i915(encoder->base.dev); > > - struct intel_crtc_state *crtc_state = > > - intel_atomic_get_new_crtc_state(state, crtc); > > - struct intel_crtc *slave_crtc; > > enum phy phy = intel_port_to_phy(i915, encoder->port); > > > > /* FIXME: Add MTL pll_mgr */ > > @@ -3479,9 +3463,6 @@ void intel_ddi_update_active_dpll(struct > &
RE: [PATCH 2/3] Start separating pipe vs transcoder set logic for bigjoiner during modeset
> -Original Message- > From: Lisovskiy, Stanislav > Sent: Thursday, February 22, 2024 12:50 AM > To: intel-gfx@lists.freedesktop.org > Cc: Lisovskiy, Stanislav ; Saarinen, Jani > ; ville.syrj...@linux.intel.com; Srinivas, Vidya > > Subject: [PATCH 2/3] Start separating pipe vs transcoder set logic for > bigjoiner > during modeset > > Handle only bigjoiner masters in skl_commit_modeset_enables/disables, > slave crtcs should be handled by master hooks. Same for encoders. > That way we can also remove a bunch of checks like > intel_crtc_is_bigjoiner_slave. > > v2: Get rid of master vs slave checks and separation in crtc enable/disable > hooks. > Use unified iteration cycle for all of those, while enabling/disabling > transcoder only for those pipes where its needed(Ville Syrjälä) > > v3: Move all the intel_encoder_* calls under transcoder code path(Ville > Syrjälä) > > v4: - Call intel_crtc_vblank_on from hsw_crtc_enable only for non-transcoder > path >(for master pipe that will be called from > intel_encoders_enable/intel_enable_ddi) > - Fix stupid mistake with using crtc->pipe for the mask, instead of > BIT(crtc- > >pipe) > > Signed-off-by: Stanislav Lisovskiy > --- > drivers/gpu/drm/i915/display/intel_ddi.c | 21 +-- > drivers/gpu/drm/i915/display/intel_display.c | 183 --- > drivers/gpu/drm/i915/display/intel_display.h | 6 + > 3 files changed, 121 insertions(+), 89 deletions(-) > > diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c > b/drivers/gpu/drm/i915/display/intel_ddi.c > index bea4415902044..6071e9f500871 100644 > --- a/drivers/gpu/drm/i915/display/intel_ddi.c > +++ b/drivers/gpu/drm/i915/display/intel_ddi.c > @@ -3100,7 +3100,6 @@ static void intel_ddi_post_disable(struct > intel_atomic_state *state, > const struct drm_connector_state > *old_conn_state) { > struct drm_i915_private *dev_priv = to_i915(encoder->base.dev); > - struct intel_crtc *slave_crtc; > > if (!intel_crtc_has_type(old_crtc_state, INTEL_OUTPUT_DP_MST)) { > intel_crtc_vblank_off(old_crtc_state); > @@ -3117,17 +3116,6 @@ static void intel_ddi_post_disable(struct > intel_atomic_state *state, > ilk_pfit_disable(old_crtc_state); > } > > - for_each_intel_crtc_in_pipe_mask(_priv->drm, slave_crtc, > - > intel_crtc_bigjoiner_slave_pipes(old_crtc_state)) { > - const struct intel_crtc_state *old_slave_crtc_state = > - intel_atomic_get_old_crtc_state(state, slave_crtc); > - > - intel_crtc_vblank_off(old_slave_crtc_state); > - > - intel_dsc_disable(old_slave_crtc_state); > - skl_scaler_disable(old_slave_crtc_state); > - } > - > /* >* When called from DP MST code: >* - old_conn_state will be NULL > @@ -3363,8 +3351,7 @@ static void intel_enable_ddi(struct > intel_atomic_state *state, { > drm_WARN_ON(state->base.dev, crtc_state->has_pch_encoder); > > - if (!intel_crtc_is_bigjoiner_slave(crtc_state)) > - intel_ddi_enable_transcoder_func(encoder, crtc_state); > + intel_ddi_enable_transcoder_func(encoder, crtc_state); > > /* Enable/Disable DP2.0 SDP split config before transcoder */ > intel_audio_sdp_split_update(crtc_state); > @@ -3469,9 +3456,6 @@ void intel_ddi_update_active_dpll(struct > intel_atomic_state *state, > struct intel_crtc *crtc) > { > struct drm_i915_private *i915 = to_i915(encoder->base.dev); > - struct intel_crtc_state *crtc_state = > - intel_atomic_get_new_crtc_state(state, crtc); > - struct intel_crtc *slave_crtc; > enum phy phy = intel_port_to_phy(i915, encoder->port); > > /* FIXME: Add MTL pll_mgr */ > @@ -3479,9 +3463,6 @@ void intel_ddi_update_active_dpll(struct > intel_atomic_state *state, > return; > > intel_update_active_dpll(state, crtc, encoder); > - for_each_intel_crtc_in_pipe_mask(>drm, slave_crtc, > - > intel_crtc_bigjoiner_slave_pipes(crtc_state)) > - intel_update_active_dpll(state, slave_crtc, encoder); > } > > static void > diff --git a/drivers/gpu/drm/i915/display/intel_display.c > b/drivers/gpu/drm/i915/display/intel_display.c > index 916c13a149fd5..e1ea53fd6a288 100644 > --- a/drivers/gpu/drm/i915/display/intel_display.c > +++ b/drivers/gpu/drm/i915/display/intel_display.c > @@ -1631,31 +1631,12 @@ static void hsw_configure_cpu_transcoder(const > struct intel_crtc_state *crtc_sta > hsw_set_transconf(crtc_state); > } > >
Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v9]
On Thu, 22 Feb 2024 23:43:41 GMT, Brent Christian wrote: >> Thanks for finding my misspelling, djelinski. > > The use of "(un)successful(ly)" in relation to `Reference.enqueue()` is quite > deliberate (and builds on the previous wording, "successful"). > > The intention was to use it consistently (is that not the case somewhere?). > For example, it's also used in the new **Memory Consistency Properties** > section of the `java.lang.ref` package docs ("The enqueueing of a > reference...by a successful call to `Reference.enqueue()`..."). > > A "successful call to `enqueue()`" is meant to be shorthand for: > "the reference has been enqueued, and the enqueuing was performed by the > `enqueue()` method (rather than by the garbage collector). Therefore there is > a _happens-before_ edge between the `enqueue()` method call and the dequeuing > of the Reference (whereas there would not be this _happens-before_ if the GC > had already enqueued the Reference at the time of the `enqueue()` call)." > > The text emphasis with italics is to indicate this added significance of the > result of the `enqueue()` call -- ala `happens-before`. > > I'm not aware of a similar scenario covered in the JLS, so AFAIK there is not > precedent to be consistent with in that regard. Sounds good, thanks! - PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1501127639
Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v9]
On Mon, 27 Nov 2023 22:41:25 GMT, Hans Boehm wrote: >> Brent Christian has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Cleaner thread dequeue happens-before running cleaning action > > src/java.base/share/classes/java/lang/ref/Reference.java line 568: > >> 566: * >> 567: * @apiNote >> 568: * Reference processing or finalization may occur whenever the >> virtual machine detects that no > > How about "detects that all needed data from the object is available > elsewhere, and no reference to that object will ever be stored ..." Otherwise > this seems needlessly mysterious to me. I find the additional suggested "detects that all needed data from the object is available elsewhere" more mysterious and confusing. The current wording seems clearer, as it sets the scene for, and motivates, when and why the `rechabilityFence()` might be needed or used. I may be missing the significance of the suggested "all needed data from the object is available elsewhere" at this point in the description. - PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1499668692
Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v9]
On Thu, 22 Feb 2024 01:42:17 GMT, Brent Christian wrote: >> Classes in the `java.lang.ref` package would benefit from an update to bring >> the spec in line with how the VM already behaves. The changes would focus on >> _happens-before_ edges at some key points during reference processing. >> >> A couple key things we want to be able to say are: >> - `Reference.reachabilityFence(x)` _happens-before_ reference processing >> occurs for 'x'. >> - `Cleaner.register()` _happens-before_ the Cleaner thread runs the >> registered cleaning action. >> >> This will bring Cleaner in line (or close) with the memory visibility >> guarantees made for finalizers in [JLS >> 17.4.5](https://docs.oracle.com/javase/specs/jls/se18/html/jls-17.html#jls-17.4.5): >> _"There is a happens-before edge from the end of a constructor of an object >> to the start of a finalizer (§12.6) for that object."_ > > Brent Christian has updated the pull request incrementally with one > additional commit since the last revision: > > Cleaner thread dequeue happens-before running cleaning action Looks good; just some casual remarks on verbage & font at a couple of places. - PR Review: https://git.openjdk.org/jdk/pull/16644#pullrequestreview-1896527410
Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref [v9]
On Thu, 22 Feb 2024 12:05:31 GMT, Daniel Jeliński wrote: >> src/java.base/share/classes/java/lang/ref/Reference.java line 491: >> >>> 489: * If this reference is not registered with a queue, or was >>> already enqueued >>> 490: * (by the garbage collector, or a previous call to {@code >>> enqueue}), this >>> 491: * method is unnsuccessful and returns false. >> >> Suggestion: >> >> * method is unsuccessful and returns false. > > or, better yet, `fails` I note that the adjective(s) (un)successful and the adverb(s) (un)successfully are used at several places in these comments, it might makes sense to use those terms here as well such that the documentation in internally consistent in its use of success or failure of actions. In particular, if this terminology is consistent with precedent in the official JLS spec. However, I note that there are places where these terms are italicized and places where they aren't. I am not sure I follow the convention for italicization. In general, the first use (i.e. introduction) of a term that the reader might want to pay attention to calls for italicization when documents are read sequentially, such as in research papers. These javadoc specs will usually not be read in sequentially. But considering that someone does read them in order, I'd suggest italicizing only the first use of the term or, if not, then perhaps none. Alternatively, you might want to italicize all uses (but why?). - PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1499714011
RE: GBM as standalone buffer allocator
+ Abhinav From: Srinivas Pullakavi (QUIC) Sent: Monday, January 22, 2024 10:44 PM To: 'Yiwei Zhang' Cc: Rob Clark ; mesa-dev@lists.freedesktop.org Subject: RE: GBM as standalone buffer allocator Hi Yiwei, Looks like this thread is closed. https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26038#note_2243187 Can we collaborate on this? Thanks, Srinivas From: Yiwei Zhang mailto:zzyi...@chromium.org>> Sent: Monday, November 20, 2023 4:38 AM To: Srinivas Pullakavi (QUIC) mailto:quic_spull...@quicinc.com>> Cc: Rob Clark mailto:robdcl...@gmail.com>>; mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org> Subject: Re: GBM as standalone buffer allocator There’s https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26038. It is quite appealing to me considering a VK only scenario. On Thu, Nov 2, 2023 at 5:50 AM Srinivas Pullakavi (QUIC) mailto:quic_spull...@quicinc.com>> wrote: Hi Rob, Thanks for your inputs. We are planning to use DMA-Buf for GBM backend. DMA-buf supported heaps are listed in /dev/dma_heap/ Gbm backend selects the best heap based on usage. For example: Secure buffers will be allocated from secure heap. Sample output: # ls /dev/dma_heap reserved system Sample code to allocate a buffer from system heap: int heap_fd = open(/dev/dma_heap/system, O_RDONLY | O_CLOEXEC)) struct dma_heap_allocation_data heap_data { .len = size, // length of data to be allocated in bytes .fd_flags = O_RDWR | O_CLOEXEC, // permissions for the memory to be allocated }; int status = ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, _data) if (status == 0) { int buffer_fd = heap_data.fd; } In this case, there is no dependency on display / Graphics driver. But still GBM create device expects a device fd to be passed. Can we make it optional to pass device fd ? Thanks, Srinivas -Original Message- From: Rob Clark mailto:robdcl...@gmail.com>> Sent: Tuesday, October 24, 2023 1:06 AM To: Srinivas Pullakavi (QUIC) mailto:quic_spull...@quicinc.com>> Cc: mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org> Subject: Re: GBM as standalone buffer allocator On Mon, Oct 23, 2023 at 6:22 AM Srinivas Pullakavi (QUIC) mailto:quic_spull...@quicinc.com>> wrote: > > Hi, > > > > We are planning to enhance GBM as a standalone buffer allocator, which > can be used for all multi-media clients. Ex: video, camera, display > etc; > > > > GBM create device expects a file descriptor to be passed, which points to drm > node. This brings in a dependency on display for buffer allocation. On > headless devices where display driver is not present, GBM cannot be used for > buffer allocations. E.g. Recording cases where pipeline is setup between > Camera, Video, Graphics. > Note that you need some sort of device to allocate buffers from. With mesa and upstream kernel, that would be the drm device. (However as Adam points out, a drm device does not necessarily need a display.. for example, several vendors have compute-only GPUs (pci) which have no display outputs.) You might want to look at ChromeOS's minigbm. It already handles these cases (buffer sharing across display/gpu/video/camera). BR, -R [1] https://chromium.googlesource.com/chromiumos/platform/minigbm/ > > Could you please share your comments on what will be a good design to make > GBM flexible for above? > > > > Thanks, > > Srinivas > >
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Thu, 8 Feb 2024 20:04:20 GMT, Vladimir Yaroslavskiy wrote: >> Hi Vladimir (@iaroslavski), >> >> The new ArraysSortNew.Java has compilation issues: >> >> >> error: DualPivotQuicksort is not public in java.util; cannot be accessed >> from outside package >> java.util.DualPivotQuicksort.sort(b, PARALLELISM, 0, b.length); >> >> Have you run into this issue? >> >> Thanks, >> Vamsi > > Hi Vamsi (@vamsi-parasa), > > My fault, there was an incorrect version of ArraysSortNew.java. Methods, of > course, should be > > @Benchmark > public void sort() { > Arrays.sort(b); > } > > @Benchmark > public void p_sort() { > Arrays.parallelSort(b); > } > > I uploaded correct version, see > https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSortNew.java > > I also comment that pom.xml contains additional options (I guess you have the > same) > --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED > --add-exports=java.base/jdk.internal.vm.annotation=ALL-UNNAMED > full text is there > https://github.com/iaroslavski/sorting/blob/master/radixsort/pom.xml > > and command to run test is > java --add-exports=java.base/jdk.internal.vm.annotation=ALL-UNNAMED > --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED -jar > target/benchmarks.jar > > I assume that each variant of DPQS (DualPivotQuicksort_jdk, > DualPivotQuicksort_r20p, DualPivotQuicksort_r20s, DualPivotQuicksort_r25p, > DualPivotQuicksort_r25s) is renamed to DualPivotQuicksort and put into > package java.util. Then benchmarking for a given variant with patched JDK is > executed. > > Thank you, > Vladimir Hello Vladimir (@iaroslavski), Please see the data below. Each DPQS class was copied to java.util and the JDK was recompiled. Thanks, Vamsi http://www.w3.org/TR/REC-html40;> Benchmark | (builder) | (size) | Stock JDK | r20p | r20s | r25p | r25s -- | -- | -- | -- | -- | -- | -- | -- ArraysSort.Int.p_sort | RANDOM | 600 | 1.618 | 2.601 | 2.966 | 2.898 | 3.269 ArraysSort.Int.p_sort | RANDOM | 2000 | 7.433 | 8.438 | 8.463 | 8.414 | 8.65 ArraysSort.Int.p_sort | RANDOM | 9 | 258.853 | 355.261 | 326.378 | 347.65 | 321.894 ArraysSort.Int.p_sort | RANDOM | 40 | 842.085 | 1225.929 | 899.852 | 1278.681 | 932.627 ArraysSort.Int.p_sort | RANDOM | 300 | 5723.659 | 8711.108 | 6086.974 | 8948.101 | 6122.612 ArraysSort.Int.p_sort | REPEATED | 600 | 0.52 | 0.585 | 0.629 | 0.586 | 0.579 ArraysSort.Int.p_sort | REPEATED | 2000 | 1.18 | 1.225 | 1.21 | 1.225 | 1.238 ArraysSort.Int.p_sort | REPEATED | 9 | 102.142 | 85.79 | 86.131 | 87.954 | 86.036 ArraysSort.Int.p_sort | REPEATED | 40 | 244.508 | 229.142 | 227.613 | 228.608 | 228.367 ArraysSort.Int.p_sort | REPEATED | 300 | 2752.745 | 2584.103 | 2544.192 | 2576.803 | 2609.833 ArraysSort.Int.p_sort | STAGGER | 600 | 1.146 | 0.894 | 0.898 | 0.904 | 0.912 ArraysSort.Int.p_sort | STAGGER | 2000 | 3.712 | 3.096 | 3.121 | 3.03 | 3.049 ArraysSort.Int.p_sort | STAGGER | 9 | 72.763 | 77.575 | 78.366 | 79.158 | 77.199 ArraysSort.Int.p_sort | STAGGER | 40 | 212.455 | 228.331 | 225.888 | 224.686 | 225.728 ArraysSort.Int.p_sort | STAGGER | 300 | 2290.327 | 2216.741 | 2196.138 | 2236.658 | 2262.472 ArraysSort.Int.p_sort | SHUFFLE | 600 | 2.01 | 2.92 | 2.907 | 2.91 | 2.926 ArraysSort.Int.p_sort | SHUFFLE | 2000 | 7.06 | 7.759 | 7.776 | 7.688 | 8.062 ArraysSort.Int.p_sort | SHUFFLE | 9 | 157.728 | 151.871 | 151.101 | 154.03 | 151.2 ArraysSort.Int.p_sort | SHUFFLE | 40 | 441.166 | 715.243 | 449.698 | 699.75 | 447.069 ArraysSort.Int.p_sort | SHUFFLE | 300 | 4326.88 | 7133.045 | 4205.47 | 7161.862 | 4337.321 ArraysSort.Int.sort | RANDOM | 600 | 1.671 | 2.707 | 2.741 | 2.698 | 2.779 ArraysSort.Int.sort | RANDOM | 2000 | 7.265 | 8.226 | 8.942 | 8.193 | 8.339 ArraysSort.Int.sort | RANDOM | 9 | 529.054 | 559.499 | 554.29 | 566.009 | 559.131 ArraysSort.Int.sort | RANDOM | 40 | 2448.226 | 2654.71 | 2622.964 | 2629.673 | 2619.051 ArraysSort.Int.sort | RANDOM | 300 | 21471.133 | 22670.45 | 22654.94 | 22811.7 | 22957.97 ArraysSort.Int.sort | REPEATED | 600 | 0.517 | 0.578 | 0.578 | 0.587 | 0.568 ArraysSort.Int.sort | REPEATED | 2000 | 1.136 | 1.228 | 1.215 | 1.377 | 1.222 ArraysSort.Int.sort | REPEATED | 9 | 57.575 | 56.406 | 56.542 | 56.068 | 56.77 ArraysSort.Int.sort | REPEATED | 40 | 178.874 | 173.883 | 176.098 | 171.975 | 172.067 ArraysSort.Int.sort | REPEATED | 300 | 1856.71 | 1588.104 | 1489.842 | 1480.34 | 1522.399 ArraysSort.Int.sort | STAGGER | 600 | 1.143 | 0.893 | 0.901 | 0.896 | 0.906 ArraysSort.Int.sort | STAGGER | 2000 | 3.726 | 3.062 | 3.18 | 3.061 | 3.169 ArraysSort.Int.sort | STAGGER | 9 | 138.503 | 135.008 | 134.023 | 136.328 | 136.026 ArraysSort.Int.sort | STAGGER | 40 | 615.732 | 608.269 | 609.348 | 606.986 | 603.287 ArraysSort.Int.sort | STAGGER | 300 | 4914.443 | 4578.733 | 4584.407 | 4591.832 | 4613.16 ArraysSort.Int.sort | SHUFFLE | 600 | 2.137 | 2.886 | 2.948 |
[PATCH v2] libstdc++: add ARM SVE support to std::experimental::simd
Hi, Thanks for review @Richard!. I have tried to address most of your comments in this patch. The major updates include optimizing operator[] for masks, find_first_set and find_last_set. My further comments on some of the pointed out issues are a. regarding the coverage of types supported for sve : Yes, all the types are covered by mapping any type using simple two rules : the size of the type and signedness of it. b. all the operator overloads now use infix operators. For division and remainder, the inactive elements are padded with 1 to avoid undefined behavior. c. isnan is optimized to have only two cases i.e finite_math_only case or case where svcmpuo is used. d. _S_load for masks (bool) now uses svld1 by reinterpret_casting the pointer to uint8_t pointer and then performing a svunpklo. The same optimization is not done for masked_load and stores, as conversion of mask from a higher size type to lower size type is not optimal (sequential). e. _S_unary_minus could not use svneg_x because it does not support unsigned types. f. added specializations for reductions. g. find_first_set and find_last_set are optimized using svclastb. libstdc++-v3/ChangeLog: * include/Makefile.am: Add simd_sve.h. * include/Makefile.in: Add simd_sve.h. * include/experimental/bits/simd.h: Add new SveAbi. * include/experimental/bits/simd_builtin.h: Use __no_sve_deduce_t to support existing Neon Abi. * include/experimental/bits/simd_converter.h: Convert sequentially when sve is available. * include/experimental/bits/simd_detail.h: Define sve specific macro. * include/experimental/bits/simd_math.h: Fallback frexp to execute sequntially when sve is available, to handle fixed_size_simd return type that always uses sve. * include/experimental/simd: Include bits/simd_sve.h. * testsuite/experimental/simd/tests/bits/main.h: Enable testing for sve128, sve256, sve512. * include/experimental/bits/simd_sve.h: New file. Signed-off-by: Srinivas Yadav Singanaboina vasu.srinivasvasu...@gmail.com --- libstdc++-v3/include/Makefile.am |1 + libstdc++-v3/include/Makefile.in |1 + libstdc++-v3/include/experimental/bits/simd.h | 131 +- .../include/experimental/bits/simd_builtin.h | 35 +- .../experimental/bits/simd_converter.h| 57 +- .../include/experimental/bits/simd_detail.h |7 +- .../include/experimental/bits/simd_math.h | 14 +- .../include/experimental/bits/simd_sve.h | 1863 + libstdc++-v3/include/experimental/simd|3 + .../experimental/simd/tests/bits/main.h |3 + 10 files changed, 2084 insertions(+), 31 deletions(-) create mode 100644 libstdc++-v3/include/experimental/bits/simd_sve.h diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am index 6209f390e08..1170cb047a6 100644 --- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -826,6 +826,7 @@ experimental_bits_headers = \ ${experimental_bits_srcdir}/simd_neon.h \ ${experimental_bits_srcdir}/simd_ppc.h \ ${experimental_bits_srcdir}/simd_scalar.h \ + ${experimental_bits_srcdir}/simd_sve.h \ ${experimental_bits_srcdir}/simd_x86.h \ ${experimental_bits_srcdir}/simd_x86_conversions.h \ ${experimental_bits_srcdir}/string_view.tcc \ diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in index 596fa0d2390..bc44582a2da 100644 --- a/libstdc++-v3/include/Makefile.in +++ b/libstdc++-v3/include/Makefile.in @@ -1172,6 +1172,7 @@ experimental_bits_headers = \ ${experimental_bits_srcdir}/simd_neon.h \ ${experimental_bits_srcdir}/simd_ppc.h \ ${experimental_bits_srcdir}/simd_scalar.h \ + ${experimental_bits_srcdir}/simd_sve.h \ ${experimental_bits_srcdir}/simd_x86.h \ ${experimental_bits_srcdir}/simd_x86_conversions.h \ ${experimental_bits_srcdir}/string_view.tcc \ diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h index 90523ea57dc..d274cd740fe 100644 --- a/libstdc++-v3/include/experimental/bits/simd.h +++ b/libstdc++-v3/include/experimental/bits/simd.h @@ -39,12 +39,16 @@ #include #include #include +#include #if _GLIBCXX_SIMD_X86INTRIN #include #elif _GLIBCXX_SIMD_HAVE_NEON #include #endif +#if _GLIBCXX_SIMD_HAVE_SVE +#include +#endif /** @ingroup ts_simd * @{ @@ -83,6 +87,12 @@ using __m512d [[__gnu__::__vector_size__(64)]] = double; using __m512i [[__gnu__::__vector_size__(64)]] = long long; #endif +#if _GLIBCXX_SIMD_HAVE_SVE +constexpr inline int __sve_vectorized_size_bytes = __ARM_FEATURE_SVE_BITS / 8; +#else +constexpr inline int __sve_vectorized_size_bytes = 0; +#endif + namespace simd_abi { // simd_abi forward declarations {{{ // implementation details: @@ -108,6
[PATCH v2] libstdc++: add ARM SVE support to std::experimental::simd
Hi, Thanks for review @Richard!. I have tried to address most of your comments in this patch. The major updates include optimizing operator[] for masks, find_first_set and find_last_set. My further comments on some of the pointed out issues are a. regarding the coverage of types supported for sve : Yes, all the types are covered by mapping any type using simple two rules : the size of the type and signedness of it. b. all the operator overloads now use infix operators. For division and remainder, the inactive elements are padded with 1 to avoid undefined behavior. c. isnan is optimized to have only two cases i.e finite_math_only case or case where svcmpuo is used. d. _S_load for masks (bool) now uses svld1 by reinterpret_casting the pointer to uint8_t pointer and then performing a svunpklo. The same optimization is not done for masked_load and stores, as conversion of mask from a higher size type to lower size type is not optimal (sequential). e. _S_unary_minus could not use svneg_x because it does not support unsigned types. f. added specializations for reductions. g. find_first_set and find_last_set are optimized using svclastb. libstdc++-v3/ChangeLog: * include/Makefile.am: Add simd_sve.h. * include/Makefile.in: Add simd_sve.h. * include/experimental/bits/simd.h: Add new SveAbi. * include/experimental/bits/simd_builtin.h: Use __no_sve_deduce_t to support existing Neon Abi. * include/experimental/bits/simd_converter.h: Convert sequentially when sve is available. * include/experimental/bits/simd_detail.h: Define sve specific macro. * include/experimental/bits/simd_math.h: Fallback frexp to execute sequntially when sve is available, to handle fixed_size_simd return type that always uses sve. * include/experimental/simd: Include bits/simd_sve.h. * testsuite/experimental/simd/tests/bits/main.h: Enable testing for sve128, sve256, sve512. * include/experimental/bits/simd_sve.h: New file. Signed-off-by: Srinivas Yadav Singanaboina vasu.srinivasvasu...@gmail.com --- libstdc++-v3/include/Makefile.am |1 + libstdc++-v3/include/Makefile.in |1 + libstdc++-v3/include/experimental/bits/simd.h | 131 +- .../include/experimental/bits/simd_builtin.h | 35 +- .../experimental/bits/simd_converter.h| 57 +- .../include/experimental/bits/simd_detail.h |7 +- .../include/experimental/bits/simd_math.h | 14 +- .../include/experimental/bits/simd_sve.h | 1863 + libstdc++-v3/include/experimental/simd|3 + .../experimental/simd/tests/bits/main.h |3 + 10 files changed, 2084 insertions(+), 31 deletions(-) create mode 100644 libstdc++-v3/include/experimental/bits/simd_sve.h diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am index 6209f390e08..1170cb047a6 100644 --- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -826,6 +826,7 @@ experimental_bits_headers = \ ${experimental_bits_srcdir}/simd_neon.h \ ${experimental_bits_srcdir}/simd_ppc.h \ ${experimental_bits_srcdir}/simd_scalar.h \ + ${experimental_bits_srcdir}/simd_sve.h \ ${experimental_bits_srcdir}/simd_x86.h \ ${experimental_bits_srcdir}/simd_x86_conversions.h \ ${experimental_bits_srcdir}/string_view.tcc \ diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in index 596fa0d2390..bc44582a2da 100644 --- a/libstdc++-v3/include/Makefile.in +++ b/libstdc++-v3/include/Makefile.in @@ -1172,6 +1172,7 @@ experimental_bits_headers = \ ${experimental_bits_srcdir}/simd_neon.h \ ${experimental_bits_srcdir}/simd_ppc.h \ ${experimental_bits_srcdir}/simd_scalar.h \ + ${experimental_bits_srcdir}/simd_sve.h \ ${experimental_bits_srcdir}/simd_x86.h \ ${experimental_bits_srcdir}/simd_x86_conversions.h \ ${experimental_bits_srcdir}/string_view.tcc \ diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h index 90523ea57dc..d274cd740fe 100644 --- a/libstdc++-v3/include/experimental/bits/simd.h +++ b/libstdc++-v3/include/experimental/bits/simd.h @@ -39,12 +39,16 @@ #include #include #include +#include #if _GLIBCXX_SIMD_X86INTRIN #include #elif _GLIBCXX_SIMD_HAVE_NEON #include #endif +#if _GLIBCXX_SIMD_HAVE_SVE +#include +#endif /** @ingroup ts_simd * @{ @@ -83,6 +87,12 @@ using __m512d [[__gnu__::__vector_size__(64)]] = double; using __m512i [[__gnu__::__vector_size__(64)]] = long long; #endif +#if _GLIBCXX_SIMD_HAVE_SVE +constexpr inline int __sve_vectorized_size_bytes = __ARM_FEATURE_SVE_BITS / 8; +#else +constexpr inline int __sve_vectorized_size_bytes = 0; +#endif + namespace simd_abi { // simd_abi forward declarations {{{ // implementation details: @@ -108,6
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Mon, 5 Feb 2024 21:31:36 GMT, Vladimir Yaroslavskiy wrote: >> Hi Vladimir (@iaroslavski), >> >> Please see the data below. All tests were run after putting the DPQS code in >> java.util package and recompiling the JDK for each case. >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> >> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> >> >> >> >> >> >> >> >> >> Benchmark (us/op) | (builder) | (size) | Stock JDK | a15 | r20p | r20s >> -- | -- | -- | -- | -- | -- | -- >> ArraysSort.Int.testParallelSort | RANDOM | 600 | 2.24 | 2.201 | 2.423 | 2.389 >> ArraysSort.Int.testParallelSort | RANDOM | 9000 | 35.318 | 35.961 | 79.028 | >> 83.774 >> ArraysSort.Int.testParallelSort | RANDOM | 2 | 118.729 | 120.872 | >> 134.829 | 138.349 >> ArraysSort.Int.testParallelSort | RANDOM | 40 | 822.676 | 822.44 | >> 1200.858 | 872.264 >> ArraysSort.Int.testParallelSort | RANDOM | 300 | 5864.514 | 5948.82 | >> 8800.391 | 6020.616 >> ArraysSort.Int.testParallelSort | REPEATED | 600 | 0.924 | 0.936 | 0.752 | >> 0.733 >> ArraysSort.Int.testParallelSort | REPEATED | 9000 | 9.896 | 9.317 | 31.409 | >> 24.896 >> ArraysSort.Int.testParallelSort | REPEATED | 2 | 58.265 | 42.189 | 40.92 >> | 40.101 >> ArraysSort.Int.testParallelSort | REPEATED | 40 | 256.952 | 253.217 | >> 236.568 | 239.163 >> ArraysSort.Int.testParallelSort | REPEATED | 300 | 2844.107 | 2851.088 | >> 2752.939 | 3040.423 >> ArraysSort.Int.testParallelSort | STAGGER | 600 | 2.245 | 2.296 | 2.15 | >> 2.219 >> ArraysSort.Int.testParallelSort | STAGGER | 9000 | 29.278 | 29.119 | 28.288 >> | 28.141 >> ArraysSort.Int.testParallelSort | STAGGER | 2 | 50.129 | 50.442 | 49.746 >> | 49.686 >> ArraysSort.Int.testParallelSort | STAGGER | 40 | 463.309 | 413.619 | >> 418.077 | 407.519 >> ArraysSort.Int.testParallelSort | STAGGER | 300 | 3687.198 | 4363.242 | >> 3732.777 | 3769.898 >> ArraysSort.Int.testParallelSort | SHUFFLE | 600 | 1.715 | 1.698 | 2.799 | >> 2.733 >> ArraysSort.Int.testParallelSort | SHUFFLE | 9000 | 27.69 | 27.183 | 32.883 | >> 32.373 >> ArraysSort.Int.testParallelSort | SHUFFLE | 2 | 62.067 | 60.987 | 63.281 >> | 52.89 >> ArraysSort.Int.testParalle... > > Hello Vamsi (@vamsi-parasa), > > Many thanks for the results! Now we can see that intrinsics are applied in > all cases, > but there are big differences between the same code. > > For example, > parallelSort REPEATED 2: 58.265(Stock JDK) and 42.189(a15) with speedup > x1.38 > parallelSort STAGGER 300: 3687.198(Stock JDK) 4363.242(a15) with speedup > x0.85 > > Case a15 is the current source code from JDK, but in one benchmarking it is > faster, > in other benchmarking it is slower (~15-30%). > > Other strange behaviour with new sorting: r20p and r20s have the same code for > sequential sorting (no radix sort at all), but we can see that on case works > much slower > > sort STAGGER 300: 34406.74(r20p) and 10467.03(r20s) - 3.3 times slower, > whereas other sizes show more or less equal values. > > Vamsi (@vamsi-parasa), > Could you please run benchmarking of 5 cases with **updated** test class > **ArraysSortNew**? > https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSortNew.java > > Put the DPQS code in java.util package and recompiling the JDK for each case > as you > did before, but run new **ArraysSortNew**. > > Find the sources there: > > https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSortNew.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_jdk.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20p.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20s.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r25p.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r25s.java > > Thank you, > Vladimir Hi Vladimir (@iaroslavski), The new ArraysSortNew.Java has compilation issues: error: DualPivotQuicksort is not public in java.util; cannot be accessed from outside package java.util.DualPivotQuicksort.sort(b, PARALLELISM, 0, b.length); Have you run into this issue? Thanks, Vamsi - PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1933243711
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Sun, 28 Jan 2024 22:23:38 GMT, Vladimir Yaroslavskiy wrote: >> Hi Vladimir (@iaroslavski), >> >> Please see the JMH data below. >> >> Thanks, >> Vamsi >> >> Benchmark (builder) (size) Mode Cnt Score Error >> Units >> ArraysSort.Int.a15RANDOM 600 avgt4 7.096 ±0.081 >> us/op >> ArraysSort.Int.a15RANDOM 2000 avgt4 44.014 ±1.717 >> us/op >> ArraysSort.Int.a15RANDOM9 avgt44451.444 ± 71.168 >> us/op >> ArraysSort.Int.a15RANDOM 40 avgt4 22751.966 ± 683.597 >> us/op >> ArraysSort.Int.a15RANDOM 300 avgt4 190326.306 ± 8008.512 >> us/op >> ArraysSort.Int.a15 REPEATED 600 avgt4 1.044 ±0.016 >> us/op >> ArraysSort.Int.a15 REPEATED 2000 avgt4 2.272 ±0.287 >> us/op >> ArraysSort.Int.a15 REPEATED9 avgt4 412.331 ± 11.656 >> us/op >> ArraysSort.Int.a15 REPEATED 40 avgt41908.978 ± 30.241 >> us/op >> ArraysSort.Int.a15 REPEATED 300 avgt4 15163.443 ± 100.425 >> us/op >> ArraysSort.Int.a15 STAGGER 600 avgt4 1.055 ±0.057 >> us/op >> ArraysSort.Int.a15 STAGGER 2000 avgt4 3.408 ±0.096 >> us/op >> ArraysSort.Int.a15 STAGGER9 avgt4 149.220 ±4.022 >> us/op >> ArraysSort.Int.a15 STAGGER 40 avgt4 663.096 ± 30.252 >> us/op >> ArraysSort.Int.a15 STAGGER 300 avgt45206.890 ± 234.857 >> us/op >> ArraysSort.Int.a15 SHUFFLE 600 avgt4 4.611 ±0.118 >> us/op >> ArraysSort.Int.a15 SHUFFLE 2000 avgt4 17.955 ±0.356 >> us/op >> ArraysSort.Int.a15 SHUFFLE9 avgt41410.357 ± 41.128 >> us/op >> ArraysSort.Int.a15 SHUFFLE 40 avgt45739.311 ± 128.270 >> us/op >> ArraysSort.Int.a15 SHUFFLE 300 avgt4 41501.980 ± 829.443 >> us/op >> ArraysSort.Int.jdkRANDOM 600 avgt4 1.612 ±0.088 >> us/op >> ArraysSort.Int.jdkRANDOM 2000 avgt4 6.893 ±0.375 >> us/op >> ArraysSort.Int.jdkRANDOM9 avgt4 522.749 ± 19.386 >> us/op >> ArraysSort.Int.jdkRANDOM 40 avgt42424.204 ± 63.844 >> us/op >> ArraysSort.Int.jdkRANDOM 300 avgt4 21000.434 ± 801.315 >> us/op >> ArraysSort.Int.jdk REPEATED 600 avgt4 0.496 ±0.030 >> us/op >> ArraysSort.Int.jdk REPEATED 2000 avgt4 1.037 ±0.083 >> us/op >> ArraysSort.Int.jdk REPE... > > Hi Vamsi (@vamsi-parasa), Laurent(@bourgesl), > > The latest benchmarking compares compares the following versions: > jdk - direct call of Arrays.sort(); > a15 - the current source of DualPivotQuicksort from the latest build (except > renaming) > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/DualPivotQuicksort.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a15.java > r20s - new version without Radix sort > r20p - new version with Radix sort in parallel case only > > It is expected that timing of jdk and a15 should be more or less the same, > but please look at the results: > > Benchmark | Data Type | Array Size | Arrays.sort() from jdk | Current source > (a15) > -- | -- | -- | -- | -- > ArraysSort.Int.testSort | RANDOM | 600 | 1.612 | 7.096 > ArraysSort.Int.testSort | RANDOM | 2000 | 6.893 | 44.014 > ArraysSort.Int.testSort | RANDOM | 9 | 522.749 | 4451.444 > ArraysSort.Int.testSort | RANDOM | 40 | 2424.204 | 22751.966 > ArraysSort.Int.testSort | RANDOM | 300 | 21000.434 | 190326.306 > ArraysSort.Int.testSort | REPEATED | 600 | 0.496 | 1.044 > ArraysSort.Int.testSort | REPEATED | 2000 | 1.037 | 2.272 > ArraysSort.Int.testSort | REPEATED | 9 | 57.763 | 412.331 > ArraysSort.Int.testSort | REPEATED | 40 | 182.238 | 1908.978 > ArraysSort.Int.testSort | REPEATED | 300 | 1708.082 | 15163.443 > ArraysSort.Int.testSort | STAGGER | 600 | 1.038 | 1.055 > ArraysSort.Int.testSort | STAGGER | 2000 | 3.434 | 3.408 > ArraysSort.Int.testSort | STAGGER | 9 | 148.638 | 149.220 > ArraysSort.Int.testSort | STAGGER | 40 | 663.076 | 663.096 > ArraysSort.Int.testSort | STAGGER | 300 | 5212.821 | 5206.890 > ArraysSort.Int.testSort | SHUFFLE | 600 | 1.926 | 4.611 > ArraysSort.Int.testSort | SHUFFLE | 2000 | 6.858 | 17.955 > ArraysSort.Int.testSort | SHUFFLE | 9 | 473.441 | 1410.357 > ArraysSort.Int.testSort | SHUFFLE | 40 | 2153.779 | 5739.311 > ArraysSort.Int.testSort | SHUFFLE | 300 | 18180.141 | 41501.980 > > You can see that a15 (current source) works extremly slower than > Arrays.sort(), but the code is the same > with
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Sun, 28 Jan 2024 22:23:38 GMT, Vladimir Yaroslavskiy wrote: >> Hi Vladimir (@iaroslavski), >> >> Please see the JMH data below. >> >> Thanks, >> Vamsi >> >> Benchmark (builder) (size) Mode Cnt Score Error >> Units >> ArraysSort.Int.a15RANDOM 600 avgt4 7.096 ±0.081 >> us/op >> ArraysSort.Int.a15RANDOM 2000 avgt4 44.014 ±1.717 >> us/op >> ArraysSort.Int.a15RANDOM9 avgt44451.444 ± 71.168 >> us/op >> ArraysSort.Int.a15RANDOM 40 avgt4 22751.966 ± 683.597 >> us/op >> ArraysSort.Int.a15RANDOM 300 avgt4 190326.306 ± 8008.512 >> us/op >> ArraysSort.Int.a15 REPEATED 600 avgt4 1.044 ±0.016 >> us/op >> ArraysSort.Int.a15 REPEATED 2000 avgt4 2.272 ±0.287 >> us/op >> ArraysSort.Int.a15 REPEATED9 avgt4 412.331 ± 11.656 >> us/op >> ArraysSort.Int.a15 REPEATED 40 avgt41908.978 ± 30.241 >> us/op >> ArraysSort.Int.a15 REPEATED 300 avgt4 15163.443 ± 100.425 >> us/op >> ArraysSort.Int.a15 STAGGER 600 avgt4 1.055 ±0.057 >> us/op >> ArraysSort.Int.a15 STAGGER 2000 avgt4 3.408 ±0.096 >> us/op >> ArraysSort.Int.a15 STAGGER9 avgt4 149.220 ±4.022 >> us/op >> ArraysSort.Int.a15 STAGGER 40 avgt4 663.096 ± 30.252 >> us/op >> ArraysSort.Int.a15 STAGGER 300 avgt45206.890 ± 234.857 >> us/op >> ArraysSort.Int.a15 SHUFFLE 600 avgt4 4.611 ±0.118 >> us/op >> ArraysSort.Int.a15 SHUFFLE 2000 avgt4 17.955 ±0.356 >> us/op >> ArraysSort.Int.a15 SHUFFLE9 avgt41410.357 ± 41.128 >> us/op >> ArraysSort.Int.a15 SHUFFLE 40 avgt45739.311 ± 128.270 >> us/op >> ArraysSort.Int.a15 SHUFFLE 300 avgt4 41501.980 ± 829.443 >> us/op >> ArraysSort.Int.jdkRANDOM 600 avgt4 1.612 ±0.088 >> us/op >> ArraysSort.Int.jdkRANDOM 2000 avgt4 6.893 ±0.375 >> us/op >> ArraysSort.Int.jdkRANDOM9 avgt4 522.749 ± 19.386 >> us/op >> ArraysSort.Int.jdkRANDOM 40 avgt42424.204 ± 63.844 >> us/op >> ArraysSort.Int.jdkRANDOM 300 avgt4 21000.434 ± 801.315 >> us/op >> ArraysSort.Int.jdk REPEATED 600 avgt4 0.496 ±0.030 >> us/op >> ArraysSort.Int.jdk REPEATED 2000 avgt4 1.037 ±0.083 >> us/op >> ArraysSort.Int.jdk REPE... > > Hi Vamsi (@vamsi-parasa), Laurent(@bourgesl), > > The latest benchmarking compares compares the following versions: > jdk - direct call of Arrays.sort(); > a15 - the current source of DualPivotQuicksort from the latest build (except > renaming) > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/DualPivotQuicksort.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a15.java > r20s - new version without Radix sort > r20p - new version with Radix sort in parallel case only > > It is expected that timing of jdk and a15 should be more or less the same, > but please look at the results: > > Benchmark | Data Type | Array Size | Arrays.sort() from jdk | Current source > (a15) > -- | -- | -- | -- | -- > ArraysSort.Int.testSort | RANDOM | 600 | 1.612 | 7.096 > ArraysSort.Int.testSort | RANDOM | 2000 | 6.893 | 44.014 > ArraysSort.Int.testSort | RANDOM | 9 | 522.749 | 4451.444 > ArraysSort.Int.testSort | RANDOM | 40 | 2424.204 | 22751.966 > ArraysSort.Int.testSort | RANDOM | 300 | 21000.434 | 190326.306 > ArraysSort.Int.testSort | REPEATED | 600 | 0.496 | 1.044 > ArraysSort.Int.testSort | REPEATED | 2000 | 1.037 | 2.272 > ArraysSort.Int.testSort | REPEATED | 9 | 57.763 | 412.331 > ArraysSort.Int.testSort | REPEATED | 40 | 182.238 | 1908.978 > ArraysSort.Int.testSort | REPEATED | 300 | 1708.082 | 15163.443 > ArraysSort.Int.testSort | STAGGER | 600 | 1.038 | 1.055 > ArraysSort.Int.testSort | STAGGER | 2000 | 3.434 | 3.408 > ArraysSort.Int.testSort | STAGGER | 9 | 148.638 | 149.220 > ArraysSort.Int.testSort | STAGGER | 40 | 663.076 | 663.096 > ArraysSort.Int.testSort | STAGGER | 300 | 5212.821 | 5206.890 > ArraysSort.Int.testSort | SHUFFLE | 600 | 1.926 | 4.611 > ArraysSort.Int.testSort | SHUFFLE | 2000 | 6.858 | 17.955 > ArraysSort.Int.testSort | SHUFFLE | 9 | 473.441 | 1410.357 > ArraysSort.Int.testSort | SHUFFLE | 40 | 2153.779 | 5739.311 > ArraysSort.Int.testSort | SHUFFLE | 300 | 18180.141 | 41501.980 > > You can see that a15 (current source) works extremly slower than > Arrays.sort(), but the code is the same > with
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Thu, 18 Jan 2024 21:36:22 GMT, Vladimir Yaroslavskiy wrote: >> Hi Vladimir (@iaroslavski) >> >> Please see the data below using the latest version of AVX512 sort that got >> integrated into OpenJDK. >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> >> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> >> >> >> >> >> >> >> >> >> Benchmark (us/op) | (builder) | Stock JDK | a10 | r14 | r17 | r18 >> -- | -- | -- | -- | -- | -- | -- >> ArraysSort.Int.testSort | RANDOM | 2.202 | 2.226 | 1.535 | 1.556 | 1.546 >> ArraysSort.Int.testSort | RANDOM | 35.128 | 34.804 | 30.808 | 30.914 | 31.284 >> ArraysSort.Int.testSort | RANDOM | 78.571 | 77.224 | 72.567 | 73.098 | 73.337 >> ArraysSort.Int.testSort | RANDOM | 2466.487 | 2470.66 | 2504.654 | 2494.051 >> | 2499.746 >> ArraysSort.Int.testSort | RANDOM | 20704.14 | 20668.19 | 21377.73 | 21362.63 >> | 21278.94 >> ArraysSort.Int.testSort | REPEATED | 0.877 | 0.892 | 0.74 | 0.724 | 0.718 >> ArraysSort.Int.testSort | REPEATED | 4.789 | 4.788 | 4.92 | 4.721 | 4.891 >> ArraysSort.Int.testSort | REPEATED | 11.172 | 11.778 | 11.53 | 11.467 | >> 11.406 >> ArraysSort.Int.testSort | REPEATED | 207.212 | 207.292 | 255.46 | 258.832 | >> 254.44 >> ArraysSort.Int.testSort | REPEATED | 1862.544 | 1876.759 | 1952.646 | >> 1957.978 | 1981.906 >> ArraysSort.Int.testSort | STAGGER | 2.092 | 2.137 | 1.999 | 2.031 | 2.015 >> ArraysSort.Int.testSort | STAGGER | 29.891 | 30.321 | 25.626 | 26.318 | >> 26.396 >> ArraysSort.Int.testSort | STAGGER | 60.979 | 83.439 | 57.864 | 57.213 | >> 79.762 >> ArraysSort.Int.testSort | STAGGER | 1227.933 | 1224.495 | 1236.133 | >> 1229.773 | 1228.877 >> ArraysSort.Int.testSort | STAGGER | 9514.873 | 9565.599 | 9491.509 | >> 9481.147 | 9481.905 >> ArraysSort.Int.testSort | SHUFFLE | 1.608 | 1.595 | 1.419 | 1.442 | 1.491 >> ArraysSort.Int.testSort | SHUFFLE | 31.566 | 32.789 | 28.718 | 28.768 | >> 28.671 >> ArraysSort.Int.testSort | SHUFFLE | 82.157 | 83.741 | 70.889 | 69.951 | >> 71.196 >> ArraysSort.Int.testSort | SHUFFLE | 2251.219 | 2248.496 | 2184.459 | >> 2163.969 | 2156.239 >> ArraysSort.Int.testSort | SHUFFLE | 18211.05 | 18223.24 | 17987.4 | 18114.26 >> | 17994.98 > ... > > Hello Vamsi (@vamsi-parasa), > > Could you please run the benchmarking of new DQPS in your environment with > AVX? > > Take all classes below and put them in the package > org.openjdk.bench.java.util. > ArraysSort class contains all tests for the new versions and ready to use. > (it will run all tests in one execution). > > https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSort.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a15.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20s.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20p.java > > Many thanks, > Vladimir Hi Vladimir (@iaroslavski), Please see the JMH data below. Thanks, Vamsi Benchmark (builder) (size) Mode Cnt Score Error Units ArraysSort.Int.a15RANDOM 600 avgt4 7.096 ±0.081 us/op ArraysSort.Int.a15RANDOM 2000 avgt4 44.014 ±1.717 us/op ArraysSort.Int.a15RANDOM9 avgt44451.444 ± 71.168 us/op ArraysSort.Int.a15RANDOM 40 avgt4 22751.966 ± 683.597 us/op ArraysSort.Int.a15RANDOM 300 avgt4 190326.306 ± 8008.512 us/op ArraysSort.Int.a15 REPEATED 600 avgt4 1.044 ±0.016 us/op ArraysSort.Int.a15 REPEATED 2000 avgt4 2.272 ±0.287 us/op ArraysSort.Int.a15 REPEATED9 avgt4 412.331 ± 11.656 us/op ArraysSort.Int.a15 REPEATED 40 avgt41908.978 ± 30.241 us/op ArraysSort.Int.a15 REPEATED 300 avgt4 15163.443 ± 100.425 us/op ArraysSort.Int.a15 STAGGER 600 avgt4 1.055 ±0.057 us/op ArraysSort.Int.a15 STAGGER 2000 avgt4 3.408 ±0.096 us/op ArraysSort.Int.a15 STAGGER9 avgt4 149.220 ±4.022 us/op ArraysSort.Int.a15 STAGGER 40 avgt4 663.096 ± 30.252 us/op ArraysSort.Int.a15 STAGGER 300 avgt45206.890 ± 234.857 us/op ArraysSort.Int.a15 SHUFFLE 600 avgt4 4.611 ±0.118 us/op ArraysSort.Int.a15 SHUFFLE 2000 avgt4 17.955 ±0.356 us/op ArraysSort.Int.a15 SHUFFLE9 avgt41410.357 ± 41.128 us/op ArraysSort.Int.a15 SHUFFLE 40 avgt45739.311 ± 128.270 us/op ArraysSort.Int.a15 SHUFFLE 300 avgt4 41501.980 ± 829.443 us/op
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Thu, 18 Jan 2024 21:36:22 GMT, Vladimir Yaroslavskiy wrote: >> Hi Vladimir (@iaroslavski) >> >> Please see the data below using the latest version of AVX512 sort that got >> integrated into OpenJDK. >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> >> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> >> >> >> >> >> >> >> >> >> Benchmark (us/op) | (builder) | Stock JDK | a10 | r14 | r17 | r18 >> -- | -- | -- | -- | -- | -- | -- >> ArraysSort.Int.testSort | RANDOM | 2.202 | 2.226 | 1.535 | 1.556 | 1.546 >> ArraysSort.Int.testSort | RANDOM | 35.128 | 34.804 | 30.808 | 30.914 | 31.284 >> ArraysSort.Int.testSort | RANDOM | 78.571 | 77.224 | 72.567 | 73.098 | 73.337 >> ArraysSort.Int.testSort | RANDOM | 2466.487 | 2470.66 | 2504.654 | 2494.051 >> | 2499.746 >> ArraysSort.Int.testSort | RANDOM | 20704.14 | 20668.19 | 21377.73 | 21362.63 >> | 21278.94 >> ArraysSort.Int.testSort | REPEATED | 0.877 | 0.892 | 0.74 | 0.724 | 0.718 >> ArraysSort.Int.testSort | REPEATED | 4.789 | 4.788 | 4.92 | 4.721 | 4.891 >> ArraysSort.Int.testSort | REPEATED | 11.172 | 11.778 | 11.53 | 11.467 | >> 11.406 >> ArraysSort.Int.testSort | REPEATED | 207.212 | 207.292 | 255.46 | 258.832 | >> 254.44 >> ArraysSort.Int.testSort | REPEATED | 1862.544 | 1876.759 | 1952.646 | >> 1957.978 | 1981.906 >> ArraysSort.Int.testSort | STAGGER | 2.092 | 2.137 | 1.999 | 2.031 | 2.015 >> ArraysSort.Int.testSort | STAGGER | 29.891 | 30.321 | 25.626 | 26.318 | >> 26.396 >> ArraysSort.Int.testSort | STAGGER | 60.979 | 83.439 | 57.864 | 57.213 | >> 79.762 >> ArraysSort.Int.testSort | STAGGER | 1227.933 | 1224.495 | 1236.133 | >> 1229.773 | 1228.877 >> ArraysSort.Int.testSort | STAGGER | 9514.873 | 9565.599 | 9491.509 | >> 9481.147 | 9481.905 >> ArraysSort.Int.testSort | SHUFFLE | 1.608 | 1.595 | 1.419 | 1.442 | 1.491 >> ArraysSort.Int.testSort | SHUFFLE | 31.566 | 32.789 | 28.718 | 28.768 | >> 28.671 >> ArraysSort.Int.testSort | SHUFFLE | 82.157 | 83.741 | 70.889 | 69.951 | >> 71.196 >> ArraysSort.Int.testSort | SHUFFLE | 2251.219 | 2248.496 | 2184.459 | >> 2163.969 | 2156.239 >> ArraysSort.Int.testSort | SHUFFLE | 18211.05 | 18223.24 | 17987.4 | 18114.26 >> | 17994.98 > ... > > Hello Vamsi (@vamsi-parasa), > > Could you please run the benchmarking of new DQPS in your environment with > AVX? > > Take all classes below and put them in the package > org.openjdk.bench.java.util. > ArraysSort class contains all tests for the new versions and ready to use. > (it will run all tests in one execution). > > https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSort.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a15.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20s.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20p.java > > Many thanks, > Vladimir Hi Vladimir (@iaroslavski), I was able to figure out the issue and started the benchmarking JMH run. It's night time here, will provide the data Friday morning (US PST) Thanks, Vamsi - PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1911741126
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Thu, 18 Jan 2024 21:36:22 GMT, Vladimir Yaroslavskiy wrote: >> Hi Vladimir (@iaroslavski) >> >> Please see the data below using the latest version of AVX512 sort that got >> integrated into OpenJDK. >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> >> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> >> >> >> >> >> >> >> >> >> Benchmark (us/op) | (builder) | Stock JDK | a10 | r14 | r17 | r18 >> -- | -- | -- | -- | -- | -- | -- >> ArraysSort.Int.testSort | RANDOM | 2.202 | 2.226 | 1.535 | 1.556 | 1.546 >> ArraysSort.Int.testSort | RANDOM | 35.128 | 34.804 | 30.808 | 30.914 | 31.284 >> ArraysSort.Int.testSort | RANDOM | 78.571 | 77.224 | 72.567 | 73.098 | 73.337 >> ArraysSort.Int.testSort | RANDOM | 2466.487 | 2470.66 | 2504.654 | 2494.051 >> | 2499.746 >> ArraysSort.Int.testSort | RANDOM | 20704.14 | 20668.19 | 21377.73 | 21362.63 >> | 21278.94 >> ArraysSort.Int.testSort | REPEATED | 0.877 | 0.892 | 0.74 | 0.724 | 0.718 >> ArraysSort.Int.testSort | REPEATED | 4.789 | 4.788 | 4.92 | 4.721 | 4.891 >> ArraysSort.Int.testSort | REPEATED | 11.172 | 11.778 | 11.53 | 11.467 | >> 11.406 >> ArraysSort.Int.testSort | REPEATED | 207.212 | 207.292 | 255.46 | 258.832 | >> 254.44 >> ArraysSort.Int.testSort | REPEATED | 1862.544 | 1876.759 | 1952.646 | >> 1957.978 | 1981.906 >> ArraysSort.Int.testSort | STAGGER | 2.092 | 2.137 | 1.999 | 2.031 | 2.015 >> ArraysSort.Int.testSort | STAGGER | 29.891 | 30.321 | 25.626 | 26.318 | >> 26.396 >> ArraysSort.Int.testSort | STAGGER | 60.979 | 83.439 | 57.864 | 57.213 | >> 79.762 >> ArraysSort.Int.testSort | STAGGER | 1227.933 | 1224.495 | 1236.133 | >> 1229.773 | 1228.877 >> ArraysSort.Int.testSort | STAGGER | 9514.873 | 9565.599 | 9491.509 | >> 9481.147 | 9481.905 >> ArraysSort.Int.testSort | SHUFFLE | 1.608 | 1.595 | 1.419 | 1.442 | 1.491 >> ArraysSort.Int.testSort | SHUFFLE | 31.566 | 32.789 | 28.718 | 28.768 | >> 28.671 >> ArraysSort.Int.testSort | SHUFFLE | 82.157 | 83.741 | 70.889 | 69.951 | >> 71.196 >> ArraysSort.Int.testSort | SHUFFLE | 2251.219 | 2248.496 | 2184.459 | >> 2163.969 | 2156.239 >> ArraysSort.Int.testSort | SHUFFLE | 18211.05 | 18223.24 | 17987.4 | 18114.26 >> | 17994.98 > ... > > Hello Vamsi (@vamsi-parasa), > > Could you please run the benchmarking of new DQPS in your environment with > AVX? > > Take all classes below and put them in the package > org.openjdk.bench.java.util. > ArraysSort class contains all tests for the new versions and ready to use. > (it will run all tests in one execution). > > https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSort.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a15.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20s.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20p.java > > Many thanks, > Vladimir Hello Vladimir (@iaroslavski), Could you please share your pom.xml as am running into issues when the JHM benchmark is run: `java.lang.IllegalAccessError: class org.openjdk.bench.java.util.DualPivotQuicksort_a15 (in unnamed module @0x520a3426) cannot access class jdk.internal.misc.Unsafe (in module java.base) because module java.base does not export jdk.internal.misc to unnamed module @0x520a3426` Added the following add-exports in pom.xml, but it's still not working. org.apache.maven.plugins maven-compiler-plugin 3.8.1 --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.vm.annotation=ALL-UNNAMED Thanks, Vamsi - PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1911712234
RE: GBM as standalone buffer allocator
Hi Yiwei, Looks like this thread is closed. https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26038#note_2243187 Can we collaborate on this? Thanks, Srinivas From: Yiwei Zhang Sent: Monday, November 20, 2023 4:38 AM To: Srinivas Pullakavi (QUIC) Cc: Rob Clark ; mesa-dev@lists.freedesktop.org Subject: Re: GBM as standalone buffer allocator There’s https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26038. It is quite appealing to me considering a VK only scenario. On Thu, Nov 2, 2023 at 5:50 AM Srinivas Pullakavi (QUIC) mailto:quic_spull...@quicinc.com>> wrote: Hi Rob, Thanks for your inputs. We are planning to use DMA-Buf for GBM backend. DMA-buf supported heaps are listed in /dev/dma_heap/ Gbm backend selects the best heap based on usage. For example: Secure buffers will be allocated from secure heap. Sample output: # ls /dev/dma_heap reserved system Sample code to allocate a buffer from system heap: int heap_fd = open(/dev/dma_heap/system, O_RDONLY | O_CLOEXEC)) struct dma_heap_allocation_data heap_data { .len = size, // length of data to be allocated in bytes .fd_flags = O_RDWR | O_CLOEXEC, // permissions for the memory to be allocated }; int status = ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, _data) if (status == 0) { int buffer_fd = heap_data.fd; } In this case, there is no dependency on display / Graphics driver. But still GBM create device expects a device fd to be passed. Can we make it optional to pass device fd ? Thanks, Srinivas -Original Message- From: Rob Clark mailto:robdcl...@gmail.com>> Sent: Tuesday, October 24, 2023 1:06 AM To: Srinivas Pullakavi (QUIC) mailto:quic_spull...@quicinc.com>> Cc: mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org> Subject: Re: GBM as standalone buffer allocator On Mon, Oct 23, 2023 at 6:22 AM Srinivas Pullakavi (QUIC) mailto:quic_spull...@quicinc.com>> wrote: > > Hi, > > > > We are planning to enhance GBM as a standalone buffer allocator, which > can be used for all multi-media clients. Ex: video, camera, display > etc; > > > > GBM create device expects a file descriptor to be passed, which points to drm > node. This brings in a dependency on display for buffer allocation. On > headless devices where display driver is not present, GBM cannot be used for > buffer allocations. E.g. Recording cases where pipeline is setup between > Camera, Video, Graphics. > Note that you need some sort of device to allocate buffers from. With mesa and upstream kernel, that would be the drm device. (However as Adam points out, a drm device does not necessarily need a display.. for example, several vendors have compute-only GPUs (pci) which have no display outputs.) You might want to look at ChromeOS's minigbm. It already handles these cases (buffer sharing across display/gpu/video/camera). BR, -R [1] https://chromium.googlesource.com/chromiumos/platform/minigbm/ > > Could you please share your comments on what will be a good design to make > GBM flexible for above? > > > > Thanks, > > Srinivas > >
[sig-policy] Reminder: APNIC 57 Call for Policy Proposals
Dear SIG Members, Happy New Year 2024! This is a reminder that the deadline set by the Policy SIG Chair for proposals to be discussed at APNIC 57 Open Policy Meeting (OPM) is *Friday, 12 January 2024 at 23:59 UTC +7.* If you have any ideas to improve policy, or wish to make an informational presentation about an aspect of resource management, please follow the instructions below. To propose a new policy or submit an informational presentation synopsis, please visit https://www.apnic.net/community/policy/proposals/submit-a-policy-proposal/ We look forward to and encourage your participation in the APNIC 57 Open Policy Meeting (OPM), which will be held on Thursday, 29 February 2024 in Bangkok, Thailand. https://conference.apnic.net/57/ Best Regards, Sunny APNIC Secretariat On 6/11/2023 1:21 pm, Bertrand Cherrier wrote: Dear Colleagues, The APNIC 57 Open Policy Meeting (OPM) will be held on Thursday, 29 February 2024 in Bangkok, Thailand. If you have any ideas to improve current policies, or propose new policy, or wish to make an informational presentation about an aspect of resource management, please follow the instructions below. The submission deadline is Friday, 12 January 2024 at 23:59 UTC +7. To propose a new policy or submit an informational presentation synopsis, please visit: https://www.apnic.net/community/policy/proposals/submit-a-policy-proposal/ We look forward to your participation in the APNIC 57 OPM. Kind regards, Bertrand, Shaila and Anupam Policy SIG Chairs ___ SIG-policy - https://mailman.apnic.net/sig-policy@lists.apnic.net/ To unsubscribe send an email to sig-policy-le...@lists.apnic.net ___ SIG-policy - https://mailman.apnic.net/sig-policy@lists.apnic.net/ To unsubscribe send an email to sig-policy-le...@lists.apnic.net
Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd
nstexpr _MaskMember<_Tp> > _S_broadcast(bool __x) > { > constexpr size_t _Np = simd_size_v<_Tp, _Abi>; > __sve_bool_type __tr = __sve_vector_type<_Tp, > _Np>::__sve_active_mask(); > __sve_bool_type __fl = svnot_z(__tr, __tr); > > This can just be svpfalse_b(); > > Got it! Thanks! > template > struct _MaskImplSve > { > ... > template > _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp> > _S_load(const bool* __mem) > { > _SveMaskWrapper> __r; > > __execute_n_times>( >[&](auto __i) _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA { > __r._M_set(__i, __mem[__i]); }); > > return __r; > } > > template > static inline _SveMaskWrapper<_Bits, _Np> > _S_masked_load(_SveMaskWrapper<_Bits, _Np> __merge, > _SveMaskWrapper<_Bits, _Np> __mask, > const bool* __mem) noexcept > { > _SveMaskWrapper<_Bits, _Np> __r; > > __execute_n_times<_Np>([&](auto __i) > _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA { >if (__mask[__i]) > __r._M_set(__i, __mem[__i]); >else > __r._M_set(__i, __merge[__i]); > }); > > return __r; > } > > If these are loading unpacked booleans, couldn't we just use svld1 > followed by a comparison? Similarly the stores could use svdup_u8_z > to load a vector of 1s and 0s and then use svst1 to store it. Do you mean reinterpret-casting the input pointer (bool*) to (uint8*) and perform a comparison ? > template > _GLIBCXX_SIMD_INTRINSIC static bool > _S_all_of(simd_mask<_Tp, _Abi> __k) > { return _S_popcount(__k) == simd_size_v<_Tp, _Abi>; } > > In principle, this should be better as !svptest_any(..., svnot_z (..., > __k)), > since we should then be able to use a single flag-setting predicate > logic instruction. > > Incidentally, __k seems like a bit of an AVX-centric name :) > > template > _GLIBCXX_SIMD_INTRINSIC static bool > _S_any_of(simd_mask<_Tp, _Abi> __k) > { return _S_popcount(__k) > 0; } > > template > _GLIBCXX_SIMD_INTRINSIC static bool > _S_none_of(simd_mask<_Tp, _Abi> __k) > { return _S_popcount(__k) == 0; } > > These should map directly to svptest_any and !svptest_any respectively. > > Got it! I will update with these changes. > template > _GLIBCXX_SIMD_INTRINSIC static int > _S_find_first_set(simd_mask<_Tp, _Abi> __k) > { > constexpr size_t _Np = simd_size_v<_Tp, _Abi>; > > auto __first_index = > __sve_mask_type::__sve_mask_first_true(); > for (int __idx = 0; __idx < _Np; __idx++) >{ > if (__sve_mask_type::__sve_mask_active_count( >__sve_vector_type<_Tp, _Np>::__sve_active_mask(), >svand_z(__sve_vector_type<_Tp, > _Np>::__sve_active_mask(), __k._M_data, >__first_index))) >return __idx; > __first_index = > __sve_mask_type::__sve_mask_next_true( >__sve_vector_type<_Tp, > _Np>::__sve_active_mask(), __first_index); >} > return -1; > } > > template > _GLIBCXX_SIMD_INTRINSIC static int > _S_find_last_set(simd_mask<_Tp, _Abi> __k) > { > constexpr size_t _Np = simd_size_v<_Tp, _Abi>; > > int __ret = -1; > auto __first_index = > __sve_mask_type::__sve_mask_first_true(); > for (int __idx = 0; __idx < _Np; __idx++) >{ > if (__sve_mask_type::__sve_mask_active_count( >__sve_vector_type<_Tp, _Np>::__sve_active_mask(), >svand_z(__sve_vector_type<_Tp, > _Np>::__sve_active_mask(), __k._M_data, >__first_index))) >__ret = __idx; > __first_index = > __sve_mask_type::__sve_mask_next_true( >__sve_vector_type<_Tp, > _Np>::__sve_active_mask(), __first_index); >} > return __ret; > } > > _S_find_last_set should be able to use svclasta and an iota vector. > _S_find_first_set could do the same with a leading svpfirst. > Thanks. This solution for find_last_set should significantly improves the performance. Can you please elaborate solution for find_first_set ? Other efficient solution for find_first_set I have in my mind is to use svrev_b* and then perform a find_last_set. Thank you, Srinivas Yadav Singanaboina
Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd
nstexpr _MaskMember<_Tp> > _S_broadcast(bool __x) > { > constexpr size_t _Np = simd_size_v<_Tp, _Abi>; > __sve_bool_type __tr = __sve_vector_type<_Tp, > _Np>::__sve_active_mask(); > __sve_bool_type __fl = svnot_z(__tr, __tr); > > This can just be svpfalse_b(); > > Got it! Thanks! > template > struct _MaskImplSve > { > ... > template > _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp> > _S_load(const bool* __mem) > { > _SveMaskWrapper> __r; > > __execute_n_times>( >[&](auto __i) _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA { > __r._M_set(__i, __mem[__i]); }); > > return __r; > } > > template > static inline _SveMaskWrapper<_Bits, _Np> > _S_masked_load(_SveMaskWrapper<_Bits, _Np> __merge, > _SveMaskWrapper<_Bits, _Np> __mask, > const bool* __mem) noexcept > { > _SveMaskWrapper<_Bits, _Np> __r; > > __execute_n_times<_Np>([&](auto __i) > _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA { >if (__mask[__i]) > __r._M_set(__i, __mem[__i]); >else > __r._M_set(__i, __merge[__i]); > }); > > return __r; > } > > If these are loading unpacked booleans, couldn't we just use svld1 > followed by a comparison? Similarly the stores could use svdup_u8_z > to load a vector of 1s and 0s and then use svst1 to store it. Do you mean reinterpret-casting the input pointer (bool*) to (uint8*) and perform a comparison ? > template > _GLIBCXX_SIMD_INTRINSIC static bool > _S_all_of(simd_mask<_Tp, _Abi> __k) > { return _S_popcount(__k) == simd_size_v<_Tp, _Abi>; } > > In principle, this should be better as !svptest_any(..., svnot_z (..., > __k)), > since we should then be able to use a single flag-setting predicate > logic instruction. > > Incidentally, __k seems like a bit of an AVX-centric name :) > > template > _GLIBCXX_SIMD_INTRINSIC static bool > _S_any_of(simd_mask<_Tp, _Abi> __k) > { return _S_popcount(__k) > 0; } > > template > _GLIBCXX_SIMD_INTRINSIC static bool > _S_none_of(simd_mask<_Tp, _Abi> __k) > { return _S_popcount(__k) == 0; } > > These should map directly to svptest_any and !svptest_any respectively. > > Got it! I will update with these changes. > template > _GLIBCXX_SIMD_INTRINSIC static int > _S_find_first_set(simd_mask<_Tp, _Abi> __k) > { > constexpr size_t _Np = simd_size_v<_Tp, _Abi>; > > auto __first_index = > __sve_mask_type::__sve_mask_first_true(); > for (int __idx = 0; __idx < _Np; __idx++) >{ > if (__sve_mask_type::__sve_mask_active_count( >__sve_vector_type<_Tp, _Np>::__sve_active_mask(), >svand_z(__sve_vector_type<_Tp, > _Np>::__sve_active_mask(), __k._M_data, >__first_index))) >return __idx; > __first_index = > __sve_mask_type::__sve_mask_next_true( >__sve_vector_type<_Tp, > _Np>::__sve_active_mask(), __first_index); >} > return -1; > } > > template > _GLIBCXX_SIMD_INTRINSIC static int > _S_find_last_set(simd_mask<_Tp, _Abi> __k) > { > constexpr size_t _Np = simd_size_v<_Tp, _Abi>; > > int __ret = -1; > auto __first_index = > __sve_mask_type::__sve_mask_first_true(); > for (int __idx = 0; __idx < _Np; __idx++) >{ > if (__sve_mask_type::__sve_mask_active_count( >__sve_vector_type<_Tp, _Np>::__sve_active_mask(), >svand_z(__sve_vector_type<_Tp, > _Np>::__sve_active_mask(), __k._M_data, >__first_index))) >__ret = __idx; > __first_index = > __sve_mask_type::__sve_mask_next_true( >__sve_vector_type<_Tp, > _Np>::__sve_active_mask(), __first_index); >} > return __ret; > } > > _S_find_last_set should be able to use svclasta and an iota vector. > _S_find_first_set could do the same with a leading svpfirst. > Thanks. This solution for find_last_set should significantly improves the performance. Can you please elaborate solution for find_first_set ? Other efficient solution for find_first_set I have in my mind is to use svrev_b* and then perform a find_last_set. Thank you, Srinivas Yadav Singanaboina
[sig-policy] APNIC EC Endorses Proposal from APNIC 56
Dear colleagues The APNIC Executive Council endorsed the proposal, prop-155: IPv6 PI Assignment for Associate Members, at its meeting on 26-28 November 2023. https://www.apnic.net/community/policy/proposals/prop-155/ The EC has also decided to waive the fees on IPv6 PI assignments under this policy for a period of 12 months from the date of delegation. After the 12 month period expires, the resources will become chargeable. Next steps -- The Secretariat will begin the implementation process and inform the community as soon as it is completed. Regards, Sunny ___ Srinivas (Sunny) Chendi (he/him) Senior Advisor - Policy and Community Development Asia Pacific Network Information Centre (APNIC) | Tel: +61 7 3858 3100 PO Box 3646 South Brisbane, QLD 4101 Australia | Fax: +61 7 3858 3199 6 Cordelia Street, South Brisbane, QLD | http://www.apnic.net ___ NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ___ SIG-policy - https://mailman.apnic.net/sig-policy@lists.apnic.net/ To unsubscribe send an email to sig-policy-le...@lists.apnic.net
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]
On Tue, 12 Dec 2023 15:42:09 GMT, Magnus Ihse Bursie wrote: >> Thank you Magnus! > > @vamsi-parasa You said: >> Made sure that OpenJDK builds without errors using both GCC 7.5 and GCC 6.4. > > but now we have https://bugs.openjdk.org/browse/JDK-8321688. Did you > introduce any changes after you tested with GCC 7.5? It seems strange to me > that the code simultaneously both works and not works with gcc 7.5. Hi Magnus (@magicus), did a fresh pull of the OpenJDK and was able to build it successfully (without any errors) using GCC 7.5.0 on Ubuntu Linux machine. (I am on vacation till Jan7th, 2024. Our team will look into this issue) - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1424352122
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]
On Tue, 12 Dec 2023 15:42:09 GMT, Magnus Ihse Bursie wrote: >> Thank you Magnus! > > @vamsi-parasa You said: >> Made sure that OpenJDK builds without errors using both GCC 7.5 and GCC 6.4. > > but now we have https://bugs.openjdk.org/browse/JDK-8321688. Did you > introduce any changes after you tested with GCC 7.5? It seems strange to me > that the code simultaneously both works and not works with gcc 7.5. Hi Magnus (@magicus), did a fresh pull of the OpenJDK and was able to build it successfully (without any errors) using GCC 7.5.0 on Ubuntu Linux machine. (I am on vacation till Jan7th, 2024. Our team will look into this issue) - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1424352122
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Fri, 8 Dec 2023 20:08:22 GMT, Vladimir Yaroslavskiy wrote: >> Hi Vladimir (@iaroslavski), >> >> Please see the data below. >> >> Thanks, >> Vamsi >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> >> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> >> >> >> >> >> >> >> >> >> Benchmark (us/op) | (builder) | (size) | Stock JDK | r_02 | r_03 | r_04 | >> r_05 | r_06 | r_07 | r_08 | r_98 | r_99 >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> ArraysSort.Int.testSort | RANDOM | 600 | 2.256 | 1.634 | 1.651 | 1.659 | >> 1.671 | 1.646 | 1.611 | 1.661 | 1.642 | 1.671 >> ArraysSort.Int.testSort | RANDOM | 9000 | 41.457 | 37.697 | 38.075 | 37.927 >> | 39.693 | 38.989 | 37.86 | 38.163 | 39.222 | 38.835 >> ArraysSort.Int.testSort | RANDOM | 2 | 98.448 | 91.494 | 89.683 | 87.971 >> | 90.231 | 90.141 | 90.515 | 90.415 | 89.571 | 90.308 >> ArraysSort.Int.testSort | RANDOM | 40 | 2820.939 | 2816.5 | 2811.334 | >> 2833.15 | 2802.958 | 2813.012 | 2815.24 | 2825.526 | 2801.497 | 2816.25 >> ArraysSort.Int.testSort | RANDOM | 300 | 23426.411 | 23661.09 | 23778.15 >> | 23748.91 | 23802.62 | 23746.3 | 23778.16 | 23631.1 | 23651.78 | 23859.91 >> ArraysSort.Int.testSort | REPEATED | 600 | 1.032 | 0.929 | 0.955 | 0.944 | >> 0.927 | 0.928 | 0.953 | 0.918 | 0.934 | 0.93 >> ArraysSort.Int.testSort | REPEATED | 9000 | 5.114 | 5.059 | 4.832 | 5.162 | >> 4.965 | 4.973 | 5.518 | 5.003 | 5.435 | 4.971 >> ArraysSort.Int.testSort | REPEATED | 2 | 10.3 | 12.238 | 12.474 | 12.482 >> | 12.351 | 12.338 | 12.372 | 12.394 | 12.688 | 13.477 >> ArraysSort.Int.testSort | REPEATED | 40 | 210.742 | 261.709 | 264.572 | >> 263.203 | 260.822 | 260.475 | 262.03 | 260.356 | 265.976 | 264.273 >> ArraysSort.Int.testSort | REPEATED | 300 | 1948.589 | 2062.235 | >> 2079.128 | 2065.445 | 2053.24 | 2076.278 | 2049.799 | 2059.1 | 2073.191 | >> 2075.65 >> ArraysSort.Int.testSort | STAGGER | 600 | 2.125 | 2.001 | 2.023 | 2.021 | >> 2.001 | 2.018 | 2.011 | 2.017 | 2.005 | 2.011 >> ArraysSort.Int.testSort | STAGGER | 9000 | 29.86 | 26.169 | 26.093 | 25.562 >> | 26.385 | 26.109 | 26.485 | 26.375 | 26.412 | 25.712 >> ArraysSort.Int.te... > > Hello Vamsi (@vamsi-parasa), > > I made the process simpler: added all variants to be compared into ArraysSort > class > (set the same package org.openjdk.bench.java.util). It will run all sorts > incl. sort from jdk > in the same environment. It should provide more accurate results, otherwise > we see some anomalies. > > Could you please find time to run the benchmarking? > Take all classes below and put them in the package > org.openjdk.bench.java.util. > https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSort.java > > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a10.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r14.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r17.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r18.java > > Many thanks, > Vladimir Hi Vladimir (@iaroslavski) Please see the data below using the latest version of AVX512 sort that got integrated into OpenJDK. http://www.w3.org/TR/REC-html40;> Benchmark (us/op) | (builder) | Stock JDK | a10 | r14 | r17 | r18 -- | -- | -- | -- | -- | -- | -- ArraysSort.Int.testSort | RANDOM | 2.202 | 2.226 | 1.535 | 1.556 | 1.546 ArraysSort.Int.testSort | RANDOM | 35.128 | 34.804 | 30.808 | 30.914 | 31.284 ArraysSort.Int.testSort | RANDOM | 78.571 | 77.224 | 72.567 | 73.098 | 73.337 ArraysSort.Int.testSort | RANDOM | 2466.487 | 2470.66 | 2504.654 | 2494.051 | 2499.746 ArraysSort.Int.testSort | RANDOM | 20704.14 | 20668.19 | 21377.73 | 21362.63 | 21278.94 ArraysSort.Int.testSort | REPEATED | 0.877 | 0.892 | 0.74 | 0.724 | 0.718 ArraysSort.Int.testSort | REPEATED | 4.789 | 4.788 | 4.92 | 4.721 | 4.891 ArraysSort.Int.testSort | REPEATED | 11.172 | 11.778 | 11.53 | 11.467 | 11.406 ArraysSort.Int.testSort | REPEATED | 207.212 | 207.292 | 255.46 | 258.832 | 254.44 ArraysSort.Int.testSort | REPEATED | 1862.544 | 1876.759 | 1952.646 | 1957.978 | 1981.906 ArraysSort.Int.testSort | STAGGER | 2.092 | 2.137 | 1.999 | 2.031 | 2.015 ArraysSort.Int.testSort | STAGGER | 29.891 | 30.321 | 25.626 | 26.318 | 26.396 ArraysSort.Int.testSort | STAGGER | 60.979 | 83.439 | 57.864 | 57.213 | 79.762 ArraysSort.Int.testSort | STAGGER | 1227.933 | 1224.495 | 1236.133 | 1229.773 | 1228.877 ArraysSort.Int.testSort | STAGGER | 9514.873 | 9565.599 | 9491.509 | 9481.147 | 9481.905 ArraysSort.Int.testSort | SHUFFLE | 1.608 | 1.595 | 1.419 | 1.442 | 1.491 ArraysSort.Int.testSort | SHUFFLE | 31.566 | 32.789
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Fri, 8 Dec 2023 20:08:22 GMT, Vladimir Yaroslavskiy wrote: >> Hi Vladimir (@iaroslavski), >> >> Please see the data below. >> >> Thanks, >> Vamsi >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> >> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> >> >> >> >> >> >> >> >> >> Benchmark (us/op) | (builder) | (size) | Stock JDK | r_02 | r_03 | r_04 | >> r_05 | r_06 | r_07 | r_08 | r_98 | r_99 >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> ArraysSort.Int.testSort | RANDOM | 600 | 2.256 | 1.634 | 1.651 | 1.659 | >> 1.671 | 1.646 | 1.611 | 1.661 | 1.642 | 1.671 >> ArraysSort.Int.testSort | RANDOM | 9000 | 41.457 | 37.697 | 38.075 | 37.927 >> | 39.693 | 38.989 | 37.86 | 38.163 | 39.222 | 38.835 >> ArraysSort.Int.testSort | RANDOM | 2 | 98.448 | 91.494 | 89.683 | 87.971 >> | 90.231 | 90.141 | 90.515 | 90.415 | 89.571 | 90.308 >> ArraysSort.Int.testSort | RANDOM | 40 | 2820.939 | 2816.5 | 2811.334 | >> 2833.15 | 2802.958 | 2813.012 | 2815.24 | 2825.526 | 2801.497 | 2816.25 >> ArraysSort.Int.testSort | RANDOM | 300 | 23426.411 | 23661.09 | 23778.15 >> | 23748.91 | 23802.62 | 23746.3 | 23778.16 | 23631.1 | 23651.78 | 23859.91 >> ArraysSort.Int.testSort | REPEATED | 600 | 1.032 | 0.929 | 0.955 | 0.944 | >> 0.927 | 0.928 | 0.953 | 0.918 | 0.934 | 0.93 >> ArraysSort.Int.testSort | REPEATED | 9000 | 5.114 | 5.059 | 4.832 | 5.162 | >> 4.965 | 4.973 | 5.518 | 5.003 | 5.435 | 4.971 >> ArraysSort.Int.testSort | REPEATED | 2 | 10.3 | 12.238 | 12.474 | 12.482 >> | 12.351 | 12.338 | 12.372 | 12.394 | 12.688 | 13.477 >> ArraysSort.Int.testSort | REPEATED | 40 | 210.742 | 261.709 | 264.572 | >> 263.203 | 260.822 | 260.475 | 262.03 | 260.356 | 265.976 | 264.273 >> ArraysSort.Int.testSort | REPEATED | 300 | 1948.589 | 2062.235 | >> 2079.128 | 2065.445 | 2053.24 | 2076.278 | 2049.799 | 2059.1 | 2073.191 | >> 2075.65 >> ArraysSort.Int.testSort | STAGGER | 600 | 2.125 | 2.001 | 2.023 | 2.021 | >> 2.001 | 2.018 | 2.011 | 2.017 | 2.005 | 2.011 >> ArraysSort.Int.testSort | STAGGER | 9000 | 29.86 | 26.169 | 26.093 | 25.562 >> | 26.385 | 26.109 | 26.485 | 26.375 | 26.412 | 25.712 >> ArraysSort.Int.te... > > Hello Vamsi (@vamsi-parasa), > > I made the process simpler: added all variants to be compared into ArraysSort > class > (set the same package org.openjdk.bench.java.util). It will run all sorts > incl. sort from jdk > in the same environment. It should provide more accurate results, otherwise > we see some anomalies. > > Could you please find time to run the benchmarking? > Take all classes below and put them in the package > org.openjdk.bench.java.util. > https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSort.java > > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a10.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r14.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r17.java > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r18.java > > Many thanks, > Vladimir Sure Vladimir (@iaroslavski), Will run the tests. Also, the baseline stock JDK has changed as a new PR which improves AVX512 sort by up to 35% has been integrated. The PR implements AVX2 sort (https://github.com/openjdk/jdk/pull/16534) but it also improves the performance of AVX512 sort. Will use the new stock JDK for these measurements and provide the results by EOD Sunday (US pacific time). Thanks, Vamsi - PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1847956297
Integrated: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays)
On Tue, 7 Nov 2023 00:12:41 GMT, Srinivas Vamsi Parasa wrote: > The goal is to develop faster sort routines for x86_64 CPUs by taking > advantage of AVX2 instructions. This enhancement provides an order of > magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > For serial sort on random data, this PR shows upto ~7.5x improvement for > 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the > performance data below. > > For parallel sort on random data, this PR shows upto ~3.4x for 32-bit > datatypes (int, float) as shown below. > > **Note:** This PR also improves the performance of AVX512 sort by upto 35%. > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> > > > > > > > > > Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | > Speedup > -- | -- | -- | -- | -- > ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 > ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 > ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 > ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 > ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 > ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 > ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 > ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 > ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 > ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 > ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 > ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 > ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 > ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 > ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 > ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 > ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 > ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 > > > > > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/... This pull request has now been integrated. Changeset: ce108446 Author:vamsi-parasa Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/ce108446ca1fe604ecc24bbefb0bf1c6318271c7 Stats: 4026 lines in 24 files changed: 2311 ins; 1560 del; 155 mod 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) Reviewed-by: sviswanathan, ihse, jbhateja, kvn - PR: https://git.openjdk.org/jdk/pull/16534
Integrated: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays)
On Tue, 7 Nov 2023 00:12:41 GMT, Srinivas Vamsi Parasa wrote: > The goal is to develop faster sort routines for x86_64 CPUs by taking > advantage of AVX2 instructions. This enhancement provides an order of > magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > For serial sort on random data, this PR shows upto ~7.5x improvement for > 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the > performance data below. > > For parallel sort on random data, this PR shows upto ~3.4x for 32-bit > datatypes (int, float) as shown below. > > **Note:** This PR also improves the performance of AVX512 sort by upto 35%. > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> > > > > > > > > > Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | > Speedup > -- | -- | -- | -- | -- > ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 > ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 > ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 > ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 > ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 > ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 > ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 > ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 > ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 > ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 > ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 > ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 > ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 > ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 > ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 > ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 > ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 > ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 > > > > > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/... This pull request has now been integrated. Changeset: ce108446 Author:vamsi-parasa Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/ce108446ca1fe604ecc24bbefb0bf1c6318271c7 Stats: 4026 lines in 24 files changed: 2311 ins; 1560 del; 155 mod 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) Reviewed-by: sviswanathan, ihse, jbhateja, kvn - PR: https://git.openjdk.org/jdk/pull/16534
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]
On Fri, 8 Dec 2023 22:37:26 GMT, Vladimir Kozlov wrote: > I pushed closed changes. Thanks Vladimir! - PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1847939767
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]
On Fri, 8 Dec 2023 22:37:26 GMT, Vladimir Kozlov wrote: > I pushed closed changes. Thanks Vladimir! - PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1847939767
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Thu, 7 Dec 2023 22:06:14 GMT, Vladimir Yaroslavskiy wrote: >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> >> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> >> >> >> >> >> >> >> >> >> Benchmark (us/op) | Builder | (size) | Stock JDK (+ AVX512 sort) | >> DPQS_r01 (+ AVX512 sort) | Speedup >> -- | -- | -- | -- | -- | -- >> ArraysSort.Int.testSort | RANDOM | 600 | 2.256 | 1.713 | 1.32 >> ArraysSort.Int.testSort | RANDOM | 9000 | 41.457 | 38.316 | 1.08 >> ArraysSort.Int.testSort | RANDOM | 2 | 98.448 | 86.376 | 1.14 >> ArraysSort.Int.testSort | RANDOM | 40 | 2820.939 | 2792.333 | 1.01 >> ArraysSort.Int.testSort | RANDOM | 300 | 23426.411 | 23711.885 | 0.99 >> ArraysSort.Int.testSort | REPEATED | 600 | 1.032 | 0.859 | 1.20 >> ArraysSort.Int.testSort | REPEATED | 9000 | 5.114 | 5.014 | 1.02 >> ArraysSort.Int.testSort | REPEATED | 2 | 10.3 | 9.532 | 1.08 >> ArraysSort.Int.testSort | REPEATED | 40 | 210.742 | 235.281 | 0.90 >> ArraysSort.Int.testSort | REPEATED | 300 | 1948.589 | 1955.258 | 1.00 >> ArraysSort.Int.testSort | STAGGER | 600 | 2.125 | 2.157 | 0.99 >> ArraysSort.Int.testSort | STAGGER | 9000 | 29.86 | 29.931 | 1.00 >> ArraysSort.Int.testSort | STAGGER | 2 | 67.096 | 66.543 | 1.01 >> ArraysSort.Int.testSort | STAGGER | 40 | 1247.53 | 1224.999 | 1.02 >> ArraysSort.Int.testSort | STAGGER | 300 | 9435.404 | 9495.189 | 0.99 >> ArraysSort.Int.testSort | SHUFFLE | 600 | 2.701 | 1.64 | 1.65 >> ArraysSort.Int.testSort | SHUFFLE | 9000 | 38.976 | 34.201 | 1.14 >> ArraysSort.Int.testSort | SHUFFLE | 2 | 96.399 | 79.616 | 1.21 >> ArraysSort.Int.testSort | SHUFFLE | 40 | 2566.338 | 2436.271 | 1.05 >> ArraysSort.Int.testSort | SHUFFLE | 300 | 20835.935 | 20071.12 | 1.04 >> >> >> >> >> >> > > Hello Vamsi (@vamsi-parasa), > > Did you have a chance to run benchmarking? Hi Vladimir (@iaroslavski), Please see the data below. Thanks, Vamsi http://www.w3.org/TR/REC-html40;> Benchmark (us/op) | (builder) | (size) | Stock JDK | r_02 | r_03 | r_04 | r_05 | r_06 | r_07 | r_08 | r_98 | r_99 -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- ArraysSort.Int.testSort | RANDOM | 600 | 2.256 | 1.634 | 1.651 | 1.659 | 1.671 | 1.646 | 1.611 | 1.661 | 1.642 | 1.671 ArraysSort.Int.testSort | RANDOM | 9000 | 41.457 | 37.697 | 38.075 | 37.927 | 39.693 | 38.989 | 37.86 | 38.163 | 39.222 | 38.835 ArraysSort.Int.testSort | RANDOM | 2 | 98.448 | 91.494 | 89.683 | 87.971 | 90.231 | 90.141 | 90.515 | 90.415 | 89.571 | 90.308 ArraysSort.Int.testSort | RANDOM | 40 | 2820.939 | 2816.5 | 2811.334 | 2833.15 | 2802.958 | 2813.012 | 2815.24 | 2825.526 | 2801.497 | 2816.25 ArraysSort.Int.testSort | RANDOM | 300 | 23426.411 | 23661.09 | 23778.15 | 23748.91 | 23802.62 | 23746.3 | 23778.16 | 23631.1 | 23651.78 | 23859.91 ArraysSort.Int.testSort | REPEATED | 600 | 1.032 | 0.929 | 0.955 | 0.944 | 0.927 | 0.928 | 0.953 | 0.918 | 0.934 | 0.93 ArraysSort.Int.testSort | REPEATED | 9000 | 5.114 | 5.059 | 4.832 | 5.162 | 4.965 | 4.973 | 5.518 | 5.003 | 5.435 | 4.971 ArraysSort.Int.testSort | REPEATED | 2 | 10.3 | 12.238 | 12.474 | 12.482 | 12.351 | 12.338 | 12.372 | 12.394 | 12.688 | 13.477 ArraysSort.Int.testSort | REPEATED | 40 | 210.742 | 261.709 | 264.572 | 263.203 | 260.822 | 260.475 | 262.03 | 260.356 | 265.976 | 264.273 ArraysSort.Int.testSort | REPEATED | 300 | 1948.589 | 2062.235 | 2079.128 | 2065.445 | 2053.24 | 2076.278 | 2049.799 | 2059.1 | 2073.191 | 2075.65 ArraysSort.Int.testSort | STAGGER | 600 | 2.125 | 2.001 | 2.023 | 2.021 | 2.001 | 2.018 | 2.011 | 2.017 | 2.005 | 2.011 ArraysSort.Int.testSort | STAGGER | 9000 | 29.86 | 26.169 | 26.093 | 25.562 | 26.385 | 26.109 | 26.485 | 26.375 | 26.412 | 25.712 ArraysSort.Int.testSort | STAGGER | 2 | 67.096 | 77.157 | 63.636 | 64.479 | 58.697 | 59.728 | 58.913 | 59.482 | 58.633 | 76.904 ArraysSort.Int.testSort | STAGGER | 40 | 1247.53 | 1271.293 | 1236.158 | 1240.29 | 1261.469 | 1233.526 | 1153.822 | 1255.238 | 1224.071 | 1235.624 ArraysSort.Int.testSort | STAGGER | 300 | 9435.404 | 9612.98 | 9597.262 | 9590.393 | 9592.343 | 9616.005 | 9591.057 | 9637.881 | 9596.932 | 9570.482 ArraysSort.Int.testSort | SHUFFLE | 600 | 2.701 | 1.678 | 1.66 | 1.676 | 1.694 | 1.704 | 1.693 | 1.686 | 1.675 | 1.699 ArraysSort.Int.testSort | SHUFFLE | 9000 | 38.976 | 35.146 | 34.879 | 34.723 | 35.093 | 35.904 | 35.672 | 35.124 | 34.626 | 35.553 ArraysSort.Int.testSort | SHUFFLE | 2 | 96.399 | 81.651 | 83.113 | 81.186 | 80.802 | 82.464 | 81.473 | 83.511 | 82.289 | 81.794 ArraysSort.Int.testSort | SHUFFLE | 40 | 2566.338 | 2446.738 | 2424.526 | 2433.211 | 2459.019 | 2446.518 | 2450.989 | 2447.125 | 2449.441 | 2444.414
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]
On Fri, 8 Dec 2023 00:31:26 GMT, Vladimir Kozlov wrote: > Testing have only one failure in closed tests and I need to fix it before > this can be pushed. Thanks Vladimir for the update. Is the test failure because of this PR? - PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1846317507
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]
On Fri, 8 Dec 2023 00:31:26 GMT, Vladimir Kozlov wrote: > Testing have only one failure in closed tests and I need to fix it before > this can be pushed. Thanks Vladimir for the update. Is the test failure because of this PR? - PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1846317507
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
On Thu, 7 Dec 2023 22:06:14 GMT, Vladimir Yaroslavskiy wrote: >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> >> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> >> >> >> >> >> >> >> >> >> Benchmark (us/op) | Builder | (size) | Stock JDK (+ AVX512 sort) | >> DPQS_r01 (+ AVX512 sort) | Speedup >> -- | -- | -- | -- | -- | -- >> ArraysSort.Int.testSort | RANDOM | 600 | 2.256 | 1.713 | 1.32 >> ArraysSort.Int.testSort | RANDOM | 9000 | 41.457 | 38.316 | 1.08 >> ArraysSort.Int.testSort | RANDOM | 2 | 98.448 | 86.376 | 1.14 >> ArraysSort.Int.testSort | RANDOM | 40 | 2820.939 | 2792.333 | 1.01 >> ArraysSort.Int.testSort | RANDOM | 300 | 23426.411 | 23711.885 | 0.99 >> ArraysSort.Int.testSort | REPEATED | 600 | 1.032 | 0.859 | 1.20 >> ArraysSort.Int.testSort | REPEATED | 9000 | 5.114 | 5.014 | 1.02 >> ArraysSort.Int.testSort | REPEATED | 2 | 10.3 | 9.532 | 1.08 >> ArraysSort.Int.testSort | REPEATED | 40 | 210.742 | 235.281 | 0.90 >> ArraysSort.Int.testSort | REPEATED | 300 | 1948.589 | 1955.258 | 1.00 >> ArraysSort.Int.testSort | STAGGER | 600 | 2.125 | 2.157 | 0.99 >> ArraysSort.Int.testSort | STAGGER | 9000 | 29.86 | 29.931 | 1.00 >> ArraysSort.Int.testSort | STAGGER | 2 | 67.096 | 66.543 | 1.01 >> ArraysSort.Int.testSort | STAGGER | 40 | 1247.53 | 1224.999 | 1.02 >> ArraysSort.Int.testSort | STAGGER | 300 | 9435.404 | 9495.189 | 0.99 >> ArraysSort.Int.testSort | SHUFFLE | 600 | 2.701 | 1.64 | 1.65 >> ArraysSort.Int.testSort | SHUFFLE | 9000 | 38.976 | 34.201 | 1.14 >> ArraysSort.Int.testSort | SHUFFLE | 2 | 96.399 | 79.616 | 1.21 >> ArraysSort.Int.testSort | SHUFFLE | 40 | 2566.338 | 2436.271 | 1.05 >> ArraysSort.Int.testSort | SHUFFLE | 300 | 20835.935 | 20071.12 | 1.04 >> >> >> >> >> >> > > Hello Vamsi (@vamsi-parasa), > > Did you have a chance to run benchmarking? Hello Vladimir (@iaroslavski), Will provide the data by EOD Friday (US Pacific time). Had to wrap up some important things at work as I'll be going on a winter vacation for 4 weeks starting from Monday. Thanks for understanding! Thanks, Vamsi - PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1846189152
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]
On Wed, 6 Dec 2023 23:09:01 GMT, Srinivas Vamsi Parasa wrote: >>> LGTM, thanks! >> >> Thanks Jatin! > >> @vamsi-parasa, sorry, I was wrong. I missed that you need to check type >> `bt`. Latest change is more complicated than it was before. Please revert it >> back (undo last change). I will test previous version 09. > @vnkozlov > Vladimir, please see the commit reverted in the updated changes pushed now. > @vamsi-parasa, please, remind me which tests check that code in > `libsmdsort.so` is used? @vnkozlov Please see the tests for simd sort code in `test/jdk/java/util/Arrays/Sorting.java` - PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1843963054
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]
On Wed, 6 Dec 2023 23:09:01 GMT, Srinivas Vamsi Parasa wrote: >>> LGTM, thanks! >> >> Thanks Jatin! > >> @vamsi-parasa, sorry, I was wrong. I missed that you need to check type >> `bt`. Latest change is more complicated than it was before. Please revert it >> back (undo last change). I will test previous version 09. > @vnkozlov > Vladimir, please see the commit reverted in the updated changes pushed now. > @vamsi-parasa, please, remind me which tests check that code in > `libsmdsort.so` is used? @vnkozlov Please see the tests for simd sort code in `test/jdk/java/util/Arrays/Sorting.java` - PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1843963054
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]
On Wed, 6 Dec 2023 17:44:24 GMT, Srinivas Vamsi Parasa wrote: >> LGTM, thanks! > >> LGTM, thanks! > > Thanks Jatin! > @vamsi-parasa, sorry, I was wrong. I missed that you need to check type `bt`. > Latest change is more complicated than it was before. Please revert it back > (undo last change). I will test previous version 09. @vnkozlov Vladimir, please see the commit reverted in the updated changes pushed now. - PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1843834085
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]
> The goal is to develop faster sort routines for x86_64 CPUs by taking > advantage of AVX2 instructions. This enhancement provides an order of > magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > For serial sort on random data, this PR shows upto ~7.5x improvement for > 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the > performance data below. > > For parallel sort on random data, this PR shows upto ~3.4x for 32-bit > datatypes (int, float) as shown below. > > **Note:** This PR also improves the performance of AVX512 sort by upto 35%. > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> > > > > > > > > > Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | > Speedup > -- | -- | -- | -- | -- > ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 > ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 > ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 > ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 > ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 > ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 > ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 > ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 > ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 > ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 > ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 > ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 > ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 > ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 > ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 > ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 > ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 > ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 > > > > > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Revert "Change supported intrinsic check" This reverts commit 9621eb045c2958582f81ec06b237789a07481ddd. - Changes: - all: https://git.openjdk.org/jdk/pull/16534/files - new: https://git.openjdk.org/jdk/pull/16534/files/9621eb04..eadba369 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=16534=11 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=10-11 Stats: 28 lines in 4 files changed: 0 ins; 20 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/16534.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534 PR: https://git.openjdk.org/jdk/pull/16534
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]
On Wed, 6 Dec 2023 17:44:24 GMT, Srinivas Vamsi Parasa wrote: >> LGTM, thanks! > >> LGTM, thanks! > > Thanks Jatin! > @vamsi-parasa, sorry, I was wrong. I missed that you need to check type `bt`. > Latest change is more complicated than it was before. Please revert it back > (undo last change). I will test previous version 09. @vnkozlov Vladimir, please see the commit reverted in the updated changes pushed now. - PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1843834085
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v12]
> The goal is to develop faster sort routines for x86_64 CPUs by taking > advantage of AVX2 instructions. This enhancement provides an order of > magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > For serial sort on random data, this PR shows upto ~7.5x improvement for > 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the > performance data below. > > For parallel sort on random data, this PR shows upto ~3.4x for 32-bit > datatypes (int, float) as shown below. > > **Note:** This PR also improves the performance of AVX512 sort by upto 35%. > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> > > > > > > > > > Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | > Speedup > -- | -- | -- | -- | -- > ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 > ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 > ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 > ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 > ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 > ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 > ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 > ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 > ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 > ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 > ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 > ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 > ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 > ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 > ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 > ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 > ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 > ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 > > > > > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Revert "Change supported intrinsic check" This reverts commit 9621eb045c2958582f81ec06b237789a07481ddd. - Changes: - all: https://git.openjdk.org/jdk/pull/16534/files - new: https://git.openjdk.org/jdk/pull/16534/files/9621eb04..eadba369 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=16534=11 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=10-11 Stats: 28 lines in 4 files changed: 0 ins; 20 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/16534.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534 PR: https://git.openjdk.org/jdk/pull/16534
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v11]
> The goal is to develop faster sort routines for x86_64 CPUs by taking > advantage of AVX2 instructions. This enhancement provides an order of > magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > For serial sort on random data, this PR shows upto ~7.5x improvement for > 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the > performance data below. > > For parallel sort on random data, this PR shows upto ~3.4x for 32-bit > datatypes (int, float) as shown below. > > **Note:** This PR also improves the performance of AVX512 sort by upto 35%. > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> > > > > > > > > > Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | > Speedup > -- | -- | -- | -- | -- > ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 > ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 > ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 > ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 > ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 > ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 > ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 > ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 > ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 > ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 > ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 > ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 > ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 > ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 > ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 > ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 > ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 > ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 > > > > > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Change supported intrinsic check - Changes: - all: https://git.openjdk.org/jdk/pull/16534/files - new: https://git.openjdk.org/jdk/pull/16534/files/7e124581..9621eb04 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=16534=10 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=09-10 Stats: 28 lines in 4 files changed: 20 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/16534.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534 PR: https://git.openjdk.org/jdk/pull/16534
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v10]
On Wed, 6 Dec 2023 18:41:26 GMT, Vladimir Kozlov wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> add missing header files > > src/hotspot/share/opto/library_call.cpp line 5393: > >> 5391: if (!Matcher::supports_simd_sort(bt)) { >> 5392: return false; >> 5393: } > > This check should be in `C2Compiler::is_intrinsic_supported()` Hi Vladimir (@vnkozlov), please see the updated changes which use `C2Compiler::is_intrinsic_supported(id, bt)` > src/hotspot/share/opto/library_call.cpp line 5450: > >> 5448: if (!Matcher::supports_simd_sort(bt)) { >> 5449: return false; >> 5450: } > > Same. Please see the updated changes which use C2Compiler::is_intrinsic_supported(id, bt) - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417946689 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417946968
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v10]
On Wed, 6 Dec 2023 18:41:26 GMT, Vladimir Kozlov wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> add missing header files > > src/hotspot/share/opto/library_call.cpp line 5393: > >> 5391: if (!Matcher::supports_simd_sort(bt)) { >> 5392: return false; >> 5393: } > > This check should be in `C2Compiler::is_intrinsic_supported()` Hi Vladimir (@vnkozlov), please see the updated changes which use `C2Compiler::is_intrinsic_supported(id, bt)` > src/hotspot/share/opto/library_call.cpp line 5450: > >> 5448: if (!Matcher::supports_simd_sort(bt)) { >> 5449: return false; >> 5450: } > > Same. Please see the updated changes which use C2Compiler::is_intrinsic_supported(id, bt) - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417946689 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417946968
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v11]
> The goal is to develop faster sort routines for x86_64 CPUs by taking > advantage of AVX2 instructions. This enhancement provides an order of > magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > For serial sort on random data, this PR shows upto ~7.5x improvement for > 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the > performance data below. > > For parallel sort on random data, this PR shows upto ~3.4x for 32-bit > datatypes (int, float) as shown below. > > **Note:** This PR also improves the performance of AVX512 sort by upto 35%. > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> > > > > > > > > > Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | > Speedup > -- | -- | -- | -- | -- > ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 > ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 > ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 > ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 > ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 > ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 > ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 > ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 > ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 > ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 > ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 > ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 > ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 > ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 > ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 > ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 > ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 > ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 > > > > > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Change supported intrinsic check - Changes: - all: https://git.openjdk.org/jdk/pull/16534/files - new: https://git.openjdk.org/jdk/pull/16534/files/7e124581..9621eb04 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=16534=10 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=09-10 Stats: 28 lines in 4 files changed: 20 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/16534.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534 PR: https://git.openjdk.org/jdk/pull/16534
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v10]
> The goal is to develop faster sort routines for x86_64 CPUs by taking > advantage of AVX2 instructions. This enhancement provides an order of > magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > For serial sort on random data, this PR shows upto ~7.5x improvement for > 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the > performance data below. > > For parallel sort on random data, this PR shows upto ~3.4x for 32-bit > datatypes (int, float) as shown below. > > **Note:** This PR also improves the performance of AVX512 sort by upto 35%. > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> > > > > > > > > > Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | > Speedup > -- | -- | -- | -- | -- > ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 > ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 > ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 > ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 > ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 > ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 > ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 > ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 > ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 > ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 > ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 > ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 > ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 > ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 > ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 > ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 > ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 > ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 > > > > > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: add missing header files - Changes: - all: https://git.openjdk.org/jdk/pull/16534/files - new: https://git.openjdk.org/jdk/pull/16534/files/c143e0b9..7e124581 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=16534=09 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=08-09 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16534.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534 PR: https://git.openjdk.org/jdk/pull/16534
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v10]
> The goal is to develop faster sort routines for x86_64 CPUs by taking > advantage of AVX2 instructions. This enhancement provides an order of > magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > For serial sort on random data, this PR shows upto ~7.5x improvement for > 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the > performance data below. > > For parallel sort on random data, this PR shows upto ~3.4x for 32-bit > datatypes (int, float) as shown below. > > **Note:** This PR also improves the performance of AVX512 sort by upto 35%. > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> > > > > > > > > > Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | > Speedup > -- | -- | -- | -- | -- > ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 > ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 > ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 > ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 > ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 > ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 > ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 > ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 > ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 > ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 > ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 > ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 > ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 > ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 > ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 > ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 > ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 > ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 > > > > > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: add missing header files - Changes: - all: https://git.openjdk.org/jdk/pull/16534/files - new: https://git.openjdk.org/jdk/pull/16534/files/c143e0b9..7e124581 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=16534=09 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=08-09 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16534.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534 PR: https://git.openjdk.org/jdk/pull/16534
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v10]
On Wed, 6 Dec 2023 17:42:39 GMT, Jatin Bhateja wrote: > LGTM, thanks! Thanks Jatin! - PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1843372385
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v10]
On Wed, 6 Dec 2023 17:42:39 GMT, Jatin Bhateja wrote: > LGTM, thanks! Thanks Jatin! - PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1843372385
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v9]
On Tue, 5 Dec 2023 19:19:23 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> remove unused avx2 64 bit sort functions; add assertions > > src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 50: > >> 48: case JVM_T_DOUBLE: >> 49: avx2_fast_sort((double*)array, from_index, to_index, >> INSERTION_SORT_THRESHOLD_64BIT); >> 50: break; > > Please add safe assertions for missing types. This is from an older (but outdated) commit. The assertions have been added in other cases. - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417706670
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v9]
> The goal is to develop faster sort routines for x86_64 CPUs by taking > advantage of AVX2 instructions. This enhancement provides an order of > magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > For serial sort on random data, this PR shows upto ~7.5x improvement for > 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the > performance data below. > > For parallel sort on random data, this PR shows upto ~3.4x for 32-bit > datatypes (int, float) as shown below. > > **Note:** This PR also improves the performance of AVX512 sort by upto 35%. > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> > > > > > > > > > Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | > Speedup > -- | -- | -- | -- | -- > ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 > ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 > ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 > ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 > ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 > ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 > ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 > ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 > ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 > ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 > ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 > ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 > ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 > ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 > ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 > ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 > ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 > ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 > > > > > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: remove unused avx2 64 bit sort functions; add assertions - Changes: - all: https://git.openjdk.org/jdk/pull/16534/files - new: https://git.openjdk.org/jdk/pull/16534/files/bc590d9f..c143e0b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=16534=08 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=07-08 Stats: 128 lines in 4 files changed: 12 ins; 116 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16534.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534 PR: https://git.openjdk.org/jdk/pull/16534
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]
On Tue, 5 Dec 2023 19:37:34 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request with a new target base >> due to a merge or a rebase. The incremental webrev excludes the unrelated >> changes brought in by the merge/rebase. The pull request contains 17 >> additional commits since the last revision: >> >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - add GCC version guards >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - Remove C++17 from C flags >> - add avoid masked stores operation >> - update the code to check for supported simd sort cpus >> - Disable AVX2 sort for 64-bit types >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - fix jcheck failures due to windows encoding >> - fix carriage return and change insertion sort thresholds >> - ... and 7 more: https://git.openjdk.org/jdk/compare/d4151e5b...bc590d9f > > src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 64: > >> 62: } >> 63: return lut; >> 64: }(); > > Lut64 is needed for compress64 emulation, can be removed. Removed in the latest commit... > src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 234: > >> 232: >> 233: vtype::mask_storeu(leftStore, left, temp); >> 234: } > > Can be removed if not being used. Removed in the latest commit... > src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 277: > >> 275: >> 276: return _mm_popcnt_u32(shortMask); >> 277: } > > Can be removed if not being used. Removed in the latest commit... > src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 44: > >> 42: break; >> 43: case JVM_T_FLOAT: >> 44: avx2_fast_sort((float*)array, from_index, to_index, >> INSERTION_SORT_THRESHOLD_32BIT); > > Assertions for unsupported types. Added in the latest commit... > src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 56: > >> 54: case JVM_T_FLOAT: >> 55: avx2_fast_partition((float*)array, from_index, to_index, >> pivot_indices, index_pivot1, index_pivot2); >> 56: break; > > Please add assertion for unsupported types. Added in the latest commit... - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701182 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417702999 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417702251 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701469 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701705
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v9]
On Tue, 5 Dec 2023 19:19:23 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> remove unused avx2 64 bit sort functions; add assertions > > src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 50: > >> 48: case JVM_T_DOUBLE: >> 49: avx2_fast_sort((double*)array, from_index, to_index, >> INSERTION_SORT_THRESHOLD_64BIT); >> 50: break; > > Please add safe assertions for missing types. This is from an older (but outdated) commit. The assertions have been added in other cases. - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417706670
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]
On Tue, 5 Dec 2023 19:37:34 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request with a new target base >> due to a merge or a rebase. The incremental webrev excludes the unrelated >> changes brought in by the merge/rebase. The pull request contains 17 >> additional commits since the last revision: >> >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - add GCC version guards >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - Remove C++17 from C flags >> - add avoid masked stores operation >> - update the code to check for supported simd sort cpus >> - Disable AVX2 sort for 64-bit types >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - fix jcheck failures due to windows encoding >> - fix carriage return and change insertion sort thresholds >> - ... and 7 more: https://git.openjdk.org/jdk/compare/d4151e5b...bc590d9f > > src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 64: > >> 62: } >> 63: return lut; >> 64: }(); > > Lut64 is needed for compress64 emulation, can be removed. Removed in the latest commit... > src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 234: > >> 232: >> 233: vtype::mask_storeu(leftStore, left, temp); >> 234: } > > Can be removed if not being used. Removed in the latest commit... > src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 277: > >> 275: >> 276: return _mm_popcnt_u32(shortMask); >> 277: } > > Can be removed if not being used. Removed in the latest commit... > src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 44: > >> 42: break; >> 43: case JVM_T_FLOAT: >> 44: avx2_fast_sort((float*)array, from_index, to_index, >> INSERTION_SORT_THRESHOLD_32BIT); > > Assertions for unsupported types. Added in the latest commit... > src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 56: > >> 54: case JVM_T_FLOAT: >> 55: avx2_fast_partition((float*)array, from_index, to_index, >> pivot_indices, index_pivot1, index_pivot2); >> 56: break; > > Please add assertion for unsupported types. Added in the latest commit... - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701182 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417702999 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417702251 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701469 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701705
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v9]
On Wed, 6 Dec 2023 11:59:19 GMT, Magnus Ihse Bursie wrote: >> Hi Magnus (@magicus), >> >>> Are you saying that when compiling with GCC 6, it will just silently ignore >>> `-std=c++17`? I'd have assumed that it printed a warning or error about an >>> unknown or invalid option, if C++17 is not supported. >> >> The GCC complier for versions 6 (and even 5) silently ignores the flag >> `-std=c++17`. It does not print any warning or error. I tested it with a toy >> C++ program and also by building OpenJDK using GCC 6. >> >>> You can't check for if compiler options should be enabled or not inside >>> source code files. >> >> what I meant was, there are #ifdef guards using predefined macros in the >> C++ source code to check for GCC version and make the simdsort code >> available for compilation or not based on the GCC version >> >> >> // src/java.base/linux/native/libsimdsort/simdsort-support.hpp >> #if defined(_LP64) && (defined(__GNUC__) && ((__GNUC__ > 7) || ((__GNUC__ == >> 7) && (__GNUC_MINOR__ >= 5 >> #define __SIMDSORT_SUPPORTED_LINUX >> #endif >> >> >> >> //src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp >> #include "simdsort-support.hpp" >> #ifdef __SIMDSORT_SUPPORTED_LINUX >> >> #endif > > Okay, then I guess I am fine with this. Thank you Magnus! - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417707661
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v9]
> The goal is to develop faster sort routines for x86_64 CPUs by taking > advantage of AVX2 instructions. This enhancement provides an order of > magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > For serial sort on random data, this PR shows upto ~7.5x improvement for > 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the > performance data below. > > For parallel sort on random data, this PR shows upto ~3.4x for 32-bit > datatypes (int, float) as shown below. > > **Note:** This PR also improves the performance of AVX512 sort by upto 35%. > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> > > > > > > > > > Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | > Speedup > -- | -- | -- | -- | -- > ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 > ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 > ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 > ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 > ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 > ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 > ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 > ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 > ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 > ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 > ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 > ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 > ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 > ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 > ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 > ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 > ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 > ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 > > > > > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: remove unused avx2 64 bit sort functions; add assertions - Changes: - all: https://git.openjdk.org/jdk/pull/16534/files - new: https://git.openjdk.org/jdk/pull/16534/files/bc590d9f..c143e0b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=16534=08 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=07-08 Stats: 128 lines in 4 files changed: 12 ins; 116 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16534.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534 PR: https://git.openjdk.org/jdk/pull/16534
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v9]
On Wed, 6 Dec 2023 11:59:19 GMT, Magnus Ihse Bursie wrote: >> Hi Magnus (@magicus), >> >>> Are you saying that when compiling with GCC 6, it will just silently ignore >>> `-std=c++17`? I'd have assumed that it printed a warning or error about an >>> unknown or invalid option, if C++17 is not supported. >> >> The GCC complier for versions 6 (and even 5) silently ignores the flag >> `-std=c++17`. It does not print any warning or error. I tested it with a toy >> C++ program and also by building OpenJDK using GCC 6. >> >>> You can't check for if compiler options should be enabled or not inside >>> source code files. >> >> what I meant was, there are #ifdef guards using predefined macros in the >> C++ source code to check for GCC version and make the simdsort code >> available for compilation or not based on the GCC version >> >> >> // src/java.base/linux/native/libsimdsort/simdsort-support.hpp >> #if defined(_LP64) && (defined(__GNUC__) && ((__GNUC__ > 7) || ((__GNUC__ == >> 7) && (__GNUC_MINOR__ >= 5 >> #define __SIMDSORT_SUPPORTED_LINUX >> #endif >> >> >> >> //src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp >> #include "simdsort-support.hpp" >> #ifdef __SIMDSORT_SUPPORTED_LINUX >> >> #endif > > Okay, then I guess I am fine with this. Thank you Magnus! - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417707661
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]
On Tue, 5 Dec 2023 19:33:48 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request with a new target base >> due to a merge or a rebase. The incremental webrev excludes the unrelated >> changes brought in by the merge/rebase. The pull request contains 17 >> additional commits since the last revision: >> >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - add GCC version guards >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - Remove C++17 from C flags >> - add avoid masked stores operation >> - update the code to check for supported simd sort cpus >> - Disable AVX2 sort for 64-bit types >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - fix jcheck failures due to windows encoding >> - fix carriage return and change insertion sort thresholds >> - ... and 7 more: https://git.openjdk.org/jdk/compare/d8b29378...bc590d9f > > src/java.base/linux/native/libsimdsort/avx512-32bit-qsort.hpp line 235: > >> 233: return avx512_double_compressstore>( >> 234: left_addr, right_addr, k, reg); >> 235: } > > Can be removed. This is needed for AVX512 sort... - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417690992
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]
On Tue, 5 Dec 2023 19:33:48 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request with a new target base >> due to a merge or a rebase. The incremental webrev excludes the unrelated >> changes brought in by the merge/rebase. The pull request contains 17 >> additional commits since the last revision: >> >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - add GCC version guards >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - Remove C++17 from C flags >> - add avoid masked stores operation >> - update the code to check for supported simd sort cpus >> - Disable AVX2 sort for 64-bit types >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - fix jcheck failures due to windows encoding >> - fix carriage return and change insertion sort thresholds >> - ... and 7 more: https://git.openjdk.org/jdk/compare/d8b29378...bc590d9f > > src/java.base/linux/native/libsimdsort/avx512-32bit-qsort.hpp line 235: > >> 233: return avx512_double_compressstore>( >> 234: left_addr, right_addr, k, reg); >> 235: } > > Can be removed. This is needed for AVX512 sort... - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417690992