[arrow] branch ARROW-17715a updated (019740ad9f -> 145e167753)
This is an automated email from the ASF dual-hosted git repository. kiszk pushed a change to branch ARROW-17715a in repository https://gitbox.apache.org/repos/asf/arrow.git from 019740ad9f change LLVM version add 145e167753 disable JEMALLOC and PLASMA No new revisions were added by this update. Summary of changes: .travis.yml | 4 1 file changed, 4 insertions(+)
[arrow] branch ARROW-17715a updated (194f3f249a -> 019740ad9f)
This is an automated email from the ASF dual-hosted git repository. kiszk pushed a change to branch ARROW-17715a in repository https://gitbox.apache.org/repos/asf/arrow.git from 194f3f249a add CLANG_TOOLS add 019740ad9f change LLVM version No new revisions were added by this update. Summary of changes: .travis.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
[arrow] branch ARROW-17715a updated (0f753003a1 -> 194f3f249a)
This is an automated email from the ASF dual-hosted git repository. kiszk pushed a change to branch ARROW-17715a in repository https://gitbox.apache.org/repos/asf/arrow.git from 0f753003a1 disable to build COMPUTE and GANDIVA add 194f3f249a add CLANG_TOOLS No new revisions were added by this update. Summary of changes: .travis.yml | 1 + 1 file changed, 1 insertion(+)
[arrow] branch ARROW-17715a created (now 0f753003a1)
This is an automated email from the ASF dual-hosted git repository. kiszk pushed a change to branch ARROW-17715a in repository https://gitbox.apache.org/repos/asf/arrow.git at 0f753003a1 disable to build COMPUTE and GANDIVA This branch includes the following new commits: new 0f753003a1 disable to build COMPUTE and GANDIVA The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[arrow] 01/01: disable to build COMPUTE and GANDIVA
This is an automated email from the ASF dual-hosted git repository. kiszk pushed a commit to branch ARROW-17715a in repository https://gitbox.apache.org/repos/asf/arrow.git commit 0f753003a1557c4e554ea879464ca001888e5c2f Author: Kazuaki Ishizaki AuthorDate: Mon Jan 9 04:01:38 2023 -0500 disable to build COMPUTE and GANDIVA reduce parallelism to 1 --- .travis.yml | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/.travis.yml b/.travis.yml index a96e07f0c4..b508d60609 100644 --- a/.travis.yml +++ b/.travis.yml @@ -93,14 +93,16 @@ jobs: # aws-sdk-cpp. DOCKER_RUN_ARGS: >- " + -e ARROW_COMPUTE=OFF -e ARROW_FLIGHT=ON + -e ARROW_GANDIVA=OFF -e ARROW_GCS=OFF -e ARROW_MIMALLOC=OFF -e ARROW_ORC=OFF -e ARROW_PARQUET=OFF -e ARROW_S3=OFF -e ARROW_SUBSTRAIT=OFF - -e CMAKE_BUILD_PARALLEL_LEVEL=2 + -e CMAKE_BUILD_PARALLEL_LEVEL=1 -e CMAKE_UNITY_BUILD=ON -e PARQUET_BUILD_EXAMPLES=OFF -e PARQUET_BUILD_EXECUTABLES=OFF @@ -144,14 +146,16 @@ jobs: # aws-sdk-cpp. DOCKER_RUN_ARGS: >- " + -e ARROW_COMPUTE=OFF -e ARROW_FLIGHT=ON + -e ARROW_GANDIVA=OFF -e ARROW_GCS=OFF -e ARROW_MIMALLOC=OFF -e ARROW_ORC=OFF -e ARROW_PARQUET=OFF -e ARROW_PYTHON=ON -e ARROW_S3=OFF - -e CMAKE_BUILD_PARALLEL_LEVEL=2 + -e CMAKE_BUILD_PARALLEL_LEVEL=1 -e CMAKE_UNITY_BUILD=ON -e PARQUET_BUILD_EXAMPLES=OFF -e PARQUET_BUILD_EXECUTABLES=OFF
[arrow-julia] 01/01: initial draft
This is an automated email from the ASF dual-hosted git repository. kiszk pushed a commit to branch issue270 in repository https://gitbox.apache.org/repos/asf/arrow-julia.git commit 8bc9510d21b27ebdf6d10e8bd57d553287f066b0 Author: ishizaki AuthorDate: Mon Jan 17 16:18:06 2022 + initial draft --- CONTRIBUTING.md | 37 + 1 file changed, 37 insertions(+) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 000..163d2b9 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,37 @@ +# How to contribute Apache Arrow Julia + +## Did you find a bug or have an improvement? + +We recommend you first search among existing [Github issues](https://github.com/apache/arrow-julia/issues). The community may already address the same idea. If you could find the issue, you may want to contribute to the existing issue. + + +## How do you write a patch that fixes a bug or brings an improvement? +If you cannot find the same idea in the issues, you first need to write a GitHub issue (e.g. [issues in Arrow-julia](https://github.com/apache/arrow-julia/issues)) for a bug fix or planned features for the improvement. To write an issue would help the community have visibility and opportunities for collaborations before a pull request (PR) shows up. This is for the [Apache way](http://theapacheway.com/). We can use GitHub labels to identify bugs. +It should not be necessary to file an issue for some non-code changes, such as CI changes or minor documentation updates such as fixing typos. + +After writing the issue, you may want to write a code by creating [a PR](https://github.com/apache/arrow-julia/pulls). In the PR, it is preferable to refer to the issue number (e.g. `#1`) that you already created. + + +## Do you want to propose a significant new feature or an important refactoring? + +We ask that all discussions about major changes in the codebase happen publicly on the [arrow-dev mailing-list](https://lists.apache.org/list.html?d...@arrow.apache.org). + + +## Do you have questions about the source code, the build procedure or the development process? + +You can also ask on the mailing-list, see above. + + +## Local Development + +When developing on Arrow.jl it is recommended that you run the following to ensure that any changes to ArrowTypes.jl are immediately available to Arrow.jl without requiring a release: + +``` +julia --project -e 'using Pkg; Pkg.develop(path="src/ArrowTypes")' +``` + + +## Release cycle + +The Julia community would like an independent release cycle. Release for apache/arrow doesn't include the Julia implementation. The Julia implementation uses separated version scheme. (apache/arrow uses 6.0.0 as the next version but the next Julia implementation release doesn't use 6.0.0.) +
[arrow-julia] branch issue270 created (now 8bc9510)
This is an automated email from the ASF dual-hosted git repository. kiszk pushed a change to branch issue270 in repository https://gitbox.apache.org/repos/asf/arrow-julia.git. at 8bc9510 initial draft This branch includes the following new commits: new 8bc9510 initial draft The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[arrow] branch master updated (dbb5b42 -> 3968146)
This is an automated email from the ASF dual-hosted git repository. kiszk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from dbb5b42 ARROW-13194: [Java][Document] Create prose document about Java algorithms add 3968146 ARROW-13032: [Java] Update guava version No new revisions were added by this update. Summary of changes: .../src/test/java/org/apache/arrow/flight/perf/TestPerf.java| 2 +- java/gandiva/pom.xml| 2 -- java/pom.xml| 2 +- 3 files changed, 2 insertions(+), 4 deletions(-)
[arrow-site] branch master updated: ARROW-13047: [Website] Add kiszk to committer list
This is an automated email from the ASF dual-hosted git repository. kiszk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-site.git The following commit(s) were added to refs/heads/master by this push: new 0a542fd ARROW-13047: [Website] Add kiszk to committer list 0a542fd is described below commit 0a542fdba3f56cf6d16298852e14a1a763bf29de Author: ishizaki AuthorDate: Tue Jun 15 01:44:26 2021 + ARROW-13047: [Website] Add kiszk to committer list Closes #118 from kiszk/Arrow-13047 and squashes the following commits: 036bb29b4 add kiszk as a committer Authored-by: ishizaki Signed-off-by: Kazuaki Ishizaki --- _data/committers.yml | 4 1 file changed, 4 insertions(+) diff --git a/_data/committers.yml b/_data/committers.yml index 56b6f62..27079ad 100644 --- a/_data/committers.yml +++ b/_data/committers.yml @@ -188,6 +188,10 @@ role: Committer alias: jorisvandenbossche affiliation: Ursa Computing +- name: Kazuaki Ishizaki + role: Committer + alias: kiszk + affiliation: IBM - name: Kenta Murata role: Committer alias: mrkn
[arrow] branch master updated (b81fcf7 -> 5173af0)
This is an automated email from the ASF dual-hosted git repository. kiszk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from b81fcf7 ARROW-13068: [GLib][Dataset] Change prefix to gdataset_ from gad_ add 5173af0 ARROW-13026: [CI] Use LLVM 10 for s390x No new revisions were added by this update. Summary of changes: .travis.yml | 3 +++ 1 file changed, 3 insertions(+)
[arrow-site] 01/01: add kiszk as a committer
This is an automated email from the ASF dual-hosted git repository. kiszk pushed a commit to branch Arrow-13047 in repository https://gitbox.apache.org/repos/asf/arrow-site.git commit f4fbc497212b4ca44e5c09482187b3ba41c1d243 Author: ishizaki AuthorDate: Fri Jun 11 07:52:43 2021 + add kiszk as a committer --- _data/committers.yml | 4 1 file changed, 4 insertions(+) diff --git a/_data/committers.yml b/_data/committers.yml index d473c60..1b63acf 100644 --- a/_data/committers.yml +++ b/_data/committers.yml @@ -184,6 +184,10 @@ role: Committer alias: jorisvandenbossche affiliation: Ursa Computing +- name: Kazuaki ishizaki + role: Committer + alias: kiszk + affiliation: IBM - name: Kenta Murata role: Committer alias: mrkn
[arrow-site] branch Arrow-13047 created (now f4fbc49)
This is an automated email from the ASF dual-hosted git repository. kiszk pushed a change to branch Arrow-13047 in repository https://gitbox.apache.org/repos/asf/arrow-site.git. at f4fbc49 add kiszk as a committer This branch includes the following new commits: new f4fbc49 add kiszk as a committer The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
svn commit: r35381 - /dev/spark/KEYS
Author: kiszk Date: Mon Aug 26 17:18:45 2019 New Revision: 35381 Log: Update KEYS Modified: dev/spark/KEYS Modified: dev/spark/KEYS == --- dev/spark/KEYS (original) +++ dev/spark/KEYS Mon Aug 26 17:18:45 2019 @@ -993,12 +993,12 @@ ZTFPNYvCMMHM8A== =PEdD -END PGP PUBLIC KEY BLOCK- -pub rsa4096/7F0FEF75 2019-08-19 [SC] -uid [ultimate] Kazuaki Ishizaki (CODE SIGNING KEY) -sub rsa4096/7C3AEC68 2019-08-19 [E] +pub 4096R/7F0FEF75 2019-08-19 +uid Kazuaki Ishizaki (CODE SIGNING KEY) +sub 4096R/7C3AEC68 2019-08-19 -BEGIN PGP PUBLIC KEY BLOCK- -Version: GnuPG v2 +Version: GnuPG v1 mQINBF1a3YcBEAC7I6f1jWpY9WlJBkbwvLneYBjnD2BRwG1eKjkz49aUXVKkx4Du XB7b+agbhWL7EIPjQHVJf0RVGochOujKfcPxOz5bZwAV078EbsJpiAYIAeVEimQF @@ -1049,3 +1049,4 @@ au2shXGZFmo4V56uCJ5HqZTJJZaMceQx7u8uqZbh XJ5Dp1pqv9DC6cl9vLSHctRrM2kG =mQLW -END PGP PUBLIC KEY BLOCK- + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r35371 - in /dev/spark/v2.3.4-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark
Author: kiszk Date: Mon Aug 26 09:54:45 2019 New Revision: 35371 Log: Apache Spark v2.3.4-rc1 docs [This commit notification would consist of 1447 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r35370 - /dev/spark/v2.3.4-rc1-bin/
Author: kiszk Date: Mon Aug 26 09:00:20 2019 New Revision: 35370 Log: Apache Spark v2.3.4-rc1 Added: dev/spark/v2.3.4-rc1-bin/ dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz (with props) dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz.asc dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz.sha512 dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz (with props) dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz.asc dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz.sha512 dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-hadoop2.6.tgz (with props) dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-hadoop2.6.tgz.asc dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-hadoop2.6.tgz.sha512 dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-hadoop2.7.tgz (with props) dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-hadoop2.7.tgz.asc dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-hadoop2.7.tgz.sha512 dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-without-hadoop.tgz (with props) dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-without-hadoop.tgz.asc dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-without-hadoop.tgz.sha512 dev/spark/v2.3.4-rc1-bin/spark-2.3.4.tgz (with props) dev/spark/v2.3.4-rc1-bin/spark-2.3.4.tgz.asc dev/spark/v2.3.4-rc1-bin/spark-2.3.4.tgz.sha512 Added: dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz == Binary file - no diff available. Propchange: dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz.asc == --- dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz.asc (added) +++ dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz.asc Mon Aug 26 09:00:20 2019 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJFBAABCgAvFiEEgFK1nI3grK++CMoV5JoEbH8P73UFAl1jmEMRHGtpc3prQGFw +YWNoZS5vcmcACgkQ5JoEbH8P73Xmog//Qj/814bac4xbMnvsmEQyA9RfIRfv2i2T +jJNh2jHiwUefV4Wd+vXy+5YXSW/A9y8MOgBHXRRbdsv+wzuaccy+SayFCg8gWXOb +CihXw5gc3sUswIRFlxSsjwL0xkcqsxLkmPQtg7eOjIlq1LS3ynLzRPbnOov71que +45dHOnZi1PIEonhQiIgwWEVQiEyUQk0cBjiWDgprrZe4sZStHm0IbTsPJNAmJ3qX +KUZddOfEwmzm4u44oVYR1Z88YrRT/F7LOB8cNvCT/JLGNkn0Sf1DNN42E8gcSUyJ +EWU8cgjy0j2kBYLVdO123Qo/V/HJ8XJUrz9fd3p89ZX6z+q66lCHVypg9Chku/OI +CZ3pnTcBbaUKTMjB0R+r8Yj6OuIyEx95oMABoOi8ye98xrRSw7kEZ1CVIPHUiiDu +oZdP8XQyg5sLda4qFAs/6AGY9jXTDojk46zE+MqJ7jefXVn8lvdwWKVhVaIyZYDs +bDm9lGFTlXyakX0qxeMC7dCNkINMuXgQBZpMb+HMlUWDurneWA3IjwtzvJd2AfiU +ZvBo7Gzv6eBjbcJ9eaG3UXEv25dt3sK56fV7/7Jh+9LVLIZDIIdNwV+YDDmVX4HF +f7KHtaWIfQpy9lbHQqLuf6DikxntT3jIV1NUg7UbkWKrKg1wuBUozmiX4aqRTAnQ +4MKVJZuZmzU= +=h0w6 +-END PGP SIGNATURE- Added: dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz.sha512 == --- dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz.sha512 (added) +++ dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz.sha512 Mon Aug 26 09:00:20 2019 @@ -0,0 +1,3 @@ +SparkR_2.3.4.tar.gz: 09173710 547AFB95 417F908E 8057C0FC C78C41E7 17F64233 + 440B8E58 B43AEB9F 15B9F5CC 1972750B 5A60D3BA AA702D22 + 7AEF3D79 495C323A 803F9F54 7EE5DB13 Added: dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz == Binary file - no diff available. Propchange: dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz.asc == --- dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz.asc (added) +++ dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz.asc Mon Aug 26 09:00:20 2019 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJFBAABCgAvFiEEgFK1nI3grK++CMoV5JoEbH8P73UFAl1jmEURHGtpc3prQGFw +YWNoZS5vcmcACgkQ5JoEbH8P73Vd5BAAmzqGMEWC50Eet0e8Jpl2IT77dRfY+6zz +mj5Nf/4tAFZ8eys7rbr4qKkNoqV3+cfytmNQSC/va6hbb0ioOB19uhvQqUe+OXaF +93enkUjV0FGFwUgh8dD6x+9V0hAQ8lFA6V0Y1NYBa53t5xJFAJSrpVcXv/Af4y0A +p8vyZN9Fea15RQykBQBjszhaQuh8nMqZbZjd19Kmwk2Dfe+ABFRjljpwuZt/paaX +qZaaRpgVj30JmxkbKtXfVeDW6IstcntBJdmCoA2wwcgZmn7vTu5Fu1dd4xXhLq/H +LIlIJXTxzPEmZuHmt7kNMYrj/M1ulPj2GFI0Cm4zg0uw9wbA01VjQ79sFuS6n0HC +cC2JGm8inG6CHmWrZ4peBM1BxefL7yhfWYROQm2jwhfRpeI5EcmHkUlhoK8w6+F6 +2i6H187IXizL0UQjMcQu8WiGHtlcvTPrMP3BHwKuALZlgnrXFfcIrXD+oE1AakK3 +vVwTSt48RxX7dp89pRGx3bxS8zaIsh5bG2GlgYVxx8EtAyq6hK9nzHulLAcY1hS9 +A/8j8lQKZlCtDmr+JkOhcGuZsiUtB2elMwsMJmFn+qBbu0R+AT08x5kAILBNDkp6 +iN8xRoOpgVvcqzHZvraz7a6OqxfoPtQ53A4xNtT8gFTDs1Kq7jLOvmjntZotreUs +gJ3741FqslM= +=hX41 +-END PGP SIGNATURE- Added: dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz.sha512 == --- dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4
[spark] 01/01: Preparing Spark release v2.3.4-rc1
This is an automated email from the ASF dual-hosted git repository. kiszk pushed a commit to tag v2.3.4-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git commit 8c6f8150f3c6298ff4e1c7e06028f12d7eaf0210 Author: Kazuaki Ishizaki AuthorDate: Sun Aug 25 14:38:17 2019 + Preparing Spark release v2.3.4-rc1 --- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 2 +- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 40 files changed, 40 insertions(+), 40 deletions(-) diff --git a/assembly/pom.xml b/assembly/pom.xml index 612a1b8..583b1bf 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.3.4-SNAPSHOT +2.3.4 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 5547e97..29c2c58 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.4-SNAPSHOT +2.3.4 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 119dde2..224b229 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.4-SNAPSHOT +2.3.4 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index dba5224..c7f661e 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.4-SNAPSHOT +2.3.4 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 56902a3..f33fb99 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.4-SNAPSHOT +2.3.4 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 5302d95..a642cb2 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.4-SNAPSHOT +2.3.4 ../../pom.xml diff --git a/common/tags/pom.xml b/common/tags/pom.xml index 232ebfa..29bd7ba 100644 --- a/common/tags/pom.xml +++ b/common/tags/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.4-SNAPSHOT +2.3.4 ../../pom.xml diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml index f0baa2a..03f9b77 100644 --- a/common/unsafe/pom.xml +++ b/common/unsafe/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.4-SNAPSHOT +2.3.4 ../../pom.xml diff --git a/core/pom.xml b/core/pom.xml index d4f5940..c9c1c7c 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.3.4-SNAPSHOT +2.3.4 ../pom.xml diff --git a/docs/_config.yml b/docs/_config.yml index dd46965..f30ff62 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -14,7 +14,7 @@ include: # These allow
[spark] 01/01: Preparing development version 2.3.5-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. kiszk pushed a commit to branch branch-2.3 in repository https://gitbox.apache.org/repos/asf/spark.git commit 3fb9e84c7a5ed6c7bde7a6c64cdeda974734dbc5 Author: Kazuaki Ishizaki AuthorDate: Sun Aug 25 14:38:22 2019 + Preparing development version 2.3.5-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 9124a88..d14017e 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 2.3.4 +Version: 2.3.5 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), diff --git a/assembly/pom.xml b/assembly/pom.xml index 583b1bf..0c36ce2 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.3.4 +2.3.5-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 29c2c58..a9ab9d5 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.4 +2.3.5-SNAPSHOT ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 224b229..f34618e 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.4 +2.3.5-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index c7f661e..62901b9 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.4 +2.3.5-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index f33fb99..8a64c64 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.4 +2.3.5-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index a642cb2..abb43d3 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.4 +2.3.5-SNAPSHOT ../../pom.xml diff --git a/common/tags/pom.xml b/common/tags/pom.xml index 29bd7ba..71e946a 100644 --- a/common/tags/pom.xml +++ b/common/tags/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.4 +2.3.5-SNAPSHOT ../../pom.xml diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml index 03f9b77..9fb92b7 100644 --- a/common/unsafe/pom.xml +++ b/common/unsafe/pom.xml @@ -22,7
[spark] branch branch-2.3 updated (adb5255 -> 3fb9e84)
This is an automated email from the ASF dual-hosted git repository. kiszk pushed a change to branch branch-2.3 in repository https://gitbox.apache.org/repos/asf/spark.git. from adb5255 [SPARK-26895][CORE][2.3] prepareSubmitEnvironment should be called within doAs for proxy users add 8c6f815 Preparing Spark release v2.3.4-rc1 new 3fb9e84 Preparing development version 2.3.5-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] tag v2.3.4-rc1 created (now 8c6f815)
This is an automated email from the ASF dual-hosted git repository. kiszk pushed a change to tag v2.3.4-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git. at 8c6f815 (commit) This tag includes the following new commits: new 8c6f815 Preparing Spark release v2.3.4-rc1 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r35304 - /dev/spark/KEYS
Author: kiszk Date: Mon Aug 19 18:04:54 2019 New Revision: 35304 Log: Update KEYS Modified: dev/spark/KEYS Modified: dev/spark/KEYS == --- dev/spark/KEYS (original) +++ dev/spark/KEYS Mon Aug 19 18:04:54 2019 @@ -991,4 +991,61 @@ QRMaCSG2MOvUAI8Zzk6i1Gi5InRlP5v8sQdrMYvS meyB5uExVklZg9yaoH2zAFXLkjG1pftpkCb57UIyC+Tk5KAMZXyS2vHNGxsnI3FG ZTFPNYvCMMHM8A== =PEdD --END PGP PUBLIC KEY BLOCK- \ No newline at end of file +-END PGP PUBLIC KEY BLOCK- + +pub rsa4096/7F0FEF75 2019-08-19 [SC] +uid [ultimate] Kazuaki Ishizaki (CODE SIGNING KEY) +sub rsa4096/7C3AEC68 2019-08-19 [E] + +-BEGIN PGP PUBLIC KEY BLOCK- +Version: GnuPG v2 + +mQINBF1a3YcBEAC7I6f1jWpY9WlJBkbwvLneYBjnD2BRwG1eKjkz49aUXVKkx4Du +XB7b+agbhWL7EIPjQHVJf0RVGochOujKfcPxOz5bZwAV078EbsJpiAYIAeVEimQF +Pv/uqaf9DbIjZAnJtZhKlyXJaXLpuZbqEwBimpfbgvF5ib4ii7a9kY7BO/YsSXXc +ksLBIHKwNAeKSMIGmCQaxz/tNmRm1tAagFknCEoQ0CMsA8FesjXyS+U6nfJWdK3K ++678joAIhZvdn5k3f/bR94ifeDCh0QsY/zuG95er4Gp0rdr8EmRQbfJAUAwfkn8a +viQD1FkTs+aJn4MSClb+FDXu7hNrPPdayA5CI6PSMdir//+Z7Haox92mvhQT5pBJ +X21R4BDqF6bmL2d/RL3e2Zb1rmztDbTq43OL3Jm+x9R3OPg9UVwFJgHUy/xEirve +Nah5Y6GzV3po/VSJbRIdM/p8OENv6YahFbLr5rT5O9iZns/PXHUpXYXLQDfdFJD2 +oCNFxlQmjfbxIL3PIcdS2gY2o1FmEbYuaLi6Bb9FDTm/J78vHYtR3wLvwufLh3PX +5en9e6+g7o5w3jN/3J1skwXUUSOHK88mWBGt2B9ZwYS+7TQ0zWcgrXjwHQoi92nA +JEADyvQSxTB/zd5usCVel8038FSKhawkhrmLBk2UoJR4prhnPC364MnjgQARAQAB +tDZLYXp1YWtpIElzaGl6YWtpIChDT0RFIFNJR05JTkcgS0VZKSA8a2lzemtAYXBh +Y2hlLm9yZz6JAjgEEwECACIFAl1a3YcCGwMGCwkIBwMCBhUIAgkKCwQWAgMBAh4B +AheAAAoJEOSaBGx/D+91w5AQALB6gff1BuaDyMSiSYaAGBGrBAxs1+ixQxlLX+ld +KG9y/u41S3s8pBn0GXp1jthdURnPm+raLqJk1lVPUZ4JqNYot0FL/nGBIZjRRG6J +TfmlWTza1AfgvzcROaO+7jVPMskBx/HZn8XxEOlMcnBv4P/v3m/QUW9/tH8j+6Bc +JwfiqD3LIaWZTicAMxWE9r7MREDcgkrFROJDDJPMFxoVKomIcc3vzXJeI7BfVtkG +5NHWYDVn4QTQygv+qes4ke9fcik7T5c9NcOjXgks6eF0z7Z/Rj6DUrIyVKleUwJZ +AWpBJcbNc8crg623DRaXpGhXsGvnD5PxcPvVjJ9Jud7o884OhVr2abxQ++rIv/+m +K5K99jbp2E/6Q6tR4ODEoPTGN6fSijziWfhuad26K/grN3878hayGmey57vPH3tx +LsBkUfc9bz46HjcdhfaU1dS82YOMmrFLLmgBEL1PViK628gk0TR7C6N4kHKGWd1f +tQz/bTFzoyXOTpS6bvceE88fZ2FSeepP0AgvZPZsUXxrHXo78oECZ9CAoO/q1P1J +OrKr5oG5om9pB+4SI3FhD2PKxt/+ayMCyA6PVBlw8HDI2XLBmBi9YkiP2ws7gJcF +A958J3CWc6Q7PstrU7LCmL0Apbl8T2Iqph7jB2Qiko2sOyxe5Vwkwh9vHYnhy1ox +YZ2quQINBF1a3YcBEADfvUJtKQKQEHl6ug/c2PxDL5pfEhCXQfBIkfPScUgiQCO9 +aiSigMUReiYa/7cau2jmGUcBktjgLwlAGywX6YTGt/ZIWCkGRdK8K3mVRNssGwXs ++oWcNinRbzIV1cvZu9zndzM7lzIMFriIP/Shsi9QPg6SibK1XhgkYr2pTN8i1zmQ +sd/FGnhEeGZxXDwW7wG6tPXvzQiAZgJEsUh90i9AbQzI/MWG2RqqjKGO423BcpQ8 +nHgUlj7JbgRI2knBjpnxAyKroDGw9dKXNBqYrGjQtbXcCkBTk6vDyOkXUWOz63Bc +AtVfXwL5+RILvYjzn8bZne5jt8fkNK3z29XTv7N3Ee8HRwPnGp6Ny7jGR/f740gP +3b8y4A6QI9YlyvOlp2SHIRPHEYKUQCLaTT1/b4DYN5SGtWwXA4GafCLBVBwD3fr+ +jIhCbInX0+MWOZwuTYuwpoE6nnsnWpsAd6ZOMJInULRyW1f7/zXoq2XvtFH8+IQN +DYtF1lr2C8lm7WUKqSg2bmVy6+gV6KvYqj6oihLQBxlnmrKBQFhkBeOyNYxRW8rf +c+nZZza/5QMZLD7mYL+BGmgHB2eycSuz7UkZ8H5DD0u7Wz74mmmHOg9EyJuJSa3z +UXgg1VNtZCW/m7ha5jedQTiXSYX1R7HjjoX6vWm85mRLAFbyW7DaKnfbYlJvjwAR +AQABiQIfBBgBAgAJBQJdWt2HAhsMAAoJEOSaBGx/D+91YNwQAIY41adyEUHRtwnP +sT90VjheUdz9++rAet8jstwGK8M3wrnhDet18E7wTxt52Knkw7vMS2wqjm3jxeFs +/pI/eA6Tq+AWLEySODegM9TGFxAtcP9TAR0bXGspw5LUWUKO+MJ17pyVs0M/0gb0 +GEjbVCjDn/h0Ozr3n81eokVDhvBZ8n2dUGoetmuZ77Wz1liPoV9G0paISKyLsj9d +iQkE3ExZlGkvX6OiNbJMoo1pHMA4knAo9ch62THofPaoLX5mCKwhNgQDECYd4k89 +ww176ndkrllV8t1v/UDHXPwmDWGK+mLeLk4e+fDJ+bOQrZ543AYk6MB1gRyb94G7 +bQniuoc2YvB+Cn6qOB83ARhDz0zPUGVj/85P8xwmcsZJxlLGpiPAXEQJX2Zk6zFR +1HLxy831IsHaEktglF9tBH+OxJqBg45fbRhuYclWfo724enVdm/rLtR1n93ybaJS +eNmw1Lomks7IsX6qdBR36zVB2WgmIcsnxjtMee+YqfFiAbzbm27lV6A7aTDyIPzQ +R2fSta747XADEy7rzYawV5zuCupmUHp/ZgfQK9xYDnZ+lJHHaipDgmIe4Mfe/3Je +au2shXGZFmo4V56uCJ5HqZTJJZaMceQx7u8uqZbhtHG+lLhbvHXVylaxxEYpqf2O +XJ5Dp1pqv9DC6cl9vLSHctRrM2kG +=mQLW +-END PGP PUBLIC KEY BLOCK- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] spark issue #23262: [SPARK-26312][SQL]Converting converters in RDDConversion...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23262 Good catch, LGTM cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23226: [SPARK-26286][TEST] Add MAXIMUM_PAGE_SIZE_BYTES exceptio...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23226 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23239 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23239 The change looks fine. Do we already have tests for cases 2 and 4? We know test for case 3 is [here](https://github.com/apache/spark/pull/23043). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21777: [WIP][SPARK-24498][SQL] Add JDK compiler for runt...
Github user kiszk closed the pull request at: https://github.com/apache/spark/pull/21777 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23206: [SPARK-26249][SQL] Add ability to inject a rule in order...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23206 cc @viirya @maropu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23206: [SPARK-26249][SQL] Add ability to inject a rule i...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23206#discussion_r238776051 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -235,10 +235,127 @@ abstract class Optimizer(sessionCatalog: SessionCatalog) */ def extendedOperatorOptimizationRules: Seq[Rule[LogicalPlan]] = Nil + /** + * Seq of Optimizer rule to be added after or before a rule in a specific batch + */ + def optimizerRulesInOrder: Seq[RuleInOrder] = Nil + + /** + * Batches to add to the optimizer in a specific order with respect to a existing batch + * Seq of Tuple(existing batch name, order, Batch to add). + */ + def optimizerBatches: Seq[(String, Order.Value, Batch)] = Nil + + /** + * Return the batch after removing rules that need to be excluded + */ + private def handleExcludedRules(batch: Batch, excludedRules: Seq[String]): Seq[Batch] = { +// Excluded rules +val filteredRules = batch.rules.filter { rule => + val exclude = excludedRules.contains(rule.ruleName) + if (exclude) { +logInfo(s"Optimization rule '${rule.ruleName}' is excluded from the optimizer.") + } + !exclude +} +if (batch.rules == filteredRules) { + Seq(batch) +} else if (filteredRules.nonEmpty) { + Seq(Batch(batch.name, batch.strategy, filteredRules: _*)) +} else { + logInfo(s"Optimization batch '${batch.name}' is excluded from the optimizer " + +s"as all enclosed rules have been excluded.") + Seq.empty +} + } + + /** + * Add the customized rules and batch in order to the optimizer batches. + * excludedRules - rules that will be excluded --- End diff -- nit: `* @param excludedRules ...` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23190 LGTM excepts two comments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23190#discussion_r238123212 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -24,7 +24,8 @@ import com.esotericsoftware.kryo.io.{Input, Output} import org.apache.spark.{SparkConf, SparkEnv, SparkException} import org.apache.spark.internal.config.MEMORY_OFFHEAP_ENABLED -import org.apache.spark.memory.{MemoryConsumer, StaticMemoryManager, TaskMemoryManager} +import org.apache.spark.memory.{MemoryConsumer, SparkOutOfMemoryError, --- End diff -- Is it better to use `_`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23190 Is this follow-up of #23084? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23190 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23199: [SPARK-26245][SQL] Add Float literal
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23199 cc @maropu @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23199: [SPARK-26245][SQL] Add Float literal
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23199 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23146: [SPARK-26173] [MLlib] Prior regularization for Lo...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23146#discussion_r238104839 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala --- @@ -82,7 +82,72 @@ private[ml] class L2Regularization( } (0.5 * sum * regParam, Vectors.dense(gradient)) case _: SparseVector => -throw new IllegalArgumentException("Sparse coefficients are not currently supported.") +throw new IllegalArgumentException( + "Sparse coefficients are not currently supported.") +} + } +} + + +/** + * Implements regularization for Maximum A Posteriori (MAP) optimization + * based on prior means (coefficients) and precisions. + * + * @param priorMean Prior coefficients (multivariate mean). + * @param priorPrecisions Prior precisions. + * @param regParam The magnitude of the regularization. + * @param shouldApply A function (Int => Boolean) indicating whether a given index should have + *regularization applied to it. Usually we don't apply regularization to + *the intercept. + * @param applyFeaturesStd Option for a function which maps coefficient index (column major) to the + * feature standard deviation. Since we always standardize the data during + * training, if `standardization` is false, we have to reverse + * standardization by penalizing each component differently by this param. + * If `standardization` is true, this should be `None`. + */ +private[ml] class PriorRegularization( +priorMean: Array[Double], +priorPrecisions: Array[Double], +override val regParam: Double, +shouldApply: Int => Boolean, +applyFeaturesStd: Option[Int => Double]) +extends DifferentiableRegularization[Vector] { + + override def calculate(coefficients: Vector): (Double, Vector) = { +coefficients match { + case dv: DenseVector => +var sum = 0.0 +val gradient = new Array[Double](dv.size) +dv.values.indices.filter(shouldApply).foreach { j => + val coef = coefficients(j) + val priorCoef = priorMean(j) + val priorPrecision = priorPrecisions(j) + applyFeaturesStd match { +case Some(getStd) => + // If `standardization` is false, we still standardize the data + // to improve the rate of convergence; as a result, we have to + // perform this reverse standardization by penalizing each component + // differently to get effectively the same objective function when + // the training dataset is not standardized. + val std = getStd(j) + if (std != 0.0) { +val temp = (coef - priorCoef) / (std * std) +sum += (coef - priorCoef) * temp * priorPrecision +gradient(j) = regParam * priorPrecision * temp + } else { +0.0 --- End diff -- Who consumes `0.0`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23194: [MINOR][SQL] Combine the same codes in test cases
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23194#discussion_r238062387 --- Diff: core/src/main/scala/org/apache/spark/memory/ExecutionMemoryPool.scala --- @@ -37,7 +37,7 @@ import org.apache.spark.internal.Logging * tasks was performed by the ShuffleMemoryManager. * * @param lock a [[MemoryManager]] instance to synchronize on - * @param memoryMode the type of memory tracked by this pool (on- or off-heap) + * @param memoryMode the type of memory tracked by this pool (on-heap or off-heap) --- End diff -- Is this change related to this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23194: [MINOR][SQL] Combine the same codes in test cases
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23194#discussion_r238062396 --- Diff: core/src/main/scala/org/apache/spark/memory/StorageMemoryPool.scala --- @@ -28,7 +28,7 @@ import org.apache.spark.storage.memory.MemoryStore * (caching). * * @param lock a [[MemoryManager]] instance to synchronize on - * @param memoryMode the type of memory tracked by this pool (on- or off-heap) + * @param memoryMode the type of memory tracked by this pool (on-heap or off-heap) --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23194 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23194 Good catch --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23177: [SPARK-26212][Build][test-maven] Upgrade maven version t...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23177 Sure, updated. Thanks for letting know them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23154: [SPARK-26195][SQL] Correct exception messages in some cl...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23154 LGTM cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23177: [SPARK-26212][Build][test-maven] Upgrade maven version t...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23177 I thought that it is automatically done by `build/mvn`, as you pointed out [before](https://github.com/apache/spark/pull/21905#issuecomment-408678119). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23177: [SPARK-26212][Build][test-maven] Upgrade maven version t...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23177 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23176: [SPARK-26211][SQL] Fix InSet for binary, and struct and ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23176 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23176: [SPARK-26211][SQL] Fix InSet for binary, and struct and ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23176 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23177: [SPARK-26212][Build][test-maven] Upgrade maven ve...
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/23177 [SPARK-26212][Build][test-maven] Upgrade maven version to 3.6.0 ## What changes were proposed in this pull request? This PR updates maven version from 3.5.4 to 3.6.0. The release note of the 3.6.0 is [here](https://maven.apache.org/docs/3.6.0/release-notes.html). From [the release note of the 3.6.0](https://maven.apache.org/docs/3.6.0/release-notes.html), the followings are new features: 1. There had been issues related to the project discoverytime which has been increased in previous version which influenced some of our users. 1. The output in the reactor summary has been improved. 1. There was an issue related to the classpath ordering. ## How was this patch tested? Existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-26212 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23177.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23177 commit a5587b5f8468eaf946b89a851e0949231445a4af Author: Kazuaki Ishizaki Date: 2018-11-29T08:14:09Z initial commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23124 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23154: [SPARK-26195][SQL] Correct exception messages in ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23154#discussion_r236919935 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -258,7 +258,7 @@ case class GeneratorOuter(child: Generator) extends UnaryExpression with Generat throw new UnsupportedOperationException(s"Cannot evaluate expression: $this") final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = -throw new UnsupportedOperationException(s"Cannot evaluate expression: $this") +throw new UnsupportedOperationException(s"Cannot generate code expression: $this") --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23154: [SPARK-26195][SQL] Correct exception messages in ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23154#discussion_r236919395 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -204,10 +204,10 @@ case class UnresolvedGenerator(name: FunctionIdentifier, children: Seq[Expressio throw new UnsupportedOperationException(s"Cannot evaluate expression: $this") override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = -throw new UnsupportedOperationException(s"Cannot evaluate expression: $this") +throw new UnsupportedOperationException(s"Cannot generate code expression: $this") --- End diff -- Is it better to use `generate code for expression` or others rather than `generate code expression`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23154: [SPARK-26195][SQL] Correct exception messages in some cl...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23154 @lcqzte10192193 I am sorry for my misunderstanding The original code in `VectorizedRleValuesReader.java` was correct. Could you please revert you change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23151#discussion_r236912228 --- Diff: core/src/test/scala/org/apache/spark/SparkFunSuite.scala --- @@ -105,5 +105,16 @@ abstract class SparkFunSuite logInfo(s"\n\n= FINISHED $shortSuiteName: '$testName' =\n") } } - + /** + * Creates a temporary directory, which is then passed to `f` and will be deleted after `f` + * returns. + * + * @todo Probably this method should be moved to a more general place + */ + protected def withCreateTempDir(f: File => Unit): Unit = { +val dir = Utils.createTempDir() --- End diff -- Is it better to call `.getCanonicalFile`, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23151#discussion_r236912182 --- Diff: core/src/test/scala/org/apache/spark/SparkFunSuite.scala --- @@ -105,5 +105,16 @@ abstract class SparkFunSuite logInfo(s"\n\n= FINISHED $shortSuiteName: '$testName' =\n") } } - + /** + * Creates a temporary directory, which is then passed to `f` and will be deleted after `f` + * returns. + * + * @todo Probably this method should be moved to a more general place + */ + protected def withCreateTempDir(f: File => Unit): Unit = { --- End diff -- Is there any reason not to use `withTempDir` as a function name like other modules? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir functi...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23151 Good catch --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23154: [SQL] Correct two exception message in UnresolvedGenerat...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23154 Good catch. I believe other files (e.g. `VectorizedRleValuesReader.java`, `Expression.scala`, and `generators.scala` also have the similar problem. Can this PR address them? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23124#discussion_r236376102 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -89,7 +89,7 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { val msg1 = intercept[Exception] { df5.select(map_from_arrays($"k", $"v")).collect }.getMessage -assert(msg1.contains("Cannot use null as map key!")) +assert(msg1.contains("Cannot use null as map key")) --- End diff -- Message at Line 98 is also changed now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23141: [SPARK-26021][SQL][followup] add test for special floati...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23141 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23124#discussion_r236284636 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import scala.collection.mutable + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.types.{AtomicType, CalendarIntervalType, DataType, MapType} + +/** + * A builder of [[ArrayBasedMapData]], which fails if a null map key is detected, and removes + * duplicated map keys w.r.t. the last wins policy. + */ +class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends Serializable { + assert(!keyType.existsRecursively(_.isInstanceOf[MapType]), "key of map cannot be/contain map") + + private lazy val keyToIndex = keyType match { +case _: AtomicType | _: CalendarIntervalType => mutable.HashMap.empty[Any, Int] +case _ => + // for complex types, use interpreted ordering to be able to compare unsafe data with safe + // data, e.g. UnsafeRow vs GenericInternalRow. + mutable.TreeMap.empty[Any, Int](TypeUtils.getInterpretedOrdering(keyType)) + } + + // TODO: specialize it + private lazy val keys = mutable.ArrayBuffer.empty[Any] + private lazy val values = mutable.ArrayBuffer.empty[Any] + + private lazy val keyGetter = InternalRow.getAccessor(keyType) + private lazy val valueGetter = InternalRow.getAccessor(valueType) + + def reset(): Unit = { +keyToIndex.clear() +keys.clear() +values.clear() + } + + def put(key: Any, value: Any): Unit = { +if (key == null) { + throw new RuntimeException("Cannot use null as map key.") +} + +val maybeExistingIdx = keyToIndex.get(key) +if (maybeExistingIdx.isDefined) { + // Overwrite the previous value, as the policy is last wins. + values(maybeExistingIdx.get) = value +} else { + keyToIndex.put(key, values.length) + keys.append(key) + values.append(value) +} + } + + // write a 2-field row, the first field is key and the second field is value. + def put(entry: InternalRow): Unit = { +if (entry.isNullAt(0)) { + throw new RuntimeException("Cannot use null as map key.") +} +put(keyGetter(entry, 0), valueGetter(entry, 1)) + } + + def putAll(keyArray: Array[Any], valueArray: Array[Any]): Unit = { +if (keyArray.length != valueArray.length) { + throw new RuntimeException( +"The key array and value array of MapData must have the same length.") +} + +var i = 0 +while (i < keyArray.length) { + put(keyArray(i), valueArray(i)) + i += 1 +} + } + + def putAll(keyArray: ArrayData, valueArray: ArrayData): Unit = { +if (keyArray.numElements() != valueArray.numElements()) { + throw new RuntimeException( +"The key array and value array of MapData must have the same length.") +} + +var i = 0 +while (i < keyArray.numElements()) { + put(keyGetter(keyArray, i), valueGetter(valueArray, i)) + i += 1 +} + } + + def build(): ArrayBasedMapData = { +new ArrayBasedMapData(new GenericArrayData(keys.toArray), new GenericArrayData(values.toArray)) + } + + def from(keyArray: ArrayData, valueArray: ArrayData): ArrayBasedMapData = { +assert(keyToIndex.isEmpty, "'from' can only be called with a fresh GenericMapBuilder.") +putAll(keyArray, valueArray) --- End diff -- Ah, you are right. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23137: [SPARK-26169] Create DataFrameSetOperationsSuite
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23137 LGTM, pending Jenkins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23135: [SPARK-26168][SQL] Update the code comments in Expressio...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23135 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23135: [SPARK-26168][SQL] Update the code comments in Ex...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23135#discussion_r236104602 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -43,9 +43,24 @@ import org.apache.spark.sql.types._ * There are a few important traits: * * - [[Nondeterministic]]: an expression that is not deterministic. + * - [[Stateful]]: an expression that contains mutable state. For example, MonotonicallyIncreasingID + * and Rand. A stateful expression is always non-deterministic. * - [[Unevaluable]]: an expression that is not supposed to be evaluated. * - [[CodegenFallback]]: an expression that does not have code gen implemented and falls back to *interpreted mode. + * - [[NullIntolerant]]: an expression that is null intolerant (i.e. any null input will result in + * null output). + * - [[NonSQLExpression]]: a common base trait for the expressions that doesn't have SQL --- End diff -- nit: `doesn't` -> `do not` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23135: [SPARK-26168][SQL] Update the code comments in Ex...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23135#discussion_r236103936 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -43,9 +43,24 @@ import org.apache.spark.sql.types._ * There are a few important traits: * * - [[Nondeterministic]]: an expression that is not deterministic. + * - [[Stateful]]: an expression that contains mutable state. For example, MonotonicallyIncreasingID + * and Rand. A stateful expression is always non-deterministic. * - [[Unevaluable]]: an expression that is not supposed to be evaluated. * - [[CodegenFallback]]: an expression that does not have code gen implemented and falls back to *interpreted mode. + * - [[NullIntolerant]]: an expression that is null intolerant (i.e. any null input will result in + * null output). + * - [[NonSQLExpression]]: a common base trait for the expressions that doesn't have SQL + * expressions like representation. For example, `ScalaUDF`, `ScalaUDAF`, + * and object `MapObjects` and `Invoke`. + * - [[UserDefinedExpression]]: a common base trait for user-defined functions, including + * UDF/UDAF/UDTF. + * - [[HigherOrderFunction]]: a common base trait for higher order functions that take one or more + *(lambda) functions and applies these to some objects. The function + *produces a number of variables which can be consumed by some lambda + *function. --- End diff -- nit: `function` -> `functions` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22512 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23022: [SPARK-26038] Decimal toScalaBigInt/toJavaBigInteger for...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23022 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23102: [SPARK-26137][CORE] Use Java system property "fil...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23102#discussion_r235975268 --- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala --- @@ -61,11 +62,12 @@ private[deploy] object DependencyUtils extends Logging { hadoopConf: Configuration, secMgr: SecurityManager): String = { val targetDir = Utils.createTempDir() +val fileSeparator = Pattern.quote(System.getProperty("file.separator")) Option(jars) .map { resolveGlobPaths(_, hadoopConf) .split(",") - .filterNot(_.contains(userJar.split("/").last)) + .filterNot(_.contains(userJar.split(fileSeparator).last)) --- End diff -- Beyond the original purpose of this PR, is it better to move `userJar.split(fileSeparator).last` before line 66? This is because `userJar` is not changed in `map { ... }`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23102: [SPARK-26137][CORE] Use Java system property "file.separ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23102 @MaxGekk This PR may change a separator for `userJar` that has `\` on Windows. `resolveGlobPaths` is not applied to `userJar`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23124#discussion_r235952965 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import scala.collection.mutable + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.types.{AtomicType, CalendarIntervalType, DataType, MapType} + +/** + * A builder of [[ArrayBasedMapData]], which fails if a null map key is detected, and removes + * duplicated map keys w.r.t. the last wins policy. + */ +class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends Serializable { + assert(!keyType.existsRecursively(_.isInstanceOf[MapType]), "key of map cannot be/contain map") + + private lazy val keyToIndex = keyType match { +case _: AtomicType | _: CalendarIntervalType => mutable.HashMap.empty[Any, Int] +case _ => + // for complex types, use interpreted ordering to be able to compare unsafe data with safe + // data, e.g. UnsafeRow vs GenericInternalRow. + mutable.TreeMap.empty[Any, Int](TypeUtils.getInterpretedOrdering(keyType)) + } + + // TODO: specialize it + private lazy val keys = mutable.ArrayBuffer.empty[Any] + private lazy val values = mutable.ArrayBuffer.empty[Any] + + private lazy val keyGetter = InternalRow.getAccessor(keyType) + private lazy val valueGetter = InternalRow.getAccessor(valueType) + + def reset(): Unit = { +keyToIndex.clear() +keys.clear() +values.clear() + } + + def put(key: Any, value: Any): Unit = { +if (key == null) { + throw new RuntimeException("Cannot use null as map key.") +} + +val maybeExistingIdx = keyToIndex.get(key) +if (maybeExistingIdx.isDefined) { + // Overwrite the previous value, as the policy is last wins. + values(maybeExistingIdx.get) = value +} else { + keyToIndex.put(key, values.length) + keys.append(key) + values.append(value) +} + } + + // write a 2-field row, the first field is key and the second field is value. + def put(entry: InternalRow): Unit = { +if (entry.isNullAt(0)) { + throw new RuntimeException("Cannot use null as map key.") +} +put(keyGetter(entry, 0), valueGetter(entry, 1)) + } + + def putAll(keyArray: Array[Any], valueArray: Array[Any]): Unit = { +if (keyArray.length != valueArray.length) { + throw new RuntimeException( +"The key array and value array of MapData must have the same length.") +} + +var i = 0 +while (i < keyArray.length) { + put(keyArray(i), valueArray(i)) + i += 1 +} + } + + def putAll(keyArray: ArrayData, valueArray: ArrayData): Unit = { +if (keyArray.numElements() != valueArray.numElements()) { + throw new RuntimeException( +"The key array and value array of MapData must have the same length.") +} + +var i = 0 +while (i < keyArray.numElements()) { + put(keyGetter(keyArray, i), valueGetter(valueArray, i)) + i += 1 +} + } + + def build(): ArrayBasedMapData = { +new ArrayBasedMapData(new GenericArrayData(keys.toArray), new GenericArrayData(values.toArray)) --- End diff -- Is it better to call reset() after calling new ArrayBasedMapData to reduce memory consumption in Java heap? At caller side, ArrayBasedMapBuilder is not released. Therefore, until reset() will be called next time, each ArrayBasedMapBuilder keeps unused data in keys, values, and keyToIndex. They consumes Java heap unexpectedly. ---
[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23124#discussion_r235950666 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import scala.collection.mutable + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.types.{AtomicType, CalendarIntervalType, DataType, MapType} + +/** + * A builder of [[ArrayBasedMapData]], which fails if a null map key is detected, and removes + * duplicated map keys w.r.t. the last wins policy. + */ +class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends Serializable { + assert(!keyType.existsRecursively(_.isInstanceOf[MapType]), "key of map cannot be/contain map") + + private lazy val keyToIndex = keyType match { +case _: AtomicType | _: CalendarIntervalType => mutable.HashMap.empty[Any, Int] +case _ => + // for complex types, use interpreted ordering to be able to compare unsafe data with safe + // data, e.g. UnsafeRow vs GenericInternalRow. + mutable.TreeMap.empty[Any, Int](TypeUtils.getInterpretedOrdering(keyType)) + } + + // TODO: specialize it + private lazy val keys = mutable.ArrayBuffer.empty[Any] + private lazy val values = mutable.ArrayBuffer.empty[Any] + + private lazy val keyGetter = InternalRow.getAccessor(keyType) + private lazy val valueGetter = InternalRow.getAccessor(valueType) + + def reset(): Unit = { +keyToIndex.clear() +keys.clear() +values.clear() + } + + def put(key: Any, value: Any): Unit = { +if (key == null) { + throw new RuntimeException("Cannot use null as map key.") +} + +val maybeExistingIdx = keyToIndex.get(key) +if (maybeExistingIdx.isDefined) { + // Overwrite the previous value, as the policy is last wins. + values(maybeExistingIdx.get) = value +} else { + keyToIndex.put(key, values.length) + keys.append(key) + values.append(value) +} + } + + // write a 2-field row, the first field is key and the second field is value. + def put(entry: InternalRow): Unit = { +if (entry.isNullAt(0)) { + throw new RuntimeException("Cannot use null as map key.") +} +put(keyGetter(entry, 0), valueGetter(entry, 1)) + } + + def putAll(keyArray: Array[Any], valueArray: Array[Any]): Unit = { +if (keyArray.length != valueArray.length) { + throw new RuntimeException( +"The key array and value array of MapData must have the same length.") +} + +var i = 0 +while (i < keyArray.length) { + put(keyArray(i), valueArray(i)) + i += 1 +} + } + + def putAll(keyArray: ArrayData, valueArray: ArrayData): Unit = { +if (keyArray.numElements() != valueArray.numElements()) { + throw new RuntimeException( +"The key array and value array of MapData must have the same length.") +} + +var i = 0 +while (i < keyArray.numElements()) { + put(keyGetter(keyArray, i), valueGetter(valueArray, i)) + i += 1 +} + } + + def build(): ArrayBasedMapData = { +new ArrayBasedMapData(new GenericArrayData(keys.toArray), new GenericArrayData(values.toArray)) + } + + def from(keyArray: ArrayData, valueArray: ArrayData): ArrayBasedMapData = { +assert(keyToIndex.isEmpty, "'from' can only be called with a fresh GenericMapBuilder.") +putAll(keyArray, valueArray) +if (keyToIndex.size == keyArray.numElements()) { + // If there is no duplicated map keys, creates the MapData with the input key and value array,
[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23124#discussion_r235950148 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import scala.collection.mutable + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.types.{AtomicType, CalendarIntervalType, DataType, MapType} + +/** + * A builder of [[ArrayBasedMapData]], which fails if a null map key is detected, and removes + * duplicated map keys w.r.t. the last wins policy. + */ +class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends Serializable { + assert(!keyType.existsRecursively(_.isInstanceOf[MapType]), "key of map cannot be/contain map") + + private lazy val keyToIndex = keyType match { +case _: AtomicType | _: CalendarIntervalType => mutable.HashMap.empty[Any, Int] +case _ => + // for complex types, use interpreted ordering to be able to compare unsafe data with safe + // data, e.g. UnsafeRow vs GenericInternalRow. + mutable.TreeMap.empty[Any, Int](TypeUtils.getInterpretedOrdering(keyType)) + } + + // TODO: specialize it + private lazy val keys = mutable.ArrayBuffer.empty[Any] + private lazy val values = mutable.ArrayBuffer.empty[Any] + + private lazy val keyGetter = InternalRow.getAccessor(keyType) + private lazy val valueGetter = InternalRow.getAccessor(valueType) + + def reset(): Unit = { +keyToIndex.clear() +keys.clear() +values.clear() + } + + def put(key: Any, value: Any): Unit = { +if (key == null) { + throw new RuntimeException("Cannot use null as map key.") +} + +val maybeExistingIdx = keyToIndex.get(key) +if (maybeExistingIdx.isDefined) { + // Overwrite the previous value, as the policy is last wins. + values(maybeExistingIdx.get) = value +} else { + keyToIndex.put(key, values.length) + keys.append(key) + values.append(value) +} + } + + // write a 2-field row, the first field is key and the second field is value. + def put(entry: InternalRow): Unit = { +if (entry.isNullAt(0)) { + throw new RuntimeException("Cannot use null as map key.") +} +put(keyGetter(entry, 0), valueGetter(entry, 1)) + } + + def putAll(keyArray: Array[Any], valueArray: Array[Any]): Unit = { +if (keyArray.length != valueArray.length) { + throw new RuntimeException( +"The key array and value array of MapData must have the same length.") +} + +var i = 0 +while (i < keyArray.length) { + put(keyArray(i), valueArray(i)) + i += 1 +} + } + + def putAll(keyArray: ArrayData, valueArray: ArrayData): Unit = { +if (keyArray.numElements() != valueArray.numElements()) { + throw new RuntimeException( +"The key array and value array of MapData must have the same length.") +} + +var i = 0 +while (i < keyArray.numElements()) { + put(keyGetter(keyArray, i), valueGetter(valueArray, i)) + i += 1 +} + } + + def build(): ArrayBasedMapData = { +new ArrayBasedMapData(new GenericArrayData(keys.toArray), new GenericArrayData(values.toArray)) + } --- End diff -- Is it better to call `reset()` after calling `new ArrayBasedMapData` to reduce memory consumption? At caller side, `ArrayBasedMapBuilder` is not released. Therefore, until reset() will be called next time, each `ArrayBasedMapBuilder` keeps unused data in `keys`, `values`, and `keyToIndex`. They consumes Java heap unexpectedly. ---
[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23124#discussion_r235947044 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import scala.collection.mutable + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.types.{AtomicType, CalendarIntervalType, DataType, MapType} + +/** + * A builder of [[ArrayBasedMapData]], which fails if a null map key is detected, and removes + * duplicated map keys w.r.t. the last wins policy. + */ +class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends Serializable { + assert(!keyType.existsRecursively(_.isInstanceOf[MapType]), "key of map cannot be/contain map") + + private lazy val keyToIndex = keyType match { +case _: AtomicType | _: CalendarIntervalType => mutable.HashMap.empty[Any, Int] +case _ => + // for complex types, use interpreted ordering to be able to compare unsafe data with safe + // data, e.g. UnsafeRow vs GenericInternalRow. + mutable.TreeMap.empty[Any, Int](TypeUtils.getInterpretedOrdering(keyType)) + } + + // TODO: specialize it + private lazy val keys = mutable.ArrayBuffer.empty[Any] + private lazy val values = mutable.ArrayBuffer.empty[Any] + + private lazy val keyGetter = InternalRow.getAccessor(keyType) + private lazy val valueGetter = InternalRow.getAccessor(valueType) + + def reset(): Unit = { +keyToIndex.clear() +keys.clear() +values.clear() + } + + def put(key: Any, value: Any): Unit = { +if (key == null) { + throw new RuntimeException("Cannot use null as map key.") +} + +val maybeExistingIdx = keyToIndex.get(key) +if (maybeExistingIdx.isDefined) { + // Overwrite the previous value, as the policy is last wins. + values(maybeExistingIdx.get) = value +} else { + keyToIndex.put(key, values.length) + keys.append(key) + values.append(value) +} + } + + // write a 2-field row, the first field is key and the second field is value. + def put(entry: InternalRow): Unit = { +if (entry.isNullAt(0)) { + throw new RuntimeException("Cannot use null as map key.") +} +put(keyGetter(entry, 0), valueGetter(entry, 1)) + } + + def putAll(keyArray: Array[Any], valueArray: Array[Any]): Unit = { +if (keyArray.length != valueArray.length) { + throw new RuntimeException( +"The key array and value array of MapData must have the same length.") +} + +var i = 0 +while (i < keyArray.length) { + put(keyArray(i), valueArray(i)) + i += 1 +} + } + + def putAll(keyArray: ArrayData, valueArray: ArrayData): Unit = { +if (keyArray.numElements() != valueArray.numElements()) { + throw new RuntimeException( +"The key array and value array of MapData must have the same length.") +} + +var i = 0 +while (i < keyArray.numElements()) { + put(keyGetter(keyArray, i), valueGetter(valueArray, i)) + i += 1 +} + } + + def build(): ArrayBasedMapData = { +new ArrayBasedMapData(new GenericArrayData(keys.toArray), new GenericArrayData(values.toArray)) + } + + def from(keyArray: ArrayData, valueArray: ArrayData): ArrayBasedMapData = { +assert(keyToIndex.isEmpty, "'from' can only be called with a fresh GenericMapBuilder.") +putAll(keyArray, valueArray) --- End diff -- Can we call `new ArrayBasedMapData(keyArray, valueArray)` without calling
[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23124#discussion_r235943290 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -751,171 +739,46 @@ case class MapFromEntries(child: Expression) extends UnaryExpression { s"${child.dataType.catalogString} type. $prettyName accepts only arrays of pair structs.") } + private lazy val mapBuilder = new ArrayBasedMapBuilder(dataType.keyType, dataType.valueType) + override protected def nullSafeEval(input: Any): Any = { -val arrayData = input.asInstanceOf[ArrayData] -val numEntries = arrayData.numElements() +val entries = input.asInstanceOf[ArrayData] +val numEntries = entries.numElements() var i = 0 -if(nullEntries) { +if (nullEntries) { while (i < numEntries) { -if (arrayData.isNullAt(i)) return null +if (entries.isNullAt(i)) return null i += 1 } } -val keyArray = new Array[AnyRef](numEntries) -val valueArray = new Array[AnyRef](numEntries) + +mapBuilder.reset() i = 0 while (i < numEntries) { - val entry = arrayData.getStruct(i, 2) - val key = entry.get(0, dataType.keyType) - if (key == null) { -throw new RuntimeException("The first field from a struct (key) can't be null.") - } - keyArray.update(i, key) - val value = entry.get(1, dataType.valueType) - valueArray.update(i, value) + mapBuilder.put(entries.getStruct(i, 2)) i += 1 } -ArrayBasedMapData(keyArray, valueArray) +mapBuilder.build() } override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { nullSafeCodeGen(ctx, ev, c => { val numEntries = ctx.freshName("numEntries") - val isKeyPrimitive = CodeGenerator.isPrimitiveType(dataType.keyType) - val isValuePrimitive = CodeGenerator.isPrimitiveType(dataType.valueType) - val code = if (isKeyPrimitive && isValuePrimitive) { -genCodeForPrimitiveElements(ctx, c, ev.value, numEntries) --- End diff -- This change allow us to focus on optimizing `ArrayBasedMapBuilder`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23101: [SPARK-26134][CORE] Upgrading Hadoop to 2.7.4 to fix jav...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23101 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23102: [SPARK-26137][CORE] Use Java system property "file.separ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23102 Would it be possible to update the PR description based on the template? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23102: [SPARK-26137][CORE] Use Java system property "file.separ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23102 Thank you for submitting a PR to fix hard coded character. Is this only one that we have to fix regarding this hard coded character? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23101: [SPARK-26134][CORE] Upgrading Hadoop to 2.7.4 to fix jav...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23101 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23101: [SPARK-26134][CORE] Upgrading Hadoop to 2.7.4 to fix jav...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23101 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23084: [SPARK-26117][CORE][SQL]use SparkOutOfMemoryError instea...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23084 I think that we need to take care of `UnsafeExternalSorterSuite.testGetIterator`, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23043 Do we need to consider `GenerateSafeProjection`, too? In other words, if the generated code or runtime does not use data in `Unsafe`, this `+0.0/-0.0` problem may still exist. Am I correct? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23043 Is it better to update this PR title now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23043 @srowen #21794 is what I thought. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22779: [SPARK-25786][CORE]If the ByteBuffer.hasArray is ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22779#discussion_r234204540 --- Diff: core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala --- @@ -497,6 +498,17 @@ class KryoSerializerAutoResetDisabledSuite extends SparkFunSuite with SharedSpar deserializationStream.close() assert(serInstance.deserialize[Any](helloHello) === ((hello, hello))) } + + test("ByteBuffer.array -- UnsupportedOperationException") { --- End diff -- It would be good to add a prefix like "SPARK-25786: ...". --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23039: [SPARK-26066][SQL] Move truncatedString to sql/ca...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23039#discussion_r234202827 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1594,6 +1594,13 @@ object SQLConf { "WHERE, which does not follow SQL standard.") .booleanConf .createWithDefault(false) + + val MAX_TO_STRING_FIELDS = buildConf("spark.sql.debug.maxToStringFields") +.doc("Maximum number of fields of sequence-like entries that can be converted to strings " + --- End diff -- nit: `that` is not necessary if I am correct. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23043: [SPARK-26021][SQL] replace minus zero with zero i...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23043#discussion_r233951725 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -56,17 +56,32 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean) val javaType = JavaCode.javaType(dataType) val value = CodeGenerator.getValue(ctx.INPUT_ROW, dataType, ordinal.toString) if (nullable) { -ev.copy(code = +var codeBlock = code""" |boolean ${ev.isNull} = ${ctx.INPUT_ROW}.isNullAt($ordinal); |$javaType ${ev.value} = ${ev.isNull} ? | ${CodeGenerator.defaultValue(dataType)} : ($value); - """.stripMargin) + """.stripMargin +codeBlock = codeBlock + genReplaceMinusZeroWithZeroCode(javaType.codeString, ev.value) +ev.copy(code = codeBlock) } else { -ev.copy(code = code"$javaType ${ev.value} = $value;", isNull = FalseLiteral) +var codeBlock = code"$javaType ${ev.value} = $value;" +codeBlock = codeBlock + genReplaceMinusZeroWithZeroCode(javaType.codeString, ev.value) +ev.copy(code = codeBlock, isNull = FalseLiteral) } } } + + private def genReplaceMinusZeroWithZeroCode(javaType: String, value: String): Block = { +val code = s"\nif ($value == -0.0%c) $value = 0.0%c;" +var formattedCode = "" --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23043: [SPARK-26021][SQL] replace minus zero with zero i...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23043#discussion_r233951670 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -56,17 +56,32 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean) val javaType = JavaCode.javaType(dataType) val value = CodeGenerator.getValue(ctx.INPUT_ROW, dataType, ordinal.toString) if (nullable) { -ev.copy(code = +var codeBlock = code""" |boolean ${ev.isNull} = ${ctx.INPUT_ROW}.isNullAt($ordinal); |$javaType ${ev.value} = ${ev.isNull} ? | ${CodeGenerator.defaultValue(dataType)} : ($value); - """.stripMargin) + """.stripMargin +codeBlock = codeBlock + genReplaceMinusZeroWithZeroCode(javaType.codeString, ev.value) +ev.copy(code = codeBlock) } else { -ev.copy(code = code"$javaType ${ev.value} = $value;", isNull = FalseLiteral) +var codeBlock = code"$javaType ${ev.value} = $value;" --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23043 IIUC, we discussed handling `+0.0` and `-0.0` before in another PR. @srowen do you remember the previous discussion? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23044: [SPARK-26073][SQL][FOLLOW-UP] remove invalid comment as ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23044 LGTM, pending Jenkins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22976: [SPARK-25974][SQL]Optimizes Generates bytecode for order...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22976 gentle ping @rednaxelafx --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22976: [SPARK-25974][SQL]Optimizes Generates bytecode for order...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22976 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22976: [SPARK-25974][SQL]Optimizes Generates bytecode for order...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22976 cc @cloud-fan @mgaido91 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22993: [SPARK-24421][BUILD][CORE] Accessing sun.misc.Cle...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22993#discussion_r232488912 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -67,6 +67,59 @@ unaligned = _unaligned; } + // Access fields and constructors once and store them, for performance: + + private static final Constructor DBB_CONSTRUCTOR; + private static final Field DBB_CLEANER_FIELD; + static { +try { + Class cls = Class.forName("java.nio.DirectByteBuffer"); + Constructor constructor = cls.getDeclaredConstructor(Long.TYPE, Integer.TYPE); + constructor.setAccessible(true); + Field cleanerField = cls.getDeclaredField("cleaner"); + cleanerField.setAccessible(true); + DBB_CONSTRUCTOR = constructor; + DBB_CLEANER_FIELD = cleanerField; +} catch (ClassNotFoundException | NoSuchMethodException | NoSuchFieldException e) { + throw new IllegalStateException(e); +} + } + + private static final Method CLEANER_CREATE_METHOD; + static { +// The implementation of Cleaner changed from JDK 8 to 9 +int majorVersion = Integer.parseInt(System.getProperty("java.version").split("\\.")[0]); --- End diff -- From Java 9, here is a [new definition](https://docs.oracle.com/javase/9/migrate/toc.htm#JSMIG-GUID-3A71ECEF-5FC5-46FE-9BA9-88CBFCE828CB). I confirmed it can work for OpenJDK, OpenJ9, and IBM JDK 8 by running the following code ``` public class Version { public static void main(String[] args){ System.out.println("jave.specification.version=" + System.getProperty("java.specification.version")); System.out.println("jave.version=" + System.getProperty("java.version")); System.out.println("jave.version.split(\".\")[0]=" + System.getProperty("java.version").split("\\.")[0]); } } ``` OpenJDK ``` $ ../OpenJDK-8/java -version java version "1.8.0_162" Java(TM) SE Runtime Environment (build 1.8.0_162-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode) $ ../OpenJDK-8/java Version jave.specification.version=1.8 jave.version=1.8.0_162 jave.version.split(".")[0]=1 $ ../OpenJDK-9/java -version openjdk version "9" OpenJDK Runtime Environment (build 9+181) OpenJDK 64-Bit Server VM (build 9+181, mixed mode) $ ../OpenJDK-9/java Version jave.specification.version=9 jave.version=9 jave.version.split(".")[0]=9 $ ../OpenJDK-11/java -version openjdk version "11.0.1" 2018-10-16 OpenJDK Runtime Environment 18.9 (build 11.0.1+13) OpenJDK 64-Bit Server VM 18.9 (build 11.0.1+13, mixed mode) $ ../OpenJDK-11/java Version jave.specification.version=11 jave.version=11.0.1 jave.version.split(".")[0]=11 ``` OpenJ9 ``` $ ../OpenJ9-8/java -version openjdk version "1.8.0_192" OpenJDK Runtime Environment (build 1.8.0_192-b12) Eclipse OpenJ9 VM (build openj9-0.11.0, JRE 1.8.0 Windows 10 amd64-64-Bit Compressed References 20181019_105 (JIT enabled, AOT enabled) OpenJ9 - 090ff9dc OMR - ea548a66 JCL - 51609250b5 based on jdk8u192-b12) $ ../OpenJ9-8/java Version jave.specification.version=1.8 jave.version=1.8.0_192 jave.version.split(".")[0]=1 $ ../OpenJ9-9/java -version openjdk version "9.0.4-adoptopenjdk" OpenJDK Runtime Environment (build 9.0.4-adoptopenjdk+12) Eclipse OpenJ9 VM (build openj9-0.9.0, JRE 9 Windows 8.1 amd64-64-Bit Compressed References 20180814_161 (JIT enabled, AOT enabled) OpenJ9 - 24e53631 OMR - fad6bf6e JCL - feec4d2ae based on jdk-9.0.4+12) $ ../OpenJ9-9/java Version jave.specification.version=9 jave.version=9.0.4-adoptopenjdk jave.version.split(".")[0]=9 $ ../OpenJ9-11/java -version openjdk version "11.0.1" 2018-10-16 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.1+13) Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.11.0, JRE 11 Windows 10 amd64-64-Bit Compressed References 20181020_83 (JIT enabled, AOT enabled) OpenJ9 - 090ff9dc OMR - ea548a66 JCL - f62696f378 based on jdk-11.0.1+13) $ ../OpenJ9-11/java Version jave.specification.version=11 jave.version=11.0.1 jave.version.split(".")[0]=11 ``` IBM JDK ``` $ ../IBMJDK-8/java -version java version "1.8.0" Java(TM) SE Runtime Environment (build pwa6480-20150129_02) IBM J9 VM (build 2.8, JRE 1.8.0 Windows 8.1 amd64-64 Compressed References
[GitHub] spark issue #23005: [SPARK-26005] [SQL] Upgrade ANTRL from 4.7 to 4.7.1
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23005 Files under `dev/deps/` should be updated, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22954: [SPARK-25981][R] Enables Arrow optimization from ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22954#discussion_r232453690 --- Diff: R/pkg/R/SQLContext.R --- @@ -147,6 +147,55 @@ getDefaultSqlSource <- function() { l[["spark.sql.sources.default"]] } +writeToTempFileInArrow <- function(rdf, numPartitions) { + # R API in Arrow is not yet released. CRAN requires to add the package in requireNamespace + # at DESCRIPTION. Later, CRAN checks if the package is available or not. Therefore, it works + # around by avoiding direct requireNamespace. + requireNamespace1 <- requireNamespace + if (requireNamespace1("arrow", quietly = TRUE)) { +record_batch <- get("record_batch", envir = asNamespace("arrow"), inherits = FALSE) +record_batch_stream_writer <- get( + "record_batch_stream_writer", envir = asNamespace("arrow"), inherits = FALSE) +file_output_stream <- get( + "file_output_stream", envir = asNamespace("arrow"), inherits = FALSE) +write_record_batch <- get( + "write_record_batch", envir = asNamespace("arrow"), inherits = FALSE) + +# Currently arrow requires withr; otherwise, write APIs don't work. --- End diff -- nit: `arrow` -> `Arrow` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22998: [SPARK-26001][SQL]Reduce memory copy when writing decima...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22998 I have two questions. 1. Is this PR already tested with `"SPARK-25538: zero-out all bits for decimals"`? 2. How does this PR achieve performance improvement? This PR may introduce some complication. We would like to know the trade-off between performance and ease of understanding. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22976#discussion_r232443266 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala --- @@ -68,57 +68,50 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR genComparisons(ctx, ordering) } + /** + * Creates the variables for ordering based on the given order. + */ + private def createOrderKeys( +ctx: CodegenContext, +row: String, +ordering: Seq[SortOrder]): Seq[ExprCode] = { +ctx.INPUT_ROW = row +// to use INPUT_ROW we must make sure currentVars is null +ctx.currentVars = null +ordering.map(_.child.genCode(ctx)) + } + /** * Generates the code for ordering based on the given order. */ def genComparisons(ctx: CodegenContext, ordering: Seq[SortOrder]): String = { val oldInputRow = ctx.INPUT_ROW val oldCurrentVars = ctx.currentVars -val inputRow = "i" -ctx.INPUT_ROW = inputRow -// to use INPUT_ROW we must make sure currentVars is null -ctx.currentVars = null - -val comparisons = ordering.map { order => - val eval = order.child.genCode(ctx) - val asc = order.isAscending - val isNullA = ctx.freshName("isNullA") - val primitiveA = ctx.freshName("primitiveA") - val isNullB = ctx.freshName("isNullB") - val primitiveB = ctx.freshName("primitiveB") +val rowAKeys = createOrderKeys(ctx, "a", ordering) +val rowBKeys = createOrderKeys(ctx, "b", ordering) +val comparisons = rowAKeys.zip(rowBKeys).zipWithIndex.map { case ((l, r), i) => + val dt = ordering(i).child.dataType + val asc = ordering(i).isAscending + val nullOrdering = ordering(i).nullOrdering s""" - ${ctx.INPUT_ROW} = a; - boolean $isNullA; - ${CodeGenerator.javaType(order.child.dataType)} $primitiveA; - { -${eval.code} -$isNullA = ${eval.isNull}; -$primitiveA = ${eval.value}; - } - ${ctx.INPUT_ROW} = b; - boolean $isNullB; - ${CodeGenerator.javaType(order.child.dataType)} $primitiveB; - { -${eval.code} -$isNullB = ${eval.isNull}; -$primitiveB = ${eval.value}; - } - if ($isNullA && $isNullB) { + ${l.code} --- End diff -- Would you update this to use | and .stripMargin? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22976#discussion_r232443230 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala --- @@ -133,7 +126,6 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR returnType = "int", makeSplitFunction = { body => s""" --- End diff -- Would you update this to use `|` and `.stripMargin`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22976#discussion_r232443205 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala --- @@ -154,7 +146,6 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR // make sure INPUT_ROW is declared even if splitExpressions // returns an inlined block s""" --- End diff -- Can we use just `code`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22985: [SPARK-25510][SQL][TEST][FOLLOW-UP] Remove BenchmarkWith...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22985 LGTM, pending Jenkins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22976: [SPARK-25974][SQL]Optimizes Generates bytecode for order...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22976 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22976#discussion_r231886019 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala --- @@ -68,57 +68,51 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR genComparisons(ctx, ordering) } + /** + * Creates the variables for ordering based on the given order. + */ + private def createOrderKeys( +ctx: CodegenContext, +row: String, +ordering: Seq[SortOrder]): Seq[ExprCode] = { +ctx.INPUT_ROW = row +ctx.currentVars = null +ordering.map(_.child.genCode(ctx)) + } + /** * Generates the code for ordering based on the given order. */ def genComparisons(ctx: CodegenContext, ordering: Seq[SortOrder]): String = { val oldInputRow = ctx.INPUT_ROW val oldCurrentVars = ctx.currentVars -val inputRow = "i" -ctx.INPUT_ROW = inputRow // to use INPUT_ROW we must make sure currentVars is null ctx.currentVars = null - -val comparisons = ordering.map { order => - val eval = order.child.genCode(ctx) - val asc = order.isAscending - val isNullA = ctx.freshName("isNullA") - val primitiveA = ctx.freshName("primitiveA") - val isNullB = ctx.freshName("isNullB") - val primitiveB = ctx.freshName("primitiveB") +val rowAKeys = createOrderKeys(ctx, "a", ordering) +val rowBKeys = createOrderKeys(ctx, "b", ordering) +val comparisons = rowAKeys.zip(rowBKeys).zipWithIndex.map { case ((l, r), i) => + val dt = ordering(i).child.dataType + val asc = ordering(i).isAscending + val nullOrdering = ordering(i).nullOrdering s""" - ${ctx.INPUT_ROW} = a; - boolean $isNullA; - ${CodeGenerator.javaType(order.child.dataType)} $primitiveA; - { -${eval.code} -$isNullA = ${eval.isNull}; -$primitiveA = ${eval.value}; - } - ${ctx.INPUT_ROW} = b; - boolean $isNullB; - ${CodeGenerator.javaType(order.child.dataType)} $primitiveB; - { -${eval.code} -$isNullB = ${eval.isNull}; -$primitiveB = ${eval.value}; - } - if ($isNullA && $isNullB) { + ${l.code} + ${r.code} + if (${l.isNull} && ${r.isNull}) { // Nothing - } else if ($isNullA) { + } else if (${l.isNull}) { return ${ - order.nullOrdering match { -case NullsFirst => "-1" -case NullsLast => "1" - }}; - } else if ($isNullB) { +nullOrdering match { --- End diff -- nit: indentation problem --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22976#discussion_r231886071 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala --- @@ -68,57 +68,51 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR genComparisons(ctx, ordering) } + /** + * Creates the variables for ordering based on the given order. + */ + private def createOrderKeys( +ctx: CodegenContext, +row: String, +ordering: Seq[SortOrder]): Seq[ExprCode] = { +ctx.INPUT_ROW = row +ctx.currentVars = null +ordering.map(_.child.genCode(ctx)) + } + /** * Generates the code for ordering based on the given order. */ def genComparisons(ctx: CodegenContext, ordering: Seq[SortOrder]): String = { val oldInputRow = ctx.INPUT_ROW val oldCurrentVars = ctx.currentVars -val inputRow = "i" -ctx.INPUT_ROW = inputRow // to use INPUT_ROW we must make sure currentVars is null ctx.currentVars = null - -val comparisons = ordering.map { order => - val eval = order.child.genCode(ctx) - val asc = order.isAscending - val isNullA = ctx.freshName("isNullA") - val primitiveA = ctx.freshName("primitiveA") - val isNullB = ctx.freshName("isNullB") - val primitiveB = ctx.freshName("primitiveB") +val rowAKeys = createOrderKeys(ctx, "a", ordering) +val rowBKeys = createOrderKeys(ctx, "b", ordering) +val comparisons = rowAKeys.zip(rowBKeys).zipWithIndex.map { case ((l, r), i) => + val dt = ordering(i).child.dataType + val asc = ordering(i).isAscending + val nullOrdering = ordering(i).nullOrdering s""" - ${ctx.INPUT_ROW} = a; - boolean $isNullA; - ${CodeGenerator.javaType(order.child.dataType)} $primitiveA; - { -${eval.code} -$isNullA = ${eval.isNull}; -$primitiveA = ${eval.value}; - } - ${ctx.INPUT_ROW} = b; - boolean $isNullB; - ${CodeGenerator.javaType(order.child.dataType)} $primitiveB; - { -${eval.code} -$isNullB = ${eval.isNull}; -$primitiveB = ${eval.value}; - } - if ($isNullA && $isNullB) { + ${l.code} + ${r.code} + if (${l.isNull} && ${r.isNull}) { // Nothing - } else if ($isNullA) { + } else if (${l.isNull}) { return ${ - order.nullOrdering match { -case NullsFirst => "-1" -case NullsLast => "1" - }}; - } else if ($isNullB) { +nullOrdering match { + case NullsFirst => "-1" + case NullsLast => "1" +}}; + } else if (${r.isNull}) { return ${ - order.nullOrdering match { -case NullsFirst => "1" -case NullsLast => "-1" - }}; +nullOrdering match { --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22976#discussion_r231885902 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala --- @@ -68,57 +68,51 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR genComparisons(ctx, ordering) } + /** + * Creates the variables for ordering based on the given order. + */ + private def createOrderKeys( +ctx: CodegenContext, +row: String, +ordering: Seq[SortOrder]): Seq[ExprCode] = { +ctx.INPUT_ROW = row +ctx.currentVars = null +ordering.map(_.child.genCode(ctx)) + } + /** * Generates the code for ordering based on the given order. */ def genComparisons(ctx: CodegenContext, ordering: Seq[SortOrder]): String = { val oldInputRow = ctx.INPUT_ROW val oldCurrentVars = ctx.currentVars -val inputRow = "i" -ctx.INPUT_ROW = inputRow // to use INPUT_ROW we must make sure currentVars is null ctx.currentVars = null --- End diff -- Now, can we remove this line? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22976: [SPARK-25974][SQL]Optimizes Generates bytecode for order...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22976 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org