[arrow] branch ARROW-17715a updated (019740ad9f -> 145e167753)

2023-01-09 Thread kiszk
This is an automated email from the ASF dual-hosted git repository.

kiszk pushed a change to branch ARROW-17715a
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 019740ad9f change LLVM version
 add 145e167753 disable JEMALLOC and PLASMA

No new revisions were added by this update.

Summary of changes:
 .travis.yml | 4 
 1 file changed, 4 insertions(+)



[arrow] branch ARROW-17715a updated (194f3f249a -> 019740ad9f)

2023-01-09 Thread kiszk
This is an automated email from the ASF dual-hosted git repository.

kiszk pushed a change to branch ARROW-17715a
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 194f3f249a add CLANG_TOOLS
 add 019740ad9f change LLVM version

No new revisions were added by this update.

Summary of changes:
 .travis.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)



[arrow] branch ARROW-17715a updated (0f753003a1 -> 194f3f249a)

2023-01-09 Thread kiszk
This is an automated email from the ASF dual-hosted git repository.

kiszk pushed a change to branch ARROW-17715a
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 0f753003a1 disable to build COMPUTE and GANDIVA
 add 194f3f249a add CLANG_TOOLS

No new revisions were added by this update.

Summary of changes:
 .travis.yml | 1 +
 1 file changed, 1 insertion(+)



[arrow] branch ARROW-17715a created (now 0f753003a1)

2023-01-09 Thread kiszk
This is an automated email from the ASF dual-hosted git repository.

kiszk pushed a change to branch ARROW-17715a
in repository https://gitbox.apache.org/repos/asf/arrow.git


  at 0f753003a1 disable to build COMPUTE and GANDIVA

This branch includes the following new commits:

 new 0f753003a1 disable to build COMPUTE and GANDIVA

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[arrow] 01/01: disable to build COMPUTE and GANDIVA

2023-01-09 Thread kiszk
This is an automated email from the ASF dual-hosted git repository.

kiszk pushed a commit to branch ARROW-17715a
in repository https://gitbox.apache.org/repos/asf/arrow.git

commit 0f753003a1557c4e554ea879464ca001888e5c2f
Author: Kazuaki Ishizaki 
AuthorDate: Mon Jan 9 04:01:38 2023 -0500

disable to build COMPUTE and GANDIVA

reduce parallelism to 1
---
 .travis.yml | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index a96e07f0c4..b508d60609 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -93,14 +93,16 @@ jobs:
 # aws-sdk-cpp.
 DOCKER_RUN_ARGS: >-
   "
+  -e ARROW_COMPUTE=OFF
   -e ARROW_FLIGHT=ON
+  -e ARROW_GANDIVA=OFF
   -e ARROW_GCS=OFF
   -e ARROW_MIMALLOC=OFF
   -e ARROW_ORC=OFF
   -e ARROW_PARQUET=OFF
   -e ARROW_S3=OFF
   -e ARROW_SUBSTRAIT=OFF
-  -e CMAKE_BUILD_PARALLEL_LEVEL=2
+  -e CMAKE_BUILD_PARALLEL_LEVEL=1
   -e CMAKE_UNITY_BUILD=ON
   -e PARQUET_BUILD_EXAMPLES=OFF
   -e PARQUET_BUILD_EXECUTABLES=OFF
@@ -144,14 +146,16 @@ jobs:
 # aws-sdk-cpp.
 DOCKER_RUN_ARGS: >-
   "
+  -e ARROW_COMPUTE=OFF
   -e ARROW_FLIGHT=ON
+  -e ARROW_GANDIVA=OFF
   -e ARROW_GCS=OFF
   -e ARROW_MIMALLOC=OFF
   -e ARROW_ORC=OFF
   -e ARROW_PARQUET=OFF
   -e ARROW_PYTHON=ON
   -e ARROW_S3=OFF
-  -e CMAKE_BUILD_PARALLEL_LEVEL=2
+  -e CMAKE_BUILD_PARALLEL_LEVEL=1
   -e CMAKE_UNITY_BUILD=ON
   -e PARQUET_BUILD_EXAMPLES=OFF
   -e PARQUET_BUILD_EXECUTABLES=OFF



[arrow-julia] 01/01: initial draft

2022-01-17 Thread kiszk
This is an automated email from the ASF dual-hosted git repository.

kiszk pushed a commit to branch issue270
in repository https://gitbox.apache.org/repos/asf/arrow-julia.git

commit 8bc9510d21b27ebdf6d10e8bd57d553287f066b0
Author: ishizaki 
AuthorDate: Mon Jan 17 16:18:06 2022 +

initial draft
---
 CONTRIBUTING.md | 37 +
 1 file changed, 37 insertions(+)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 000..163d2b9
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,37 @@
+# How to contribute Apache Arrow Julia
+
+## Did you find a bug or have an improvement?
+
+We recommend you first search among existing [Github 
issues](https://github.com/apache/arrow-julia/issues). The community may 
already address the same idea. If you could find the issue, you may want to 
contribute to the existing issue.
+
+
+## How do you write a patch that fixes a bug or brings an improvement? 
+If you cannot find the same idea in the issues, you first need to write a 
GitHub issue (e.g. [issues in 
Arrow-julia](https://github.com/apache/arrow-julia/issues)) for a bug fix or 
planned features for the improvement. To write an issue would help the 
community have visibility and opportunities for collaborations before a pull 
request (PR) shows up. This is for the [Apache way](http://theapacheway.com/). 
We can use GitHub labels to identify bugs.   
+It should not be necessary to file an issue for some non-code changes, such as 
CI changes or minor documentation updates such as fixing typos.
+
+After writing the issue, you may want to write a code by creating [a 
PR](https://github.com/apache/arrow-julia/pulls). In the PR, it is preferable 
to refer to the issue number (e.g. `#1`) that you already created.
+
+
+## Do you want to propose a significant new feature or an important 
refactoring?
+
+We ask that all discussions about major changes in the codebase happen 
publicly on the [arrow-dev 
mailing-list](https://lists.apache.org/list.html?d...@arrow.apache.org).
+
+
+## Do you have questions about the source code, the build procedure or the 
development process?
+
+You can also ask on the mailing-list, see above.
+
+
+## Local Development
+
+When developing on Arrow.jl it is recommended that you run the following to 
ensure that any changes to ArrowTypes.jl are immediately available to Arrow.jl 
without requiring a release:
+
+```
+julia --project -e 'using Pkg; Pkg.develop(path="src/ArrowTypes")'
+```
+
+
+## Release cycle
+
+The Julia community would like an independent release cycle. Release for 
apache/arrow doesn't include the Julia implementation. The Julia implementation 
uses separated version scheme. (apache/arrow uses 6.0.0 as the next version but 
the next Julia implementation release doesn't use 6.0.0.)
+


[arrow-julia] branch issue270 created (now 8bc9510)

2022-01-17 Thread kiszk
This is an automated email from the ASF dual-hosted git repository.

kiszk pushed a change to branch issue270
in repository https://gitbox.apache.org/repos/asf/arrow-julia.git.


  at 8bc9510  initial draft

This branch includes the following new commits:

 new 8bc9510  initial draft

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



[arrow] branch master updated (dbb5b42 -> 3968146)

2021-07-05 Thread kiszk
This is an automated email from the ASF dual-hosted git repository.

kiszk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from dbb5b42  ARROW-13194: [Java][Document] Create prose document about 
Java algorithms
 add 3968146  ARROW-13032: [Java] Update guava version

No new revisions were added by this update.

Summary of changes:
 .../src/test/java/org/apache/arrow/flight/perf/TestPerf.java| 2 +-
 java/gandiva/pom.xml| 2 --
 java/pom.xml| 2 +-
 3 files changed, 2 insertions(+), 4 deletions(-)


[arrow-site] branch master updated: ARROW-13047: [Website] Add kiszk to committer list

2021-06-14 Thread kiszk
This is an automated email from the ASF dual-hosted git repository.

kiszk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-site.git


The following commit(s) were added to refs/heads/master by this push:
 new 0a542fd  ARROW-13047: [Website] Add kiszk to committer list
0a542fd is described below

commit 0a542fdba3f56cf6d16298852e14a1a763bf29de
Author: ishizaki 
AuthorDate: Tue Jun 15 01:44:26 2021 +

ARROW-13047: [Website] Add kiszk to committer list

Closes #118 from kiszk/Arrow-13047 and squashes the following commits:

036bb29b4  add kiszk as a committer

Authored-by: ishizaki 
Signed-off-by: Kazuaki Ishizaki 
---
 _data/committers.yml | 4 
 1 file changed, 4 insertions(+)

diff --git a/_data/committers.yml b/_data/committers.yml
index 56b6f62..27079ad 100644
--- a/_data/committers.yml
+++ b/_data/committers.yml
@@ -188,6 +188,10 @@
   role: Committer
   alias: jorisvandenbossche
   affiliation: Ursa Computing
+- name: Kazuaki Ishizaki
+  role: Committer
+  alias: kiszk
+  affiliation: IBM
 - name: Kenta Murata
   role: Committer
   alias: mrkn


[arrow] branch master updated (b81fcf7 -> 5173af0)

2021-06-14 Thread kiszk
This is an automated email from the ASF dual-hosted git repository.

kiszk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from b81fcf7  ARROW-13068: [GLib][Dataset] Change prefix to gdataset_ from 
gad_
 add 5173af0  ARROW-13026: [CI] Use LLVM 10 for s390x

No new revisions were added by this update.

Summary of changes:
 .travis.yml | 3 +++
 1 file changed, 3 insertions(+)


[arrow-site] 01/01: add kiszk as a committer

2021-06-11 Thread kiszk
This is an automated email from the ASF dual-hosted git repository.

kiszk pushed a commit to branch Arrow-13047
in repository https://gitbox.apache.org/repos/asf/arrow-site.git

commit f4fbc497212b4ca44e5c09482187b3ba41c1d243
Author: ishizaki 
AuthorDate: Fri Jun 11 07:52:43 2021 +

add kiszk as a committer
---
 _data/committers.yml | 4 
 1 file changed, 4 insertions(+)

diff --git a/_data/committers.yml b/_data/committers.yml
index d473c60..1b63acf 100644
--- a/_data/committers.yml
+++ b/_data/committers.yml
@@ -184,6 +184,10 @@
   role: Committer
   alias: jorisvandenbossche
   affiliation: Ursa Computing
+- name: Kazuaki ishizaki
+  role: Committer
+  alias: kiszk
+  affiliation: IBM
 - name: Kenta Murata
   role: Committer
   alias: mrkn


[arrow-site] branch Arrow-13047 created (now f4fbc49)

2021-06-11 Thread kiszk
This is an automated email from the ASF dual-hosted git repository.

kiszk pushed a change to branch Arrow-13047
in repository https://gitbox.apache.org/repos/asf/arrow-site.git.


  at f4fbc49  add kiszk as a committer

This branch includes the following new commits:

 new f4fbc49  add kiszk as a committer

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



svn commit: r35381 - /dev/spark/KEYS

2019-08-26 Thread kiszk
Author: kiszk
Date: Mon Aug 26 17:18:45 2019
New Revision: 35381

Log:
Update KEYS

Modified:
dev/spark/KEYS

Modified: dev/spark/KEYS
==
--- dev/spark/KEYS (original)
+++ dev/spark/KEYS Mon Aug 26 17:18:45 2019
@@ -993,12 +993,12 @@ ZTFPNYvCMMHM8A==
 =PEdD
 -END PGP PUBLIC KEY BLOCK-
 
-pub   rsa4096/7F0FEF75 2019-08-19 [SC]
-uid [ultimate] Kazuaki Ishizaki (CODE SIGNING KEY) 
-sub   rsa4096/7C3AEC68 2019-08-19 [E]
+pub   4096R/7F0FEF75 2019-08-19
+uid  Kazuaki Ishizaki (CODE SIGNING KEY) 
+sub   4096R/7C3AEC68 2019-08-19
 
 -BEGIN PGP PUBLIC KEY BLOCK-
-Version: GnuPG v2
+Version: GnuPG v1
 
 mQINBF1a3YcBEAC7I6f1jWpY9WlJBkbwvLneYBjnD2BRwG1eKjkz49aUXVKkx4Du
 XB7b+agbhWL7EIPjQHVJf0RVGochOujKfcPxOz5bZwAV078EbsJpiAYIAeVEimQF
@@ -1049,3 +1049,4 @@ au2shXGZFmo4V56uCJ5HqZTJJZaMceQx7u8uqZbh
 XJ5Dp1pqv9DC6cl9vLSHctRrM2kG
 =mQLW
 -END PGP PUBLIC KEY BLOCK-
+



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r35371 - in /dev/spark/v2.3.4-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark

2019-08-26 Thread kiszk
Author: kiszk
Date: Mon Aug 26 09:54:45 2019
New Revision: 35371

Log:
Apache Spark v2.3.4-rc1 docs


[This commit notification would consist of 1447 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r35370 - /dev/spark/v2.3.4-rc1-bin/

2019-08-26 Thread kiszk
Author: kiszk
Date: Mon Aug 26 09:00:20 2019
New Revision: 35370

Log:
Apache Spark v2.3.4-rc1

Added:
dev/spark/v2.3.4-rc1-bin/
dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz   (with props)
dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz.asc
dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz.sha512
dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz   (with props)
dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz.asc
dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz.sha512
dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-hadoop2.6.tgz   (with props)
dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-hadoop2.6.tgz.asc
dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-hadoop2.6.tgz.sha512
dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-hadoop2.7.tgz   (with props)
dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-hadoop2.7.tgz.asc
dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-hadoop2.7.tgz.sha512
dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-without-hadoop.tgz   (with props)
dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-without-hadoop.tgz.asc
dev/spark/v2.3.4-rc1-bin/spark-2.3.4-bin-without-hadoop.tgz.sha512
dev/spark/v2.3.4-rc1-bin/spark-2.3.4.tgz   (with props)
dev/spark/v2.3.4-rc1-bin/spark-2.3.4.tgz.asc
dev/spark/v2.3.4-rc1-bin/spark-2.3.4.tgz.sha512

Added: dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz.asc
==
--- dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz.asc (added)
+++ dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz.asc Mon Aug 26 09:00:20 2019
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJFBAABCgAvFiEEgFK1nI3grK++CMoV5JoEbH8P73UFAl1jmEMRHGtpc3prQGFw
+YWNoZS5vcmcACgkQ5JoEbH8P73Xmog//Qj/814bac4xbMnvsmEQyA9RfIRfv2i2T
+jJNh2jHiwUefV4Wd+vXy+5YXSW/A9y8MOgBHXRRbdsv+wzuaccy+SayFCg8gWXOb
+CihXw5gc3sUswIRFlxSsjwL0xkcqsxLkmPQtg7eOjIlq1LS3ynLzRPbnOov71que
+45dHOnZi1PIEonhQiIgwWEVQiEyUQk0cBjiWDgprrZe4sZStHm0IbTsPJNAmJ3qX
+KUZddOfEwmzm4u44oVYR1Z88YrRT/F7LOB8cNvCT/JLGNkn0Sf1DNN42E8gcSUyJ
+EWU8cgjy0j2kBYLVdO123Qo/V/HJ8XJUrz9fd3p89ZX6z+q66lCHVypg9Chku/OI
+CZ3pnTcBbaUKTMjB0R+r8Yj6OuIyEx95oMABoOi8ye98xrRSw7kEZ1CVIPHUiiDu
+oZdP8XQyg5sLda4qFAs/6AGY9jXTDojk46zE+MqJ7jefXVn8lvdwWKVhVaIyZYDs
+bDm9lGFTlXyakX0qxeMC7dCNkINMuXgQBZpMb+HMlUWDurneWA3IjwtzvJd2AfiU
+ZvBo7Gzv6eBjbcJ9eaG3UXEv25dt3sK56fV7/7Jh+9LVLIZDIIdNwV+YDDmVX4HF
+f7KHtaWIfQpy9lbHQqLuf6DikxntT3jIV1NUg7UbkWKrKg1wuBUozmiX4aqRTAnQ
+4MKVJZuZmzU=
+=h0w6
+-END PGP SIGNATURE-

Added: dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz.sha512
==
--- dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz.sha512 (added)
+++ dev/spark/v2.3.4-rc1-bin/SparkR_2.3.4.tar.gz.sha512 Mon Aug 26 09:00:20 2019
@@ -0,0 +1,3 @@
+SparkR_2.3.4.tar.gz: 09173710 547AFB95 417F908E 8057C0FC C78C41E7 17F64233
+ 440B8E58 B43AEB9F 15B9F5CC 1972750B 5A60D3BA AA702D22
+ 7AEF3D79 495C323A 803F9F54 7EE5DB13

Added: dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz.asc
==
--- dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz.asc (added)
+++ dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz.asc Mon Aug 26 09:00:20 2019
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJFBAABCgAvFiEEgFK1nI3grK++CMoV5JoEbH8P73UFAl1jmEURHGtpc3prQGFw
+YWNoZS5vcmcACgkQ5JoEbH8P73Vd5BAAmzqGMEWC50Eet0e8Jpl2IT77dRfY+6zz
+mj5Nf/4tAFZ8eys7rbr4qKkNoqV3+cfytmNQSC/va6hbb0ioOB19uhvQqUe+OXaF
+93enkUjV0FGFwUgh8dD6x+9V0hAQ8lFA6V0Y1NYBa53t5xJFAJSrpVcXv/Af4y0A
+p8vyZN9Fea15RQykBQBjszhaQuh8nMqZbZjd19Kmwk2Dfe+ABFRjljpwuZt/paaX
+qZaaRpgVj30JmxkbKtXfVeDW6IstcntBJdmCoA2wwcgZmn7vTu5Fu1dd4xXhLq/H
+LIlIJXTxzPEmZuHmt7kNMYrj/M1ulPj2GFI0Cm4zg0uw9wbA01VjQ79sFuS6n0HC
+cC2JGm8inG6CHmWrZ4peBM1BxefL7yhfWYROQm2jwhfRpeI5EcmHkUlhoK8w6+F6
+2i6H187IXizL0UQjMcQu8WiGHtlcvTPrMP3BHwKuALZlgnrXFfcIrXD+oE1AakK3
+vVwTSt48RxX7dp89pRGx3bxS8zaIsh5bG2GlgYVxx8EtAyq6hK9nzHulLAcY1hS9
+A/8j8lQKZlCtDmr+JkOhcGuZsiUtB2elMwsMJmFn+qBbu0R+AT08x5kAILBNDkp6
+iN8xRoOpgVvcqzHZvraz7a6OqxfoPtQ53A4xNtT8gFTDs1Kq7jLOvmjntZotreUs
+gJ3741FqslM=
+=hX41
+-END PGP SIGNATURE-

Added: dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4.tar.gz.sha512
==
--- dev/spark/v2.3.4-rc1-bin/pyspark-2.3.4

[spark] 01/01: Preparing Spark release v2.3.4-rc1

2019-08-25 Thread kiszk
This is an automated email from the ASF dual-hosted git repository.

kiszk pushed a commit to tag v2.3.4-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 8c6f8150f3c6298ff4e1c7e06028f12d7eaf0210
Author: Kazuaki Ishizaki 
AuthorDate: Sun Aug 25 14:38:17 2019 +

Preparing Spark release v2.3.4-rc1
---
 assembly/pom.xml  | 2 +-
 common/kvstore/pom.xml| 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 2 +-
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 hadoop-cloud/pom.xml  | 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/kubernetes/core/pom.xml | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 40 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/assembly/pom.xml b/assembly/pom.xml
index 612a1b8..583b1bf 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.4-SNAPSHOT
+2.3.4
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 5547e97..29c2c58 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.4-SNAPSHOT
+2.3.4
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 119dde2..224b229 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.4-SNAPSHOT
+2.3.4
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index dba5224..c7f661e 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.4-SNAPSHOT
+2.3.4
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 56902a3..f33fb99 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.4-SNAPSHOT
+2.3.4
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 5302d95..a642cb2 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.4-SNAPSHOT
+2.3.4
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index 232ebfa..29bd7ba 100644
--- a/common/tags/pom.xml
+++ b/common/tags/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.4-SNAPSHOT
+2.3.4
 ../../pom.xml
   
 
diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml
index f0baa2a..03f9b77 100644
--- a/common/unsafe/pom.xml
+++ b/common/unsafe/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.4-SNAPSHOT
+2.3.4
 ../../pom.xml
   
 
diff --git a/core/pom.xml b/core/pom.xml
index d4f5940..c9c1c7c 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.4-SNAPSHOT
+2.3.4
 ../pom.xml
   
 
diff --git a/docs/_config.yml b/docs/_config.yml
index dd46965..f30ff62 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -14,7 +14,7 @@ include:
 
 # These allow

[spark] 01/01: Preparing development version 2.3.5-SNAPSHOT

2019-08-25 Thread kiszk
This is an automated email from the ASF dual-hosted git repository.

kiszk pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 3fb9e84c7a5ed6c7bde7a6c64cdeda974734dbc5
Author: Kazuaki Ishizaki 
AuthorDate: Sun Aug 25 14:38:22 2019 +

Preparing development version 2.3.5-SNAPSHOT
---
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/kvstore/pom.xml| 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 hadoop-cloud/pom.xml  | 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/kubernetes/core/pom.xml | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 41 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 9124a88..d14017e 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.3.4
+Version: 2.3.5
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 583b1bf..0c36ce2 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.4
+2.3.5-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 29c2c58..a9ab9d5 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.4
+2.3.5-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 224b229..f34618e 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.4
+2.3.5-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index c7f661e..62901b9 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.4
+2.3.5-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index f33fb99..8a64c64 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.4
+2.3.5-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index a642cb2..abb43d3 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.4
+2.3.5-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index 29bd7ba..71e946a 100644
--- a/common/tags/pom.xml
+++ b/common/tags/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.4
+2.3.5-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml
index 03f9b77..9fb92b7 100644
--- a/common/unsafe/pom.xml
+++ b/common/unsafe/pom.xml
@@ -22,7

[spark] branch branch-2.3 updated (adb5255 -> 3fb9e84)

2019-08-25 Thread kiszk
This is an automated email from the ASF dual-hosted git repository.

kiszk pushed a change to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git.


from adb5255  [SPARK-26895][CORE][2.3] prepareSubmitEnvironment should be 
called within doAs for proxy users
 add 8c6f815  Preparing Spark release v2.3.4-rc1
 new 3fb9e84  Preparing development version 2.3.5-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/kvstore/pom.xml| 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 hadoop-cloud/pom.xml  | 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/kubernetes/core/pom.xml | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 41 files changed, 42 insertions(+), 42 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] tag v2.3.4-rc1 created (now 8c6f815)

2019-08-25 Thread kiszk
This is an automated email from the ASF dual-hosted git repository.

kiszk pushed a change to tag v2.3.4-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 8c6f815  (commit)
This tag includes the following new commits:

 new 8c6f815  Preparing Spark release v2.3.4-rc1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r35304 - /dev/spark/KEYS

2019-08-19 Thread kiszk
Author: kiszk
Date: Mon Aug 19 18:04:54 2019
New Revision: 35304

Log:
Update KEYS

Modified:
dev/spark/KEYS

Modified: dev/spark/KEYS
==
--- dev/spark/KEYS (original)
+++ dev/spark/KEYS Mon Aug 19 18:04:54 2019
@@ -991,4 +991,61 @@ QRMaCSG2MOvUAI8Zzk6i1Gi5InRlP5v8sQdrMYvS
 meyB5uExVklZg9yaoH2zAFXLkjG1pftpkCb57UIyC+Tk5KAMZXyS2vHNGxsnI3FG
 ZTFPNYvCMMHM8A==
 =PEdD
--END PGP PUBLIC KEY BLOCK-
\ No newline at end of file
+-END PGP PUBLIC KEY BLOCK-
+
+pub   rsa4096/7F0FEF75 2019-08-19 [SC]
+uid [ultimate] Kazuaki Ishizaki (CODE SIGNING KEY) 
+sub   rsa4096/7C3AEC68 2019-08-19 [E]
+
+-BEGIN PGP PUBLIC KEY BLOCK-
+Version: GnuPG v2
+
+mQINBF1a3YcBEAC7I6f1jWpY9WlJBkbwvLneYBjnD2BRwG1eKjkz49aUXVKkx4Du
+XB7b+agbhWL7EIPjQHVJf0RVGochOujKfcPxOz5bZwAV078EbsJpiAYIAeVEimQF
+Pv/uqaf9DbIjZAnJtZhKlyXJaXLpuZbqEwBimpfbgvF5ib4ii7a9kY7BO/YsSXXc
+ksLBIHKwNAeKSMIGmCQaxz/tNmRm1tAagFknCEoQ0CMsA8FesjXyS+U6nfJWdK3K
++678joAIhZvdn5k3f/bR94ifeDCh0QsY/zuG95er4Gp0rdr8EmRQbfJAUAwfkn8a
+viQD1FkTs+aJn4MSClb+FDXu7hNrPPdayA5CI6PSMdir//+Z7Haox92mvhQT5pBJ
+X21R4BDqF6bmL2d/RL3e2Zb1rmztDbTq43OL3Jm+x9R3OPg9UVwFJgHUy/xEirve
+Nah5Y6GzV3po/VSJbRIdM/p8OENv6YahFbLr5rT5O9iZns/PXHUpXYXLQDfdFJD2
+oCNFxlQmjfbxIL3PIcdS2gY2o1FmEbYuaLi6Bb9FDTm/J78vHYtR3wLvwufLh3PX
+5en9e6+g7o5w3jN/3J1skwXUUSOHK88mWBGt2B9ZwYS+7TQ0zWcgrXjwHQoi92nA
+JEADyvQSxTB/zd5usCVel8038FSKhawkhrmLBk2UoJR4prhnPC364MnjgQARAQAB
+tDZLYXp1YWtpIElzaGl6YWtpIChDT0RFIFNJR05JTkcgS0VZKSA8a2lzemtAYXBh
+Y2hlLm9yZz6JAjgEEwECACIFAl1a3YcCGwMGCwkIBwMCBhUIAgkKCwQWAgMBAh4B
+AheAAAoJEOSaBGx/D+91w5AQALB6gff1BuaDyMSiSYaAGBGrBAxs1+ixQxlLX+ld
+KG9y/u41S3s8pBn0GXp1jthdURnPm+raLqJk1lVPUZ4JqNYot0FL/nGBIZjRRG6J
+TfmlWTza1AfgvzcROaO+7jVPMskBx/HZn8XxEOlMcnBv4P/v3m/QUW9/tH8j+6Bc
+JwfiqD3LIaWZTicAMxWE9r7MREDcgkrFROJDDJPMFxoVKomIcc3vzXJeI7BfVtkG
+5NHWYDVn4QTQygv+qes4ke9fcik7T5c9NcOjXgks6eF0z7Z/Rj6DUrIyVKleUwJZ
+AWpBJcbNc8crg623DRaXpGhXsGvnD5PxcPvVjJ9Jud7o884OhVr2abxQ++rIv/+m
+K5K99jbp2E/6Q6tR4ODEoPTGN6fSijziWfhuad26K/grN3878hayGmey57vPH3tx
+LsBkUfc9bz46HjcdhfaU1dS82YOMmrFLLmgBEL1PViK628gk0TR7C6N4kHKGWd1f
+tQz/bTFzoyXOTpS6bvceE88fZ2FSeepP0AgvZPZsUXxrHXo78oECZ9CAoO/q1P1J
+OrKr5oG5om9pB+4SI3FhD2PKxt/+ayMCyA6PVBlw8HDI2XLBmBi9YkiP2ws7gJcF
+A958J3CWc6Q7PstrU7LCmL0Apbl8T2Iqph7jB2Qiko2sOyxe5Vwkwh9vHYnhy1ox
+YZ2quQINBF1a3YcBEADfvUJtKQKQEHl6ug/c2PxDL5pfEhCXQfBIkfPScUgiQCO9
+aiSigMUReiYa/7cau2jmGUcBktjgLwlAGywX6YTGt/ZIWCkGRdK8K3mVRNssGwXs
++oWcNinRbzIV1cvZu9zndzM7lzIMFriIP/Shsi9QPg6SibK1XhgkYr2pTN8i1zmQ
+sd/FGnhEeGZxXDwW7wG6tPXvzQiAZgJEsUh90i9AbQzI/MWG2RqqjKGO423BcpQ8
+nHgUlj7JbgRI2knBjpnxAyKroDGw9dKXNBqYrGjQtbXcCkBTk6vDyOkXUWOz63Bc
+AtVfXwL5+RILvYjzn8bZne5jt8fkNK3z29XTv7N3Ee8HRwPnGp6Ny7jGR/f740gP
+3b8y4A6QI9YlyvOlp2SHIRPHEYKUQCLaTT1/b4DYN5SGtWwXA4GafCLBVBwD3fr+
+jIhCbInX0+MWOZwuTYuwpoE6nnsnWpsAd6ZOMJInULRyW1f7/zXoq2XvtFH8+IQN
+DYtF1lr2C8lm7WUKqSg2bmVy6+gV6KvYqj6oihLQBxlnmrKBQFhkBeOyNYxRW8rf
+c+nZZza/5QMZLD7mYL+BGmgHB2eycSuz7UkZ8H5DD0u7Wz74mmmHOg9EyJuJSa3z
+UXgg1VNtZCW/m7ha5jedQTiXSYX1R7HjjoX6vWm85mRLAFbyW7DaKnfbYlJvjwAR
+AQABiQIfBBgBAgAJBQJdWt2HAhsMAAoJEOSaBGx/D+91YNwQAIY41adyEUHRtwnP
+sT90VjheUdz9++rAet8jstwGK8M3wrnhDet18E7wTxt52Knkw7vMS2wqjm3jxeFs
+/pI/eA6Tq+AWLEySODegM9TGFxAtcP9TAR0bXGspw5LUWUKO+MJ17pyVs0M/0gb0
+GEjbVCjDn/h0Ozr3n81eokVDhvBZ8n2dUGoetmuZ77Wz1liPoV9G0paISKyLsj9d
+iQkE3ExZlGkvX6OiNbJMoo1pHMA4knAo9ch62THofPaoLX5mCKwhNgQDECYd4k89
+ww176ndkrllV8t1v/UDHXPwmDWGK+mLeLk4e+fDJ+bOQrZ543AYk6MB1gRyb94G7
+bQniuoc2YvB+Cn6qOB83ARhDz0zPUGVj/85P8xwmcsZJxlLGpiPAXEQJX2Zk6zFR
+1HLxy831IsHaEktglF9tBH+OxJqBg45fbRhuYclWfo724enVdm/rLtR1n93ybaJS
+eNmw1Lomks7IsX6qdBR36zVB2WgmIcsnxjtMee+YqfFiAbzbm27lV6A7aTDyIPzQ
+R2fSta747XADEy7rzYawV5zuCupmUHp/ZgfQK9xYDnZ+lJHHaipDgmIe4Mfe/3Je
+au2shXGZFmo4V56uCJ5HqZTJJZaMceQx7u8uqZbhtHG+lLhbvHXVylaxxEYpqf2O
+XJ5Dp1pqv9DC6cl9vLSHctRrM2kG
+=mQLW
+-END PGP PUBLIC KEY BLOCK-



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] spark issue #23262: [SPARK-26312][SQL]Converting converters in RDDConversion...

2018-12-08 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23262
  
Good catch, LGTM
cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23226: [SPARK-26286][TEST] Add MAXIMUM_PAGE_SIZE_BYTES exceptio...

2018-12-07 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23226
  
retest this please



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-07 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23239
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-06 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23239
  
The change looks fine.
Do we already have tests for cases 2 and 4?  We know test for case 3 is 
[here](https://github.com/apache/spark/pull/23043).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21777: [WIP][SPARK-24498][SQL] Add JDK compiler for runt...

2018-12-06 Thread kiszk
Github user kiszk closed the pull request at:

https://github.com/apache/spark/pull/21777


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23206: [SPARK-26249][SQL] Add ability to inject a rule in order...

2018-12-05 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23206
  
cc @viirya @maropu 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23206: [SPARK-26249][SQL] Add ability to inject a rule i...

2018-12-04 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23206#discussion_r238776051
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -235,10 +235,127 @@ abstract class Optimizer(sessionCatalog: 
SessionCatalog)
*/
   def extendedOperatorOptimizationRules: Seq[Rule[LogicalPlan]] = Nil
 
+  /**
+   * Seq of Optimizer rule to be added after or before a rule in a 
specific batch
+   */
+  def optimizerRulesInOrder: Seq[RuleInOrder] = Nil
+
+  /**
+   * Batches to add to the optimizer in a specific order with respect to a 
existing batch
+   * Seq of Tuple(existing batch name, order, Batch to add).
+   */
+  def optimizerBatches: Seq[(String, Order.Value, Batch)] = Nil
+
+  /**
+   * Return the batch after removing rules that need to be excluded
+   */
+  private def handleExcludedRules(batch: Batch, excludedRules: 
Seq[String]): Seq[Batch] = {
+// Excluded rules
+val filteredRules = batch.rules.filter { rule =>
+  val exclude = excludedRules.contains(rule.ruleName)
+  if (exclude) {
+logInfo(s"Optimization rule '${rule.ruleName}' is excluded from 
the optimizer.")
+  }
+  !exclude
+}
+if (batch.rules == filteredRules) {
+  Seq(batch)
+} else if (filteredRules.nonEmpty) {
+  Seq(Batch(batch.name, batch.strategy, filteredRules: _*))
+} else {
+  logInfo(s"Optimization batch '${batch.name}' is excluded from the 
optimizer " +
+s"as all enclosed rules have been excluded.")
+  Seq.empty
+}
+  }
+
+  /**
+   * Add the customized rules and batch in order to the optimizer batches.
+   * excludedRules - rules that will be excluded
--- End diff --

nit: `* @param excludedRules ...`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-02 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23190
  
LGTM excepts two comments


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of...

2018-12-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23190#discussion_r238123212
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -24,7 +24,8 @@ import com.esotericsoftware.kryo.io.{Input, Output}
 
 import org.apache.spark.{SparkConf, SparkEnv, SparkException}
 import org.apache.spark.internal.config.MEMORY_OFFHEAP_ENABLED
-import org.apache.spark.memory.{MemoryConsumer, StaticMemoryManager, 
TaskMemoryManager}
+import org.apache.spark.memory.{MemoryConsumer, SparkOutOfMemoryError,
--- End diff --

Is it better to use `_`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-02 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23190
  
Is this follow-up of #23084?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-02 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23190
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23199: [SPARK-26245][SQL] Add Float literal

2018-12-02 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23199
  
cc @maropu @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23199: [SPARK-26245][SQL] Add Float literal

2018-12-02 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23199
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23146: [SPARK-26173] [MLlib] Prior regularization for Lo...

2018-12-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23146#discussion_r238104839
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
 ---
@@ -82,7 +82,72 @@ private[ml] class L2Regularization(
 }
 (0.5 * sum * regParam, Vectors.dense(gradient))
   case _: SparseVector =>
-throw new IllegalArgumentException("Sparse coefficients are not 
currently supported.")
+throw new IllegalArgumentException(
+  "Sparse coefficients are not currently supported.")
+}
+  }
+}
+
+
+/**
+ * Implements regularization for Maximum A Posteriori (MAP) optimization
+ * based on prior means (coefficients) and precisions.
+ *
+ * @param priorMean Prior coefficients (multivariate mean).
+ * @param priorPrecisions Prior precisions.
+ * @param regParam The magnitude of the regularization.
+ * @param shouldApply A function (Int => Boolean) indicating whether a 
given index should have
+ *regularization applied to it. Usually we don't apply 
regularization to
+ *the intercept.
+ * @param applyFeaturesStd Option for a function which maps coefficient 
index (column major) to the
+ * feature standard deviation. Since we always 
standardize the data during
+ * training, if `standardization` is false, we 
have to reverse
+ * standardization by penalizing each component 
differently by this param.
+ * If `standardization` is true, this should be 
`None`.
+ */
+private[ml] class PriorRegularization(
+priorMean: Array[Double],
+priorPrecisions: Array[Double],
+override val regParam: Double,
+shouldApply: Int => Boolean,
+applyFeaturesStd: Option[Int => Double])
+extends DifferentiableRegularization[Vector] {
+
+  override def calculate(coefficients: Vector): (Double, Vector) = {
+coefficients match {
+  case dv: DenseVector =>
+var sum = 0.0
+val gradient = new Array[Double](dv.size)
+dv.values.indices.filter(shouldApply).foreach { j =>
+  val coef = coefficients(j)
+  val priorCoef = priorMean(j)
+  val priorPrecision = priorPrecisions(j)
+  applyFeaturesStd match {
+case Some(getStd) =>
+  // If `standardization` is false, we still standardize the 
data
+  // to improve the rate of convergence; as a result, we have 
to
+  // perform this reverse standardization by penalizing each 
component
+  // differently to get effectively the same objective 
function when
+  // the training dataset is not standardized.
+  val std = getStd(j)
+  if (std != 0.0) {
+val temp = (coef - priorCoef) / (std * std)
+sum += (coef - priorCoef) * temp * priorPrecision
+gradient(j) = regParam * priorPrecision * temp
+  } else {
+0.0
--- End diff --

Who consumes `0.0`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23194: [MINOR][SQL] Combine the same codes in test cases

2018-12-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23194#discussion_r238062387
  
--- Diff: 
core/src/main/scala/org/apache/spark/memory/ExecutionMemoryPool.scala ---
@@ -37,7 +37,7 @@ import org.apache.spark.internal.Logging
  * tasks was performed by the ShuffleMemoryManager.
  *
  * @param lock a [[MemoryManager]] instance to synchronize on
- * @param memoryMode the type of memory tracked by this pool (on- or 
off-heap)
+ * @param memoryMode the type of memory tracked by this pool (on-heap or 
off-heap)
--- End diff --

Is this change related to this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23194: [MINOR][SQL] Combine the same codes in test cases

2018-12-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23194#discussion_r238062396
  
--- Diff: 
core/src/main/scala/org/apache/spark/memory/StorageMemoryPool.scala ---
@@ -28,7 +28,7 @@ import org.apache.spark.storage.memory.MemoryStore
  * (caching).
  *
  * @param lock a [[MemoryManager]] instance to synchronize on
- * @param memoryMode the type of memory tracked by this pool (on- or 
off-heap)
+ * @param memoryMode the type of memory tracked by this pool (on-heap or 
off-heap)
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases

2018-12-01 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23194
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases

2018-12-01 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23194
  
Good catch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23177: [SPARK-26212][Build][test-maven] Upgrade maven version t...

2018-11-30 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23177
  
Sure, updated. Thanks for letting know them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23154: [SPARK-26195][SQL] Correct exception messages in some cl...

2018-11-30 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23154
  
LGTM cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23177: [SPARK-26212][Build][test-maven] Upgrade maven version t...

2018-11-29 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23177
  
I thought that it is automatically done by `build/mvn`, as you pointed out 
[before](https://github.com/apache/spark/pull/21905#issuecomment-408678119).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23177: [SPARK-26212][Build][test-maven] Upgrade maven version t...

2018-11-29 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23177
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23176: [SPARK-26211][SQL] Fix InSet for binary, and struct and ...

2018-11-29 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23176
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23176: [SPARK-26211][SQL] Fix InSet for binary, and struct and ...

2018-11-29 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23176
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23177: [SPARK-26212][Build][test-maven] Upgrade maven ve...

2018-11-29 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/23177

[SPARK-26212][Build][test-maven] Upgrade maven version to 3.6.0

## What changes were proposed in this pull request?

This PR updates maven version from 3.5.4 to 3.6.0. The release note of the 
3.6.0 is [here](https://maven.apache.org/docs/3.6.0/release-notes.html).

From [the release note of the 
3.6.0](https://maven.apache.org/docs/3.6.0/release-notes.html), the followings 
are new features:
1. There had been issues related to the project discoverytime which has 
been increased in previous version which influenced some of our users.
1. The output in the reactor summary has been improved.
1. There was an issue related to the classpath ordering.

## How was this patch tested?

Existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-26212

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23177.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23177


commit a5587b5f8468eaf946b89a851e0949231445a4af
Author: Kazuaki Ishizaki 
Date:   2018-11-29T08:14:09Z

initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...

2018-11-28 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23124
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23154: [SPARK-26195][SQL] Correct exception messages in ...

2018-11-27 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23154#discussion_r236919935
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -258,7 +258,7 @@ case class GeneratorOuter(child: Generator) extends 
UnaryExpression with Generat
 throw new UnsupportedOperationException(s"Cannot evaluate expression: 
$this")
 
   final override protected def doGenCode(ctx: CodegenContext, ev: 
ExprCode): ExprCode =
-throw new UnsupportedOperationException(s"Cannot evaluate expression: 
$this")
+throw new UnsupportedOperationException(s"Cannot generate code 
expression: $this")
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23154: [SPARK-26195][SQL] Correct exception messages in ...

2018-11-27 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23154#discussion_r236919395
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -204,10 +204,10 @@ case class UnresolvedGenerator(name: 
FunctionIdentifier, children: Seq[Expressio
 throw new UnsupportedOperationException(s"Cannot evaluate expression: 
$this")
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode =
-throw new UnsupportedOperationException(s"Cannot evaluate expression: 
$this")
+throw new UnsupportedOperationException(s"Cannot generate code 
expression: $this")
--- End diff --

Is it better to use `generate code for expression` or others rather than 
`generate code expression`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23154: [SPARK-26195][SQL] Correct exception messages in some cl...

2018-11-27 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23154
  
@lcqzte10192193 I am sorry for my misunderstanding The original code in 
`VectorizedRleValuesReader.java` was correct. Could you please revert you 
change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir...

2018-11-27 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23151#discussion_r236912228
  
--- Diff: core/src/test/scala/org/apache/spark/SparkFunSuite.scala ---
@@ -105,5 +105,16 @@ abstract class SparkFunSuite
   logInfo(s"\n\n= FINISHED $shortSuiteName: '$testName' =\n")
 }
   }
-
+  /**
+   * Creates a temporary directory, which is then passed to `f` and will 
be deleted after `f`
+   * returns.
+   *
+   * @todo Probably this method should be moved to a more general place
+   */
+  protected def withCreateTempDir(f: File => Unit): Unit = {
+val dir = Utils.createTempDir()
--- End diff --

Is it better to call `.getCanonicalFile`, too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir...

2018-11-27 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23151#discussion_r236912182
  
--- Diff: core/src/test/scala/org/apache/spark/SparkFunSuite.scala ---
@@ -105,5 +105,16 @@ abstract class SparkFunSuite
   logInfo(s"\n\n= FINISHED $shortSuiteName: '$testName' =\n")
 }
   }
-
+  /**
+   * Creates a temporary directory, which is then passed to `f` and will 
be deleted after `f`
+   * returns.
+   *
+   * @todo Probably this method should be moved to a more general place
+   */
+  protected def withCreateTempDir(f: File => Unit): Unit = {
--- End diff --

Is there any reason not to use `withTempDir` as a function name like other 
modules?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir functi...

2018-11-27 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23151
  
Good catch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23154: [SQL] Correct two exception message in UnresolvedGenerat...

2018-11-27 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23154
  
Good catch. I believe other files (e.g. `VectorizedRleValuesReader.java`, 
`Expression.scala`, and `generators.scala` also have the similar problem. Can 
this PR address them?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...

2018-11-26 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23124#discussion_r236376102
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -89,7 +89,7 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 val msg1 = intercept[Exception] {
   df5.select(map_from_arrays($"k", $"v")).collect
 }.getMessage
-assert(msg1.contains("Cannot use null as map key!"))
+assert(msg1.contains("Cannot use null as map key"))
--- End diff --

Message at Line 98 is also changed now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23141: [SPARK-26021][SQL][followup] add test for special floati...

2018-11-26 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23141
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...

2018-11-26 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23124#discussion_r236284636
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala
 ---
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types.{AtomicType, CalendarIntervalType, 
DataType, MapType}
+
+/**
+ * A builder of [[ArrayBasedMapData]], which fails if a null map key is 
detected, and removes
+ * duplicated map keys w.r.t. the last wins policy.
+ */
+class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends 
Serializable {
+  assert(!keyType.existsRecursively(_.isInstanceOf[MapType]), "key of map 
cannot be/contain map")
+
+  private lazy val keyToIndex = keyType match {
+case _: AtomicType | _: CalendarIntervalType => 
mutable.HashMap.empty[Any, Int]
+case _ =>
+  // for complex types, use interpreted ordering to be able to compare 
unsafe data with safe
+  // data, e.g. UnsafeRow vs GenericInternalRow.
+  mutable.TreeMap.empty[Any, 
Int](TypeUtils.getInterpretedOrdering(keyType))
+  }
+
+  // TODO: specialize it
+  private lazy val keys = mutable.ArrayBuffer.empty[Any]
+  private lazy val values = mutable.ArrayBuffer.empty[Any]
+
+  private lazy val keyGetter = InternalRow.getAccessor(keyType)
+  private lazy val valueGetter = InternalRow.getAccessor(valueType)
+
+  def reset(): Unit = {
+keyToIndex.clear()
+keys.clear()
+values.clear()
+  }
+
+  def put(key: Any, value: Any): Unit = {
+if (key == null) {
+  throw new RuntimeException("Cannot use null as map key.")
+}
+
+val maybeExistingIdx = keyToIndex.get(key)
+if (maybeExistingIdx.isDefined) {
+  // Overwrite the previous value, as the policy is last wins.
+  values(maybeExistingIdx.get) = value
+} else {
+  keyToIndex.put(key, values.length)
+  keys.append(key)
+  values.append(value)
+}
+  }
+
+  // write a 2-field row, the first field is key and the second field is 
value.
+  def put(entry: InternalRow): Unit = {
+if (entry.isNullAt(0)) {
+  throw new RuntimeException("Cannot use null as map key.")
+}
+put(keyGetter(entry, 0), valueGetter(entry, 1))
+  }
+
+  def putAll(keyArray: Array[Any], valueArray: Array[Any]): Unit = {
+if (keyArray.length != valueArray.length) {
+  throw new RuntimeException(
+"The key array and value array of MapData must have the same 
length.")
+}
+
+var i = 0
+while (i < keyArray.length) {
+  put(keyArray(i), valueArray(i))
+  i += 1
+}
+  }
+
+  def putAll(keyArray: ArrayData, valueArray: ArrayData): Unit = {
+if (keyArray.numElements() != valueArray.numElements()) {
+  throw new RuntimeException(
+"The key array and value array of MapData must have the same 
length.")
+}
+
+var i = 0
+while (i < keyArray.numElements()) {
+  put(keyGetter(keyArray, i), valueGetter(valueArray, i))
+  i += 1
+}
+  }
+
+  def build(): ArrayBasedMapData = {
+new ArrayBasedMapData(new GenericArrayData(keys.toArray), new 
GenericArrayData(values.toArray))
+  }
+
+  def from(keyArray: ArrayData, valueArray: ArrayData): ArrayBasedMapData 
= {
+assert(keyToIndex.isEmpty, "'from' can only be called with a fresh 
GenericMapBuilder.")
+putAll(keyArray, valueArray)
--- End diff --

Ah, you are right.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23137: [SPARK-26169] Create DataFrameSetOperationsSuite

2018-11-25 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23137
  
LGTM, pending Jenkins


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23135: [SPARK-26168][SQL] Update the code comments in Expressio...

2018-11-25 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23135
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23135: [SPARK-26168][SQL] Update the code comments in Ex...

2018-11-25 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23135#discussion_r236104602
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -43,9 +43,24 @@ import org.apache.spark.sql.types._
  * There are a few important traits:
  *
  * - [[Nondeterministic]]: an expression that is not deterministic.
+ * - [[Stateful]]: an expression that contains mutable state. For example, 
MonotonicallyIncreasingID
+ * and Rand. A stateful expression is always 
non-deterministic.
  * - [[Unevaluable]]: an expression that is not supposed to be evaluated.
  * - [[CodegenFallback]]: an expression that does not have code gen 
implemented and falls back to
  *interpreted mode.
+ * - [[NullIntolerant]]: an expression that is null intolerant (i.e. any 
null input will result in
+ *   null output).
+ * - [[NonSQLExpression]]: a common base trait for the expressions that 
doesn't have SQL
--- End diff --

nit: `doesn't` -> `do not`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23135: [SPARK-26168][SQL] Update the code comments in Ex...

2018-11-25 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23135#discussion_r236103936
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -43,9 +43,24 @@ import org.apache.spark.sql.types._
  * There are a few important traits:
  *
  * - [[Nondeterministic]]: an expression that is not deterministic.
+ * - [[Stateful]]: an expression that contains mutable state. For example, 
MonotonicallyIncreasingID
+ * and Rand. A stateful expression is always 
non-deterministic.
  * - [[Unevaluable]]: an expression that is not supposed to be evaluated.
  * - [[CodegenFallback]]: an expression that does not have code gen 
implemented and falls back to
  *interpreted mode.
+ * - [[NullIntolerant]]: an expression that is null intolerant (i.e. any 
null input will result in
+ *   null output).
+ * - [[NonSQLExpression]]: a common base trait for the expressions that 
doesn't have SQL
+ * expressions like representation. For example, 
`ScalaUDF`, `ScalaUDAF`,
+ * and object `MapObjects` and `Invoke`.
+ * - [[UserDefinedExpression]]: a common base trait for user-defined 
functions, including
+ *  UDF/UDAF/UDTF.
+ * - [[HigherOrderFunction]]: a common base trait for higher order 
functions that take one or more
+ *(lambda) functions and applies these to some 
objects. The function
+ *produces a number of variables which can be 
consumed by some lambda
+ *function.
--- End diff --

nit: `function` -> `functions` ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...

2018-11-23 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22512
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23022: [SPARK-26038] Decimal toScalaBigInt/toJavaBigInteger for...

2018-11-23 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23022
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23102: [SPARK-26137][CORE] Use Java system property "fil...

2018-11-23 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23102#discussion_r235975268
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala 
---
@@ -61,11 +62,12 @@ private[deploy] object DependencyUtils extends Logging {
   hadoopConf: Configuration,
   secMgr: SecurityManager): String = {
 val targetDir = Utils.createTempDir()
+val fileSeparator = Pattern.quote(System.getProperty("file.separator"))
 Option(jars)
   .map {
 resolveGlobPaths(_, hadoopConf)
   .split(",")
-  .filterNot(_.contains(userJar.split("/").last))
+  .filterNot(_.contains(userJar.split(fileSeparator).last))
--- End diff --

Beyond the original purpose of this PR, is it better to move 
`userJar.split(fileSeparator).last` before line 66? This is because `userJar` 
is not changed in `map { ... }`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23102: [SPARK-26137][CORE] Use Java system property "file.separ...

2018-11-23 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23102
  
@MaxGekk This PR may change a separator for `userJar` that has `\` on 
Windows. `resolveGlobPaths` is not applied to `userJar`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...

2018-11-23 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23124#discussion_r235952965
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala
 ---
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types.{AtomicType, CalendarIntervalType, 
DataType, MapType}
+
+/**
+ * A builder of [[ArrayBasedMapData]], which fails if a null map key is 
detected, and removes
+ * duplicated map keys w.r.t. the last wins policy.
+ */
+class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends 
Serializable {
+  assert(!keyType.existsRecursively(_.isInstanceOf[MapType]), "key of map 
cannot be/contain map")
+
+  private lazy val keyToIndex = keyType match {
+case _: AtomicType | _: CalendarIntervalType => 
mutable.HashMap.empty[Any, Int]
+case _ =>
+  // for complex types, use interpreted ordering to be able to compare 
unsafe data with safe
+  // data, e.g. UnsafeRow vs GenericInternalRow.
+  mutable.TreeMap.empty[Any, 
Int](TypeUtils.getInterpretedOrdering(keyType))
+  }
+
+  // TODO: specialize it
+  private lazy val keys = mutable.ArrayBuffer.empty[Any]
+  private lazy val values = mutable.ArrayBuffer.empty[Any]
+
+  private lazy val keyGetter = InternalRow.getAccessor(keyType)
+  private lazy val valueGetter = InternalRow.getAccessor(valueType)
+
+  def reset(): Unit = {
+keyToIndex.clear()
+keys.clear()
+values.clear()
+  }
+
+  def put(key: Any, value: Any): Unit = {
+if (key == null) {
+  throw new RuntimeException("Cannot use null as map key.")
+}
+
+val maybeExistingIdx = keyToIndex.get(key)
+if (maybeExistingIdx.isDefined) {
+  // Overwrite the previous value, as the policy is last wins.
+  values(maybeExistingIdx.get) = value
+} else {
+  keyToIndex.put(key, values.length)
+  keys.append(key)
+  values.append(value)
+}
+  }
+
+  // write a 2-field row, the first field is key and the second field is 
value.
+  def put(entry: InternalRow): Unit = {
+if (entry.isNullAt(0)) {
+  throw new RuntimeException("Cannot use null as map key.")
+}
+put(keyGetter(entry, 0), valueGetter(entry, 1))
+  }
+
+  def putAll(keyArray: Array[Any], valueArray: Array[Any]): Unit = {
+if (keyArray.length != valueArray.length) {
+  throw new RuntimeException(
+"The key array and value array of MapData must have the same 
length.")
+}
+
+var i = 0
+while (i < keyArray.length) {
+  put(keyArray(i), valueArray(i))
+  i += 1
+}
+  }
+
+  def putAll(keyArray: ArrayData, valueArray: ArrayData): Unit = {
+if (keyArray.numElements() != valueArray.numElements()) {
+  throw new RuntimeException(
+"The key array and value array of MapData must have the same 
length.")
+}
+
+var i = 0
+while (i < keyArray.numElements()) {
+  put(keyGetter(keyArray, i), valueGetter(valueArray, i))
+  i += 1
+}
+  }
+
+  def build(): ArrayBasedMapData = {
+new ArrayBasedMapData(new GenericArrayData(keys.toArray), new 
GenericArrayData(values.toArray))
--- End diff --



Is it better to call reset() after calling new ArrayBasedMapData to reduce 
memory consumption in Java heap?

At caller side, ArrayBasedMapBuilder is not released. Therefore, until 
reset() will be called next time, each ArrayBasedMapBuilder keeps unused data 
in keys, values, and keyToIndex. They consumes Java heap unexpectedly.



---


[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...

2018-11-23 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23124#discussion_r235950666
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala
 ---
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types.{AtomicType, CalendarIntervalType, 
DataType, MapType}
+
+/**
+ * A builder of [[ArrayBasedMapData]], which fails if a null map key is 
detected, and removes
+ * duplicated map keys w.r.t. the last wins policy.
+ */
+class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends 
Serializable {
+  assert(!keyType.existsRecursively(_.isInstanceOf[MapType]), "key of map 
cannot be/contain map")
+
+  private lazy val keyToIndex = keyType match {
+case _: AtomicType | _: CalendarIntervalType => 
mutable.HashMap.empty[Any, Int]
+case _ =>
+  // for complex types, use interpreted ordering to be able to compare 
unsafe data with safe
+  // data, e.g. UnsafeRow vs GenericInternalRow.
+  mutable.TreeMap.empty[Any, 
Int](TypeUtils.getInterpretedOrdering(keyType))
+  }
+
+  // TODO: specialize it
+  private lazy val keys = mutable.ArrayBuffer.empty[Any]
+  private lazy val values = mutable.ArrayBuffer.empty[Any]
+
+  private lazy val keyGetter = InternalRow.getAccessor(keyType)
+  private lazy val valueGetter = InternalRow.getAccessor(valueType)
+
+  def reset(): Unit = {
+keyToIndex.clear()
+keys.clear()
+values.clear()
+  }
+
+  def put(key: Any, value: Any): Unit = {
+if (key == null) {
+  throw new RuntimeException("Cannot use null as map key.")
+}
+
+val maybeExistingIdx = keyToIndex.get(key)
+if (maybeExistingIdx.isDefined) {
+  // Overwrite the previous value, as the policy is last wins.
+  values(maybeExistingIdx.get) = value
+} else {
+  keyToIndex.put(key, values.length)
+  keys.append(key)
+  values.append(value)
+}
+  }
+
+  // write a 2-field row, the first field is key and the second field is 
value.
+  def put(entry: InternalRow): Unit = {
+if (entry.isNullAt(0)) {
+  throw new RuntimeException("Cannot use null as map key.")
+}
+put(keyGetter(entry, 0), valueGetter(entry, 1))
+  }
+
+  def putAll(keyArray: Array[Any], valueArray: Array[Any]): Unit = {
+if (keyArray.length != valueArray.length) {
+  throw new RuntimeException(
+"The key array and value array of MapData must have the same 
length.")
+}
+
+var i = 0
+while (i < keyArray.length) {
+  put(keyArray(i), valueArray(i))
+  i += 1
+}
+  }
+
+  def putAll(keyArray: ArrayData, valueArray: ArrayData): Unit = {
+if (keyArray.numElements() != valueArray.numElements()) {
+  throw new RuntimeException(
+"The key array and value array of MapData must have the same 
length.")
+}
+
+var i = 0
+while (i < keyArray.numElements()) {
+  put(keyGetter(keyArray, i), valueGetter(valueArray, i))
+  i += 1
+}
+  }
+
+  def build(): ArrayBasedMapData = {
+new ArrayBasedMapData(new GenericArrayData(keys.toArray), new 
GenericArrayData(values.toArray))
+  }
+
+  def from(keyArray: ArrayData, valueArray: ArrayData): ArrayBasedMapData 
= {
+assert(keyToIndex.isEmpty, "'from' can only be called with a fresh 
GenericMapBuilder.")
+putAll(keyArray, valueArray)
+if (keyToIndex.size == keyArray.numElements()) {
+  // If there is no duplicated map keys, creates the MapData with the 
input key and value array,

[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...

2018-11-23 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23124#discussion_r235950148
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala
 ---
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types.{AtomicType, CalendarIntervalType, 
DataType, MapType}
+
+/**
+ * A builder of [[ArrayBasedMapData]], which fails if a null map key is 
detected, and removes
+ * duplicated map keys w.r.t. the last wins policy.
+ */
+class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends 
Serializable {
+  assert(!keyType.existsRecursively(_.isInstanceOf[MapType]), "key of map 
cannot be/contain map")
+
+  private lazy val keyToIndex = keyType match {
+case _: AtomicType | _: CalendarIntervalType => 
mutable.HashMap.empty[Any, Int]
+case _ =>
+  // for complex types, use interpreted ordering to be able to compare 
unsafe data with safe
+  // data, e.g. UnsafeRow vs GenericInternalRow.
+  mutable.TreeMap.empty[Any, 
Int](TypeUtils.getInterpretedOrdering(keyType))
+  }
+
+  // TODO: specialize it
+  private lazy val keys = mutable.ArrayBuffer.empty[Any]
+  private lazy val values = mutable.ArrayBuffer.empty[Any]
+
+  private lazy val keyGetter = InternalRow.getAccessor(keyType)
+  private lazy val valueGetter = InternalRow.getAccessor(valueType)
+
+  def reset(): Unit = {
+keyToIndex.clear()
+keys.clear()
+values.clear()
+  }
+
+  def put(key: Any, value: Any): Unit = {
+if (key == null) {
+  throw new RuntimeException("Cannot use null as map key.")
+}
+
+val maybeExistingIdx = keyToIndex.get(key)
+if (maybeExistingIdx.isDefined) {
+  // Overwrite the previous value, as the policy is last wins.
+  values(maybeExistingIdx.get) = value
+} else {
+  keyToIndex.put(key, values.length)
+  keys.append(key)
+  values.append(value)
+}
+  }
+
+  // write a 2-field row, the first field is key and the second field is 
value.
+  def put(entry: InternalRow): Unit = {
+if (entry.isNullAt(0)) {
+  throw new RuntimeException("Cannot use null as map key.")
+}
+put(keyGetter(entry, 0), valueGetter(entry, 1))
+  }
+
+  def putAll(keyArray: Array[Any], valueArray: Array[Any]): Unit = {
+if (keyArray.length != valueArray.length) {
+  throw new RuntimeException(
+"The key array and value array of MapData must have the same 
length.")
+}
+
+var i = 0
+while (i < keyArray.length) {
+  put(keyArray(i), valueArray(i))
+  i += 1
+}
+  }
+
+  def putAll(keyArray: ArrayData, valueArray: ArrayData): Unit = {
+if (keyArray.numElements() != valueArray.numElements()) {
+  throw new RuntimeException(
+"The key array and value array of MapData must have the same 
length.")
+}
+
+var i = 0
+while (i < keyArray.numElements()) {
+  put(keyGetter(keyArray, i), valueGetter(valueArray, i))
+  i += 1
+}
+  }
+
+  def build(): ArrayBasedMapData = {
+new ArrayBasedMapData(new GenericArrayData(keys.toArray), new 
GenericArrayData(values.toArray))
+  }
--- End diff --

Is it better to call `reset()` after calling `new ArrayBasedMapData` to 
reduce memory consumption?

At caller side, `ArrayBasedMapBuilder` is not released. Therefore, until 
reset() will be called next time, each `ArrayBasedMapBuilder` keeps unused data 
in `keys`, `values`, and `keyToIndex`. They consumes Java heap unexpectedly.


---


[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...

2018-11-23 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23124#discussion_r235947044
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala
 ---
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types.{AtomicType, CalendarIntervalType, 
DataType, MapType}
+
+/**
+ * A builder of [[ArrayBasedMapData]], which fails if a null map key is 
detected, and removes
+ * duplicated map keys w.r.t. the last wins policy.
+ */
+class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends 
Serializable {
+  assert(!keyType.existsRecursively(_.isInstanceOf[MapType]), "key of map 
cannot be/contain map")
+
+  private lazy val keyToIndex = keyType match {
+case _: AtomicType | _: CalendarIntervalType => 
mutable.HashMap.empty[Any, Int]
+case _ =>
+  // for complex types, use interpreted ordering to be able to compare 
unsafe data with safe
+  // data, e.g. UnsafeRow vs GenericInternalRow.
+  mutable.TreeMap.empty[Any, 
Int](TypeUtils.getInterpretedOrdering(keyType))
+  }
+
+  // TODO: specialize it
+  private lazy val keys = mutable.ArrayBuffer.empty[Any]
+  private lazy val values = mutable.ArrayBuffer.empty[Any]
+
+  private lazy val keyGetter = InternalRow.getAccessor(keyType)
+  private lazy val valueGetter = InternalRow.getAccessor(valueType)
+
+  def reset(): Unit = {
+keyToIndex.clear()
+keys.clear()
+values.clear()
+  }
+
+  def put(key: Any, value: Any): Unit = {
+if (key == null) {
+  throw new RuntimeException("Cannot use null as map key.")
+}
+
+val maybeExistingIdx = keyToIndex.get(key)
+if (maybeExistingIdx.isDefined) {
+  // Overwrite the previous value, as the policy is last wins.
+  values(maybeExistingIdx.get) = value
+} else {
+  keyToIndex.put(key, values.length)
+  keys.append(key)
+  values.append(value)
+}
+  }
+
+  // write a 2-field row, the first field is key and the second field is 
value.
+  def put(entry: InternalRow): Unit = {
+if (entry.isNullAt(0)) {
+  throw new RuntimeException("Cannot use null as map key.")
+}
+put(keyGetter(entry, 0), valueGetter(entry, 1))
+  }
+
+  def putAll(keyArray: Array[Any], valueArray: Array[Any]): Unit = {
+if (keyArray.length != valueArray.length) {
+  throw new RuntimeException(
+"The key array and value array of MapData must have the same 
length.")
+}
+
+var i = 0
+while (i < keyArray.length) {
+  put(keyArray(i), valueArray(i))
+  i += 1
+}
+  }
+
+  def putAll(keyArray: ArrayData, valueArray: ArrayData): Unit = {
+if (keyArray.numElements() != valueArray.numElements()) {
+  throw new RuntimeException(
+"The key array and value array of MapData must have the same 
length.")
+}
+
+var i = 0
+while (i < keyArray.numElements()) {
+  put(keyGetter(keyArray, i), valueGetter(valueArray, i))
+  i += 1
+}
+  }
+
+  def build(): ArrayBasedMapData = {
+new ArrayBasedMapData(new GenericArrayData(keys.toArray), new 
GenericArrayData(values.toArray))
+  }
+
+  def from(keyArray: ArrayData, valueArray: ArrayData): ArrayBasedMapData 
= {
+assert(keyToIndex.isEmpty, "'from' can only be called with a fresh 
GenericMapBuilder.")
+putAll(keyArray, valueArray)
--- End diff --

Can we call `new ArrayBasedMapData(keyArray, valueArray)` without calling 

[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...

2018-11-23 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23124#discussion_r235943290
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -751,171 +739,46 @@ case class MapFromEntries(child: Expression) extends 
UnaryExpression {
   s"${child.dataType.catalogString} type. $prettyName accepts only 
arrays of pair structs.")
   }
 
+  private lazy val mapBuilder = new ArrayBasedMapBuilder(dataType.keyType, 
dataType.valueType)
+
   override protected def nullSafeEval(input: Any): Any = {
-val arrayData = input.asInstanceOf[ArrayData]
-val numEntries = arrayData.numElements()
+val entries = input.asInstanceOf[ArrayData]
+val numEntries = entries.numElements()
 var i = 0
-if(nullEntries) {
+if (nullEntries) {
   while (i < numEntries) {
-if (arrayData.isNullAt(i)) return null
+if (entries.isNullAt(i)) return null
 i += 1
   }
 }
-val keyArray = new Array[AnyRef](numEntries)
-val valueArray = new Array[AnyRef](numEntries)
+
+mapBuilder.reset()
 i = 0
 while (i < numEntries) {
-  val entry = arrayData.getStruct(i, 2)
-  val key = entry.get(0, dataType.keyType)
-  if (key == null) {
-throw new RuntimeException("The first field from a struct (key) 
can't be null.")
-  }
-  keyArray.update(i, key)
-  val value = entry.get(1, dataType.valueType)
-  valueArray.update(i, value)
+  mapBuilder.put(entries.getStruct(i, 2))
   i += 1
 }
-ArrayBasedMapData(keyArray, valueArray)
+mapBuilder.build()
   }
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 nullSafeCodeGen(ctx, ev, c => {
   val numEntries = ctx.freshName("numEntries")
-  val isKeyPrimitive = CodeGenerator.isPrimitiveType(dataType.keyType)
-  val isValuePrimitive = 
CodeGenerator.isPrimitiveType(dataType.valueType)
-  val code = if (isKeyPrimitive && isValuePrimitive) {
-genCodeForPrimitiveElements(ctx, c, ev.value, numEntries)
--- End diff --

This change allow us to focus on optimizing `ArrayBasedMapBuilder`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23101: [SPARK-26134][CORE] Upgrading Hadoop to 2.7.4 to fix jav...

2018-11-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23101
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23102: [SPARK-26137][CORE] Use Java system property "file.separ...

2018-11-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23102
  
Would it be possible to update the PR description based on the template?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23102: [SPARK-26137][CORE] Use Java system property "file.separ...

2018-11-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23102
  
Thank you for submitting a PR to fix hard coded character. Is this only one 
that we have to fix regarding this hard coded character?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23101: [SPARK-26134][CORE] Upgrading Hadoop to 2.7.4 to fix jav...

2018-11-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23101
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23101: [SPARK-26134][CORE] Upgrading Hadoop to 2.7.4 to fix jav...

2018-11-20 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23101
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23084: [SPARK-26117][CORE][SQL]use SparkOutOfMemoryError instea...

2018-11-19 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23084
  
I think that we need to take care of 
`UnsafeExternalSorterSuite.testGetIterator`, too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-18 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23043
  
Do we need to consider `GenerateSafeProjection`, too? In other words, if 
the generated code or runtime does not use data in `Unsafe`, this `+0.0/-0.0` 
problem may still exist.  
Am I correct?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-18 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23043
  
Is it better to update this PR title now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-18 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23043
  
@srowen #21794 is what I thought.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22779: [SPARK-25786][CORE]If the ByteBuffer.hasArray is ...

2018-11-16 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22779#discussion_r234204540
  
--- Diff: 
core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala ---
@@ -497,6 +498,17 @@ class KryoSerializerAutoResetDisabledSuite extends 
SparkFunSuite with SharedSpar
 deserializationStream.close()
 assert(serInstance.deserialize[Any](helloHello) === ((hello, hello)))
   }
+
+  test("ByteBuffer.array -- UnsupportedOperationException") {
--- End diff --

It would be good to add a prefix like "SPARK-25786: ...".


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23039: [SPARK-26066][SQL] Move truncatedString to sql/ca...

2018-11-16 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23039#discussion_r234202827
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1594,6 +1594,13 @@ object SQLConf {
 "WHERE, which does not follow SQL standard.")
   .booleanConf
   .createWithDefault(false)
+
+  val MAX_TO_STRING_FIELDS = buildConf("spark.sql.debug.maxToStringFields")
+.doc("Maximum number of fields of sequence-like entries that can be 
converted to strings " +
--- End diff --

nit: `that` is not necessary if I am correct.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23043: [SPARK-26021][SQL] replace minus zero with zero i...

2018-11-15 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23043#discussion_r233951725
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -56,17 +56,32 @@ case class BoundReference(ordinal: Int, dataType: 
DataType, nullable: Boolean)
   val javaType = JavaCode.javaType(dataType)
   val value = CodeGenerator.getValue(ctx.INPUT_ROW, dataType, 
ordinal.toString)
   if (nullable) {
-ev.copy(code =
+var codeBlock =
   code"""
  |boolean ${ev.isNull} = ${ctx.INPUT_ROW}.isNullAt($ordinal);
  |$javaType ${ev.value} = ${ev.isNull} ?
  |  ${CodeGenerator.defaultValue(dataType)} : ($value);
-   """.stripMargin)
+   """.stripMargin
+codeBlock = codeBlock + 
genReplaceMinusZeroWithZeroCode(javaType.codeString, ev.value)
+ev.copy(code = codeBlock)
   } else {
-ev.copy(code = code"$javaType ${ev.value} = $value;", isNull = 
FalseLiteral)
+var codeBlock = code"$javaType ${ev.value} = $value;"
+codeBlock = codeBlock + 
genReplaceMinusZeroWithZeroCode(javaType.codeString, ev.value)
+ev.copy(code = codeBlock, isNull = FalseLiteral)
   }
 }
   }
+
+  private def genReplaceMinusZeroWithZeroCode(javaType: String, value: 
String): Block = {
+val code = s"\nif ($value == -0.0%c) $value = 0.0%c;"
+var formattedCode = ""
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23043: [SPARK-26021][SQL] replace minus zero with zero i...

2018-11-15 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23043#discussion_r233951670
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -56,17 +56,32 @@ case class BoundReference(ordinal: Int, dataType: 
DataType, nullable: Boolean)
   val javaType = JavaCode.javaType(dataType)
   val value = CodeGenerator.getValue(ctx.INPUT_ROW, dataType, 
ordinal.toString)
   if (nullable) {
-ev.copy(code =
+var codeBlock =
   code"""
  |boolean ${ev.isNull} = ${ctx.INPUT_ROW}.isNullAt($ordinal);
  |$javaType ${ev.value} = ${ev.isNull} ?
  |  ${CodeGenerator.defaultValue(dataType)} : ($value);
-   """.stripMargin)
+   """.stripMargin
+codeBlock = codeBlock + 
genReplaceMinusZeroWithZeroCode(javaType.codeString, ev.value)
+ev.copy(code = codeBlock)
   } else {
-ev.copy(code = code"$javaType ${ev.value} = $value;", isNull = 
FalseLiteral)
+var codeBlock = code"$javaType ${ev.value} = $value;"
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-15 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23043
  
IIUC, we discussed handling `+0.0` and `-0.0` before in another PR. 
@srowen do you remember the previous discussion?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23044: [SPARK-26073][SQL][FOLLOW-UP] remove invalid comment as ...

2018-11-15 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23044
  
LGTM, pending Jenkins


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22976: [SPARK-25974][SQL]Optimizes Generates bytecode for order...

2018-11-15 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22976
  
gentle ping @rednaxelafx


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22976: [SPARK-25974][SQL]Optimizes Generates bytecode for order...

2018-11-13 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22976
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22976: [SPARK-25974][SQL]Optimizes Generates bytecode for order...

2018-11-11 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22976
  
cc @cloud-fan @mgaido91 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22993: [SPARK-24421][BUILD][CORE] Accessing sun.misc.Cle...

2018-11-11 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22993#discussion_r232488912
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -67,6 +67,59 @@
 unaligned = _unaligned;
   }
 
+  // Access fields and constructors once and store them, for performance:
+
+  private static final Constructor DBB_CONSTRUCTOR;
+  private static final Field DBB_CLEANER_FIELD;
+  static {
+try {
+  Class cls = Class.forName("java.nio.DirectByteBuffer");
+  Constructor constructor = cls.getDeclaredConstructor(Long.TYPE, 
Integer.TYPE);
+  constructor.setAccessible(true);
+  Field cleanerField = cls.getDeclaredField("cleaner");
+  cleanerField.setAccessible(true);
+  DBB_CONSTRUCTOR = constructor;
+  DBB_CLEANER_FIELD = cleanerField;
+} catch (ClassNotFoundException | NoSuchMethodException | 
NoSuchFieldException e) {
+  throw new IllegalStateException(e);
+}
+  }
+
+  private static final Method CLEANER_CREATE_METHOD;
+  static {
+// The implementation of Cleaner changed from JDK 8 to 9
+int majorVersion = 
Integer.parseInt(System.getProperty("java.version").split("\\.")[0]);
--- End diff --

From Java 9, here is a [new 
definition](https://docs.oracle.com/javase/9/migrate/toc.htm#JSMIG-GUID-3A71ECEF-5FC5-46FE-9BA9-88CBFCE828CB).

I confirmed it can work for OpenJDK, OpenJ9, and IBM JDK 8 by running the 
following code
```
public class Version {
  public static void main(String[] args){
System.out.println("jave.specification.version=" + 
System.getProperty("java.specification.version"));
System.out.println("jave.version=" + 
System.getProperty("java.version"));
System.out.println("jave.version.split(\".\")[0]=" + 
System.getProperty("java.version").split("\\.")[0]);
  }
}
```

OpenJDK
```
$ ../OpenJDK-8/java -version
java version "1.8.0_162"
Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)

$ ../OpenJDK-8/java Version
jave.specification.version=1.8
jave.version=1.8.0_162
jave.version.split(".")[0]=1

$ ../OpenJDK-9/java -version
openjdk version "9"
OpenJDK Runtime Environment (build 9+181)
OpenJDK 64-Bit Server VM (build 9+181, mixed mode)

$ ../OpenJDK-9/java Version
jave.specification.version=9
jave.version=9
jave.version.split(".")[0]=9

$ ../OpenJDK-11/java -version
openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment 18.9 (build 11.0.1+13)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.1+13, mixed mode)

$ ../OpenJDK-11/java Version
jave.specification.version=11
jave.version=11.0.1
jave.version.split(".")[0]=11
```

OpenJ9
```
$ ../OpenJ9-8/java -version
openjdk version "1.8.0_192"
OpenJDK Runtime Environment (build 1.8.0_192-b12)
Eclipse OpenJ9 VM (build openj9-0.11.0, JRE 1.8.0 Windows 10 amd64-64-Bit 
Compressed References 20181019_105 (JIT enabled, AOT enabled)
OpenJ9   - 090ff9dc
OMR  - ea548a66
JCL  - 51609250b5 based on jdk8u192-b12)

$ ../OpenJ9-8/java Version
jave.specification.version=1.8
jave.version=1.8.0_192
jave.version.split(".")[0]=1

$ ../OpenJ9-9/java -version
openjdk version "9.0.4-adoptopenjdk"
OpenJDK Runtime Environment (build 9.0.4-adoptopenjdk+12)
Eclipse OpenJ9 VM (build openj9-0.9.0, JRE 9 Windows 8.1 amd64-64-Bit 
Compressed References 20180814_161 (JIT enabled, AOT enabled)
OpenJ9   - 24e53631
OMR  - fad6bf6e
JCL  - feec4d2ae based on jdk-9.0.4+12)

$ ../OpenJ9-9/java Version
jave.specification.version=9
jave.version=9.0.4-adoptopenjdk
jave.version.split(".")[0]=9


$ ../OpenJ9-11/java -version
openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.1+13)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.11.0, JRE 11 Windows 10 
amd64-64-Bit Compressed References 20181020_83 (JIT enabled, AOT enabled)
OpenJ9   - 090ff9dc
OMR  - ea548a66
JCL  - f62696f378 based on jdk-11.0.1+13)

$ ../OpenJ9-11/java Version
jave.specification.version=11
jave.version=11.0.1
jave.version.split(".")[0]=11
```

IBM JDK
```
$ ../IBMJDK-8/java -version
java version "1.8.0"
Java(TM) SE Runtime Environment (build pwa6480-20150129_02)
IBM J9 VM (build 2.8, JRE 1.8.0 Windows 8.1 amd64-64 Compressed References

[GitHub] spark issue #23005: [SPARK-26005] [SQL] Upgrade ANTRL from 4.7 to 4.7.1

2018-11-11 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23005
  
Files under `dev/deps/` should be updated, too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22954: [SPARK-25981][R] Enables Arrow optimization from ...

2018-11-10 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22954#discussion_r232453690
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -147,6 +147,55 @@ getDefaultSqlSource <- function() {
   l[["spark.sql.sources.default"]]
 }
 
+writeToTempFileInArrow <- function(rdf, numPartitions) {
+  # R API in Arrow is not yet released. CRAN requires to add the package 
in requireNamespace
+  # at DESCRIPTION. Later, CRAN checks if the package is available or not. 
Therefore, it works
+  # around by avoiding direct requireNamespace.
+  requireNamespace1 <- requireNamespace
+  if (requireNamespace1("arrow", quietly = TRUE)) {
+record_batch <- get("record_batch", envir = asNamespace("arrow"), 
inherits = FALSE)
+record_batch_stream_writer <- get(
+  "record_batch_stream_writer", envir = asNamespace("arrow"), inherits 
= FALSE)
+file_output_stream <- get(
+  "file_output_stream", envir = asNamespace("arrow"), inherits = FALSE)
+write_record_batch <- get(
+  "write_record_batch", envir = asNamespace("arrow"), inherits = FALSE)
+
+# Currently arrow requires withr; otherwise, write APIs don't work.
--- End diff --

nit: `arrow` -> `Arrow`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22998: [SPARK-26001][SQL]Reduce memory copy when writing decima...

2018-11-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22998
  
I have two questions. 
1. Is this PR already tested with `"SPARK-25538: zero-out all bits for 
decimals"`?
2. How does this PR achieve performance improvement? This PR may introduce 
some complication. We would like to know the trade-off between performance and 
ease of understanding.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...

2018-11-09 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22976#discussion_r232443266
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala
 ---
@@ -68,57 +68,50 @@ object GenerateOrdering extends 
CodeGenerator[Seq[SortOrder], Ordering[InternalR
 genComparisons(ctx, ordering)
   }
 
+  /**
+   * Creates the variables for ordering based on the given order.
+   */
+  private def createOrderKeys(
+ctx: CodegenContext,
+row: String,
+ordering: Seq[SortOrder]): Seq[ExprCode] = {
+ctx.INPUT_ROW = row
+// to use INPUT_ROW we must make sure currentVars is null
+ctx.currentVars = null
+ordering.map(_.child.genCode(ctx))
+  }
+
   /**
* Generates the code for ordering based on the given order.
*/
   def genComparisons(ctx: CodegenContext, ordering: Seq[SortOrder]): 
String = {
 val oldInputRow = ctx.INPUT_ROW
 val oldCurrentVars = ctx.currentVars
-val inputRow = "i"
-ctx.INPUT_ROW = inputRow
-// to use INPUT_ROW we must make sure currentVars is null
-ctx.currentVars = null
-
-val comparisons = ordering.map { order =>
-  val eval = order.child.genCode(ctx)
-  val asc = order.isAscending
-  val isNullA = ctx.freshName("isNullA")
-  val primitiveA = ctx.freshName("primitiveA")
-  val isNullB = ctx.freshName("isNullB")
-  val primitiveB = ctx.freshName("primitiveB")
+val rowAKeys = createOrderKeys(ctx, "a", ordering)
+val rowBKeys = createOrderKeys(ctx, "b", ordering)
+val comparisons = rowAKeys.zip(rowBKeys).zipWithIndex.map { case ((l, 
r), i) =>
+  val dt = ordering(i).child.dataType
+  val asc = ordering(i).isAscending
+  val nullOrdering = ordering(i).nullOrdering
   s"""
-  ${ctx.INPUT_ROW} = a;
-  boolean $isNullA;
-  ${CodeGenerator.javaType(order.child.dataType)} $primitiveA;
-  {
-${eval.code}
-$isNullA = ${eval.isNull};
-$primitiveA = ${eval.value};
-  }
-  ${ctx.INPUT_ROW} = b;
-  boolean $isNullB;
-  ${CodeGenerator.javaType(order.child.dataType)} $primitiveB;
-  {
-${eval.code}
-$isNullB = ${eval.isNull};
-$primitiveB = ${eval.value};
-  }
-  if ($isNullA && $isNullB) {
+  ${l.code}
--- End diff --

Would you update this to use | and .stripMargin?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...

2018-11-09 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22976#discussion_r232443230
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala
 ---
@@ -133,7 +126,6 @@ object GenerateOrdering extends 
CodeGenerator[Seq[SortOrder], Ordering[InternalR
   returnType = "int",
   makeSplitFunction = { body =>
 s"""
--- End diff --

Would you update this to use `|` and `.stripMargin`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...

2018-11-09 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22976#discussion_r232443205
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala
 ---
@@ -154,7 +146,6 @@ object GenerateOrdering extends 
CodeGenerator[Seq[SortOrder], Ordering[InternalR
 // make sure INPUT_ROW is declared even if splitExpressions
 // returns an inlined block
 s"""
--- End diff --

Can we use just `code`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22985: [SPARK-25510][SQL][TEST][FOLLOW-UP] Remove BenchmarkWith...

2018-11-08 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22985
  
LGTM, pending Jenkins


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22976: [SPARK-25974][SQL]Optimizes Generates bytecode for order...

2018-11-08 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22976
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...

2018-11-08 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22976#discussion_r231886019
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala
 ---
@@ -68,57 +68,51 @@ object GenerateOrdering extends 
CodeGenerator[Seq[SortOrder], Ordering[InternalR
 genComparisons(ctx, ordering)
   }
 
+  /**
+   * Creates the variables for ordering based on the given order.
+   */
+  private def createOrderKeys(
+ctx: CodegenContext,
+row: String,
+ordering: Seq[SortOrder]): Seq[ExprCode] = {
+ctx.INPUT_ROW = row
+ctx.currentVars = null
+ordering.map(_.child.genCode(ctx))
+  }
+
   /**
* Generates the code for ordering based on the given order.
*/
   def genComparisons(ctx: CodegenContext, ordering: Seq[SortOrder]): 
String = {
 val oldInputRow = ctx.INPUT_ROW
 val oldCurrentVars = ctx.currentVars
-val inputRow = "i"
-ctx.INPUT_ROW = inputRow
 // to use INPUT_ROW we must make sure currentVars is null
 ctx.currentVars = null
-
-val comparisons = ordering.map { order =>
-  val eval = order.child.genCode(ctx)
-  val asc = order.isAscending
-  val isNullA = ctx.freshName("isNullA")
-  val primitiveA = ctx.freshName("primitiveA")
-  val isNullB = ctx.freshName("isNullB")
-  val primitiveB = ctx.freshName("primitiveB")
+val rowAKeys = createOrderKeys(ctx, "a", ordering)
+val rowBKeys = createOrderKeys(ctx, "b", ordering)
+val comparisons = rowAKeys.zip(rowBKeys).zipWithIndex.map { case ((l, 
r), i) =>
+  val dt = ordering(i).child.dataType
+  val asc = ordering(i).isAscending
+  val nullOrdering = ordering(i).nullOrdering
   s"""
-  ${ctx.INPUT_ROW} = a;
-  boolean $isNullA;
-  ${CodeGenerator.javaType(order.child.dataType)} $primitiveA;
-  {
-${eval.code}
-$isNullA = ${eval.isNull};
-$primitiveA = ${eval.value};
-  }
-  ${ctx.INPUT_ROW} = b;
-  boolean $isNullB;
-  ${CodeGenerator.javaType(order.child.dataType)} $primitiveB;
-  {
-${eval.code}
-$isNullB = ${eval.isNull};
-$primitiveB = ${eval.value};
-  }
-  if ($isNullA && $isNullB) {
+  ${l.code}
+  ${r.code}
+  if (${l.isNull} && ${r.isNull}) {
 // Nothing
-  } else if ($isNullA) {
+  } else if (${l.isNull}) {
 return ${
-  order.nullOrdering match {
-case NullsFirst => "-1"
-case NullsLast => "1"
-  }};
-  } else if ($isNullB) {
+nullOrdering match {
--- End diff --

nit: indentation problem


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...

2018-11-08 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22976#discussion_r231886071
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala
 ---
@@ -68,57 +68,51 @@ object GenerateOrdering extends 
CodeGenerator[Seq[SortOrder], Ordering[InternalR
 genComparisons(ctx, ordering)
   }
 
+  /**
+   * Creates the variables for ordering based on the given order.
+   */
+  private def createOrderKeys(
+ctx: CodegenContext,
+row: String,
+ordering: Seq[SortOrder]): Seq[ExprCode] = {
+ctx.INPUT_ROW = row
+ctx.currentVars = null
+ordering.map(_.child.genCode(ctx))
+  }
+
   /**
* Generates the code for ordering based on the given order.
*/
   def genComparisons(ctx: CodegenContext, ordering: Seq[SortOrder]): 
String = {
 val oldInputRow = ctx.INPUT_ROW
 val oldCurrentVars = ctx.currentVars
-val inputRow = "i"
-ctx.INPUT_ROW = inputRow
 // to use INPUT_ROW we must make sure currentVars is null
 ctx.currentVars = null
-
-val comparisons = ordering.map { order =>
-  val eval = order.child.genCode(ctx)
-  val asc = order.isAscending
-  val isNullA = ctx.freshName("isNullA")
-  val primitiveA = ctx.freshName("primitiveA")
-  val isNullB = ctx.freshName("isNullB")
-  val primitiveB = ctx.freshName("primitiveB")
+val rowAKeys = createOrderKeys(ctx, "a", ordering)
+val rowBKeys = createOrderKeys(ctx, "b", ordering)
+val comparisons = rowAKeys.zip(rowBKeys).zipWithIndex.map { case ((l, 
r), i) =>
+  val dt = ordering(i).child.dataType
+  val asc = ordering(i).isAscending
+  val nullOrdering = ordering(i).nullOrdering
   s"""
-  ${ctx.INPUT_ROW} = a;
-  boolean $isNullA;
-  ${CodeGenerator.javaType(order.child.dataType)} $primitiveA;
-  {
-${eval.code}
-$isNullA = ${eval.isNull};
-$primitiveA = ${eval.value};
-  }
-  ${ctx.INPUT_ROW} = b;
-  boolean $isNullB;
-  ${CodeGenerator.javaType(order.child.dataType)} $primitiveB;
-  {
-${eval.code}
-$isNullB = ${eval.isNull};
-$primitiveB = ${eval.value};
-  }
-  if ($isNullA && $isNullB) {
+  ${l.code}
+  ${r.code}
+  if (${l.isNull} && ${r.isNull}) {
 // Nothing
-  } else if ($isNullA) {
+  } else if (${l.isNull}) {
 return ${
-  order.nullOrdering match {
-case NullsFirst => "-1"
-case NullsLast => "1"
-  }};
-  } else if ($isNullB) {
+nullOrdering match {
+  case NullsFirst => "-1"
+  case NullsLast => "1"
+}};
+  } else if (${r.isNull}) {
 return ${
-  order.nullOrdering match {
-case NullsFirst => "1"
-case NullsLast => "-1"
-  }};
+nullOrdering match {
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...

2018-11-08 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22976#discussion_r231885902
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala
 ---
@@ -68,57 +68,51 @@ object GenerateOrdering extends 
CodeGenerator[Seq[SortOrder], Ordering[InternalR
 genComparisons(ctx, ordering)
   }
 
+  /**
+   * Creates the variables for ordering based on the given order.
+   */
+  private def createOrderKeys(
+ctx: CodegenContext,
+row: String,
+ordering: Seq[SortOrder]): Seq[ExprCode] = {
+ctx.INPUT_ROW = row
+ctx.currentVars = null
+ordering.map(_.child.genCode(ctx))
+  }
+
   /**
* Generates the code for ordering based on the given order.
*/
   def genComparisons(ctx: CodegenContext, ordering: Seq[SortOrder]): 
String = {
 val oldInputRow = ctx.INPUT_ROW
 val oldCurrentVars = ctx.currentVars
-val inputRow = "i"
-ctx.INPUT_ROW = inputRow
 // to use INPUT_ROW we must make sure currentVars is null
 ctx.currentVars = null
--- End diff --

Now, can we remove this line?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22976: [SPARK-25974][SQL]Optimizes Generates bytecode for order...

2018-11-08 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22976
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >