date:20190128

[GitHub] felixcheung commented on issue #175: CVE-2018-11760

2019-01-28 Thread GitBox

felixcheung commented on issue #175: CVE-2018-11760
URL: https://github.com/apache/spark-website/pull/175#issuecomment-458441893
 
 
   sometimes it's the Latest News.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26566][PYTHON][SQL] Upgrade Apache Arrow to version 0.12.0

2019-01-28 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 16990f9  [SPARK-26566][PYTHON][SQL] Upgrade Apache Arrow to version 
0.12.0
16990f9 is described below

commit 16990f929921b3f784a85f3afbe1a22fbe77d895
Author: Bryan Cutler 
AuthorDate: Tue Jan 29 14:18:45 2019 +0800

[SPARK-26566][PYTHON][SQL] Upgrade Apache Arrow to version 0.12.0

## What changes were proposed in this pull request?

Upgrade Apache Arrow to version 0.12.0. This includes the Java artifacts 
and fixes to enable usage with pyarrow 0.12.0

Version 0.12.0 includes the following selected fixes/improvements relevant 
to Spark users:

* Safe cast fails from numpy float64 array with nans to integer, ARROW-4258
* Java, Reduce heap usage for variable width vectors, ARROW-4147
* Binary identity cast not implemented, ARROW-4101
* pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098
* conversion to date object no longer needed, ARROW-3910
* Error reading IPC file with no record batches, ARROW-3894
* Signed to unsigned integer cast yields incorrect results when type sizes 
are the same, ARROW-3790
* from_pandas gives incorrect results when converting floating point to 
bool, ARROW-3428
* Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / 
libboost issue), ARROW-3048
* Java update to official Flatbuffers version 1.9.0, ARROW-3175

complete list 
[here](https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0)

PySpark requires the following fixes to work with PyArrow 0.12.0

* Encrypted pyspark worker fails due to ChunkedStream missing closed 
property
* pyarrow now converts dates as objects by default, which causes error 
because type is assumed datetime64
* ArrowTests fails due to difference in raised error message
* pyarrow.open_stream deprecated
* tests fail because groupby adds index column with duplicate name

## How was this patch tested?

Ran unit tests with pyarrow versions 0.8.0, 0.10.0, 0.11.1, 0.12.0

Closes #23657 from BryanCutler/arrow-upgrade-012.

Authored-by: Bryan Cutler 
Signed-off-by: Hyukjin Kwon 
---
 dev/deps/spark-deps-hadoop-2.7  |  8 
 dev/deps/spark-deps-hadoop-3.1  |  8 
 pom.xml |  2 +-
 python/pyspark/serializers.py   | 13 +++--
 python/pyspark/sql/tests/test_arrow.py  |  2 +-
 python/pyspark/sql/tests/test_pandas_udf_grouped_map.py | 10 +-
 python/pyspark/sql/types.py | 13 ++---
 7 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7
index 1af29fc..0154fd2 100644
--- a/dev/deps/spark-deps-hadoop-2.7
+++ b/dev/deps/spark-deps-hadoop-2.7
@@ -14,9 +14,9 @@ apacheds-kerberos-codec-2.0.0-M15.jar
 api-asn1-api-1.0.0-M20.jar
 api-util-1.0.0-M20.jar
 arpack_combined_all-0.1.jar
-arrow-format-0.10.0.jar
-arrow-memory-0.10.0.jar
-arrow-vector-0.10.0.jar
+arrow-format-0.12.0.jar
+arrow-memory-0.12.0.jar
+arrow-vector-0.12.0.jar
 automaton-1.11-8.jar
 avro-1.8.2.jar
 avro-ipc-1.8.2.jar
@@ -58,7 +58,7 @@ datanucleus-core-3.2.10.jar
 datanucleus-rdbms-3.2.9.jar
 derby-10.12.1.1.jar
 eigenbase-properties-1.1.5.jar
-flatbuffers-1.2.0-3f79e055.jar
+flatbuffers-java-1.9.0.jar
 generex-1.0.1.jar
 gson-2.2.4.jar
 guava-14.0.1.jar
diff --git a/dev/deps/spark-deps-hadoop-3.1 b/dev/deps/spark-deps-hadoop-3.1
index 05f180b..7d5325c 100644
--- a/dev/deps/spark-deps-hadoop-3.1
+++ b/dev/deps/spark-deps-hadoop-3.1
@@ -12,9 +12,9 @@ aopalliance-1.0.jar
 aopalliance-repackaged-2.4.0-b34.jar
 apache-log4j-extras-1.2.17.jar
 arpack_combined_all-0.1.jar
-arrow-format-0.10.0.jar
-arrow-memory-0.10.0.jar
-arrow-vector-0.10.0.jar
+arrow-format-0.12.0.jar
+arrow-memory-0.12.0.jar
+arrow-vector-0.12.0.jar
 automaton-1.11-8.jar
 avro-1.8.2.jar
 avro-ipc-1.8.2.jar
@@ -57,7 +57,7 @@ derby-10.12.1.1.jar
 dnsjava-2.1.7.jar
 ehcache-3.3.1.jar
 eigenbase-properties-1.1.5.jar
-flatbuffers-1.2.0-3f79e055.jar
+flatbuffers-java-1.9.0.jar
 generex-1.0.1.jar
 geronimo-jcache_1.0_spec-1.0-alpha-1.jar
 gson-2.2.4.jar
diff --git a/pom.xml b/pom.xml
index 29a281a..6676c5d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -193,7 +193,7 @@
 If you are changing Arrow version specification, please check 
./python/pyspark/sql/utils.py,
 ./python/run-tests.py and ./python/setup.py too.
 -->
-0.10.0
+0.12.0
 
 ${java.home}
 
diff --git a/python/pyspark/serializers.py b/python/pyspark/serializers.py
index 741dfb2..1d17053 100644
---

[GitHub] dongjoon-hyun commented on issue #177: Update versioning policy with 2.2.x

2019-01-28 Thread GitBox

dongjoon-hyun commented on issue #177: Update versioning policy with 2.2.x
URL: https://github.com/apache/spark-website/pull/177#issuecomment-458377606
 
 
   Thanks, @maropu and @HyukjinKwon .


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] maropu commented on issue #177: Update versioning policy with 2.2.x

2019-01-28 Thread GitBox

maropu commented on issue #177: Update versioning policy with 2.2.x
URL: https://github.com/apache/spark-website/pull/177#issuecomment-458359112
 
 
   Really cool! Thanks, @dongjoon-hyun!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] zsxwing closed pull request #176: Add Jose Torres to committers list

2019-01-28 Thread GitBox

zsxwing closed pull request #176: Add Jose Torres to committers list
URL: https://github.com/apache/spark-website/pull/176
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Add Jose Torres to committers list

2019-01-28 Thread zsxwing

This is an automated email from the ASF dual-hosted git repository.

zsxwing pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new fb1a7b4  Add Jose Torres to committers list
fb1a7b4 is described below

commit fb1a7b407e149e133e35bb506d48cfe034a4d351
Author: Jose Torres 
AuthorDate: Mon Jan 28 15:59:37 2019 -0800

Add Jose Torres to committers list

Author: Jose Torres 

Closes #176 from jose-torres/addjose.
---
 committers.md| 1 +
 site/committers.html | 4 
 2 files changed, 5 insertions(+)

diff --git a/committers.md b/committers.md
index c3daf10..8049106 100644
--- a/committers.md
+++ b/committers.md
@@ -65,6 +65,7 @@ navigation:
 |Saisai Shao|Tencent|
 |Prashant Sharma|IBM|
 |Ram Sriharsha|Databricks|
+|Jose Torres|Databricks|
 |DB Tsai|Apple|
 |Takuya Ueshin|Databricks|
 |Marcelo Vanzin|Cloudera|
diff --git a/site/committers.html b/site/committers.html
index ec5814b..3066b5d 100644
--- a/site/committers.html
+++ b/site/committers.html
@@ -431,6 +431,10 @@
   Databricks
 
 
+  Jose Torres
+  Databricks
+
+
   DB Tsai
   Apple
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Update versioning policy with 2.2.x (#177)

2019-01-28 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 51aa3b7  Update versioning policy with 2.2.x (#177)
51aa3b7 is described below

commit 51aa3b7e4ed52be7806326f8d55ddd061aa6b97e
Author: Dongjoon Hyun 
AuthorDate: Mon Jan 28 15:23:24 2019 -0800

Update versioning policy with 2.2.x (#177)

This PR updates EOL documents with release 2.2.x explicitly.
---
 site/versioning-policy.html | 4 ++--
 versioning-policy.md| 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/site/versioning-policy.html b/site/versioning-policy.html
index e2665f0..1ddf507 100644
--- a/site/versioning-policy.html
+++ b/site/versioning-policy.html
@@ -283,8 +283,8 @@ in between feature releases. Major releases do not happen 
according to a fixed s
 Maintenance Releases and EOL
 
 Feature release branches will, generally, be maintained with bug fix 
releases for a period of 18 months. 
-For example, branch 2.1.x is no longer considered maintained as of July 2018, 
18 months after the release 
-of 2.1.0 in December 2016. No more 2.1.x releases should be expected after 
that point, even for bug fixes.
+For example, branch 2.2.x is no longer considered maintained as of January 
2019, 18 months after the release
+of 2.2.0 in July 2017. No more 2.2.x releases should be expected after that 
point, even for bug fixes.
 
   
 
diff --git a/versioning-policy.md b/versioning-policy.md
index de6f1e6..7c104a1 100644
--- a/versioning-policy.md
+++ b/versioning-policy.md
@@ -67,5 +67,5 @@ in between feature releases. Major releases do not happen 
according to a fixed s
 Maintenance Releases and EOL
 
 Feature release branches will, generally, be maintained with bug fix releases 
for a period of 18 months. 
-For example, branch 2.1.x is no longer considered maintained as of July 2018, 
18 months after the release 
-of 2.1.0 in December 2016. No more 2.1.x releases should be expected after 
that point, even for bug fixes.
+For example, branch 2.2.x is no longer considered maintained as of January 
2019, 18 months after the release
+of 2.2.0 in July 2017. No more 2.2.x releases should be expected after that 
point, even for bug fixes.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] dongjoon-hyun merged pull request #177: Update versioning policy with 2.2.x

2019-01-28 Thread GitBox

dongjoon-hyun merged pull request #177: Update versioning policy with 2.2.x
URL: https://github.com/apache/spark-website/pull/177
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] dongjoon-hyun commented on issue #177: Update versioning policy with 2.2.x

2019-01-28 Thread GitBox

dongjoon-hyun commented on issue #177: Update versioning policy with 2.2.x
URL: https://github.com/apache/spark-website/pull/177#issuecomment-458342770
 
 
   Thanks, @srowen !


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] dongjoon-hyun commented on issue #177: Update versioning policy with 2.2.x

2019-01-28 Thread GitBox

dongjoon-hyun commented on issue #177: Update versioning policy with 2.2.x
URL: https://github.com/apache/spark-website/pull/177#issuecomment-458340854
 
 
   Hi, @maropu .
   This might be a better answer for the question you received.
   
   cc @srowen and @HyukjinKwon 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] dongjoon-hyun opened a new pull request #177: Update versioning policy with 2.2.x

2019-01-28 Thread GitBox

dongjoon-hyun opened a new pull request #177: Update versioning policy with 
2.2.x
URL: https://github.com/apache/spark-website/pull/177
 
 
   This PR updates EOL documents with release 2.2.x explicitly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] vanzin merged pull request #175: CVE-2018-11760

2019-01-28 Thread GitBox

vanzin merged pull request #175: CVE-2018-11760
URL: https://github.com/apache/spark-website/pull/175
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: CVE-2018-11760 (#175)

2019-01-28 Thread vanzin

This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new d73a92d  CVE-2018-11760 (#175)
d73a92d is described below

commit d73a92d911b3f9fa6cfab0acfd3558a5609dce72
Author: Imran Rashid 
AuthorDate: Mon Jan 28 17:03:42 2019 -0600

CVE-2018-11760 (#175)
---
 security.md| 30 ++
 site/security.html | 35 +++
 2 files changed, 65 insertions(+)

diff --git a/security.md b/security.md
index c75e47b..340622b 100644
--- a/security.md
+++ b/security.md
@@ -18,6 +18,36 @@ non-public list that will reach the Apache Security team, as 
well as the Spark P
 
 Known Security Issues
 
+CVE-2018-11760: Apache Spark local privilege 
escalation vulnerability
+
+Severity: Important
+
+Vendor: The Apache Software Foundation
+
+Versions affected:
+
+- All Spark 1.x, Spark 2.0.x, and Spark 2.1.x versions
+- Spark 2.2.0 to 2.2.2
+- Spark 2.3.0 to 2.3.1
+
+Description:
+
+When using PySpark, it's possible for a different local user
+to connect to the Spark application and impersonate the user running
+the Spark application.  This affects versions 1.x, 2.0.x, 2.1.x, 2.2.0 to 
2.2.2, and 2.3.0 to 2.3.1.
+
+Mitigation:
+
+- 1.x, 2.0.x, 2.1.x, and 2.2.x users should upgrade to 2.2.3 or newer
+- 2.3.x users should upgrade to 2.3.2 or newer
+- Otherwise, affected users should avoid using PySpark in
+multi-user environments.
+
+Credit:
+
+- Luca Canali and Jose Carlos Luna Duran, CERN
+
+
 
 CVE-2018-17190: Unsecured Apache Spark standalone 
executes user code
 
diff --git a/site/security.html b/site/security.html
index 0eca9a3..cabfb47 100644
--- a/site/security.html
+++ b/site/security.html
@@ -211,6 +211,41 @@ non-public list that will reach the Apache Security team, 
as well as the Spark P
 
 Known Security Issues
 
+CVE-2018-11760: Apache Spark local privilege 
escalation vulnerability
+
+Severity: Important
+
+Vendor: The Apache Software Foundation
+
+Versions affected:
+
+
+  All Spark 1.x, Spark 2.0.x, and Spark 2.1.x versions
+  Spark 2.2.0 to 2.2.2
+  Spark 2.3.0 to 2.3.1
+
+
+Description:
+
+When using PySpark, its possible for a different local user
+to connect to the Spark application and impersonate the user running
+the Spark application.  This affects versions 1.x, 2.0.x, 2.1.x, 2.2.0 to 
2.2.2, and 2.3.0 to 2.3.1.
+
+Mitigation:
+
+
+  1.x, 2.0.x, 2.1.x, and 2.2.x users should upgrade to 2.2.3 or newer
+  2.3.x users should upgrade to 2.3.2 or newer
+  Otherwise, affected users should avoid using PySpark in
+multi-user environments.
+
+
+Credit:
+
+
+  Luca Canali and Jose Carlos Luna Duran, CERN
+
+
 CVE-2018-17190: Unsecured Apache Spark standalone 
executes user code
 
 Severity: Low


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] vanzin commented on issue #175: CVE-2018-11760

2019-01-28 Thread GitBox

vanzin commented on issue #175: CVE-2018-11760
URL: https://github.com/apache/spark-website/pull/175#issuecomment-458337805
 
 
   (Merge script seems broken, I'll just use the UI.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] vanzin commented on issue #175: CVE-2018-11760

2019-01-28 Thread GitBox

vanzin commented on issue #175: CVE-2018-11760
URL: https://github.com/apache/spark-website/pull/175#issuecomment-458337464
 
 
   Cool, I'll merge this then.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] squito commented on issue #175: CVE-2018-11760

2019-01-28 Thread GitBox

squito commented on issue #175: CVE-2018-11760
URL: https://github.com/apache/spark-website/pull/175#issuecomment-458327164
 
 
   so it turns out I just had an old version of jekyll installed.  After 
upgrading, then those other changes disappear.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26747][SQL] Makes GetMapValue nullability more precise

2019-01-28 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 92706e6  [SPARK-26747][SQL] Makes GetMapValue nullability more precise
92706e6 is described below

commit 92706e657629b36c9b3dc1478b2b80e351562135
Author: Takeshi Yamamuro 
AuthorDate: Mon Jan 28 13:39:50 2019 -0800

[SPARK-26747][SQL] Makes GetMapValue nullability more precise

## What changes were proposed in this pull request?
In master, `GetMapValue` nullable is always true;

https://github.com/apache/spark/blob/cf133e611020ed178f90358464a1b88cdd9b7889/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala#L371

But, If input key is foldable, we could make its nullability more precise.
This fix is the same with SPARK-26637(#23566).

## How was this patch tested?
Added tests in `ComplexTypeSuite`.

Closes #23669 from maropu/SPARK-26747.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Dongjoon Hyun 
---
 .../catalyst/expressions/complexTypeExtractors.scala  | 16 +++-
 .../sql/catalyst/expressions/ComplexTypeSuite.scala   | 19 +++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
index 104ad98..55ed617 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
@@ -381,7 +381,21 @@ case class GetMapValue(child: Expression, key: Expression)
   override def right: Expression = key
 
   /** `Null` is returned for invalid ordinals. */
-  override def nullable: Boolean = true
+  override def nullable: Boolean = if (key.foldable && !key.nullable) {
+val keyObj = key.eval()
+child match {
+  case m: CreateMap if m.resolved =>
+m.keys.zip(m.values).filter { case (k, _) => k.foldable && !k.nullable 
}.find {
+  case (k, _) if k.eval() == keyObj => true
+  case _ => false
+}.map(_._2.nullable).getOrElse(true)
+  case _ =>
+true
+}
+  } else {
+true
+  }
+
 
   override def dataType: DataType = 
child.dataType.asInstanceOf[MapType].valueType
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ComplexTypeSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ComplexTypeSuite.scala
index d8d6571..d65b49f 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ComplexTypeSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ComplexTypeSuite.scala
@@ -110,6 +110,25 @@ class ComplexTypeSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkEvaluation(GetMapValue(nestedMap, Literal("a")), Map("b" -> "c"))
   }
 
+  test("SPARK-26747 handles GetMapValue nullability correctly when input key 
is foldable") {
+// String key test
+val k1 = Literal("k1")
+val v1 = AttributeReference("v1", StringType, nullable = true)()
+val k2 = Literal("k2")
+val v2 = AttributeReference("v2", StringType, nullable = false)()
+val map1 = CreateMap(k1 :: v1 :: k2 :: v2 :: Nil)
+assert(GetMapValue(map1, Literal("k1")).nullable)
+assert(!GetMapValue(map1, Literal("k2")).nullable)
+assert(GetMapValue(map1, Literal("non-existent-key")).nullable)
+
+// Complex type key test
+val k3 = Literal.create((1, "a"))
+val k4 = Literal.create((2, "b"))
+val map2 = CreateMap(k3 :: v1 :: k4 :: v2 :: Nil)
+assert(GetMapValue(map2, Literal.create((1, "a"))).nullable)
+assert(!GetMapValue(map2, Literal.create((2, "b"))).nullable)
+  }
+
   test("GetStructField") {
 val typeS = StructType(StructField("a", IntegerType) :: Nil)
 val struct = Literal.create(create_row(1), typeS)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26595][CORE] Allow credential renewal based on kerberos ticket cache.

2019-01-28 Thread vanzin

This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2a67dbf  [SPARK-26595][CORE] Allow credential renewal based on 
kerberos ticket cache.
2a67dbf is described below

commit 2a67dbfbd341af166b1c85904875f26a6dea5ba8
Author: Marcelo Vanzin 
AuthorDate: Mon Jan 28 13:32:34 2019 -0800

[SPARK-26595][CORE] Allow credential renewal based on kerberos ticket cache.

This change addes a new mode for credential renewal that does not require
a keytab; it uses the local ticket cache instead, so it works while the
user keeps the cache valid.

This can be useful for, e.g., people running long spark-shell sessions where
their kerberos login is kept up-to-date.

The main change to enable this behavior is in HadoopDelegationTokenManager,
with a small change in the HDFS token provider. The other changes are to 
avoid
creating duplicate tokens when submitting the application to YARN; they 
allow
the tokens from the scheduler to be sent to the YARN AM, reducing the round 
trips
to HDFS.

For that, the scheduler initialization code was changed a little bit so that
the tokens are available when the YARN client is initialized. That basically
takes care of a long-standing TODO that was in the code to clean up 
configuration
propagation to the driver's RPC endpoint (in CoarseGrainedSchedulerBackend).

Tested with an app designed to stress this functionality, with both keytab 
and
cache-based logins. Some basic kerberos tests on k8s also.

Closes #23525 from vanzin/SPARK-26595.

Authored-by: Marcelo Vanzin 
Signed-off-by: Marcelo Vanzin 
---
 .../security/HadoopDelegationTokenManager.scala| 38 +++-
 .../security/HadoopFSDelegationTokenProvider.scala | 34 +-
 .../org/apache/spark/internal/config/package.scala | 10 ++
 .../cluster/CoarseGrainedSchedulerBackend.scala| 42 --
 docs/security.md   | 32 -
 .../k8s/KubernetesClusterSchedulerBackend.scala| 12 +++
 .../mesos/MesosCoarseGrainedSchedulerBackend.scala |  5 ++-
 .../org/apache/spark/deploy/yarn/Client.scala  | 25 ++---
 .../cluster/YarnClientSchedulerBackend.scala   |  8 ++---
 .../cluster/YarnClusterSchedulerBackend.scala  |  2 ++
 .../scheduler/cluster/YarnSchedulerBackend.scala   | 15 
 11 files changed, 123 insertions(+), 100 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
 
b/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
index 2763a46..487291e 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
@@ -41,7 +41,7 @@ import org.apache.spark.util.{ThreadUtils, Utils}
 /**
  * Manager for delegation tokens in a Spark application.
  *
- * When configured with a principal and a keytab, this manager will make sure 
long-running apps can
+ * When delegation token renewal is enabled, this manager will make sure 
long-running apps can
  * run without interruption while accessing secured services. It periodically 
logs in to the KDC
  * with user-provided credentials, and contacts all the configured secure 
services to obtain
  * delegation tokens to be distributed to the rest of the application.
@@ -50,6 +50,11 @@ import org.apache.spark.util.{ThreadUtils, Utils}
  * elapsed. The new tokens are sent to the Spark driver endpoint. The driver 
is tasked with
  * distributing the tokens to other processes that might need them.
  *
+ * Renewal can be enabled in two different ways: by providing a principal and 
keytab to Spark, or by
+ * enabling renewal based on the local credential cache. The latter has the 
drawback that Spark
+ * can't create new TGTs by itself, so the user has to manually update the 
Kerberos ticket cache
+ * externally.
+ *
  * This class can also be used just to create delegation tokens, by calling the
  * `obtainDelegationTokens` method. This option does not require calling the 
`start` method nor
  * providing a driver reference, but leaves it up to the caller to distribute 
the tokens that were
@@ -81,7 +86,11 @@ private[spark] class HadoopDelegationTokenManager(
   private var renewalExecutor: ScheduledExecutorService = _
 
   /** @return Whether delegation token renewal is enabled. */
-  def renewalEnabled: Boolean = principal != null
+  def renewalEnabled: Boolean = sparkConf.get(KERBEROS_RENEWAL_CREDENTIALS) 
match {
+case "keytab" => principal != null
+case "ccache" => 
UserGroupInformation.getCurrentUser().hasKerberosCredentials()
+case _ => false
+  }

[GitHub] zsxwing edited a comment on issue #176: Add Jose Torres to committers list

2019-01-28 Thread GitBox

zsxwing edited a comment on issue #176: Add Jose Torres to committers list
URL: https://github.com/apache/spark-website/pull/176#issuecomment-458307738
 
 
   LGTM. Welcome!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] zsxwing commented on issue #176: Add Jose Torres to committers list

2019-01-28 Thread GitBox

zsxwing commented on issue #176: Add Jose Torres to committers list
URL: https://github.com/apache/spark-website/pull/176#issuecomment-458307738
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] jose-torres opened a new pull request #176: Add Jose Torres to committers list

2019-01-28 Thread GitBox

jose-torres opened a new pull request #176: Add Jose Torres to committers list
URL: https://github.com/apache/spark-website/pull/176
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] vanzin commented on issue #175: CVE-2018-11760

2019-01-28 Thread GitBox

vanzin commented on issue #175: CVE-2018-11760
URL: https://github.com/apache/spark-website/pull/175#issuecomment-458300023
 
 
   > jekyll build updated a few more pages under site
   
   That's normal. There are indices and other stuff to update. Also maybe the 
person who made the previous change forgot to build the site, it happens 
sometimes.
   
   Should probably add those back.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] abellina commented on issue #175: CVE-2018-11760

2019-01-28 Thread GitBox

abellina commented on issue #175: CVE-2018-11760
URL: https://github.com/apache/spark-website/pull/175#issuecomment-458292634
 
 
   oh ok, sounds good. +1 non binding


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] squito commented on issue #175: CVE-2018-11760

2019-01-28 Thread GitBox

squito commented on issue #175: CVE-2018-11760
URL: https://github.com/apache/spark-website/pull/175#issuecomment-458286718
 
 
   oops, I didn't realize I should also commit the generated page, 
`site/security.html` as well, just updated that.  Oddly, `jekyll build` updated 
a few more pages under `site`, but I'm not including those changes here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] abellina commented on a change in pull request #175: CVE-2018-11760

2019-01-28 Thread GitBox

abellina commented on a change in pull request #175: CVE-2018-11760
URL: https://github.com/apache/spark-website/pull/175#discussion_r251576280
 
 

 ##
 File path: security.md
 ##
 @@ -18,6 +18,36 @@ non-public list that will reach the Apache Security team, 
as well as the Spark P
 
 Known Security Issues
 
+CVE-2018-11760: Apache Spark local privilege 
escalation vulnerability
+
+Severity: Important
+
+Vendor: The Apache Software Foundation
+
+Versions affected:
+
+- All Spark 1.x, Spark 2.0.x, and Spark 2.1.x versions
+- Spark 2.2.0 to 2.2.2
+- Spark 2.3.0 to 2.3.1
+
+Description:
+
+When using PySpark , it's possible for a different local user
 
 Review comment:
   minor nit, extra space after PySpark


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.3 updated: [SPARK-26379][SS][BRANCH-2.3] Use dummy TimeZoneId to avoid UnresolvedException in CurrentBatchTimestamp

2019-01-28 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new a89f601  [SPARK-26379][SS][BRANCH-2.3] Use dummy TimeZoneId to avoid 
UnresolvedException in CurrentBatchTimestamp
a89f601 is described below

commit a89f601a788ab6f1c89cefb1b4097444fb9847a4
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Mon Jan 28 12:13:51 2019 -0800

[SPARK-26379][SS][BRANCH-2.3] Use dummy TimeZoneId to avoid 
UnresolvedException in CurrentBatchTimestamp

## What changes were proposed in this pull request?

Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`.
However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while 
`CurrentTimestamp` isn't.
Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises 
`UnresolvedException`.

Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with 
`CurrentDate`.

## How was this patch tested?

Pass the Jenkins with the updated test cases.

Closes #23656 from HeartSaVioR/SPARK-26379-branch-2.3.

Lead-authored-by: Jungtaek Lim (HeartSaVioR) 
Co-authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../execution/streaming/MicroBatchExecution.scala  |  5 ++-
 .../apache/spark/sql/streaming/StreamSuite.scala   | 42 ++
 2 files changed, 46 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
index 6a264ad..dbbb7e7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
@@ -430,8 +430,11 @@ class MicroBatchExecution(
 // Rewire the plan to use the new attributes that were returned by the 
source.
 val newAttributePlan = newBatchesPlan transformAllExpressions {
   case ct: CurrentTimestamp =>
+// CurrentTimestamp is not TimeZoneAwareExpression while 
CurrentBatchTimestamp is.
+// Without TimeZoneId, CurrentBatchTimestamp is unresolved. Here, we 
use an explicit
+// dummy string to prevent UnresolvedException and to prevent to be 
used in the future.
 CurrentBatchTimestamp(offsetSeqMetadata.batchTimestampMs,
-  ct.dataType)
+  ct.dataType, Some("Dummy TimeZoneId"))
   case cd: CurrentDate =>
 CurrentBatchTimestamp(offsetSeqMetadata.batchTimestampMs,
   cd.dataType, cd.timeZoneId)
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
index c65e5d3..92fdde8 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
@@ -33,6 +33,7 @@ import org.apache.spark.scheduler.{SparkListener, 
SparkListenerJobStart}
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.plans.logical.Range
 import org.apache.spark.sql.catalyst.streaming.InternalOutputModes
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
 import org.apache.spark.sql.execution.command.ExplainCommand
 import org.apache.spark.sql.execution.streaming._
 import org.apache.spark.sql.execution.streaming.state.{StateStore, 
StateStoreConf, StateStoreId, StateStoreProvider}
@@ -825,6 +826,47 @@ class StreamSuite extends StreamTest {
   assert(query.exception.isEmpty)
 }
   }
+
+  test("SPARK-26379 Structured Streaming - Exception on adding 
current_timestamp " +
+" to Dataset - use v2 sink") {
+testCurrentTimestampOnStreamingQuery(useV2Sink = true)
+  }
+
+  test("SPARK-26379 Structured Streaming - Exception on adding 
current_timestamp " +
+" to Dataset - use v1 sink") {
+testCurrentTimestampOnStreamingQuery(useV2Sink = false)
+  }
+
+  private def testCurrentTimestampOnStreamingQuery(useV2Sink: Boolean): Unit = 
{
+val input = MemoryStream[Int]
+val df = input.toDS().withColumn("cur_timestamp", lit(current_timestamp()))
+
+def assertBatchOutputAndUpdateLastTimestamp(
+rows: Seq[Row],
+curTimestamp: Long,
+curDate: Int,
+expectedValue: Int): Long = {
+  assert(rows.size === 1)
+  val row = rows.head
+  assert(row.getInt(0) === expectedValue)
+  assert(row.getTimestamp(1).getTime >= curTimestamp)
+  row.getTimestamp(1).getTime
+}
+
+var lastTimestamp = System.currentTimeMillis()
+val currentDate = DateTimeUtils.millisToDays(lastTimestamp)
+testStream(df, useV2Sink = useV2Sink) (
+  AddData(input, 1),
+  CheckLastBatch { rows: Seq[Row] =>
+lastTimestamp =

[GitHub] squito opened a new pull request #175: CVE-2018-11760

2019-01-28 Thread GitBox

squito opened a new pull request #175: CVE-2018-11760
URL: https://github.com/apache/spark-website/pull/175
 
 
   ran `jekyll build` and `jekyll serve`, page looked OK locally.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26660][FOLLOWUP] Add warning logs when broadcasting large task binary

2019-01-28 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8baf3ba  [SPARK-26660][FOLLOWUP] Add warning logs when broadcasting 
large task binary
8baf3ba is described below

commit 8baf3ba35bc13d72f8ab911284d7d75ce897cb5a
Author: Sean Owen 
AuthorDate: Mon Jan 28 13:47:32 2019 -0600

[SPARK-26660][FOLLOWUP] Add warning logs when broadcasting large task binary

## What changes were proposed in this pull request?

The warning introduced in https://github.com/apache/spark/pull/23580 has a 
bug: https://github.com/apache/spark/pull/23580#issuecomment-458000380 This 
just fixes the logic.

## How was this patch tested?

N/A

Closes #23668 from srowen/SPARK-26660.2.

Authored-by: Sean Owen 
Signed-off-by: Sean Owen 
---
 core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala | 2 +-
 .../main/scala/org/apache/spark/scheduler/TaskSetManager.scala| 8 
 .../scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala| 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
index ecb8ac0..75eb37c 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
@@ -1162,7 +1162,7 @@ private[spark] class DAGScheduler(
 partitions = stage.rdd.partitions
   }
 
-  if (taskBinaryBytes.length * 1000 > TaskSetManager.TASK_SIZE_TO_WARN_KB) 
{
+  if (taskBinaryBytes.length > TaskSetManager.TASK_SIZE_TO_WARN_KIB * 
1024) {
 logWarning(s"Broadcasting large task binary with size " +
   s"${Utils.bytesToString(taskBinaryBytes.length)}")
   }
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala 
b/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
index 6f3f77c..b7bf069 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
@@ -494,12 +494,12 @@ private[spark] class TaskSetManager(
 abort(s"$msg Exception during serialization: $e")
 throw new TaskNotSerializableException(e)
 }
-if (serializedTask.limit() > TaskSetManager.TASK_SIZE_TO_WARN_KB * 
1024 &&
+if (serializedTask.limit() > TaskSetManager.TASK_SIZE_TO_WARN_KIB * 
1024 &&
   !emittedTaskSizeWarning) {
   emittedTaskSizeWarning = true
   logWarning(s"Stage ${task.stageId} contains a task of very large 
size " +
-s"(${serializedTask.limit() / 1024} KB). The maximum recommended 
task size is " +
-s"${TaskSetManager.TASK_SIZE_TO_WARN_KB} KB.")
+s"(${serializedTask.limit() / 1024} KiB). The maximum recommended 
task size is " +
+s"${TaskSetManager.TASK_SIZE_TO_WARN_KIB} KiB.")
 }
 addRunningTask(taskId)
 
@@ -1101,5 +1101,5 @@ private[spark] class TaskSetManager(
 private[spark] object TaskSetManager {
   // The user will be warned if any stages contain a task that has a 
serialized size greater than
   // this.
-  val TASK_SIZE_TO_WARN_KB = 100
+  val TASK_SIZE_TO_WARN_KIB = 100
 }
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
index d0f98b5..60acd3e 100644
--- a/core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
@@ -153,7 +153,7 @@ class FakeTaskScheduler(sc: SparkContext, liveExecutors: 
(String, String)* /* ex
  */
 class LargeTask(stageId: Int) extends Task[Array[Byte]](stageId, 0, 0) {
 
-  val randomBuffer = new Array[Byte](TaskSetManager.TASK_SIZE_TO_WARN_KB * 
1024)
+  val randomBuffer = new Array[Byte](TaskSetManager.TASK_SIZE_TO_WARN_KIB * 
1024)
   val random = new Random(0)
   random.nextBytes(randomBuffer)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26432][CORE] Obtain HBase delegation token operation compatible with HBase 2.x.x version API

2019-01-28 Thread vanzin

This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dfed439  [SPARK-26432][CORE] Obtain HBase delegation token operation 
compatible with HBase 2.x.x version API
dfed439 is described below

commit dfed439e33b7bf224dd412b0960402068d961c7b
Author: s71955 
AuthorDate: Mon Jan 28 10:08:23 2019 -0800

[SPARK-26432][CORE] Obtain HBase delegation token operation compatible with 
HBase 2.x.x version API

## What changes were proposed in this pull request?

While obtaining token from hbase service , spark uses deprecated API of 
hbase ,
```public static Token 
obtainToken(Configuration conf)```
This deprecated API is already been removed from hbase 2.x version as part 
of the hbase 2.x major release. 
https://issues.apache.org/jira/browse/HBASE-14713_
there is one more stable API in
```public static Token 
obtainToken(Connection conn)``` in TokenUtil class
spark shall use this stable api for getting the delegation token.

To invoke this api first connection object has to be retrieved from 
ConnectionFactory and the same connection can be passed to 
obtainToken(Connection conn) for getting token.
eg: Call   ```public static Connection createConnection(Configuration 
conf)```
, then call   ```public static Token 
obtainToken( Connection conn)```.

## How was this patch tested?
Manual testing is been done.
Manual test result:
Before fix:

![hbase-dep-obtaintok 
1](https://user-images.githubusercontent.com/12999161/50699264-64cac200-106d-11e9-81b4-e50ae8097f27.png)

After fix:
1. Create 2 tables in hbase shell
 >Launch hbase shell
 >Enter commands to create tables and load data
create 'table1','cf'
put 'table1','row1','cf:cid','20'

create 'table2','cf'
put 'table2','row1','cf:cid','30'

 >Show values command
   get 'table1','row1','cf:cid'  will diplay value as 20
   get 'table2','row1','cf:cid'  will diplay value as 30

2.Run SparkHbasetoHbase class in testSpark.jar using spark-submit

spark-submit --master yarn-cluster --class 
com.mrs.example.spark.SparkHbasetoHbase --conf 
"spark.yarn.security.credentials.hbase.enabled"="true" --conf 
"spark.security.credentials.hbase.enabled"="true" --keytab 
/opt/client/user.keytab --principal sen testSpark.jar

The SparkHbasetoHbase test class will update the value of table2 with sum 
of values of table1 & table2.

table2 = table1+table2
As we can see in the snapshot the spark job has been successfully able to 
interact with hbase service and able to update the row count.
![obtaintok_success 
1](https://user-images.githubusercontent.com/12999161/50699393-bd9a5a80-106d-11e9-96c6-6c250d561efa.png)

Closes #23429 from sujith71955/master_hbase_service.

Authored-by: s71955 
Signed-off-by: Marcelo Vanzin 
---
 .../security/HBaseDelegationTokenProvider.scala| 56 --
 1 file changed, 52 insertions(+), 4 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/security/HBaseDelegationTokenProvider.scala
 
b/core/src/main/scala/org/apache/spark/deploy/security/HBaseDelegationTokenProvider.scala
index 3bf8c14..e345b0b 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/security/HBaseDelegationTokenProvider.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/security/HBaseDelegationTokenProvider.scala
@@ -17,6 +17,8 @@
 
 package org.apache.spark.deploy.security
 
+import java.io.Closeable
+
 import scala.reflect.runtime.universe
 import scala.util.control.NonFatal
 
@@ -42,8 +44,8 @@ private[security] class HBaseDelegationTokenProvider
 try {
   val mirror = universe.runtimeMirror(Utils.getContextOrSparkClassLoader)
   val obtainToken = mirror.classLoader.
-loadClass("org.apache.hadoop.hbase.security.token.TokenUtil").
-getMethod("obtainToken", classOf[Configuration])
+loadClass("org.apache.hadoop.hbase.security.token.TokenUtil")
+.getMethod("obtainToken", classOf[Configuration])
 
   logDebug("Attempting to fetch HBase security token.")
   val token = obtainToken.invoke(null, hbaseConf(hadoopConf))
@@ -52,12 +54,58 @@ private[security] class HBaseDelegationTokenProvider
   creds.addToken(token.getService, token)
 } catch {
   case NonFatal(e) =>
-logWarning(s"Failed to get token from service $serviceName", e)
+logWarning(s"Failed to get token from service $serviceName due to  " + 
e +
+  s" Retrying to fetch HBase security token with hbase connection 
parameter.")
+// Seems to be spark is trying to get the token from HBase 2.x.x  
version or above where the
+// obtainToken(Configuration conf) API has been removed. Lets try

[spark] branch master updated: [SPARK-26713][CORE] Interrupt pipe IO threads in PipedRDD when task is finished

2019-01-28 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1280bfd  [SPARK-26713][CORE] Interrupt pipe IO threads in PipedRDD 
when task is finished
1280bfd is described below

commit 1280bfd7564ca6d201a6e4a54ecf93a20f142f3a
Author: Xianjin YE 
AuthorDate: Mon Jan 28 10:54:18 2019 -0600

[SPARK-26713][CORE] Interrupt pipe IO threads in PipedRDD when task is 
finished

## What changes were proposed in this pull request?
Manually release stdin writer and stderr reader thread when task is 
finished. This commit also marks
ShuffleBlockFetchIterator as fully consumed if isZombie is set.

## How was this patch tested?
Added new test

Closes #23638 from advancedxy/SPARK-26713.

Authored-by: Xianjin YE 
Signed-off-by: Sean Owen 
---
 .../main/scala/org/apache/spark/rdd/PipedRDD.scala | 33 ++--
 .../storage/ShuffleBlockFetcherIterator.scala  | 18 +--
 .../scala/org/apache/spark/rdd/PipedRDDSuite.scala | 24 +
 .../storage/ShuffleBlockFetcherIteratorSuite.scala | 59 ++
 4 files changed, 126 insertions(+), 8 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala
index 02b28b7..f1daf62 100644
--- a/core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala
@@ -113,7 +113,7 @@ private[spark] class PipedRDD[T: ClassTag](
 val childThreadException = new AtomicReference[Throwable](null)
 
 // Start a thread to print the process's stderr to ours
-new Thread(s"stderr reader for $command") {
+val stderrReaderThread = new 
Thread(s"${PipedRDD.STDERR_READER_THREAD_PREFIX} $command") {
   override def run(): Unit = {
 val err = proc.getErrorStream
 try {
@@ -128,10 +128,11 @@ private[spark] class PipedRDD[T: ClassTag](
   err.close()
 }
   }
-}.start()
+}
+stderrReaderThread.start()
 
 // Start a thread to feed the process input from our parent's iterator
-new Thread(s"stdin writer for $command") {
+val stdinWriterThread = new 
Thread(s"${PipedRDD.STDIN_WRITER_THREAD_PREFIX} $command") {
   override def run(): Unit = {
 TaskContext.setTaskContext(context)
 val out = new PrintWriter(new BufferedWriter(
@@ -156,7 +157,28 @@ private[spark] class PipedRDD[T: ClassTag](
   out.close()
 }
   }
-}.start()
+}
+stdinWriterThread.start()
+
+// interrupts stdin writer and stderr reader threads when the 
corresponding task is finished.
+// Otherwise, these threads could outlive the task's lifetime. For example:
+//   val pipeRDD = sc.range(1, 100).pipe(Seq("cat"))
+//   val abnormalRDD = pipeRDD.mapPartitions(_ => Iterator.empty)
+// the iterator generated by PipedRDD is never involved. If the parent 
RDD's iterator takes a
+// long time to generate(ShuffledRDD's shuffle operation for example), the 
stdin writer thread
+// may consume significant memory and CPU time even if task is already 
finished.
+context.addTaskCompletionListener[Unit] { _ =>
+  if (proc.isAlive) {
+proc.destroy()
+  }
+
+  if (stdinWriterThread.isAlive) {
+stdinWriterThread.interrupt()
+  }
+  if (stderrReaderThread.isAlive) {
+stderrReaderThread.interrupt()
+  }
+}
 
 // Return an iterator that read lines from the process's stdout
 val lines = Source.fromInputStream(proc.getInputStream)(encoding).getLines
@@ -219,4 +241,7 @@ private object PipedRDD {
 }
 buf
   }
+
+  val STDIN_WRITER_THREAD_PREFIX = "stdin writer for"
+  val STDERR_READER_THREAD_PREFIX = "stderr reader for"
 }
diff --git 
a/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
 
b/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
index 28decf0..f73c21b 100644
--- 
a/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
+++ 
b/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
@@ -141,7 +141,14 @@ final class ShuffleBlockFetcherIterator(
 
   /**
* Whether the iterator is still active. If isZombie is true, the callback 
interface will no
-   * longer place fetched blocks into [[results]].
+   * longer place fetched blocks into [[results]] and the iterator is marked 
as fully consumed.
+   *
+   * When the iterator is inactive, [[hasNext]] and [[next]] calls will honor 
that as there are
+   * cases the iterator is still being consumed. For example, ShuffledRDD + 
PipedRDD if the
+   * subprocess command is failed. The task will be marked as failed, then the 
iterator will be
+   * cleaned up at task completion, the [[next]] call (called

[spark] branch master updated: [SPARK-26719][SQL] Get rid of java.util.Calendar in DateTimeUtils

2019-01-28 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 58e42cf  [SPARK-26719][SQL] Get rid of java.util.Calendar in 
DateTimeUtils
58e42cf is described below

commit 58e42cf50639d5b9d6c877b902ac559e132d041b
Author: Maxim Gekk 
AuthorDate: Mon Jan 28 10:52:17 2019 -0600

[SPARK-26719][SQL] Get rid of java.util.Calendar in DateTimeUtils

## What changes were proposed in this pull request?

- Replacing `java.util.Calendar` in  `DateTimeUtils. truncTimestamp` and in 
`DateTimeUtils.getOffsetFromLocalMillis ` by equivalent code using Java 8 API 
for timestamp manipulations. The reason is `java.util.Calendar` is based on the 
hybrid calendar (Julian+Gregorian) but *java.time* classes use Proleptic 
Gregorian calendar which assumes by SQL standard.
-  Replacing `Calendar.getInstance()` in `DateTimeUtilsSuite` by similar 
code in `DateTimeTestUtils` using *java.time* classes

## How was this patch tested?

The changes were tested by existing suites: `DateExpressionsSuite`, 
`DateFunctionsSuite` and `DateTimeUtilsSuite`.

Closes #23641 from MaxGekk/cleanup-date-time-utils.

Authored-by: Maxim Gekk 
Signed-off-by: Sean Owen 
---
 docs/sql-migration-guide-upgrade.md|   2 +
 .../spark/sql/catalyst/util/DateTimeUtils.scala|  52 +--
 .../sql/catalyst/util/DateTimeTestUtils.scala  |  50 +++
 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 433 -
 4 files changed, 231 insertions(+), 306 deletions(-)

diff --git a/docs/sql-migration-guide-upgrade.md 
b/docs/sql-migration-guide-upgrade.md
index 98fc9fa..41f27a3 100644
--- a/docs/sql-migration-guide-upgrade.md
+++ b/docs/sql-migration-guide-upgrade.md
@@ -95,6 +95,8 @@ displayTitle: Spark SQL Upgrading Guide
 
   - Since Spark 3.0, the JDBC options `lowerBound` and `upperBound` are 
converted to TimestampType/DateType values in the same way as casting strings 
to TimestampType/DateType values. The conversion is based on Proleptic 
Gregorian calendar, and time zone defined by the SQL config 
`spark.sql.session.timeZone`. In Spark version 2.4 and earlier, the conversion 
is based on the hybrid calendar (Julian + Gregorian) and on default system time 
zone.
 
+  - Since Spark 3.0, the `date_trunc`, `from_utc_timestamp`, 
`to_utc_timestamp`, and `unix_timestamp` functions use java.time API based on 
Proleptic Gregorian calendar. In Spark version 2.4 and earlier, the hybrid 
calendar (Julian + Gregorian) is used for the same purpose. Results of the 
functions returned by Spark 3.0 and previous versions can be different for 
dates before October 15, 1582 (Gregorian).
+
 ## Upgrading From Spark SQL 2.3 to 2.4
 
   - In Spark version 2.3 and earlier, the second parameter to array_contains 
function is implicitly promoted to the element type of first array type 
parameter. This type promotion can be lossy and may cause `array_contains` 
function to return wrong result. This problem has been addressed in 2.4 by 
employing a safer type promotion mechanism. This can cause some change in 
behavior and are illustrated in the table below.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
index 911750e..a97c1f7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
@@ -18,10 +18,10 @@
 package org.apache.spark.sql.catalyst.util
 
 import java.sql.{Date, Timestamp}
-import java.time.{Instant, LocalDate, LocalDateTime, LocalTime, ZonedDateTime}
+import java.time.{Instant, LocalDate, LocalDateTime, LocalTime, Month, 
ZonedDateTime}
 import java.time.Year.isLeap
 import java.time.temporal.IsoFields
-import java.util.{Calendar, Locale, TimeZone}
+import java.util.{Locale, TimeZone}
 import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
 import java.util.function.{Function => JFunction}
 
@@ -744,18 +744,17 @@ object DateTimeUtils {
 daysToMillis(prevMonday, timeZone)
   case TRUNC_TO_QUARTER =>
 val dDays = millisToDays(millis, timeZone)
-millis = daysToMillis(truncDate(dDays, TRUNC_TO_MONTH), timeZone)
-val cal = Calendar.getInstance()
-cal.setTimeInMillis(millis)
-val quarter = getQuarter(dDays)
-val month = quarter match {
-  case 1 => Calendar.JANUARY
-  case 2 => Calendar.APRIL
-  case 3 => Calendar.JULY
-  case 4 => Calendar.OCTOBER
+val month = getQuarter(dDays) match {
+  case 1 => Month.JANUARY
+  case 2 => Month.APRIL
+  case 3 => Month.JULY
+  case 4 => Month.OCTOBER
 }
-

[spark] branch master updated: [SPARK-26700][CORE] enable fetch-big-block-to-disk by default

2019-01-28 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ed71a82  [SPARK-26700][CORE] enable fetch-big-block-to-disk by default
ed71a82 is described below

commit ed71a825c56920327533ebb741707871848ccd6d
Author: Wenchen Fan 
AuthorDate: Mon Jan 28 23:41:55 2019 +0800

[SPARK-26700][CORE] enable fetch-big-block-to-disk by default

## What changes were proposed in this pull request?

This is a followup of #16989

The fetch-big-block-to-disk feature is disabled by default, because it's 
not compatible with external shuffle service prior to Spark 2.2. The client 
sends stream request to fetch block chunks, and old shuffle service can't 
support it.

After 2 years, Spark 2.2 has EOL, and now it's safe to turn on this feature 
by default

## How was this patch tested?

existing tests

Closes #23625 from cloud-fan/minor.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/internal/config/package.scala | 14 +++--
 docs/configuration.md  | 24 ++
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 71b0df4..32559ae 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -699,17 +699,19 @@ package object config {
   private[spark] val MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM =
 ConfigBuilder("spark.maxRemoteBlockSizeFetchToMem")
   .doc("Remote block will be fetched to disk when size of the block is 
above this threshold " +
-"in bytes. This is to avoid a giant request takes too much memory. We 
can enable this " +
-"config by setting a specific value(e.g. 200m). Note this 
configuration will affect " +
-"both shuffle fetch and block manager remote block fetch. For users 
who enabled " +
-"external shuffle service, this feature can only be worked when 
external shuffle" +
-"service is newer than Spark 2.2.")
+"in bytes. This is to avoid a giant request takes too much memory. 
Note this " +
+"configuration will affect both shuffle fetch and block manager remote 
block fetch. " +
+"For users who enabled external shuffle service, this feature can only 
work when " +
+"external shuffle service is at least 2.3.0.")
   .bytesConf(ByteUnit.BYTE)
   // fetch-to-mem is guaranteed to fail if the message is bigger than 2 
GB, so we might
   // as well use fetch-to-disk in that case.  The message includes some 
metadata in addition
   // to the block data itself (in particular UploadBlock has a lot of 
metadata), so we leave
   // extra room.
-  .createWithDefault(Int.MaxValue - 512)
+  .checkValue(
+_ <= Int.MaxValue - 512,
+"maxRemoteBlockSizeFetchToMem cannot be larger than (Int.MaxValue - 
512) bytes.")
+  .createWithDefaultString("200m")
 
   private[spark] val TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES =
 ConfigBuilder("spark.taskMetrics.trackUpdatedBlockStatuses")
diff --git a/docs/configuration.md b/docs/configuration.md
index 7d3bbf9..806e16a 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -627,19 +627,6 @@ Apart from these, the following properties are also 
available, and may be useful
   
 
 
-  spark.maxRemoteBlockSizeFetchToMem
-  Int.MaxValue - 512
-  
-The remote block will be fetched to disk when size of the block is above 
this threshold in bytes.
-This is to avoid a giant request that takes too much memory.  By default, 
this is only enabled
-for blocks > 2GB, as those cannot be fetched directly into memory, no 
matter what resources are
-available.  But it can be turned down to a much lower value (eg. 200m) to 
avoid using too much
-memory on smaller blocks as well. Note this configuration will affect both 
shuffle fetch
-and block manager remote block fetch. For users who enabled external 
shuffle service,
-this feature can only be used when external shuffle service is newer than 
Spark 2.2.
-  
-
-
   spark.shuffle.compress
   true
   
@@ -1519,6 +1506,17 @@ Apart from these, the following properties are also 
available, and may be useful
 you can set larger value.
   
 
+
+  spark.maxRemoteBlockSizeFetchToMem
+  200m
+  
+Remote block will be fetched to disk when size of the block is above this 
threshold
+in bytes. This is to avoid a giant request takes too much memory. Note this
+configuration will affect both shuffle fetch and block manager remote 
block fetch.
+For users who enabled external shuffle service, this

[spark] branch master updated: [SPARK-26656][SQL] Benchmarks for date and timestamp functions

2019-01-28 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bd027f6  [SPARK-26656][SQL] Benchmarks for date and timestamp functions
bd027f6 is described below

commit bd027f6e0e64b7e12ba9d3c77e0797a00c61e72d
Author: Maxim Gekk 
AuthorDate: Mon Jan 28 14:21:21 2019 +0100

[SPARK-26656][SQL] Benchmarks for date and timestamp functions

## What changes were proposed in this pull request?

Added the following benchmarks:
- Extract components from timestamp like year, month, day and etc.
- Current date and time
- Date arithmetic like date_add, date_sub
- Format dates and timestamps
- Convert timestamps from/to UTC

Closes #23661 from MaxGekk/datetime-benchmark.

Authored-by: Maxim Gekk 
Signed-off-by: Herman van Hovell 
---
 sql/core/benchmarks/DateTimeBenchmark-results.txt  | 416 +
 .../execution/benchmark/DateTimeBenchmark.scala| 123 ++
 2 files changed, 539 insertions(+)

diff --git a/sql/core/benchmarks/DateTimeBenchmark-results.txt 
b/sql/core/benchmarks/DateTimeBenchmark-results.txt
new file mode 100644
index 000..8bbe310
--- /dev/null
+++ b/sql/core/benchmarks/DateTimeBenchmark-results.txt
@@ -0,0 +1,416 @@
+
+Extract components
+
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-ea-b03 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
+cast to timestamp:   Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+cast to timestamp wholestage off   276 /  290 36.2 
 27.6   1.0X
+cast to timestamp wholestage on254 /  267 39.4 
 25.4   1.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-ea-b03 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
+year of timestamp:   Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+year of timestamp wholestage off   699 /  700 14.3 
 69.9   1.0X
+year of timestamp wholestage on680 /  689 14.7 
 68.0   1.0X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-ea-b03 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
+quarter of timestamp:Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+quarter of timestamp wholestage off848 /  864 11.8 
 84.8   1.0X
+quarter of timestamp wholestage on 784 /  797 12.8 
 78.4   1.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-ea-b03 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
+month of timestamp:  Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+month of timestamp wholestage off  652 /  653 15.3 
 65.2   1.0X
+month of timestamp wholestage on   671 /  677 14.9 
 67.1   1.0X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-ea-b03 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
+weekofyear of timestamp: Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+weekofyear of timestamp wholestage off1233 / 1233  8.1 
123.3   1.0X
+weekofyear of timestamp wholestage on 1236 / 1240  8.1 
123.6   1.0X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-ea-b03 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
+day of timestamp:Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+day of timestamp wholestage off649 /  655 15.4 
 64.9   1.0X
+day of timestamp wholestage on 670 /  678 14.9 
 67.0   1.0X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-ea-b03 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
+dayofyear of timestamp:  Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

[GitHub] felixcheung commented on issue #175: CVE-2018-11760

[spark] branch master updated: [SPARK-26566][PYTHON][SQL] Upgrade Apache Arrow to version 0.12.0

[GitHub] dongjoon-hyun commented on issue #177: Update versioning policy with 2.2.x

[GitHub] maropu commented on issue #177: Update versioning policy with 2.2.x

[GitHub] zsxwing closed pull request #176: Add Jose Torres to committers list

[spark-website] branch asf-site updated: Add Jose Torres to committers list

[spark-website] branch asf-site updated: Update versioning policy with 2.2.x (#177)

[GitHub] dongjoon-hyun merged pull request #177: Update versioning policy with 2.2.x

[GitHub] dongjoon-hyun commented on issue #177: Update versioning policy with 2.2.x

[GitHub] dongjoon-hyun commented on issue #177: Update versioning policy with 2.2.x

[GitHub] dongjoon-hyun opened a new pull request #177: Update versioning policy with 2.2.x

[GitHub] vanzin merged pull request #175: CVE-2018-11760

[spark-website] branch asf-site updated: CVE-2018-11760 (#175)

[GitHub] vanzin commented on issue #175: CVE-2018-11760

[GitHub] vanzin commented on issue #175: CVE-2018-11760

[GitHub] squito commented on issue #175: CVE-2018-11760

[spark] branch master updated: [SPARK-26747][SQL] Makes GetMapValue nullability more precise

[spark] branch master updated: [SPARK-26595][CORE] Allow credential renewal based on kerberos ticket cache.

[GitHub] zsxwing edited a comment on issue #176: Add Jose Torres to committers list

[GitHub] zsxwing commented on issue #176: Add Jose Torres to committers list

[GitHub] jose-torres opened a new pull request #176: Add Jose Torres to committers list

[GitHub] vanzin commented on issue #175: CVE-2018-11760

[GitHub] abellina commented on issue #175: CVE-2018-11760

[GitHub] squito commented on issue #175: CVE-2018-11760

[GitHub] abellina commented on a change in pull request #175: CVE-2018-11760

[spark] branch branch-2.3 updated: [SPARK-26379][SS][BRANCH-2.3] Use dummy TimeZoneId to avoid UnresolvedException in CurrentBatchTimestamp

[GitHub] squito opened a new pull request #175: CVE-2018-11760

[spark] branch master updated: [SPARK-26660][FOLLOWUP] Add warning logs when broadcasting large task binary

[spark] branch master updated: [SPARK-26432][CORE] Obtain HBase delegation token operation compatible with HBase 2.x.x version API

[spark] branch master updated: [SPARK-26713][CORE] Interrupt pipe IO threads in PipedRDD when task is finished

[spark] branch master updated: [SPARK-26719][SQL] Get rid of java.util.Calendar in DateTimeUtils

[spark] branch master updated: [SPARK-26700][CORE] enable fetch-big-block-to-disk by default

[spark] branch master updated: [SPARK-26656][SQL] Benchmarks for date and timestamp functions

33 matches

Site Navigation

Mail list logo

Footer information