date:20200524

[GitHub] [spark-website] maropu commented on a change in pull request #264: Fix case of GitHub.

2020-05-24 Thread GitBox



maropu commented on a change in pull request #264:
URL: https://github.com/apache/spark-website/pull/264#discussion_r429607857



##
File path: site/examples.html
##
@@ -230,11 +230,11 @@ Word Count
 
 
 
-text_file = 
sc.textFile("hdfs://...")
-counts = text_file.flatMap(lambda 
line: line.split(" ")) \
- .map(lambda word: (word, 1)) \
- .reduceByKey(lambda a, b: a + b)
-counts.saveAsTextFile("hdfs://...")
+text_file = 
sc.textFile("hdfs://...")
+counts = text_file.flatMap(lambda 
line: line.split(" ")) \
+ .map(lambda word: (word, 1)) \

Review comment:
   It seems the changes are different from the PR title:  `Fix case of 
GitHub`.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] jbampton commented on pull request #264: Fix case of GitHub.

2020-05-24 Thread GitBox



jbampton commented on pull request #264:
URL: https://github.com/apache/spark-website/pull/264#issuecomment-633193658


   Seems  `jekyll build` created the extra changes. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] maropu commented on pull request #264: Fix typos: Github to GitHub

2020-05-24 Thread GitBox



maropu commented on pull request #264:
URL: https://github.com/apache/spark-website/pull/264#issuecomment-633197472


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Fix typos: Github to GitHub

2020-05-24 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 9879569  Fix typos: Github to GitHub
9879569 is described below

commit 9879569826e89be4addf3d1f977924cc28062e2c
Author: John Bampton 
AuthorDate: Sun May 24 19:03:46 2020 +0900

Fix typos: Github to GitHub

Author: John Bampton 

Closes #264 from jbampton/fix-word-case.
---
 contributing.md   | 10 +-
 release-process.md|  4 ++--
 site/contributing.html| 10 +-
 site/downloads.html   |  2 +-
 site/release-process.html |  6 +++---
 5 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/contributing.md b/contributing.md
index 3016a26..5c68d98 100644
--- a/contributing.md
+++ b/contributing.md
@@ -43,7 +43,7 @@ feedback on any performance or correctness issues found in 
the newer release.
 Contributing by Reviewing Changes
 
 Changes to Spark source code are proposed, reviewed and committed via 
-https://github.com/apache/spark/pulls";>Github pull requests 
(described later). 
+https://github.com/apache/spark/pulls";>GitHub pull requests 
(described later). 
 Anyone can view and comment on active changes here. 
 Reviewing others' changes is a good way to learn how the change process works 
and gain exposure 
 to activity in various parts of the code. You can help by reviewing the 
changes and asking 
@@ -243,7 +243,7 @@ Once you've downloaded Spark, you can find instructions for 
installing and build
 JIRA
 
 Generally, Spark uses JIRA to track logical issues, including bugs and 
improvements, and uses 
-Github pull requests to manage the review and merge of specific code changes. 
That is, JIRAs are 
+GitHub pull requests to manage the review and merge of specific code changes. 
That is, JIRAs are 
 used to describe _what_ should be fixed or changed, and high-level approaches, 
and pull requests 
 describe _how_ to implement that change in the project's source code. For 
example, major design 
 decisions are discussed in JIRA.
@@ -300,7 +300,7 @@ Example: `Fix typos in Foo scaladoc`
 
 Pull Request
 
-1. https://help.github.com/articles/fork-a-repo/";>Fork the Github 
repository at 
+1. https://help.github.com/articles/fork-a-repo/";>Fork the GitHub 
repository at 
 https://github.com/apache/spark";>https://github.com/apache/spark 
if you haven't already
 1. Clone your fork, create a new branch, push commits to the branch.
 1. Consider whether documentation or tests need to be added or updated as part 
of the change, 
@@ -341,9 +341,9 @@ the `master` branch of `apache/spark`. (Only in special 
cases would the PR be op
  https://spark-prs.appspot.com/";>spark-prs.appspot.com and 
  Title may be the JIRA's title or a more specific title describing the PR 
itself.
  1. If the pull request is still a work in progress, and so is not ready 
to be merged, 
- but needs to be pushed to Github to facilitate review, then add `[WIP]` 
after the component.
+ but needs to be pushed to GitHub to facilitate review, then add `[WIP]` 
after the component.
  1. Consider identifying committers or other contributors who have worked 
on the code being 
- changed. Find the file(s) in Github and click "Blame" to see a 
line-by-line annotation of 
+ changed. Find the file(s) in GitHub and click "Blame" to see a 
line-by-line annotation of 
  who changed the code last. You can add `@username` in the PR description 
to ping them 
  immediately.
  1. Please state that the contribution is your original work and that you 
license the work 
diff --git a/release-process.md b/release-process.md
index d3a9f9f..8165d24 100644
--- a/release-process.md
+++ b/release-process.md
@@ -264,7 +264,7 @@ pick the release version from the list, then click on 
"Release Notes". Copy this
 Then run `jekyll build` to update the `site` directory.
 
 After merging the change into the `asf-site` branch, you may need to create a 
follow-up empty
-commit to force synchronization between ASF's git and the web site, and also 
the github mirror.
+commit to force synchronization between ASF's git and the web site, and also 
the GitHub mirror.
 For some reason synchronization seems to not be reliable for this repository.
 
 On a related note, make sure the version is marked as released on JIRA. Go 
find the release page as above, eg.,
@@ -278,7 +278,7 @@ releasing Spark 1.2.0, set the current tag to v1.2.0-rc2 
and the previous tag to
 Once you have generated the initial contributors list, it is highly likely 
that there will be
 warnings about author names not being properly translated. To fix this, run
 https://github.com/apache/spark/blob/branch-1.1/dev/create-release/translate-contributors.py";>this
 other script,
-which fetches potential replacements from Github and JIRA. For instance:
+which

[GitHub] [spark-website] asfgit closed pull request #264: Fix typos: Github to GitHub

2020-05-24 Thread GitBox



asfgit closed pull request #264:
URL: https://github.com/apache/spark-website/pull/264


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] maropu commented on pull request #264: Fix typos: Github to GitHub

2020-05-24 Thread GitBox



maropu commented on pull request #264:
URL: https://github.com/apache/spark-website/pull/264#issuecomment-633207582


   Thanks! Merged to asf-site.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (576c224 -> 72c466e)

2020-05-24 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 576c224  [SPARK-31755][SQL][3.0] allow missing year/hour when parsing 
date/timestamp string
 add 72c466e  [SPARK-31761][SQL][3.0] cast integer to Long to avoid 
IntegerOverflow for IntegralDivide operator

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/TypeCoercion.scala | 18 
 .../sql/catalyst/expressions/arithmetic.scala  |  2 +-
 .../sql/catalyst/analysis/TypeCoercionSuite.scala  | 24 ++
 .../expressions/ArithmeticExpressionSuite.scala|  7 +--
 .../sql-functions/sql-expression-schema.md |  2 +-
 .../resources/sql-tests/results/operators.sql.out  |  8 
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  8 
 7 files changed, 57 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31761][SQL][3.0] cast integer to Long to avoid IntegerOverflow for IntegralDivide operator

2020-05-24 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 72c466e  [SPARK-31761][SQL][3.0] cast integer to Long to avoid 
IntegerOverflow for IntegralDivide operator
72c466e is described below

commit 72c466e0c37e4cc639040161699b6c0bffde70d5
Author: sandeep katta 
AuthorDate: Sun May 24 21:39:16 2020 +0900

[SPARK-31761][SQL][3.0] cast integer to Long to avoid IntegerOverflow for 
IntegralDivide operator

### What changes were proposed in this pull request?
`IntegralDivide` operator returns Long DataType, so integer overflow case 
should be handled.
If the operands are of type Int it will be casted to Long

### Why are the changes needed?
As `IntegralDivide` returns Long datatype, integer overflow should not 
happen

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT and also tested in the local cluster

After fix


![image](https://user-images.githubusercontent.com/35216143/82603361-25eccc00-9bd0-11ea-9ca7-001c539e628b.png)

SQL Test

After fix

![image](https://user-images.githubusercontent.com/35216143/82637689-f0250300-9c22-11ea-85c3-886ab2c23471.png)

Before Fix

![image](https://user-images.githubusercontent.com/35216143/82637984-878a5600-9c23-11ea-9e47-5ce2fb923c01.png)

Closes #28628 from sandeep-katta/branch3Backport.

Authored-by: sandeep katta 
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/analysis/TypeCoercion.scala | 18 
 .../sql/catalyst/expressions/arithmetic.scala  |  2 +-
 .../sql/catalyst/analysis/TypeCoercionSuite.scala  | 24 ++
 .../expressions/ArithmeticExpressionSuite.scala|  7 +--
 .../sql-functions/sql-expression-schema.md |  2 +-
 .../resources/sql-tests/results/operators.sql.out  |  8 
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  8 
 7 files changed, 57 insertions(+), 12 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
index c6e3f56..a6f8e12 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
@@ -61,6 +61,7 @@ object TypeCoercion {
   IfCoercion ::
   StackCoercion ::
   Division ::
+  IntegralDivision ::
   ImplicitTypeCasts ::
   DateTimeOperations ::
   WindowFrameCoercion ::
@@ -685,6 +686,23 @@ object TypeCoercion {
   }
 
   /**
+   * The DIV operator always returns long-type value.
+   * This rule cast the integral inputs to long type, to avoid overflow during 
calculation.
+   */
+  object IntegralDivision extends TypeCoercionRule {
+override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = plan 
resolveExpressions {
+  case e if !e.childrenResolved => e
+  case d @ IntegralDivide(left, right) =>
+IntegralDivide(mayCastToLong(left), mayCastToLong(right))
+}
+
+private def mayCastToLong(expr: Expression): Expression = expr.dataType 
match {
+  case _: ByteType | _: ShortType | _: IntegerType => Cast(expr, LongType)
+  case _ => expr
+}
+  }
+
+  /**
* Coerces the type of different branches of a CASE WHEN statement to a 
common type.
*/
   object CaseWhenCoercion extends TypeCoercionRule {
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index 354845d..7c52183 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -412,7 +412,7 @@ case class IntegralDivide(
 left: Expression,
 right: Expression) extends DivModLike {
 
-  override def inputType: AbstractDataType = TypeCollection(IntegralType, 
DecimalType)
+  override def inputType: AbstractDataType = TypeCollection(LongType, 
DecimalType)
 
   override def dataType: DataType = LongType
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
index e37555f..1ea1ddb 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
@@ -1559,6 +1559,30 @@ class TypeCoercionSuite extends AnalysisTest {
   Literal.create(n

[spark] branch master updated (cf7463f -> d0fe433)

2020-05-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cf7463f  [SPARK-31761][SQL] cast integer to Long to avoid 
IntegerOverflow for IntegralDivide  operator
 add d0fe433  [SPARK-31768][ML] add getMetrics in Evaluators

No new revisions were added by this update.

Summary of changes:
 .../evaluation/BinaryClassificationEvaluator.scala |  26 +-
 .../spark/ml/evaluation/ClusteringEvaluator.scala  | 559 +
 ...ringEvaluator.scala => ClusteringMetrics.scala} | 173 ++-
 .../MulticlassClassificationEvaluator.scala|  54 +-
 .../MultilabelClassificationEvaluator.scala|  36 +-
 .../spark/ml/evaluation/RankingEvaluator.scala |  28 +-
 .../spark/ml/evaluation/RegressionEvaluator.scala  |  28 +-
 .../spark/mllib/evaluation/MulticlassMetrics.scala |   3 +-
 .../BinaryClassificationEvaluatorSuite.scala   |  23 +
 .../ml/evaluation/ClusteringEvaluatorSuite.scala   |  16 +
 .../MulticlassClassificationEvaluatorSuite.scala   |  29 ++
 .../MultilabelClassificationEvaluatorSuite.scala   |  48 ++
 .../ml/evaluation/RankingEvaluatorSuite.scala  |  38 ++
 .../ml/evaluation/RegressionEvaluatorSuite.scala   |  33 ++
 14 files changed, 375 insertions(+), 719 deletions(-)
 copy 
mllib/src/main/scala/org/apache/spark/ml/evaluation/{ClusteringEvaluator.scala 
=> ClusteringMetrics.scala} (80%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (cf7463f -> d0fe433)

2020-05-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cf7463f  [SPARK-31761][SQL] cast integer to Long to avoid 
IntegerOverflow for IntegralDivide  operator
 add d0fe433  [SPARK-31768][ML] add getMetrics in Evaluators

No new revisions were added by this update.

Summary of changes:
 .../evaluation/BinaryClassificationEvaluator.scala |  26 +-
 .../spark/ml/evaluation/ClusteringEvaluator.scala  | 559 +
 ...ringEvaluator.scala => ClusteringMetrics.scala} | 173 ++-
 .../MulticlassClassificationEvaluator.scala|  54 +-
 .../MultilabelClassificationEvaluator.scala|  36 +-
 .../spark/ml/evaluation/RankingEvaluator.scala |  28 +-
 .../spark/ml/evaluation/RegressionEvaluator.scala  |  28 +-
 .../spark/mllib/evaluation/MulticlassMetrics.scala |   3 +-
 .../BinaryClassificationEvaluatorSuite.scala   |  23 +
 .../ml/evaluation/ClusteringEvaluatorSuite.scala   |  16 +
 .../MulticlassClassificationEvaluatorSuite.scala   |  29 ++
 .../MultilabelClassificationEvaluatorSuite.scala   |  48 ++
 .../ml/evaluation/RankingEvaluatorSuite.scala  |  38 ++
 .../ml/evaluation/RegressionEvaluatorSuite.scala   |  33 ++
 14 files changed, 375 insertions(+), 719 deletions(-)
 copy 
mllib/src/main/scala/org/apache/spark/ml/evaluation/{ClusteringEvaluator.scala 
=> ClusteringMetrics.scala} (80%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d0fe433 -> 753636e)

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d0fe433  [SPARK-31768][ML] add getMetrics in Evaluators
 add 753636e  [SPARK-31807][INFRA] Use python 3 style in release-build.sh

No new revisions were added by this update.

Summary of changes:
 dev/create-release/release-build.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d0fe433 -> 753636e)

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d0fe433  [SPARK-31768][ML] add getMetrics in Evaluators
 add 753636e  [SPARK-31807][INFRA] Use python 3 style in release-build.sh

No new revisions were added by this update.

Summary of changes:
 dev/create-release/release-build.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31807][INFRA] Use python 3 style in release-build.sh

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 40e5de5  [SPARK-31807][INFRA] Use python 3 style in release-build.sh
40e5de5 is described below

commit 40e5de54e6109762d3ee85197da4fa1a8b20e13b
Author: William Hyun 
AuthorDate: Mon May 25 10:25:43 2020 +0900

[SPARK-31807][INFRA] Use python 3 style in release-build.sh

### What changes were proposed in this pull request?

This PR aims to use the style that is compatible with both python 2 and 3.

### Why are the changes needed?

This will help python 3 migration.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual.

Closes #28632 from williamhyun/use_python3_style.

Authored-by: William Hyun 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 753636e86b1989a9b43ec691de5136a70d11f323)
Signed-off-by: HyukjinKwon 
---
 dev/create-release/release-build.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/create-release/release-build.sh 
b/dev/create-release/release-build.sh
index 655b079..e3bcb72 100755
--- a/dev/create-release/release-build.sh
+++ b/dev/create-release/release-build.sh
@@ -380,7 +380,7 @@ if [[ "$1" == "publish-snapshot" ]]; then
   echo "" >> $tmp_settings
 
   # Generate random point for Zinc
-  export ZINC_PORT=$(python -S -c "import random; print 
random.randrange(3030,4030)")
+  export ZINC_PORT=$(python -S -c "import random; 
print(random.randrange(3030,4030))")
 
   $MVN -DzincPort=$ZINC_PORT --settings $tmp_settings -DskipTests 
$SCALA_2_12_PROFILES $PUBLISH_PROFILES deploy
 
@@ -412,7 +412,7 @@ if [[ "$1" == "publish-release" ]]; then
   tmp_repo=$(mktemp -d spark-repo-X)
 
   # Generate random point for Zinc
-  export ZINC_PORT=$(python -S -c "import random; print 
random.randrange(3030,4030)")
+  export ZINC_PORT=$(python -S -c "import random; 
print(random.randrange(3030,4030))")
 
   # TODO: revisit for Scala 2.13 support
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31807][INFRA] Use python 3 style in release-build.sh

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 40e5de5  [SPARK-31807][INFRA] Use python 3 style in release-build.sh
40e5de5 is described below

commit 40e5de54e6109762d3ee85197da4fa1a8b20e13b
Author: William Hyun 
AuthorDate: Mon May 25 10:25:43 2020 +0900

[SPARK-31807][INFRA] Use python 3 style in release-build.sh

### What changes were proposed in this pull request?

This PR aims to use the style that is compatible with both python 2 and 3.

### Why are the changes needed?

This will help python 3 migration.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual.

Closes #28632 from williamhyun/use_python3_style.

Authored-by: William Hyun 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 753636e86b1989a9b43ec691de5136a70d11f323)
Signed-off-by: HyukjinKwon 
---
 dev/create-release/release-build.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/create-release/release-build.sh 
b/dev/create-release/release-build.sh
index 655b079..e3bcb72 100755
--- a/dev/create-release/release-build.sh
+++ b/dev/create-release/release-build.sh
@@ -380,7 +380,7 @@ if [[ "$1" == "publish-snapshot" ]]; then
   echo "" >> $tmp_settings
 
   # Generate random point for Zinc
-  export ZINC_PORT=$(python -S -c "import random; print 
random.randrange(3030,4030)")
+  export ZINC_PORT=$(python -S -c "import random; 
print(random.randrange(3030,4030))")
 
   $MVN -DzincPort=$ZINC_PORT --settings $tmp_settings -DskipTests 
$SCALA_2_12_PROFILES $PUBLISH_PROFILES deploy
 
@@ -412,7 +412,7 @@ if [[ "$1" == "publish-release" ]]; then
   tmp_repo=$(mktemp -d spark-repo-X)
 
   # Generate random point for Zinc
-  export ZINC_PORT=$(python -S -c "import random; print 
random.randrange(3030,4030)")
+  export ZINC_PORT=$(python -S -c "import random; 
print(random.randrange(3030,4030))")
 
   # TODO: revisit for Scala 2.13 support
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (753636e -> a61911c)

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 753636e  [SPARK-31807][INFRA] Use python 3 style in release-build.sh
 add a61911c  [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs

No new revisions were added by this update.

Summary of changes:
 python/pyspark/context.py| 12 ++--
 python/pyspark/tests/test_rdd.py |  9 +
 2 files changed, 19 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31807][INFRA] Use python 3 style in release-build.sh

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 40e5de5  [SPARK-31807][INFRA] Use python 3 style in release-build.sh
40e5de5 is described below

commit 40e5de54e6109762d3ee85197da4fa1a8b20e13b
Author: William Hyun 
AuthorDate: Mon May 25 10:25:43 2020 +0900

[SPARK-31807][INFRA] Use python 3 style in release-build.sh

### What changes were proposed in this pull request?

This PR aims to use the style that is compatible with both python 2 and 3.

### Why are the changes needed?

This will help python 3 migration.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual.

Closes #28632 from williamhyun/use_python3_style.

Authored-by: William Hyun 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 753636e86b1989a9b43ec691de5136a70d11f323)
Signed-off-by: HyukjinKwon 
---
 dev/create-release/release-build.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/create-release/release-build.sh 
b/dev/create-release/release-build.sh
index 655b079..e3bcb72 100755
--- a/dev/create-release/release-build.sh
+++ b/dev/create-release/release-build.sh
@@ -380,7 +380,7 @@ if [[ "$1" == "publish-snapshot" ]]; then
   echo "" >> $tmp_settings
 
   # Generate random point for Zinc
-  export ZINC_PORT=$(python -S -c "import random; print 
random.randrange(3030,4030)")
+  export ZINC_PORT=$(python -S -c "import random; 
print(random.randrange(3030,4030))")
 
   $MVN -DzincPort=$ZINC_PORT --settings $tmp_settings -DskipTests 
$SCALA_2_12_PROFILES $PUBLISH_PROFILES deploy
 
@@ -412,7 +412,7 @@ if [[ "$1" == "publish-release" ]]; then
   tmp_repo=$(mktemp -d spark-repo-X)
 
   # Generate random point for Zinc
-  export ZINC_PORT=$(python -S -c "import random; print 
random.randrange(3030,4030)")
+  export ZINC_PORT=$(python -S -c "import random; 
print(random.randrange(3030,4030))")
 
   # TODO: revisit for Scala 2.13 support
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new be18718  [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs
be18718 is described below

commit be18718380fc501fe2a780debd089a1df91c1699
Author: schintap 
AuthorDate: Mon May 25 10:29:08 2020 +0900

[SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs

### What changes were proposed in this pull request?
UnionRDD of PairRDDs causing a bug. The fix is to check for instance type 
before proceeding

### Why are the changes needed?
Changes are needed to avoid users running into issues with union rdd 
operation with any other type other than JavaRDD.

### Does this PR introduce _any_ user-facing change?
Yes

Before:
SparkSession available as 'spark'.
>>> rdd1 = sc.parallelize([1,2,3,4,5])
>>> rdd2 = sc.parallelize([6,7,8,9,10])
>>> pairRDD1 = rdd1.zip(rdd2)
>>> unionRDD1 = sc.union([pairRDD1, pairRDD1])
Traceback (most recent call last): File "", line 1, in  File 
"/home/gs/spark/latest/python/pyspark/context.py", line 870,
in union jrdds[i] = rdds[i]._jrdd
File 
"/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py",
 line 238, in setitem File 
"/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py",
 line 221,
in __set_item File 
"/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 
332, in get_return_value py4j.protocol.Py4JError: An error occurred while 
calling None.None. Trace: py4j.Py4JException: Cannot convert 
org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at 
py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at 
py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at 
py4j.commands.ArrayCommand.execute(ArrayCommand. [...]

After:
>>> rdd2 = sc.parallelize([6,7,8,9,10])
>>> pairRDD1 = rdd1.zip(rdd2)
>>> unionRDD1 = sc.union([pairRDD1, pairRDD1])
>>> unionRDD1.collect()
[(1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (1, 6), (2, 7), (3, 8), (4, 9), 
(5, 10)]

### How was this patch tested?
Tested with the reproduced piece of code above manually

Closes #28603 from redsanket/SPARK-31788.

Authored-by: schintap 
Signed-off-by: HyukjinKwon 
(cherry picked from commit a61911c50c391e61038cf01611629d2186d17a76)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/context.py| 12 ++--
 python/pyspark/tests/test_rdd.py |  9 +
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/context.py b/python/pyspark/context.py
index d5f1506..3199aa7 100644
--- a/python/pyspark/context.py
+++ b/python/pyspark/context.py
@@ -25,6 +25,7 @@ from threading import RLock
 from tempfile import NamedTemporaryFile
 
 from py4j.protocol import Py4JError
+from py4j.java_gateway import is_instance_of
 
 from pyspark import accumulators
 from pyspark.accumulators import Accumulator
@@ -864,10 +865,17 @@ class SparkContext(object):
 first_jrdd_deserializer = rdds[0]._jrdd_deserializer
 if any(x._jrdd_deserializer != first_jrdd_deserializer for x in rdds):
 rdds = [x._reserialize() for x in rdds]
+gw = SparkContext._gateway
 cls = SparkContext._jvm.org.apache.spark.api.java.JavaRDD
-jrdds = SparkContext._gateway.new_array(cls, len(rdds))
+is_jrdd = is_instance_of(gw, rdds[0]._jrdd, cls)
+jrdds = gw.new_array(cls, len(rdds))
 for i in range(0, len(rdds)):
-jrdds[i] = rdds[i]._jrdd
+if is_jrdd:
+jrdds[i] = rdds[i]._jrdd
+else:
+# zip could return JavaPairRDD hence we ensure `_jrdd`
+# to be `JavaRDD` by wrapping it in a `map`
+jrdds[i] = rdds[i].map(lambda x: x)._jrdd
 return RDD(self._jsc.union(jrdds), self, rdds[0]._jrdd_deserializer)
 
 def broadcast(self, value):
diff --git a/python/pyspark/tests/test_rdd.py b/python/pyspark/tests/test_rdd.py
index e2d910c..0f1ee5b 100644
--- a/python/pyspark/tests/test_rdd.py
+++ b/python/pyspark/tests/test_rdd.py
@@ -166,6 +166,15 @@ class RDDTests(ReusedPySparkTestCase):
 set([(x, (x, x)) for x in 'abc'])
 )
 
+def test_union_pair_rdd(self):
+# Regression test for SPARK-31788
+rdd = self.sc.parallelize([1, 2])
+pair_rdd = rdd.zip(rdd)
+self.assertEqual(
+self.sc.union([pair_rdd, pair_rdd]).collect(),
+[((1, 1), (2, 2)), ((1, 1), (2, 2))]
+)
+
 def test_deleting_input_files(self):
 # Regression test for SPARK-1025
 tempFile = tempfile.NamedTemporaryFile(delete=False)


-
To uns

[spark] branch master updated (753636e -> a61911c)

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 753636e  [SPARK-31807][INFRA] Use python 3 style in release-build.sh
 add a61911c  [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs

No new revisions were added by this update.

Summary of changes:
 python/pyspark/context.py| 12 ++--
 python/pyspark/tests/test_rdd.py |  9 +
 2 files changed, 19 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (753636e -> a61911c)

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 753636e  [SPARK-31807][INFRA] Use python 3 style in release-build.sh
 add a61911c  [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs

No new revisions were added by this update.

Summary of changes:
 python/pyspark/context.py| 12 ++--
 python/pyspark/tests/test_rdd.py |  9 +
 2 files changed, 19 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new be18718  [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs
be18718 is described below

commit be18718380fc501fe2a780debd089a1df91c1699
Author: schintap 
AuthorDate: Mon May 25 10:29:08 2020 +0900

[SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs

### What changes were proposed in this pull request?
UnionRDD of PairRDDs causing a bug. The fix is to check for instance type 
before proceeding

### Why are the changes needed?
Changes are needed to avoid users running into issues with union rdd 
operation with any other type other than JavaRDD.

### Does this PR introduce _any_ user-facing change?
Yes

Before:
SparkSession available as 'spark'.
>>> rdd1 = sc.parallelize([1,2,3,4,5])
>>> rdd2 = sc.parallelize([6,7,8,9,10])
>>> pairRDD1 = rdd1.zip(rdd2)
>>> unionRDD1 = sc.union([pairRDD1, pairRDD1])
Traceback (most recent call last): File "", line 1, in  File 
"/home/gs/spark/latest/python/pyspark/context.py", line 870,
in union jrdds[i] = rdds[i]._jrdd
File 
"/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py",
 line 238, in setitem File 
"/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py",
 line 221,
in __set_item File 
"/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 
332, in get_return_value py4j.protocol.Py4JError: An error occurred while 
calling None.None. Trace: py4j.Py4JException: Cannot convert 
org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at 
py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at 
py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at 
py4j.commands.ArrayCommand.execute(ArrayCommand. [...]

After:
>>> rdd2 = sc.parallelize([6,7,8,9,10])
>>> pairRDD1 = rdd1.zip(rdd2)
>>> unionRDD1 = sc.union([pairRDD1, pairRDD1])
>>> unionRDD1.collect()
[(1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (1, 6), (2, 7), (3, 8), (4, 9), 
(5, 10)]

### How was this patch tested?
Tested with the reproduced piece of code above manually

Closes #28603 from redsanket/SPARK-31788.

Authored-by: schintap 
Signed-off-by: HyukjinKwon 
(cherry picked from commit a61911c50c391e61038cf01611629d2186d17a76)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/context.py| 12 ++--
 python/pyspark/tests/test_rdd.py |  9 +
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/context.py b/python/pyspark/context.py
index d5f1506..3199aa7 100644
--- a/python/pyspark/context.py
+++ b/python/pyspark/context.py
@@ -25,6 +25,7 @@ from threading import RLock
 from tempfile import NamedTemporaryFile
 
 from py4j.protocol import Py4JError
+from py4j.java_gateway import is_instance_of
 
 from pyspark import accumulators
 from pyspark.accumulators import Accumulator
@@ -864,10 +865,17 @@ class SparkContext(object):
 first_jrdd_deserializer = rdds[0]._jrdd_deserializer
 if any(x._jrdd_deserializer != first_jrdd_deserializer for x in rdds):
 rdds = [x._reserialize() for x in rdds]
+gw = SparkContext._gateway
 cls = SparkContext._jvm.org.apache.spark.api.java.JavaRDD
-jrdds = SparkContext._gateway.new_array(cls, len(rdds))
+is_jrdd = is_instance_of(gw, rdds[0]._jrdd, cls)
+jrdds = gw.new_array(cls, len(rdds))
 for i in range(0, len(rdds)):
-jrdds[i] = rdds[i]._jrdd
+if is_jrdd:
+jrdds[i] = rdds[i]._jrdd
+else:
+# zip could return JavaPairRDD hence we ensure `_jrdd`
+# to be `JavaRDD` by wrapping it in a `map`
+jrdds[i] = rdds[i].map(lambda x: x)._jrdd
 return RDD(self._jsc.union(jrdds), self, rdds[0]._jrdd_deserializer)
 
 def broadcast(self, value):
diff --git a/python/pyspark/tests/test_rdd.py b/python/pyspark/tests/test_rdd.py
index e2d910c..0f1ee5b 100644
--- a/python/pyspark/tests/test_rdd.py
+++ b/python/pyspark/tests/test_rdd.py
@@ -166,6 +166,15 @@ class RDDTests(ReusedPySparkTestCase):
 set([(x, (x, x)) for x in 'abc'])
 )
 
+def test_union_pair_rdd(self):
+# Regression test for SPARK-31788
+rdd = self.sc.parallelize([1, 2])
+pair_rdd = rdd.zip(rdd)
+self.assertEqual(
+self.sc.union([pair_rdd, pair_rdd]).collect(),
+[((1, 1), (2, 2)), ((1, 1), (2, 2))]
+)
+
 def test_deleting_input_files(self):
 # Regression test for SPARK-1025
 tempFile = tempfile.NamedTemporaryFile(delete=False)


-
To uns

[spark] branch branch-3.0 updated: [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new be18718  [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs
be18718 is described below

commit be18718380fc501fe2a780debd089a1df91c1699
Author: schintap 
AuthorDate: Mon May 25 10:29:08 2020 +0900

[SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs

### What changes were proposed in this pull request?
UnionRDD of PairRDDs causing a bug. The fix is to check for instance type 
before proceeding

### Why are the changes needed?
Changes are needed to avoid users running into issues with union rdd 
operation with any other type other than JavaRDD.

### Does this PR introduce _any_ user-facing change?
Yes

Before:
SparkSession available as 'spark'.
>>> rdd1 = sc.parallelize([1,2,3,4,5])
>>> rdd2 = sc.parallelize([6,7,8,9,10])
>>> pairRDD1 = rdd1.zip(rdd2)
>>> unionRDD1 = sc.union([pairRDD1, pairRDD1])
Traceback (most recent call last): File "", line 1, in  File 
"/home/gs/spark/latest/python/pyspark/context.py", line 870,
in union jrdds[i] = rdds[i]._jrdd
File 
"/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py",
 line 238, in setitem File 
"/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py",
 line 221,
in __set_item File 
"/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 
332, in get_return_value py4j.protocol.Py4JError: An error occurred while 
calling None.None. Trace: py4j.Py4JException: Cannot convert 
org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at 
py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at 
py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at 
py4j.commands.ArrayCommand.execute(ArrayCommand. [...]

After:
>>> rdd2 = sc.parallelize([6,7,8,9,10])
>>> pairRDD1 = rdd1.zip(rdd2)
>>> unionRDD1 = sc.union([pairRDD1, pairRDD1])
>>> unionRDD1.collect()
[(1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (1, 6), (2, 7), (3, 8), (4, 9), 
(5, 10)]

### How was this patch tested?
Tested with the reproduced piece of code above manually

Closes #28603 from redsanket/SPARK-31788.

Authored-by: schintap 
Signed-off-by: HyukjinKwon 
(cherry picked from commit a61911c50c391e61038cf01611629d2186d17a76)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/context.py| 12 ++--
 python/pyspark/tests/test_rdd.py |  9 +
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/context.py b/python/pyspark/context.py
index d5f1506..3199aa7 100644
--- a/python/pyspark/context.py
+++ b/python/pyspark/context.py
@@ -25,6 +25,7 @@ from threading import RLock
 from tempfile import NamedTemporaryFile
 
 from py4j.protocol import Py4JError
+from py4j.java_gateway import is_instance_of
 
 from pyspark import accumulators
 from pyspark.accumulators import Accumulator
@@ -864,10 +865,17 @@ class SparkContext(object):
 first_jrdd_deserializer = rdds[0]._jrdd_deserializer
 if any(x._jrdd_deserializer != first_jrdd_deserializer for x in rdds):
 rdds = [x._reserialize() for x in rdds]
+gw = SparkContext._gateway
 cls = SparkContext._jvm.org.apache.spark.api.java.JavaRDD
-jrdds = SparkContext._gateway.new_array(cls, len(rdds))
+is_jrdd = is_instance_of(gw, rdds[0]._jrdd, cls)
+jrdds = gw.new_array(cls, len(rdds))
 for i in range(0, len(rdds)):
-jrdds[i] = rdds[i]._jrdd
+if is_jrdd:
+jrdds[i] = rdds[i]._jrdd
+else:
+# zip could return JavaPairRDD hence we ensure `_jrdd`
+# to be `JavaRDD` by wrapping it in a `map`
+jrdds[i] = rdds[i].map(lambda x: x)._jrdd
 return RDD(self._jsc.union(jrdds), self, rdds[0]._jrdd_deserializer)
 
 def broadcast(self, value):
diff --git a/python/pyspark/tests/test_rdd.py b/python/pyspark/tests/test_rdd.py
index e2d910c..0f1ee5b 100644
--- a/python/pyspark/tests/test_rdd.py
+++ b/python/pyspark/tests/test_rdd.py
@@ -166,6 +166,15 @@ class RDDTests(ReusedPySparkTestCase):
 set([(x, (x, x)) for x in 'abc'])
 )
 
+def test_union_pair_rdd(self):
+# Regression test for SPARK-31788
+rdd = self.sc.parallelize([1, 2])
+pair_rdd = rdd.zip(rdd)
+self.assertEqual(
+self.sc.union([pair_rdd, pair_rdd]).collect(),
+[((1, 1), (2, 2)), ((1, 1), (2, 2))]
+)
+
 def test_deleting_input_files(self):
 # Regression test for SPARK-1025
 tempFile = tempfile.NamedTemporaryFile(delete=False)


-
To uns

[spark] branch master updated (a61911c -> b90e10c)

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a61911c  [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs
 add b90e10c  [SPARK-31377][SQL][TEST] Added unit tests to 'number of 
output rows metric' for some joins in SQLMetricSuite

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/metric/SQLMetricsSuite.scala | 73 --
 1 file changed, 54 insertions(+), 19 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a61911c -> b90e10c)

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a61911c  [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs
 add b90e10c  [SPARK-31377][SQL][TEST] Added unit tests to 'number of 
output rows metric' for some joins in SQLMetricSuite

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/metric/SQLMetricsSuite.scala | 73 --
 1 file changed, 54 insertions(+), 19 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a61911c -> b90e10c)

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a61911c  [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs
 add b90e10c  [SPARK-31377][SQL][TEST] Added unit tests to 'number of 
output rows metric' for some joins in SQLMetricSuite

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/metric/SQLMetricsSuite.scala | 73 --
 1 file changed, 54 insertions(+), 19 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a61911c -> b90e10c)

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a61911c  [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs
 add b90e10c  [SPARK-31377][SQL][TEST] Added unit tests to 'number of 
output rows metric' for some joins in SQLMetricSuite

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/metric/SQLMetricsSuite.scala | 73 --
 1 file changed, 54 insertions(+), 19 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a61911c -> b90e10c)

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a61911c  [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs
 add b90e10c  [SPARK-31377][SQL][TEST] Added unit tests to 'number of 
output rows metric' for some joins in SQLMetricSuite

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/metric/SQLMetricsSuite.scala | 73 --
 1 file changed, 54 insertions(+), 19 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b90e10c -> 7f36310)

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b90e10c  [SPARK-31377][SQL][TEST] Added unit tests to 'number of 
output rows metric' for some joins in SQLMetricSuite
 add 7f36310  [SPARK-31802][SQL] Format Java date-time types in 
`Row.jsonValue` directly

No new revisions were added by this update.

Summary of changes:
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31802][SQL] Format Java date-time types in `Row.jsonValue` directly

2020-05-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 10208908 [SPARK-31802][SQL] Format Java date-time types in 
`Row.jsonValue` directly
10208908 is described below

commit 102089081e246bab306def2b63385bca35b87fe9
Author: Max Gekk 
AuthorDate: Mon May 25 12:50:38 2020 +0900

[SPARK-31802][SQL] Format Java date-time types in `Row.jsonValue` directly

### What changes were proposed in this pull request?
Use `format()` methods for Java date-time types in `Row.jsonValue`. The PR 
https://github.com/apache/spark/pull/28582 added the methods to avoid 
conversions to days and microseconds.

### Why are the changes needed?
To avoid unnecessary overhead of converting Java date-time types to 
micros/days before formatting. Also formatters have to convert input 
micros/days back to Java types to pass instances to standard library API.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By existing tests in `RowJsonSuite`.

Closes #28620 from MaxGekk/toJson-format-Java-datetime-types.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 7f36310500a7ab6e80cc1b1e6c764ae7b9657758)
Signed-off-by: HyukjinKwon 
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
index 4487a2d..5b17f1d 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
@@ -571,14 +571,10 @@ trait Row extends Serializable {
   case (s: String, _) => JString(s)
   case (b: Array[Byte], BinaryType) =>
 JString(Base64.getEncoder.encodeToString(b))
-  case (d: LocalDate, _) =>
-JString(dateFormatter.format(DateTimeUtils.localDateToDays(d)))
-  case (d: Date, _) =>
-JString(dateFormatter.format(DateTimeUtils.fromJavaDate(d)))
-  case (i: Instant, _) =>
-JString(timestampFormatter.format(DateTimeUtils.instantToMicros(i)))
-  case (t: Timestamp, _) =>
-JString(timestampFormatter.format(DateTimeUtils.fromJavaTimestamp(t)))
+  case (d: LocalDate, _) => JString(dateFormatter.format(d))
+  case (d: Date, _) => JString(dateFormatter.format(d))
+  case (i: Instant, _) => JString(timestampFormatter.format(i))
+  case (t: Timestamp, _) => JString(timestampFormatter.format(t))
   case (i: CalendarInterval, _) => JString(i.toString)
   case (a: Array[_], ArrayType(elementType, _)) =>
 iteratorToJsonArray(a.iterator, elementType)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] maropu commented on a change in pull request #264: Fix case of GitHub.

[GitHub] [spark-website] jbampton commented on pull request #264: Fix case of GitHub.

[GitHub] [spark-website] maropu commented on pull request #264: Fix typos: Github to GitHub

[spark-website] branch asf-site updated: Fix typos: Github to GitHub

[GitHub] [spark-website] asfgit closed pull request #264: Fix typos: Github to GitHub

[GitHub] [spark-website] maropu commented on pull request #264: Fix typos: Github to GitHub

[spark] branch branch-3.0 updated (576c224 -> 72c466e)

[spark] branch branch-3.0 updated: [SPARK-31761][SQL][3.0] cast integer to Long to avoid IntegerOverflow for IntegralDivide operator

[spark] branch master updated (cf7463f -> d0fe433)

[spark] branch master updated (cf7463f -> d0fe433)

[spark] branch master updated (d0fe433 -> 753636e)

[spark] branch master updated (d0fe433 -> 753636e)

[spark] branch branch-3.0 updated: [SPARK-31807][INFRA] Use python 3 style in release-build.sh

[spark] branch branch-3.0 updated: [SPARK-31807][INFRA] Use python 3 style in release-build.sh

[spark] branch master updated (753636e -> a61911c)

[spark] branch branch-3.0 updated: [SPARK-31807][INFRA] Use python 3 style in release-build.sh

[spark] branch branch-3.0 updated: [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs

[spark] branch master updated (753636e -> a61911c)

[spark] branch master updated (753636e -> a61911c)

[spark] branch branch-3.0 updated: [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs

[spark] branch branch-3.0 updated: [SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs

[spark] branch master updated (a61911c -> b90e10c)

[spark] branch master updated (a61911c -> b90e10c)

[spark] branch master updated (a61911c -> b90e10c)

[spark] branch master updated (a61911c -> b90e10c)

[spark] branch master updated (a61911c -> b90e10c)

[spark] branch master updated (b90e10c -> 7f36310)

[spark] branch branch-3.0 updated: [SPARK-31802][SQL] Format Java date-time types in `Row.jsonValue` directly

28 matches

Site Navigation

Mail list logo

Footer information