[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19884
  
**[Test build #85160 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85160/testReport)**
 for PR 19884 at commit 
[`0047f7a`](https://github.com/apache/spark/commit/0047f7a6560bfbb46d7ee28df0c2781f7538b907).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20031: [SPARK-22844][R] Adds date_trunc in R API

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20031
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20031: [SPARK-22844][R] Adds date_trunc in R API

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20031
  
**[Test build #85177 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85177/testReport)**
 for PR 20031 at commit 
[`1c3e956`](https://github.com/apache/spark/commit/1c3e956313b78da492f917c003c38e981cce7877).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20031: [SPARK-22844][R] Adds date_trunc in R API

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20031
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85177/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20020: [SPARK-22834][SQL] Make insertion commands have real chi...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20020
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20020: [SPARK-22834][SQL] Make insertion commands have real chi...

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20020
  
**[Test build #85161 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85161/testReport)**
 for PR 20020 at commit 
[`e25a9eb`](https://github.com/apache/spark/commit/e25a9eb285d56a771a56b77534413be59b9f111b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait DataWritingCommand extends Command `
  * `case class DataWritingCommandExec(cmd: DataWritingCommand, children: 
Seq[SparkPlan])`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20021
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85164/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20021
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...

2017-12-20 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/19954
  
> I don't think they are independent as architecturally they make sense 
together and represent a single concern: enabling use of remote dependencies 
through init-containers. Missing any one of the three makes the feature 
unusable. I would also argue that it won't necessarily make review easier as 
reviewers need to mentally connect them together to make sense of each change 
set. 

I agree with this. This is pretty much one cohesive unit and splitting it 
up is going to probably lead to more difficulty in understanding it. From your 
comments @vanzin, it seems we definitely do need a good refactor here, and the 
community can undertake that in Q1 2018. This approach and code has been 
functionally tested over the last 3 releases of our fork - and I'd be fairly 
confident about its efficacy - broad changes at this point seem riskier to me 
from a 2.3 release perspective given that we're still in the process of 
improving spark-k8s integration testing coverage against apache/spark. 

cc/ @mccheah 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19884
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85165/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19884
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19884
  
**[Test build #85165 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85165/testReport)**
 for PR 19884 at commit 
[`d92ae90`](https://github.com/apache/spark/commit/d92ae90e05f55955eaad8e7f55e6324bf333a6bc).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18029
  
**[Test build #85181 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85181/testReport)**
 for PR 18029 at commit 
[`3c16c47`](https://github.com/apache/spark/commit/3c16c478257c8aed61b1cef4d75360b8bb8b166d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19991: [SPARK-22801][ML][PYSPARK] Allow FeatureHasher to treat ...

2017-12-20 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/19991
  
@holdenk @sethah any other comments?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-20 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19977
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...

2017-12-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20021
  
> I checked some call sites. Here is one example that `extraArguments` has 
`ev.value` instead of local variable.

Hey, `ev.value` is not from children, it's the output of the current 
expression, which we can make sure it's local variable, e.g. 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L296


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r158004864
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -566,6 +568,21 @@ object TypeCoercion {
 }
   }
 
+  /**
+   * When all inputs in [[Concat]] are binary, coerces an output type to 
binary
+   */
+  case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule {
--- End diff --

I think we should do it in this PR, because this is a new requirement for 
the new behavior introduced in this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-20 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r158005532
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -566,6 +568,21 @@ object TypeCoercion {
 }
   }
 
+  /**
+   * When all inputs in [[Concat]] are binary, coerces an output type to 
binary
+   */
+  case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule {
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-12-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19498
  
Hi @brkyvz, could you take a look please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-12-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19498
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...

2017-12-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20023
  
Ideally we should not change behaviors as possible as we can, but since 
this behavior is from Hive and Hive also changed it, might be OK to follow Hive 
and also change it? cc @hvanhovell  too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19498
  
**[Test build #85184 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85184/testReport)**
 for PR 19498 at commit 
[`174ec21`](https://github.com/apache/spark/commit/174ec2139a7e0af049e2954494525fd3fff145e2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...

2017-12-20 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20023
  
@cloud-fan yes, Hive changed and most important at the moment we are not 
compliant with SQL standard. So currently Spark is returning results which are 
different from Hive and not compliant with SQL standard. This is why I proposed 
this change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20002: [SPARK-22465][Core][WIP] Add a safety-check to RDD defau...

2017-12-20 Thread sujithjay
Github user sujithjay commented on the issue:

https://github.com/apache/spark/pull/20002
  
@tgravescs , could you please take a look when you have some time ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20032: [SPARK-22845] [Scheduler] Modify spark.kubernetes...

2017-12-20 Thread foxish
GitHub user foxish opened a pull request:

https://github.com/apache/spark/pull/20032

[SPARK-22845] [Scheduler] Modify spark.kubernetes.allocation.batch.delay to 
take time instead of int

## What changes were proposed in this pull request?

Fixing configuration that was taking an int which should take time. 
Discussion in https://github.com/apache/spark/pull/19946#discussion_r156682354
Made the granularity milliseconds as opposed to seconds since there's a 
use-case for sub-second reactions to scale-up rapidly especially with dynamic 
allocation.

## How was this patch tested?

TODO: manual run of integration tests against this PR.
PTAL

cc/ @mccheah @liyinan926 @kimoonkim @vanzin @mridulm @jiangxb1987 @ueshin 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache-spark-on-k8s/spark fix-time-conf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20032.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20032


commit 48a3326faaea69bf74d97d028bffdd0552777ffe
Author: foxish 
Date:   2017-12-20T12:03:07Z

Change config to support millisecond based timeconf




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20032: [SPARK-22845] [Scheduler] Modify spark.kubernetes.alloca...

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20032
  
**[Test build #85185 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85185/testReport)**
 for PR 20032 at commit 
[`48a3326`](https://github.com/apache/spark/commit/48a3326faaea69bf74d97d028bffdd0552777ffe).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-20 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r158008588
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18029
  
**[Test build #85181 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85181/testReport)**
 for PR 18029 at commit 
[`3c16c47`](https://github.com/apache/spark/commit/3c16c478257c8aed61b1cef4d75360b8bb8b166d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class KinesisInitialPositions `
  * `public static class Latest implements KinesisInitialPosition, 
Serializable `
  * `public static class TrimHorizon implements KinesisInitialPosition, 
Serializable `
  * `public static class AtTimestamp implements KinesisInitialPosition, 
Serializable `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18029
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85181/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18029
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-12-20 Thread yashs360
Github user yashs360 commented on the issue:

https://github.com/apache/spark/pull/18029
  
Hi @brkyvz , I've added the new changes with the java classes. Had to make 
the classes serializable for passing them to the KinesisReceiver. Please have a 
look when you get time. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - Document...

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19946
  
**[Test build #85167 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85167/testReport)**
 for PR 19946 at commit 
[`74ac5c9`](https://github.com/apache/spark/commit/74ac5c9e5b495d0133e8e1378867a43f2bc1ff4a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - Document...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19946
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85167/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - Document...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19946
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20032: [SPARK-22845] [Scheduler] Modify spark.kubernetes.alloca...

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20032
  
**[Test build #85185 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85185/testReport)**
 for PR 20032 at commit 
[`48a3326`](https://github.com/apache/spark/commit/48a3326faaea69bf74d97d028bffdd0552777ffe).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20032: [SPARK-22845] [Scheduler] Modify spark.kubernetes.alloca...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20032
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20032: [SPARK-22845] [Scheduler] Modify spark.kubernetes.alloca...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20032
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85185/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19498
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85184/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19498
  
**[Test build #85184 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85184/testReport)**
 for PR 19498 at commit 
[`174ec21`](https://github.com/apache/spark/commit/174ec2139a7e0af049e2954494525fd3fff145e2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19498
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20030: [SPARK-10496][CORE] Efficient RDD cumulative sum

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20030
  
**[Test build #85172 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85172/testReport)**
 for PR 20030 at commit 
[`4f1d5e2`](https://github.com/apache/spark/commit/4f1d5e269c5f84f6126fea97c201b6cd6fef461f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20030: [SPARK-10496][CORE] Efficient RDD cumulative sum

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20030
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85172/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20030: [SPARK-10496][CORE] Efficient RDD cumulative sum

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20030
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19904
  
**[Test build #85183 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85183/testReport)**
 for PR 19904 at commit 
[`cad2104`](https://github.com/apache/spark/commit/cad210439b7a0bc3eb870f1d68fd96fbd0763aa8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19904
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85183/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19904
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20008
  
**[Test build #85186 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85186/testReport)**
 for PR 20008 at commit 
[`19bcca1`](https://github.com/apache/spark/commit/19bcca13ab03c9a5cb5399476e1afac26a30ec49).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...

2017-12-20 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20021
  
Oh, you are right. I misunderstood. After our optimizations, output is also 
a part of `arguments`. Let me check others again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r158206309
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -110,3 +110,12 @@ def toJArray(gateway, jtype, arr):
 for i in range(0, len(arr)):
 jarr[i] = arr[i]
 return jarr
+
+
+def _require_minimum_pyarrow_version():
--- End diff --

@ueshin did we do the same thing for pandas?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20043
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85232/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20043
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...

2017-12-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20018#discussion_r158210425
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala
 ---
@@ -102,8 +101,63 @@ object SparkHiveExample {
 // |  4| val_4|  4| val_4|
 // |  5| val_5|  5| val_5|
 // ...
-// $example off:spark_hive$
 
+/*
+ * Save DataFrame to Hive Managed table as Parquet format
+ * 1. Create Hive Database / Schema with location at HDFS if you want 
to mentioned explicitly else default
+ * warehouse location will be used to store Hive table Data.
+ * Ex: CREATE DATABASE IF NOT EXISTS database_name LOCATION hdfs_path;
+ * You don't have to explicitly give location for each table, every 
tables under specified schema will be located at
+ * location given while creating schema.
+ * 2. Create Hive Managed table with storage format as 'Parquet'
+ * Ex: CREATE TABLE records(key int, value string) STORED AS PARQUET;
+ */
+val hiveTableDF = sql("SELECT * FROM records").toDF()
--- End diff --

actually, I think `spark.table("records")` is a better example.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...

2017-12-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20018#discussion_r158210374
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala
 ---
@@ -102,8 +101,63 @@ object SparkHiveExample {
 // |  4| val_4|  4| val_4|
 // |  5| val_5|  5| val_5|
 // ...
-// $example off:spark_hive$
 
+/*
+ * Save DataFrame to Hive Managed table as Parquet format
+ * 1. Create Hive Database / Schema with location at HDFS if you want 
to mentioned explicitly else default
+ * warehouse location will be used to store Hive table Data.
+ * Ex: CREATE DATABASE IF NOT EXISTS database_name LOCATION hdfs_path;
+ * You don't have to explicitly give location for each table, every 
tables under specified schema will be located at
+ * location given while creating schema.
+ * 2. Create Hive Managed table with storage format as 'Parquet'
+ * Ex: CREATE TABLE records(key int, value string) STORED AS PARQUET;
+ */
+val hiveTableDF = sql("SELECT * FROM records").toDF()
--- End diff --

`.toDF` is not needed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...

2017-12-20 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/19954
  
I'll finish reading this by Friday, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack

2017-12-20 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20035
  
I think the test failure is not related to this change, but the ongoing 
work to upgrade pyarrow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20041: [SPARK-22042] [FOLLOW-UP] [SQL] ReorderJoinPredicates ca...

2017-12-20 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/20041
  
checked the test case failure but I dont think its related to this PR. 

```
org.apache.spark.sql.execution.datasources.parquet.ParquetQuerySuite.(It is 
not a test it is a sbt.testing.SuiteSelector)
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed 
to eventually never returned normally. Attempted 651 times over 10.008601144 
seconds. Last failure message: There are 1 possibly leaked file streams..
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20041: [SPARK-22042] [FOLLOW-UP] [SQL] ReorderJoinPredicates ca...

2017-12-20 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/20041
  
Jenkins retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r158206546
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -110,3 +110,12 @@ def toJArray(gateway, jtype, arr):
 for i in range(0, len(arr)):
 jarr[i] = arr[i]
 return jarr
+
+
+def _require_minimum_pyarrow_version():
--- End diff --

No. I just checked if `ImportError` occurred or not. We should do the same 
thing for pandas later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20043
  
cc @kiszk @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r158208592
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2141,22 +2141,22 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
 
>>> from pyspark.sql.functions import pandas_udf, PandasUDFType
>>> from pyspark.sql.types import IntegerType, StringType
-   >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())
-   >>> @pandas_udf(StringType())
+   >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())  # 
doctest: +SKIP
+   >>> @pandas_udf(StringType())  # doctest: +SKIP
... def to_upper(s):
... return s.str.upper()
...
-   >>> @pandas_udf("integer", PandasUDFType.SCALAR)
+   >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
... def add_one(x):
... return x + 1
...
-   >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", 
"name", "age"))
+   >>> df = spark.createDataFrame([(1, "John", 21)], ("id", "name", 
"age"))  # doctest: +SKIP
--- End diff --

The name change shouldn't have been committed, I'll change it back.  I 
don't think we can make the doctests conditional on if pandas/pyarrow is 
installed, so unless we make these required dependencies and have them 
installed on all the workers, we need to skip them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20043: [SPARK-22856][SQL] Add wrappers for codegen outpu...

2017-12-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20043#discussion_r158209659
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -56,7 +56,36 @@ import org.apache.spark.util.{ParentClassLoader, Utils}
  * @param value A term for a (possibly primitive) value of the result of 
the evaluation. Not
  *  valid if `isNull` is set to `true`.
  */
-case class ExprCode(var code: String, var isNull: String, var value: 
String)
+case class ExprCode(var code: String, var isNull: ExprValue, var value: 
ExprValue)
+
+
+// An abstraction that represents the evaluation result of [[ExprCode]].
+abstract class ExprValue
+
+object ExprValue {
+  implicit def exprValueToString(exprValue: ExprValue): String = 
exprValue.toString
+}
+
+// A literal evaluation of [[ExprCode]].
+case class LiteralValue(val value: String) extends ExprValue {
+  override def toString: String = value
+}
+
+// A variable evaluation of [[ExprCode]].
+case class VariableValue(val variableName: String) extends ExprValue {
+  override def toString: String = variableName
+}
+
+// A statement evaluation of [[ExprCode]].
+case class StatementValue(val statement: String) extends ExprValue {
+  override def toString: String = statement
+}
+
+// A global variable evaluation of [[ExprCode]].
+case class GlobalValue(val value: String) extends ExprValue {
--- End diff --

for compacted global variables, we may get something like `arr[1]` while 
`arr` is a global variable. Is `arr[1]` a statement or global variable?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20043
  
**[Test build #85241 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85241/testReport)**
 for PR 20043 at commit 
[`d120750`](https://github.com/apache/spark/commit/d120750ff61bb066e7ceb628f3356fa37af462f5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19884
  
**[Test build #85242 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85242/testReport)**
 for PR 19884 at commit 
[`ae84c84`](https://github.com/apache/spark/commit/ae84c8454875906e488b895e18ad78ddf6e9fbc9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20043: [SPARK-22856][SQL] Add wrappers for codegen outpu...

2017-12-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20043#discussion_r158210849
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -56,7 +56,36 @@ import org.apache.spark.util.{ParentClassLoader, Utils}
  * @param value A term for a (possibly primitive) value of the result of 
the evaluation. Not
  *  valid if `isNull` is set to `true`.
  */
-case class ExprCode(var code: String, var isNull: String, var value: 
String)
+case class ExprCode(var code: String, var isNull: ExprValue, var value: 
ExprValue)
+
+
+// An abstraction that represents the evaluation result of [[ExprCode]].
+abstract class ExprValue
+
+object ExprValue {
+  implicit def exprValueToString(exprValue: ExprValue): String = 
exprValue.toString
+}
+
+// A literal evaluation of [[ExprCode]].
+case class LiteralValue(val value: String) extends ExprValue {
+  override def toString: String = value
+}
+
+// A variable evaluation of [[ExprCode]].
+case class VariableValue(val variableName: String) extends ExprValue {
+  override def toString: String = variableName
+}
+
+// A statement evaluation of [[ExprCode]].
+case class StatementValue(val statement: String) extends ExprValue {
+  override def toString: String = statement
+}
+
+// A global variable evaluation of [[ExprCode]].
+case class GlobalValue(val value: String) extends ExprValue {
--- End diff --

It is considered as global variable now, as it can be accessed globally and 
don't/can't be parameterized.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...

2017-12-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20018#discussion_r158210754
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala
 ---
@@ -102,8 +101,63 @@ object SparkHiveExample {
 // |  4| val_4|  4| val_4|
 // |  5| val_5|  5| val_5|
 // ...
-// $example off:spark_hive$
 
+/*
+ * Save DataFrame to Hive Managed table as Parquet format
+ * 1. Create Hive Database / Schema with location at HDFS if you want 
to mentioned explicitly else default
+ * warehouse location will be used to store Hive table Data.
+ * Ex: CREATE DATABASE IF NOT EXISTS database_name LOCATION hdfs_path;
+ * You don't have to explicitly give location for each table, every 
tables under specified schema will be located at
+ * location given while creating schema.
+ * 2. Create Hive Managed table with storage format as 'Parquet'
+ * Ex: CREATE TABLE records(key int, value string) STORED AS PARQUET;
+ */
+val hiveTableDF = sql("SELECT * FROM records").toDF()
+
hiveTableDF.write.mode(SaveMode.Overwrite).saveAsTable("database_name.records")
+
+/*
+ * Save DataFrame to Hive External table as compatible parquet format.
+ * 1. Create Hive External table with storage format as parquet.
+ * Ex: CREATE EXTERNAL TABLE records(key int, value string) STORED AS 
PARQUET;
+ * Since we are not explicitly providing hive database location, it 
automatically takes default warehouse location
+ * given to 'spark.sql.warehouse.dir' while creating SparkSession with 
enableHiveSupport().
+ * For example, we have given '/user/hive/warehouse/' as a Hive 
Warehouse location. It will create schema directories
+ * under '/user/hive/warehouse/' as 
'/user/hive/warehouse/database_name.db' and 
'/user/hive/warehouse/database_name'.
+ */
+
+// to make Hive parquet format compatible with spark parquet format
+spark.sqlContext.setConf("spark.sql.parquet.writeLegacyFormat", "true")
+// Multiple parquet files could be created accordingly to volume of 
data under directory given.
+val hiveExternalTableLocation = 
s"/user/hive/warehouse/database_name.db/records"
+
hiveTableDF.write.mode(SaveMode.Overwrite).parquet(hiveExternalTableLocation)
+
+// turn on flag for Dynamic Partitioning
+spark.sqlContext.setConf("hive.exec.dynamic.partition", "true")
+spark.sqlContext.setConf("hive.exec.dynamic.partition.mode", 
"nonstrict")
+// You can create partitions in Hive table, so downstream queries run 
much faster.
+hiveTableDF.write.mode(SaveMode.Overwrite).partitionBy("key")
+  .parquet(hiveExternalTableLocation)
+/*
+If Data volume is very huge, then every partitions would have many 
small-small files which may harm
+downstream query performance due to File I/O, Bandwidth I/O, Network 
I/O, Disk I/O.
+To improve performance you can create single parquet file under each 
partition directory using 'repartition'
+on partitioned key for Hive table. When you add partition to table, 
there will be change in table DDL.
+Ex: CREATE TABLE records(value string) PARTITIONED BY(key int) STORED 
AS PARQUET;
+ */
+hiveTableDF.repartition($"key").write.mode(SaveMode.Overwrite)
+  .partitionBy("key").parquet(hiveExternalTableLocation)
+
+/*
+ You can also do coalesce to control number of files under each 
partitions, repartition does full shuffle and equal
+ data distribution to all partitions. here coalesce can reduce number 
of files to given 'Int' argument without
+ full data shuffle.
+ */
+// coalesce of 10 could create 10 parquet files under each partitions,
+// if data is huge and make sense to do partitioning.
+hiveTableDF.coalesce(10).write.mode(SaveMode.Overwrite)
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-20 Thread zuotingbing
Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/20029
  
It seems each time when connect to thrift server through beeline, the 
`SessionState.start(state)` will be called two times. one is in 
`HiveSessionImpl:open` , another is in `HiveClientImpl.newSession()` for 
`sql("use default")` . When close the beeline connection, only close the 
HiveSession with `HiveSessionImpl.close()`, but the object of 
`HiveClientImpl.newSession()` will be left over.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-20 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19977
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack

2017-12-20 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20035
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [K8S] Spark on Kubernetes - Documen...

2017-12-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r158205893
  
--- Diff: docs/building-spark.md ---
@@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by 
the
 to be runnable, use `./dev/make-distribution.sh` in the project root 
directory. It can be configured
 with Maven profile settings and so on like the direct Maven build. Example:
 
-./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
-Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn
+./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
-Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
--- End diff --

Yea I don't think you need to block this pr with this.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r158206051
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2141,22 +2141,22 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
 
>>> from pyspark.sql.functions import pandas_udf, PandasUDFType
>>> from pyspark.sql.types import IntegerType, StringType
-   >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())
-   >>> @pandas_udf(StringType())
+   >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())  # 
doctest: +SKIP
+   >>> @pandas_udf(StringType())  # doctest: +SKIP
... def to_upper(s):
... return s.str.upper()
...
-   >>> @pandas_udf("integer", PandasUDFType.SCALAR)
+   >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
... def add_one(x):
... return x + 1
...
-   >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", 
"name", "age"))
+   >>> df = spark.createDataFrame([(1, "John", 21)], ("id", "name", 
"age"))  # doctest: +SKIP
--- End diff --

why change `John Doe` to `John`? And are we going to re-enable these 
doctest later?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2017-12-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r158205387
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType {
 case DoubleType => DoubleDecimal
   }
 
+  private[sql] def forLiteral(literal: Literal): DecimalType = 
literal.value match {
+case v: Short => fromBigDecimal(BigDecimal(v))
--- End diff --

Can't we just use `ShortDecimal`, `IntDecimal`...?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2017-12-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r158205620
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType {
 case DoubleType => DoubleDecimal
   }
 
+  private[sql] def forLiteral(literal: Literal): DecimalType = 
literal.value match {
--- End diff --

Is this different than `forType` if applied on `Literal.dataType`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20035: [SPARK-22848][SQL] Eliminate mutable state from S...

2017-12-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20035


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2017-12-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r158206388
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType {
 case DoubleType => DoubleDecimal
   }
 
+  private[sql] def forLiteral(literal: Literal): DecimalType = 
literal.value match {
+case v: Short => fromBigDecimal(BigDecimal(v))
+case v: Int => fromBigDecimal(BigDecimal(v))
+case v: Long => fromBigDecimal(BigDecimal(v))
+case _ => forType(literal.dataType)
+  }
+
+  private[sql] def fromBigDecimal(d: BigDecimal): DecimalType = {
+DecimalType(Math.max(d.precision, d.scale), d.scale)
+  }
+
   private[sql] def bounded(precision: Int, scale: Int): DecimalType = {
 DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE))
   }
 
+  // scalastyle:off line.size.limit
+  /**
+   * Decimal implementation is based on Hive's one, which is itself 
inspired to SQLServer's one.
+   * In particular, when a result precision is greater than {@link 
#MAX_PRECISION}, the
+   * corresponding scale is reduced to prevent the integral part of a 
result from being truncated.
+   *
+   * For further reference, please see
+   * 
https://blogs.msdn.microsoft.com/sqlprogrammability/2006/03/29/multiplication-and-division-with-numerics/.
+   *
+   * @param precision
+   * @param scale
+   * @return
+   */
+  // scalastyle:on line.size.limit
+  private[sql] def adjustPrecisionScale(precision: Int, scale: Int): 
DecimalType = {
+// Assumptions:
+// precision >= scale
+// scale >= 0
+if (precision <= MAX_PRECISION) {
+  // Adjustment only needed when we exceed max precision
+  DecimalType(precision, scale)
+} else {
+  // Precision/scale exceed maximum precision. Result must be adjusted 
to MAX_PRECISION.
+  val intDigits = precision - scale
+  // If original scale less than MINIMUM_ADJUSTED_SCALE, use original 
scale value; otherwise
+  // preserve at least MINIMUM_ADJUSTED_SCALE fractional digits
+  val minScaleValue = Math.min(scale, MINIMUM_ADJUSTED_SCALE)
--- End diff --

Sounds like `MAXIMUM_ADJUSTED_SCALE` instead of `MINIMUM_ADJUSTED_SCALE`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/19884
  
I used a workaround for timestamp casts that allows the tests to pass for 
me locally, and left a note to look into the root cause later.  Hopefully this 
should pass now and we will be good to merge.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...

2017-12-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20018#discussion_r158210132
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala
 ---
@@ -102,8 +101,63 @@ object SparkHiveExample {
 // |  4| val_4|  4| val_4|
 // |  5| val_5|  5| val_5|
 // ...
-// $example off:spark_hive$
 
+/*
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...

2017-12-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20018#discussion_r158210714
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala
 ---
@@ -102,8 +101,63 @@ object SparkHiveExample {
 // |  4| val_4|  4| val_4|
 // |  5| val_5|  5| val_5|
 // ...
-// $example off:spark_hive$
 
+/*
+ * Save DataFrame to Hive Managed table as Parquet format
+ * 1. Create Hive Database / Schema with location at HDFS if you want 
to mentioned explicitly else default
+ * warehouse location will be used to store Hive table Data.
+ * Ex: CREATE DATABASE IF NOT EXISTS database_name LOCATION hdfs_path;
+ * You don't have to explicitly give location for each table, every 
tables under specified schema will be located at
+ * location given while creating schema.
+ * 2. Create Hive Managed table with storage format as 'Parquet'
+ * Ex: CREATE TABLE records(key int, value string) STORED AS PARQUET;
+ */
+val hiveTableDF = sql("SELECT * FROM records").toDF()
+
hiveTableDF.write.mode(SaveMode.Overwrite).saveAsTable("database_name.records")
+
+/*
+ * Save DataFrame to Hive External table as compatible parquet format.
+ * 1. Create Hive External table with storage format as parquet.
+ * Ex: CREATE EXTERNAL TABLE records(key int, value string) STORED AS 
PARQUET;
+ * Since we are not explicitly providing hive database location, it 
automatically takes default warehouse location
+ * given to 'spark.sql.warehouse.dir' while creating SparkSession with 
enableHiveSupport().
+ * For example, we have given '/user/hive/warehouse/' as a Hive 
Warehouse location. It will create schema directories
+ * under '/user/hive/warehouse/' as 
'/user/hive/warehouse/database_name.db' and 
'/user/hive/warehouse/database_name'.
+ */
+
+// to make Hive parquet format compatible with spark parquet format
+spark.sqlContext.setConf("spark.sql.parquet.writeLegacyFormat", "true")
+// Multiple parquet files could be created accordingly to volume of 
data under directory given.
+val hiveExternalTableLocation = 
s"/user/hive/warehouse/database_name.db/records"
+
hiveTableDF.write.mode(SaveMode.Overwrite).parquet(hiveExternalTableLocation)
+
+// turn on flag for Dynamic Partitioning
+spark.sqlContext.setConf("hive.exec.dynamic.partition", "true")
+spark.sqlContext.setConf("hive.exec.dynamic.partition.mode", 
"nonstrict")
+// You can create partitions in Hive table, so downstream queries run 
much faster.
+hiveTableDF.write.mode(SaveMode.Overwrite).partitionBy("key")
+  .parquet(hiveExternalTableLocation)
+/*
+If Data volume is very huge, then every partitions would have many 
small-small files which may harm
+downstream query performance due to File I/O, Bandwidth I/O, Network 
I/O, Disk I/O.
+To improve performance you can create single parquet file under each 
partition directory using 'repartition'
+on partitioned key for Hive table. When you add partition to table, 
there will be change in table DDL.
+Ex: CREATE TABLE records(value string) PARTITIONED BY(key int) STORED 
AS PARQUET;
+ */
+hiveTableDF.repartition($"key").write.mode(SaveMode.Overwrite)
--- End diff --

This is not a standard usage, let's not put it in the example.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...

2017-12-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20018#discussion_r158210666
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala
 ---
@@ -102,8 +101,63 @@ object SparkHiveExample {
 // |  4| val_4|  4| val_4|
 // |  5| val_5|  5| val_5|
 // ...
-// $example off:spark_hive$
 
+/*
+ * Save DataFrame to Hive Managed table as Parquet format
+ * 1. Create Hive Database / Schema with location at HDFS if you want 
to mentioned explicitly else default
+ * warehouse location will be used to store Hive table Data.
+ * Ex: CREATE DATABASE IF NOT EXISTS database_name LOCATION hdfs_path;
+ * You don't have to explicitly give location for each table, every 
tables under specified schema will be located at
+ * location given while creating schema.
+ * 2. Create Hive Managed table with storage format as 'Parquet'
+ * Ex: CREATE TABLE records(key int, value string) STORED AS PARQUET;
+ */
+val hiveTableDF = sql("SELECT * FROM records").toDF()
+
hiveTableDF.write.mode(SaveMode.Overwrite).saveAsTable("database_name.records")
+
+/*
+ * Save DataFrame to Hive External table as compatible parquet format.
+ * 1. Create Hive External table with storage format as parquet.
+ * Ex: CREATE EXTERNAL TABLE records(key int, value string) STORED AS 
PARQUET;
--- End diff --

it's weird to create an external table without a location. User may be 
confused between the difference between managed table and external table.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19884
  
**[Test build #85244 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85244/testReport)**
 for PR 19884 at commit 
[`b0200ef`](https://github.com/apache/spark/commit/b0200efd30c6fe77ec6e57d65f3bc828be0e1802).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r158212056
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2141,22 +2141,23 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
 
>>> from pyspark.sql.functions import pandas_udf, PandasUDFType
>>> from pyspark.sql.types import IntegerType, StringType
-   >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())
-   >>> @pandas_udf(StringType())
+   >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())  # 
doctest: +SKIP
+   >>> @pandas_udf(StringType())  # doctest: +SKIP
... def to_upper(s):
... return s.str.upper()
...
-   >>> @pandas_udf("integer", PandasUDFType.SCALAR)
+   >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
... def add_one(x):
... return x + 1
...
-   >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", 
"name", "age"))
+   >>> df = spark.createDataFrame([(1, "John Doe", 21)],
+   ...("id", "name", "age"))  # doctest: 
+SKIP
>>> df.select(slen("name").alias("slen(name)"), to_upper("name"), 
add_one("age")) \\
... .show()  # doctest: +SKIP
+--+--++
|slen(name)|to_upper(name)|add_one(age)|
+--+--++
-   | 8|  JOHN DOE|  22|
+   | 8|  JOHN|  22|
--- End diff --

oops, done!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19977
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19977
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85235/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19977
  
**[Test build #85235 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85235/testReport)**
 for PR 19977 at commit 
[`fc14aeb`](https://github.com/apache/spark/commit/fc14aeb4e92e67aba1750fc1bc2b0fc9afaa5fac).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack

2017-12-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20035
  
yea it's failing globally, I'm merging this PR, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20043
  
**[Test build #85232 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85232/testReport)**
 for PR 20043 at commit 
[`d5c986a`](https://github.com/apache/spark/commit/d5c986a1cab410c4eb64a72119346875d7607be6).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ExprCode(var code: String, var isNull: ExprValue, var 
value: ExprValue)`
  * `case class LiteralValue(val value: String) extends ExprValue `
  * `case class VariableValue(val variableName: String) extends ExprValue `
  * `case class StatementValue(val statement: String) extends ExprValue `
  * `case class GlobalValue(val value: String) extends ExprValue `
  * `case class SubExprEliminationState(isNull: ExprValue, value: 
ExprValue)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20008
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20008
  
**[Test build #85233 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85233/testReport)**
 for PR 20008 at commit 
[`ec07bc2`](https://github.com/apache/spark/commit/ec07bc2a463b089dd5798ab9e6bf8aea1b8ccd28).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20008
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85233/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20043
  
**[Test build #85243 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85243/testReport)**
 for PR 20043 at commit 
[`81c9b6e`](https://github.com/apache/spark/commit/81c9b6e73ee64adcd8fc931d51f3faa98b892e0b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r158211101
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2141,22 +2141,23 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
 
>>> from pyspark.sql.functions import pandas_udf, PandasUDFType
>>> from pyspark.sql.types import IntegerType, StringType
-   >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())
-   >>> @pandas_udf(StringType())
+   >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())  # 
doctest: +SKIP
+   >>> @pandas_udf(StringType())  # doctest: +SKIP
... def to_upper(s):
... return s.str.upper()
...
-   >>> @pandas_udf("integer", PandasUDFType.SCALAR)
+   >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
... def add_one(x):
... return x + 1
...
-   >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", 
"name", "age"))
+   >>> df = spark.createDataFrame([(1, "John Doe", 21)],
+   ...("id", "name", "age"))  # doctest: 
+SKIP
>>> df.select(slen("name").alias("slen(name)"), to_upper("name"), 
add_one("age")) \\
... .show()  # doctest: +SKIP
+--+--++
|slen(name)|to_upper(name)|add_one(age)|
+--+--++
-   | 8|  JOHN DOE|  22|
+   | 8|  JOHN|  22|
--- End diff --

nit: we should revert this too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20035
  
**[Test build #85237 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85237/testReport)**
 for PR 20035 at commit 
[`f0163e7`](https://github.com/apache/spark/commit/f0163e7b68aa09fef5c1dc7f25e00170354a1ab2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20036: [SPARK-18016][SQL][FOLLOW-UP] Code Generation: Constant ...

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20036
  
**[Test build #85236 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85236/testReport)**
 for PR 20036 at commit 
[`53661eb`](https://github.com/apache/spark/commit/53661eb72bba55376bc6112b51c25489522d309c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack

2017-12-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20035
  
**[Test build #85237 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85237/testReport)**
 for PR 20035 at commit 
[`f0163e7`](https://github.com/apache/spark/commit/f0163e7b68aa09fef5c1dc7f25e00170354a1ab2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20036: [SPARK-18016][SQL][FOLLOW-UP] Code Generation: Constant ...

2017-12-20 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20036
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19981
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2017-12-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r158207539
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType {
 case DoubleType => DoubleDecimal
   }
 
+  private[sql] def forLiteral(literal: Literal): DecimalType = 
literal.value match {
+case v: Short => fromBigDecimal(BigDecimal(v))
+case v: Int => fromBigDecimal(BigDecimal(v))
+case v: Long => fromBigDecimal(BigDecimal(v))
+case _ => forType(literal.dataType)
+  }
+
+  private[sql] def fromBigDecimal(d: BigDecimal): DecimalType = {
+DecimalType(Math.max(d.precision, d.scale), d.scale)
+  }
+
   private[sql] def bounded(precision: Int, scale: Int): DecimalType = {
 DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE))
   }
 
+  // scalastyle:off line.size.limit
+  /**
+   * Decimal implementation is based on Hive's one, which is itself 
inspired to SQLServer's one.
+   * In particular, when a result precision is greater than {@link 
#MAX_PRECISION}, the
+   * corresponding scale is reduced to prevent the integral part of a 
result from being truncated.
+   *
+   * For further reference, please see
+   * 
https://blogs.msdn.microsoft.com/sqlprogrammability/2006/03/29/multiplication-and-division-with-numerics/.
--- End diff --

Not sure if this blog link can be available for long time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2017-12-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r158205829
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType {
 case DoubleType => DoubleDecimal
   }
 
+  private[sql] def forLiteral(literal: Literal): DecimalType = 
literal.value match {
+case v: Short => fromBigDecimal(BigDecimal(v))
+case v: Int => fromBigDecimal(BigDecimal(v))
+case v: Long => fromBigDecimal(BigDecimal(v))
+case _ => forType(literal.dataType)
+  }
+
+  private[sql] def fromBigDecimal(d: BigDecimal): DecimalType = {
+DecimalType(Math.max(d.precision, d.scale), d.scale)
+  }
+
   private[sql] def bounded(precision: Int, scale: Int): DecimalType = {
 DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE))
   }
 
+  // scalastyle:off line.size.limit
+  /**
+   * Decimal implementation is based on Hive's one, which is itself 
inspired to SQLServer's one.
+   * In particular, when a result precision is greater than {@link 
#MAX_PRECISION}, the
+   * corresponding scale is reduced to prevent the integral part of a 
result from being truncated.
+   *
+   * For further reference, please see
+   * 
https://blogs.msdn.microsoft.com/sqlprogrammability/2006/03/29/multiplication-and-division-with-numerics/.
+   *
+   * @param precision
+   * @param scale
+   * @return
+   */
+  // scalastyle:on line.size.limit
+  private[sql] def adjustPrecisionScale(precision: Int, scale: Int): 
DecimalType = {
+// Assumptions:
+// precision >= scale
+// scale >= 0
+if (precision <= MAX_PRECISION) {
+  // Adjustment only needed when we exceed max precision
+  DecimalType(precision, scale)
--- End diff --

Shouldn't we also prevent `scale` > `MAX_SCALE`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2017-12-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r158205151
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
 ---
@@ -243,17 +248,43 @@ object DecimalPrecision extends TypeCoercionRule {
 // Promote integers inside a binary expression with fixed-precision 
decimals to decimals,
 // and fixed-precision decimals in an expression with floats / doubles 
to doubles
 case b @ BinaryOperator(left, right) if left.dataType != 
right.dataType =>
-  (left.dataType, right.dataType) match {
-case (t: IntegralType, DecimalType.Fixed(p, s)) =>
-  b.makeCopy(Array(Cast(left, DecimalType.forType(t)), right))
-case (DecimalType.Fixed(p, s), t: IntegralType) =>
-  b.makeCopy(Array(left, Cast(right, DecimalType.forType(t
-case (t, DecimalType.Fixed(p, s)) if isFloat(t) =>
-  b.makeCopy(Array(left, Cast(right, DoubleType)))
-case (DecimalType.Fixed(p, s), t) if isFloat(t) =>
-  b.makeCopy(Array(Cast(left, DoubleType), right))
-case _ =>
-  b
-  }
+  nondecimalLiteralAndDecimal(b).lift((left, right)).getOrElse(
+nondecimalNonliteralAndDecimal(b).applyOrElse((left.dataType, 
right.dataType),
+  (_: (DataType, DataType)) => b))
   }
+
+  /**
+   * Type coercion for BinaryOperator in which one side is a non-decimal 
literal numeric, and the
+   * other side is a decimal.
+   */
+  private def nondecimalLiteralAndDecimal(
--- End diff --

Is this rule newly introduced?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2017-12-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r158206693
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -136,10 +137,54 @@ object DecimalType extends AbstractDataType {
 case DoubleType => DoubleDecimal
   }
 
+  private[sql] def forLiteral(literal: Literal): DecimalType = 
literal.value match {
+case v: Short => fromBigDecimal(BigDecimal(v))
+case v: Int => fromBigDecimal(BigDecimal(v))
+case v: Long => fromBigDecimal(BigDecimal(v))
+case _ => forType(literal.dataType)
+  }
+
+  private[sql] def fromBigDecimal(d: BigDecimal): DecimalType = {
+DecimalType(Math.max(d.precision, d.scale), d.scale)
+  }
+
   private[sql] def bounded(precision: Int, scale: Int): DecimalType = {
 DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE))
   }
 
+  // scalastyle:off line.size.limit
+  /**
+   * Decimal implementation is based on Hive's one, which is itself 
inspired to SQLServer's one.
+   * In particular, when a result precision is greater than {@link 
#MAX_PRECISION}, the
+   * corresponding scale is reduced to prevent the integral part of a 
result from being truncated.
+   *
+   * For further reference, please see
+   * 
https://blogs.msdn.microsoft.com/sqlprogrammability/2006/03/29/multiplication-and-division-with-numerics/.
+   *
+   * @param precision
+   * @param scale
+   * @return
+   */
+  // scalastyle:on line.size.limit
+  private[sql] def adjustPrecisionScale(precision: Int, scale: Int): 
DecimalType = {
+// Assumptions:
+// precision >= scale
+// scale >= 0
+if (precision <= MAX_PRECISION) {
+  // Adjustment only needed when we exceed max precision
+  DecimalType(precision, scale)
+} else {
+  // Precision/scale exceed maximum precision. Result must be adjusted 
to MAX_PRECISION.
+  val intDigits = precision - scale
+  // If original scale less than MINIMUM_ADJUSTED_SCALE, use original 
scale value; otherwise
+  // preserve at least MINIMUM_ADJUSTED_SCALE fractional digits
+  val minScaleValue = Math.min(scale, MINIMUM_ADJUSTED_SCALE)
+  val adjustedScale = Math.max(MAX_PRECISION - intDigits, 
minScaleValue)
--- End diff --

Sounds like `Math.min`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20035: [SPARK-22848][SQL] Eliminate mutable state from Stack

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20035
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85237/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   >