date:20171101

[GitHub] spark issue #19607: [WIP][SPARK-22395][SQL][PYTHON] Fix the behavior of time...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19607
  
**[Test build #83320 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83320/testReport)**
 for PR 19607 at commit 
[`1f096bf`](https://github.com/apache/spark/commit/1f096bf32f742945363cc7d9af978041ad77408b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMetrics

2017-11-01 Thread tengpeng

Github user tengpeng commented on the issue:

https://github.com/apache/spark/pull/19638
  
Would it be possible to add me to the white list for test? Thanks.

On Thu, Nov 2, 2017 at 12:17 AM UCB AMPLab  wrote:

> Can one of the admins verify this patch?
>
> â
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>
-- 
åèªç§»å¨ç Gmail



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19625: [SPARK-22407][WEB-UI] Add rdd id column on storage page ...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19625
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83316/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19625: [SPARK-22407][WEB-UI] Add rdd id column on storage page ...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19625
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19625: [SPARK-22407][WEB-UI] Add rdd id column on storage page ...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19625
  
**[Test build #83316 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83316/testReport)**
 for PR 19625 at commit 
[`2207dbe`](https://github.com/apache/spark/commit/2207dbed511bc9ca460d9794272f50a4b9ea7fe3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMetrics

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19638
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-01 Thread tengpeng

GitHub user tengpeng opened a pull request:

https://github.com/apache/spark/pull/19638

[SPARK-22422][ML] Add Adjusted R2 to RegressionMetrics

## What changes were proposed in this pull request?

I added adjusted R2 as a regression metric which was implemented in all 
major statistical analysis tools.

In practice, no one looks at R2 alone. The reason is R2 itself is 
misleading. If we add more parameters, R2 will not decrease but only increase 
(or stay the same). This leads to overfitting. Adjusted R2 addressed this issue 
by using number of parameters as "weight" for the sum of errors.


## How was this patch tested?

- Added a new unit test and passed.
- ./dev/run-tests all passed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tengpeng/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19638.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19638


commit adee7b418f9e9feb70ec9abfaba9ab34c789523b
Author: test 
Date:   2017-11-02T05:01:55Z

Implement Adjusted R2 with a new unit test

commit 692fcb3dd332c677d9dd4f75ebb3ed14db495d7c
Author: test 
Date:   2017-11-02T05:03:12Z

Merge branch 'master' of git://git.apache.org/spark




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `s...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19636
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `s...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19636
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83317/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `s...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19636
  
**[Test build #83317 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83317/testReport)**
 for PR 19636 at commit 
[`f157cfd`](https://github.com/apache/spark/commit/f157cfd79c655723c7da233d20c167c849c75080).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OrcOptions(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

2017-11-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19622
  
LGTM except a comment 
[here](https://github.com/apache/spark/pull/19622#discussion_r148441747)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19622: [SPARK-22306][SQL][2.2] alter table schema should...

2017-11-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19622#discussion_r148441747
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -295,7 +297,7 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 storage = table.storage.copy(
   locationUri = None,
   properties = storagePropsWithLocation),
-schema = table.partitionSchema,
+schema = StructType(EMPTY_DATA_SCHEMA ++ table.partitionSchema),
--- End diff --

I think this should be good for 2.3, but how about keeping it unchanged in 
2.2 for safety? I am just afraid this might break the other library who made an 
assumption in our metastore.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19637: [SPARK-22243][DStream]spark.yarn.jars should reload from...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19637
  
**[Test build #83319 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83319/testReport)**
 for PR 19637 at commit 
[`4a7d3d8`](https://github.com/apache/spark/commit/4a7d3d80dff14ba7bd9c71be1307f261051bed12).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19637: [SPARK-22243][DStream]spark.yarn.jars should reload from...

2017-11-01 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/19637
  
jenkins test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19607: [SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp...

2017-11-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19607
  
> I tried to find out a workaround for old Pandas, but I haven't done yet.

I haven't looked at this closely yet but will definitely try to take a look 
and help soon together. I would appreciate it if the problem (or just symptoms, 
or just a pointer ..) can be given though if it is not too complex.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19607: [SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp...

2017-11-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19607
  
Yea, that was my proposal. If anything is blocked by this, I think we 
should bump it up as, IMHO, technically the fixed version specification was not 
yet released and published.

^ cc @cloud-fan, @srowen and @viirya 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19607: [SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp...

2017-11-01 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19607
  
@BryanCutler I guess the oldest version of Pandas is `0.13.0` currently 
according to #18403, cc @HyukjinKwon. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

2017-11-01 Thread hhbyyh

Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/19599
  
I used two ways to switch String params among different options:
1. In NaiveBayes: convert StringParam and String constants to lowercase.
2. in LinearRegression: .equalsIgnoreCase

Currently most Spark algorithms use 1. 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should fi...

2017-11-01 Thread hhbyyh

Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/19565#discussion_r148438581
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -497,40 +481,46 @@ final class OnlineLDAOptimizer extends LDAOptimizer 
with Logging {
   (u._1, u._2, u._3 + v._3)
 }
 
-val (statsSum: BDM[Double], logphatOption: Option[BDV[Double]], 
nonEmptyDocsN: Long) = stats
-  .treeAggregate((BDM.zeros[Double](k, vocabSize), 
logphatPartOptionBase(), 0L))(
-elementWiseSum, elementWiseSum
-  )
+val (statsSum: BDM[Double], logphatOption: Option[BDV[Double]], 
batchSize: Long) =
+  batch.treeAggregate((BDM.zeros[Double](k, vocabSize), 
logphatPartOptionBase(), 0L))({
+case (acc, (_, termCounts)) =>
+  val stat = BDM.zeros[Double](k, vocabSize)
--- End diff --

This is a per-record operation. Will it consume more memory than 
mapPartitions ? especially when k and vocabSize are large.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should fi...

2017-11-01 Thread hhbyyh

Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/19565#discussion_r148438759
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -446,14 +445,14 @@ final class OnlineLDAOptimizer extends LDAOptimizer 
with Logging {
   override private[clustering] def next(): OnlineLDAOptimizer = {
 val batch = docs.sample(withReplacement = sampleWithReplacement, 
miniBatchFraction,
   randomGenerator.nextLong())
-if (batch.isEmpty()) return this
--- End diff --

If still you want to remove the line, would you please add a unit test to 
ensure `submitMiniBatch` can handle empty rdd? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should fi...

2017-11-01 Thread hhbyyh

Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/19565#discussion_r148437931
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -446,14 +445,14 @@ final class OnlineLDAOptimizer extends LDAOptimizer 
with Logging {
   override private[clustering] def next(): OnlineLDAOptimizer = {
 val batch = docs.sample(withReplacement = sampleWithReplacement, 
miniBatchFraction,
   randomGenerator.nextLong())
-if (batch.isEmpty()) return this
--- End diff --

I think isEmpty() is optimized to avoid a full materialization.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `s...

2017-11-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19636
  
Thank you for review, @jiangxb1987 .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17886
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17886
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83318/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17886
  
**[Test build #83318 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83318/testReport)**
 for PR 17886 at commit 
[`dfb1ee5`](https://github.com/apache/spark/commit/dfb1ee5fbf7469895f5f91fe9f9d63dc202ca1b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2017-11-01 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/14638
  
yes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19586: [SPARK-22367][WIP][CORE] Separate the serialization of c...

2017-11-01 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19586
  
I tend to agree with @cloud-fan , I think you can implement your own 
serializer out of Spark to be more specialized for your application, that will 
definitely be more efficient than the built-in one. But for the Spark's default 
solution, it should be general enough to cover all cases. Setting a flag or a 
configuration is not intuitive enough from my understanding.

And for ML, can you please provide an example about how this could be 
improved with your approach. From my understanding you approach is more useful 
when leverage custom class definition, like `Person` in your example. But for 
ML/SQL cases, all the types should be predefined or primitives, will that 
improved a lot?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17886
  
**[Test build #83318 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83318/testReport)**
 for PR 17886 at commit 
[`dfb1ee5`](https://github.com/apache/spark/commit/dfb1ee5fbf7469895f5f91fe9f9d63dc202ca1b5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--...

2017-11-01 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/17886
  
@gatorsmile master branch still has this issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19208: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-11-01 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19208#discussion_r148433035
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -117,6 +123,12 @@ class CrossValidator @Since("1.2.0") (@Since("1.4.0") 
override val uid: String)
 instr.logParams(numFolds, seed, parallelism)
 logTuningParams(instr)
 
+val collectSubModelsParam = $(collectSubModels)
+
+var subModels: Option[Array[Array[Model[_ = if 
(collectSubModelsParam) {
--- End diff --

so this var seems unnecessary, could we just it seems like we'd be better 
by just collecting modelFutures in copy values (then we can avoid the mutation 
on L145)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...

2017-11-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19370
  
@jsnowacki, sorry for being late again and again. Will probably take a 
final look once the comments are addressed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on...

2017-11-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19370#discussion_r148431331
  
--- Diff: bin/find-spark-home.cmd ---
@@ -0,0 +1,44 @@
+@echo off
+
+rem
+rem Licensed to the Apache Software Foundation (ASF) under one or more
+rem contributor license agreements.  See the NOTICE file distributed with
+rem this work for additional information regarding copyright ownership.
+rem The ASF licenses this file to You under the Apache License, Version 2.0
+rem (the "License"); you may not use this file except in compliance with
+rem the License.  You may obtain a copy of the License at
+rem
+remhttp://www.apache.org/licenses/LICENSE-2.0
+rem
+rem Unless required by applicable law or agreed to in writing, software
+rem distributed under the License is distributed on an "AS IS" BASIS,
+rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+rem See the License for the specific language governing permissions and
+rem limitations under the License.
+rem
+
+rem Path to Python script finding SPARK_HOME
+set FIND_SPARK_HOME_SCRIPT=%~dp0find_spark_home.py
+
+rem Default to standard python interpreter unless told otherwise
+set PYTHON_RUNNER=python
+if not "x%PYSPARK_DRIVER_PYTHON%" =="x" (
+  set PYTHON_RUNNER=%PYSPARK_DRIVER_PYTHON%
+)
+
+rem Only attempt to find SPARK_HOME if it is not set.
+if "x%SPARK_HOME%"=="x" (
+  rem We are pip installed, use the Python script to resolve a reasonable 
SPARK_HOME
+  if exist "%FIND_SPARK_HOME_SCRIPT%" (
+rem If there is no python installed it will fail with message:
+rem 'python' is not recognized as an internal or external command,
+for /f "delims=" %%i in ('%PYTHON_RUNNER% %FIND_SPARK_HOME_SCRIPT%') 
do set SPARK_HOME=%%i
--- End diff --

Hm.. actually, looks we deal with `PYSPARK_PYTHON`:


https://github.com/apache/spark/blob/a36a76ac43c36a3b897a748bd9f138b629dbc684/bin/find-spark-home#L38

```bash
$ PYSPARK_PYTHON=python3
$ echo ${PYSPARK_PYTHON:-"python"}
python3
```

```bash
$ echo ${PYSPARK_PYTHON:-"python"}
python
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on...

2017-11-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19370#discussion_r148432405
  
--- Diff: bin/find-spark-home.cmd ---
@@ -0,0 +1,44 @@
+@echo off
+
+rem
+rem Licensed to the Apache Software Foundation (ASF) under one or more
+rem contributor license agreements.  See the NOTICE file distributed with
+rem this work for additional information regarding copyright ownership.
+rem The ASF licenses this file to You under the Apache License, Version 2.0
+rem (the "License"); you may not use this file except in compliance with
+rem the License.  You may obtain a copy of the License at
+rem
+remhttp://www.apache.org/licenses/LICENSE-2.0
+rem
+rem Unless required by applicable law or agreed to in writing, software
+rem distributed under the License is distributed on an "AS IS" BASIS,
+rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+rem See the License for the specific language governing permissions and
+rem limitations under the License.
+rem
+
+rem Path to Python script finding SPARK_HOME
+set FIND_SPARK_HOME_SCRIPT=%~dp0find_spark_home.py
+
+rem Default to standard python interpreter unless told otherwise
+set PYTHON_RUNNER=python
+if not "x%PYSPARK_DRIVER_PYTHON%" =="x" (
+  set PYTHON_RUNNER=%PYSPARK_DRIVER_PYTHON%
+)
+
+rem Only attempt to find SPARK_HOME if it is not set.
+if "x%SPARK_HOME%"=="x" (
+  rem We are pip installed, use the Python script to resolve a reasonable 
SPARK_HOME
+  if exist "%FIND_SPARK_HOME_SCRIPT%" (
+rem If there is no python installed it will fail with message:
+rem 'python' is not recognized as an internal or external command,
+for /f "delims=" %%i in ('%PYTHON_RUNNER% %FIND_SPARK_HOME_SCRIPT%') 
do set SPARK_HOME=%%i
--- End diff --

`FIND_SPARK_HOME_SCRIPT` -> `FIND_SPARK_HOME_PYTHON_SCRIPT` to be 
consistent with:


https://github.com/apache/spark/blob/a36a76ac43c36a3b897a748bd9f138b629dbc684/bin/find-spark-home#L22


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on...

2017-11-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19370#discussion_r148432692
  
--- Diff: bin/find-spark-home.cmd ---
@@ -0,0 +1,44 @@
+@echo off
+
+rem
+rem Licensed to the Apache Software Foundation (ASF) under one or more
+rem contributor license agreements.  See the NOTICE file distributed with
+rem this work for additional information regarding copyright ownership.
+rem The ASF licenses this file to You under the Apache License, Version 2.0
+rem (the "License"); you may not use this file except in compliance with
+rem the License.  You may obtain a copy of the License at
+rem
+remhttp://www.apache.org/licenses/LICENSE-2.0
+rem
+rem Unless required by applicable law or agreed to in writing, software
+rem distributed under the License is distributed on an "AS IS" BASIS,
+rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+rem See the License for the specific language governing permissions and
+rem limitations under the License.
+rem
+
+rem Path to Python script finding SPARK_HOME
+set FIND_SPARK_HOME_SCRIPT=%~dp0find_spark_home.py
+
+rem Default to standard python interpreter unless told otherwise
+set PYTHON_RUNNER=python
+if not "x%PYSPARK_DRIVER_PYTHON%" =="x" (
+  set PYTHON_RUNNER=%PYSPARK_DRIVER_PYTHON%
+)
+
+rem Only attempt to find SPARK_HOME if it is not set.
+if "x%SPARK_HOME%"=="x" (
+  rem We are pip installed, use the Python script to resolve a reasonable 
SPARK_HOME
+  if exist "%FIND_SPARK_HOME_SCRIPT%" (
+rem If there is no python installed it will fail with message:
--- End diff --

Can we put the logic:


https://github.com/apache/spark/blob/a36a76ac43c36a3b897a748bd9f138b629dbc684/bin/find-spark-home#L37-L39

here like

```cmd
if "x%PYSPARK_DRIVER_PYTHON%" =="x" (
  if "x%PYSPARK_PYTHON%" =="x" (
set PYSPARK_DRIVER_PYTHON =%PYSPARK_PYTHON%
  ) else {
set PYSPARK_DRIVER_PYTHON =python
  }
)

...

for /f "delims=" %%i in ('%PYSPARK_DRIVER_PYTHON% 
%FIND_SPARK_HOME_SCRIPT%') do set SPARK_HOME=%%i
```

to be consistent?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19586: [SPARK-22367][WIP][CORE] Separate the serialization of c...

2017-11-01 Thread ConeyLiu

Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/19586
  
Hi @cloud-fan, for most case the data type should be same. So I think this 
optimization is valuable, because it can save the space and cpu resource 
considerable. What about setting a flag for the RDD, which indicates whether 
the RDD only has the same types. If it'st not valid, could we putting it to the 
ml package for special serializer, then user could configure it. But for this 
case, there must be provided the exactly classtag of the RDD for serialization 
due to the relocation of unsafeshufflewrite.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-01 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16578
  
Yeah, I think with a config for this optimization is good.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19632: Added description to python spark Pi example

2017-11-01 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/19632
  
Thanks for helping out with the Spark project, it's great to see folks 
looking to improve the examples :) I'm not sure the in-line comment adds much, 
but the docstring one looks like a good minor improvement :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19632: Added description to python spark Pi example

2017-11-01 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19632#discussion_r148430264
  
--- Diff: examples/src/main/python/pi.py ---
@@ -27,12 +27,16 @@
 if __name__ == "__main__":
 """
 Usage: pi [partitions]
+
+Monte Carlo method is used to estimate Pi in the below example.
 """
 spark = SparkSession\
 .builder\
 .appName("PythonPi")\
 .getOrCreate()
-
+
+# If no arguments are passed(i.e. `len(sys.argv) < = 1` ) 
--- End diff --

So I think we should expect folks to read the examples in addition to 
running them. That being said I don't think we need this comment specifically.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-11-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19439#discussion_r148429895
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala 
---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.image
+
+import java.awt.Color
+import java.awt.color.ColorSpace
+import java.io.ByteArrayInputStream
+import javax.imageio.ImageIO
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.input.PortableDataStream
+import org.apache.spark.sql.{DataFrame, Row, SparkSession}
+import org.apache.spark.sql.types._
+
+@Experimental
+@Since("2.3.0")
+object ImageSchema {
+
+  val undefinedImageType = "Undefined"
+
+  val imageFields: Array[String] = Array("origin", "height", "width", 
"nChannels", "mode", "data")
+
+  val ocvTypes: Map[String, Int] = Map(
+undefinedImageType -> -1,
+"CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
+  )
+
+  /**
+   * Used for conversion to python
+   */
+  val _ocvTypes: java.util.Map[String, Int] = ocvTypes.asJava
+
+  /**
+   * Schema for the image column: Row(String, Int, Int, Int, Int, 
Array[Byte])
+   */
+  val columnSchema = StructType(
+StructField(imageFields(0), StringType, true) ::
+StructField(imageFields(1), IntegerType, false) ::
+StructField(imageFields(2), IntegerType, false) ::
+StructField(imageFields(3), IntegerType, false) ::
+// OpenCV-compatible type: CV_8UC3 in most cases
+StructField(imageFields(4), IntegerType, false) ::
+// Bytes in OpenCV-compatible order: row-wise BGR in most cases
+StructField(imageFields(5), BinaryType, false) :: Nil)
+
+  /**
+   * DataFrame with a single column of images named "image" (nullable)
+   */
+  val imageSchema = StructType(StructField("image", columnSchema, true) :: 
Nil)
+
+  /**
+   * :: Experimental ::
+   * Gets the origin of the image
+   *
+   * @return The origin of the image
+   */
+  def getOrigin(row: Row): String = row.getString(0)
+
+  /**
+   * :: Experimental ::
+   * Gets the height of the image
+   *
+   * @return The height of the image
+   */
+  def getHeight(row: Row): Int = row.getInt(1)
+
+  /**
+   * :: Experimental ::
+   * Gets the width of the image
+   *
+   * @return The width of the image
+   */
+  def getWidth(row: Row): Int = row.getInt(2)
+
+  /**
+   * :: Experimental ::
+   * Gets the number of channels in the image
+   *
+   * @return The number of channels in the image
+   */
+  def getNChannels(row: Row): Int = row.getInt(3)
+
+  /**
+   * :: Experimental ::
+   * Gets the OpenCV representation as an int
+   *
+   * @return The OpenCV representation as an int
+   */
+  def getMode(row: Row): Int = row.getInt(4)
+
+  /**
+   * :: Experimental ::
+   * Gets the image data
+   *
+   * @return The image data
+   */
+  def getData(row: Row): Array[Byte] = row.getAs[Array[Byte]](5)
+
+  /**
+   * Default values for the invalid image
+   *
+   * @param origin Origin of the invalid image
+   * @return Row with the default values
+   */
+  private def invalidImageRow(origin: String): Row =
+Row(Row(origin, -1, -1, -1, ocvTypes(undefinedImageType), 
Array.ofDim[Byte](0)))
+
+  /**
+   * Convert the compressed image (jpeg, png, etc.) into OpenCV
+   * representation and store it in DataFrame Row
+   *
+   * @param origin Arbitrary string that identifies the image
+   * @param bytes Image bytes (for example, jpeg)
+   * @return DataFrame Row or None (if the decompression fails)
+   */
+  private[spark] def decode(origin: String, bytes: Array[Byte]): 
Option[Row] = {
+

[GitHub] spark issue #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `s...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19636
  
**[Test build #83317 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83317/testReport)**
 for PR 19636 at commit 
[`f157cfd`](https://github.com/apache/spark/commit/f157cfd79c655723c7da233d20c167c849c75080).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `s...

2017-11-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19636
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19625: [SPARK-22407][WEB-UI] Add rdd id column on storage page ...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19625
  
**[Test build #83316 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83316/testReport)**
 for PR 19625 at commit 
[`2207dbe`](https://github.com/apache/spark/commit/2207dbed511bc9ca460d9794272f50a4b9ea7fe3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19625: [SPARK-22407][WEB-UI] Add rdd id column on storag...

2017-11-01 Thread caneGuy

Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19625#discussion_r148427237
  
--- Diff: core/src/main/scala/org/apache/spark/ui/storage/StoragePage.scala 
---
@@ -49,6 +49,7 @@ private[ui] class StoragePage(parent: StorageTab) extends 
WebUIPage("") {
 
   /** Header fields for the RDD table */
   private val rddHeader = Seq(
+"RDD ID",
--- End diff --

Thanks @srowen .I will update right now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `s...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19636
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83315/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `s...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19636
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `s...

2017-11-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19636
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `s...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19636
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83314/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `s...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19636
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-01 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/19439
  
Will do as soon as I can!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19439
  
To me, I have two concern. One is Python API shape, imatiach-msft#1 and 
Java API support related with `Map` - 
https://github.com/apache/spark/pull/19439#discussion_r148289879 at high level.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19625: [SPARK-22407][WEB-UI] Add rdd id column on storage page ...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19625
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83313/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19625: [SPARK-22407][WEB-UI] Add rdd id column on storage page ...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19625
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19439
  
@jkbradley, BTW, mind checking the API structure please? I reviewed this to 
be consistent with other components and codes at my best but, to be honest, my 
ML knowledge and familiarity are limited. I can help shape it nicer but want 
someone confident to check the overall API shape.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19637: [SPARK-22243][DStream]spark.yarn.jars should reload from...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19637
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive...

2017-11-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19636#discussion_r148423575
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOptions.scala
 ---
@@ -27,7 +27,7 @@ import org.apache.spark.sql.internal.SQLConf
 /**
  * Options for the ORC data source.
  */
-private[orc] class OrcOptions(
+private[sql] class OrcOptions(
--- End diff --

Thank you, @HyukjinKwon !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-11-01 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/18538
  
@yanboliang @mgaido91  I just saw this PR.  It creates a new test data 
directory.  Could you please send a quite update to move the data to the 
existing data directory: https://github.com/apache/spark/tree/master/data/mllib 
?  Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-01 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/19439
  
Quick comment: I see that data are being added under 
mllib/src/test/resources/  That appears to be a new directory, created 
recently.  The standard directory is 
https://github.com/apache/spark/tree/master/data/mllib --- could you please put 
the images there instead?  I'll ping on the PR which introduced the new data 
directory to see about correcting it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19637: [SPARK-22243][DStream]spark.yarn.jars should reload from...

2017-11-01 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/19637
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-01 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16578
  
Thanks @CodingCat

+1 on config switch. I think that would be a good idea.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-01 Thread pralabhkumar

Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@sethah please find some time to look into the changes .  

Please  let me know if further changes are required.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19637: [SPARK-22243][DStream]spark.yarn.jars should reload from...

2017-11-01 Thread ChenjunZou

Github user ChenjunZou commented on the issue:

https://github.com/apache/spark/pull/19637
  
add spark.yarn.jars to the checkpoint reload configs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19637: [SPARK-22243][DStream]spark.yarn.jars should relo...

2017-11-01 Thread ChenjunZou

GitHub user ChenjunZou opened a pull request:

https://github.com/apache/spark/pull/19637

[SPARK-22243][DStream]spark.yarn.jars should reload from config when 
checkpoint recovery

the previous pr branch is deleted by mistake


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ChenjunZou/spark checkpoint-yarn-jars

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19637.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19637


commit 4a7d3d80dff14ba7bd9c71be1307f261051bed12
Author: ZouChenjun 
Date:   2017-10-10T12:34:07Z

set spark.yarn.jars reload from config




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19625: [SPARK-22407][WEB-UI] Add rdd id column on storage page ...

2017-11-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19625
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive...

2017-11-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19636#discussion_r148420534
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOptions.scala
 ---
@@ -27,7 +27,7 @@ import org.apache.spark.sql.internal.SQLConf
 /**
  * Options for the ORC data source.
  */
-private[orc] class OrcOptions(
+private[sql] class OrcOptions(
--- End diff --

I believe `private[sql]` can be removed per 
[SPARK-16964](https://issues.apache.org/jira/browse/SPARK-16964).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19628: [MINOR][DOC] automatic type inference supports also Date...

2017-11-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19628
  
BTW, the test was passed for 5e1bbf0. It triggered again against the same 
commit after adding it to whitelist somehow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19628: [MINOR][DOC] automatic type inference supports al...

2017-11-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19628


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19628: [MINOR][DOC] automatic type inference supports also Date...

2017-11-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19628
  
Merged to mater, branch-2.2 and branch-2.1.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19628: [MINOR][DOC] automatic type inference supports also Date...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19628
  
**[Test build #83312 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83312/testReport)**
 for PR 19628 at commit 
[`5e1bbf0`](https://github.com/apache/spark/commit/5e1bbf04b01451ad504997a2751deafe55587a74).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19628: [MINOR][DOC] automatic type inference supports also Date...

2017-11-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19628
  
add to whitelist


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19628: [MINOR][DOC] automatic type inference supports also Date...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19628
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19628: [MINOR][DOC] automatic type inference supports also Date...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19628
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83311/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19628: [MINOR][DOC] automatic type inference supports also Date...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19628
  
**[Test build #83311 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83311/testReport)**
 for PR 19628 at commit 
[`5e1bbf0`](https://github.com/apache/spark/commit/5e1bbf04b01451ad504997a2751deafe55587a74).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19628: [MINOR][DOC] automatic type inference supports also Date...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19628
  
**[Test build #83311 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83311/testReport)**
 for PR 19628 at commit 
[`5e1bbf0`](https://github.com/apache/spark/commit/5e1bbf04b01451ad504997a2751deafe55587a74).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19628: [MINOR][DOC] automatic type inference supports also Date...

2017-11-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19628
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19208
  
**[Test build #3972 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3972/testReport)**
 for PR 19208 at commit 
[`e009ee1`](https://github.com/apache/spark/commit/e009ee1145930a02c71db85c967a49f9fd7509e5).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19433
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83310/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19433
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19468
  
**[Test build #83309 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83309/testReport)**
 for PR 19468 at commit 
[`4b32134`](https://github.com/apache/spark/commit/4b3213422e6e67b11de7b627ad46d4031043be0e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...

2017-11-01 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/19459
  
I made [SPARK-22417](https://issues.apache.org/jira/browse/SPARK-22417) for 
fixing reading from timestamps without arrow


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19439
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83303/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19439
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19439
  
**[Test build #83303 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83303/testReport)**
 for PR 19439 at commit 
[`84d9177`](https://github.com/apache/spark/commit/84d9177b16267fad5564d71981e84685cc08cf6f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-11-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19439#discussion_r148405401
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala 
---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.image
+
+import java.awt.Color
+import java.awt.color.ColorSpace
+import java.io.ByteArrayInputStream
+import javax.imageio.ImageIO
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.input.PortableDataStream
+import org.apache.spark.sql.{DataFrame, Row, SparkSession}
+import org.apache.spark.sql.types._
+
+@Experimental
+@Since("2.3.0")
+object ImageSchema {
+
+  val undefinedImageType = "Undefined"
+
+  val imageFields: Array[String] = Array("origin", "height", "width", 
"nChannels", "mode", "data")
+
+  val ocvTypes: Map[String, Int] = Map(
--- End diff --

Does anyome have an idea about this please maybe?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-11-01 Thread dakirsa

Github user dakirsa commented on a diff in the pull request:

https://github.com/apache/spark/pull/19439#discussion_r148404466
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala 
---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.image
+
+import java.awt.Color
+import java.awt.color.ColorSpace
+import java.io.ByteArrayInputStream
+import javax.imageio.ImageIO
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.input.PortableDataStream
+import org.apache.spark.sql.{DataFrame, Row, SparkSession}
+import org.apache.spark.sql.types._
+
+@Experimental
+@Since("2.3.0")
+object ImageSchema {
+
+  val undefinedImageType = "Undefined"
+
+  val imageFields: Array[String] = Array("origin", "height", "width", 
"nChannels", "mode", "data")
+
+  val ocvTypes: Map[String, Int] = Map(
+undefinedImageType -> -1,
+"CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
+  )
+
+  /**
+   * Used for conversion to python
+   */
+  val _ocvTypes: java.util.Map[String, Int] = ocvTypes.asJava
+
+  /**
+   * Schema for the image column: Row(String, Int, Int, Int, Int, 
Array[Byte])
+   */
+  val columnSchema = StructType(
+StructField(imageFields(0), StringType, true) ::
+StructField(imageFields(1), IntegerType, false) ::
+StructField(imageFields(2), IntegerType, false) ::
+StructField(imageFields(3), IntegerType, false) ::
+// OpenCV-compatible type: CV_8UC3 in most cases
+StructField(imageFields(4), IntegerType, false) ::
+// Bytes in OpenCV-compatible order: row-wise BGR in most cases
+StructField(imageFields(5), BinaryType, false) :: Nil)
+
+  /**
+   * DataFrame with a single column of images named "image" (nullable)
+   */
+  val imageSchema = StructType(StructField("image", columnSchema, true) :: 
Nil)
+
+  /**
+   * :: Experimental ::
+   * Gets the origin of the image
+   *
+   * @return The origin of the image
+   */
+  def getOrigin(row: Row): String = row.getString(0)
--- End diff --

I can only echo the discussion you point out -- these are convenience 
function that allow the user not to care about indexing into schema (which is a 
common source of mistakes, in my experience). We might consider adding them to 
Python API too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19635: [SPARK-22413][SQL] Type coercion for IN is not coherent ...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19635
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83304/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19635: [SPARK-22413][SQL] Type coercion for IN is not coherent ...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19635
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19635: [SPARK-22413][SQL] Type coercion for IN is not coherent ...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19635
  
**[Test build #83304 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83304/testReport)**
 for PR 19635 at commit 
[`8fb9c9d`](https://github.com/apache/spark/commit/8fb9c9d423097a706187cf3484c74186d3490ead).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19623: [SPARK-22078][SQL] clarify exception behaviors for all d...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19623
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83300/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19623: [SPARK-22078][SQL] clarify exception behaviors for all d...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19623
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19623: [SPARK-22078][SQL] clarify exception behaviors for all d...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19623
  
**[Test build #83300 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83300/testReport)**
 for PR 19623 at commit 
[`db45129`](https://github.com/apache/spark/commit/db45129d621f788136e88bfd645f658bee2afd2c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `s...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19636
  
**[Test build #83308 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83308/testReport)**
 for PR 19636 at commit 
[`a2ae1ad`](https://github.com/apache/spark/commit/a2ae1ad0c54a699b362f9c2dcf3d46fe8067c8b0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `s...

2017-11-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19636
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `s...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19636
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `s...

2017-11-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19636
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83307/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19629: [SPARK-22408][SQL] RelationalGroupedDataset's distinct p...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19629
  
**[Test build #3971 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3971/testReport)**
 for PR 19629 at commit 
[`aa809e3`](https://github.com/apache/spark/commit/aa809e39baf222e698315a5efb2d583cab99aad7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `s...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19636
  
**[Test build #83306 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83306/testReport)**
 for PR 19636 at commit 
[`c9ca3b6`](https://github.com/apache/spark/commit/c9ca3b665127eea105b277d36cf9f064216027e5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19636: [SPARK-22416][SQL] Move OrcOptions from `sql/hive...

2017-11-01 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/19636

[SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `sql/core`

## What changes were proposed in this pull request?

According to the 
[discussion|https://github.com/apache/spark/pull/19571#issuecomment-339472976] 
on SPARK-15474, we will add new OrcFileFormat in `sql/core` module and allow 
users to use both old and new OrcFileFormat.

To do that, `OrcOptions` should be visible like `private[sql]` in 
`sql/core` module, too. Previously, it was `private[orc]` in `sql/hive`.

## How was this patch tested?

Pass the Jenkins with the existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-22416

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19636.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19636


commit c9ca3b665127eea105b277d36cf9f064216027e5
Author: Dongjoon Hyun 
Date:   2017-11-01T21:36:40Z

[SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `sql/core`




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19631: [SPARK-22372][core, yarn] Make cluster submission use Sp...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19631
  
**[Test build #83305 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83305/testReport)**
 for PR 19631 at commit 
[`cee6be2`](https://github.com/apache/spark/commit/cee6be231310b89d35d5b419bd30153f29dd4cb9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19631: [SPARK-22372][core, yarn] Make cluster submission use Sp...

2017-11-01 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19631
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19635: [SPARK-22413][SQL] Type coercion for IN is not coherent ...

2017-11-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19635
  
**[Test build #83304 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83304/testReport)**
 for PR 19635 at commit 
[`8fb9c9d`](https://github.com/apache/spark/commit/8fb9c9d423097a706187cf3484c74186d3490ead).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 279 matches

Mail list logo