date:20170904

[GitHub] spark pull request #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.

2017-09-04 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19116#discussion_r136899937
  
--- Diff: 
repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala ---
@@ -19,7 +19,9 @@ package org.apache.spark.repl
 
 import java.io.BufferedReader
 
+// scalastyle:off println
 import scala.Predef.{println => _, _}
+// scalastyle:on println
--- End diff --

I said it's weird because this obviously not a place to print out 
something. Not much harm actually.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19127: [SPARK-21916][SQL] Set isolationOn=true when create hive...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19127
  
**[Test build #81396 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81396/testReport)**
 for PR 19127 at commit 
[`2d13ab8`](https://github.com/apache/spark/commit/2d13ab8a18955e281033c17a446022aba57865f8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.

2017-09-04 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19116#discussion_r136898192
  
--- Diff: 
repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala ---
@@ -19,7 +19,9 @@ package org.apache.spark.repl
 
 import java.io.BufferedReader
 
+// scalastyle:off println
 import scala.Predef.{println => _, _}
+// scalastyle:on println
--- End diff --

This actually looks valid though. If I manually add ` import 
scala.Predef.{println => _, _}` somewhere not here, for example, `SQLConf` in 
the current master:

```
[error] 
.../spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:26:21:
 Are you sure you want to println? If yes, wrap the code block with
[error]   // scalastyle:off println
[error]   println(...)
[error]   // scalastyle:on println
[error] 
.../spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:26:0:
 scala.Predef. is in wrong order relative to scala.collection.immutable.
```

It looks recognising this as an error. Looks 1.0.0 fixes an issue about 
those style checking and detection. We might have to fix `println` token 
checker rule but I guess this should be orthogonal anyway.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19127: [SPARK-21916][SQL] Set isolationOn=true when crea...

2017-09-04 Thread jinxing64

GitHub user jinxing64 opened a pull request:

https://github.com/apache/spark/pull/19127

[SPARK-21916][SQL] Set isolationOn=true when create hive client for 
metadata.

## What changes were proposed in this pull request?

In current code, we set `isolationOn=!isCliSession()` when create hive 
client for metadata. However conf of `CliSessionState` points to local dummy 
metastore(https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L416).
 Using `CliSessionState`, we fail to get metadata from remote hive metastore. 
We can always set `isolationOn=true` when create hive clietnt for metadata

## How was this patch tested?

Existing.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinxing64/spark SPARK-21916

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19127.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19127


commit 2d13ab8a18955e281033c17a446022aba57865f8
Author: jinxing 
Date:   2017-09-05T05:28:06Z

Set isolationOn=true when create hive client for metadata.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19111
  
**[Test build #81395 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81395/testReport)**
 for PR 19111 at commit 
[`5d156be`](https://github.com/apache/spark/commit/5d156be92fd3cfe8af30094fd759909ce5455d8f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...

2017-09-04 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19111
  
jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.

2017-09-04 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19116#discussion_r136894278
  
--- Diff: 
repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala ---
@@ -19,7 +19,9 @@ package org.apache.spark.repl
 
 import java.io.BufferedReader
 
+// scalastyle:off println
 import scala.Predef.{println => _, _}
+// scalastyle:on println
--- End diff --

Nit: This looks a bit weird.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17014
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17014
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81393/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17014
  
**[Test build #81393 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81393/testReport)**
 for PR 17014 at commit 
[`f8fa957`](https://github.com/apache/spark/commit/f8fa9573a1b40ff236e9c52cf429e2742c8f2bd0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19112: [SPARK-21901][SS] Define toString for StateOperat...

2017-09-04 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/19112#discussion_r136892336
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
@@ -200,7 +202,7 @@ class SourceProgress protected[sql](
  */
 @InterfaceStability.Evolving
 class SinkProgress protected[sql](
-val description: String) extends Serializable {
--- End diff --

not a committer but would like to leave this suggestion : 
- codestyle changes are orthogonal to the motive of the PR and should be 
done separately. Generally, every PR should address one problem and not have 
changes unrelated to it. In event of revert or bisecting commits to pin-point 
regression, following this practice helps a lot.
- It would be beneficial to see why checkstyle does not catch such 
instances and fix that (along with making all such instances consistent with 
the rules). Otherwise this would be a one off fix and we would continue to pile 
up similar inconsistencies in future development without anyone realising this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19126: Model 1 and Model 2 ParamMaps Missing

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19126
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC datasource table should ...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19124
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC datasource table should ...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19124
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81394/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC datasource table should ...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19124
  
**[Test build #81394 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81394/testReport)**
 for PR 19124 at commit 
[`a738943`](https://github.com/apache/spark/commit/a73894374d284484d9b28123db02dfe6f264567a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19126: Model 1 and Model 2 ParamMaps Missing

2017-09-04 Thread marktab

GitHub user marktab opened a pull request:

https://github.com/apache/spark/pull/19126

Model 1 and Model 2 ParamMaps Missing

The original Scala code says
println("Model 2 was fit using parameters: " + 
model2.parent.extractParamMap)

The parent is lr

There is no method for accessing parent as is done in Scala.

This code has been tested in Python, and returns values consistent with 
Scala

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marktab/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19126.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19126


commit 76e5da7b14d71338cf82352e9cf5628640e732a2
Author: MarkTab marktab.net 
Date:   2017-09-05T03:26:07Z

Model 1 and Model 2 ParamMaps Missing

The original Scala code says
println("Model 2 was fit using parameters: " + 
model2.parent.extractParamMap)

The parent is lr

There is no method for accessing parent as is done in Scala.

This code has been tested in Python, and returns values consistent with 
Scala




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19121: [SPARK-21906][YARN][Spark Core]Don't runAsSparkUser to s...

2017-09-04 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19121
  
UGI is only used for security, normally it is used for Spark application to 
communicate with Hadoop using correct user.

doAs already wraps the whole `CoarseGrainedExecutorBackend` process, all 
the task threads forked in this process will honor this UGI, don't need to wrap 
again on each task.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19121: [SPARK-21906][YARN][Spark Core]Don't runAsSparkUser to s...

2017-09-04 Thread yaooqinn

Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/19121
  
@jerryshao 
1. I didn't meet any problems, these codes are ok to run even if it is 
unnecessary.
2. In Standalone mode, if collaborating with a secured hdfs, we might 
haven't support yet. Besidesï¼this ugi `doAs` wraps executors' initialization 
but not tasks running, if we truly want to `doAs` a `SPARK_USER`, this ugi may 
be used in both phases.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19125: [SPARK-21913][SQL][TEST] `withDatabase` should drop data...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19125
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81392/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19125: [SPARK-21913][SQL][TEST] `withDatabase` should drop data...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19125
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19125: [SPARK-21913][SQL][TEST] `withDatabase` should drop data...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19125
  
**[Test build #81392 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81392/testReport)**
 for PR 19125 at commit 
[`241d565`](https://github.com/apache/spark/commit/241d56563ed278828567eb8f78029a8e70e96c5d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC datasource table should ...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19124
  
**[Test build #81394 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81394/testReport)**
 for PR 19124 at commit 
[`a738943`](https://github.com/apache/spark/commit/a73894374d284484d9b28123db02dfe6f264567a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19121: [SPARK-21906][YARN][Spark Core]Don't runAsSparkUser to s...

2017-09-04 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19121
  
Can you please elaborate the problem you met, did you meet any unexpected 
behavior?

The changes here get rid of env variable "SPARK_USER", this might be OK for 
yarn application, but what if user runs on standalone mode and explicitly set 
this "SPARK_USER", your changes seems break the semantics.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17014
  
**[Test build #81393 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81393/testReport)**
 for PR 17014 at commit 
[`f8fa957`](https://github.com/apache/spark/commit/f8fa9573a1b40ff236e9c52cf429e2742c8f2bd0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-09-04 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/17014
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13794: [SPARK-15574][ML][PySpark] Python meta-algorithms in Sca...

2017-09-04 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/13794
  
+1 @jkbradley For now it is better to keep the current implementation for 
the 4 meta-algo in pyspark.
@yinxusen Would you mind to close this PR ? But I still appreciate your 
contribution for this!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] Creating ORC datasource table ...

2017-09-04 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r136878102
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
@@ -169,6 +171,16 @@ class OrcFileFormat extends FileFormat with 
DataSourceRegister with Serializable
   }
 }
   }
+
+  private def checkFieldName(name: String): Unit = {
+// ,;{}()\n\t= and space are special characters in ORC schema
--- End diff --

Thank you for review, @tejasapatil !
That's a good idea. Right, It's not an exhaustive list. I'll update the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] Creating ORC datasource table ...

2017-09-04 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r136877087
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
@@ -169,6 +171,16 @@ class OrcFileFormat extends FileFormat with 
DataSourceRegister with Serializable
   }
 }
   }
+
+  private def checkFieldName(name: String): Unit = {
+// ,;{}()\n\t= and space are special characters in ORC schema
--- End diff --

Is this exhaustive list ? eg. looks like `?` is not allowed either. Given 
that the underlying lib (ORC) can evolve to support / not support certain 
chars, its safer to reply on some method rather than coming up with a 
blacklist. Can you simply call `TypeInfoUtils.getTypeInfoFromTypeString` or any 
related method which would do this check ?

```
Caused by: java.lang.IllegalArgumentException: Error: : expected at the 
position 8 of 'struct' but '?' is found.
  at 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:360)
  at 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331)
  at 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:483)
  at 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305)
  at 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfoFromTypeString(TypeInfoUtils.java:770)
  at 
org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:194)
  at 
org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:231)
  at 
org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:91)
...
...
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19125: [SPARK-21913][SQL][TEST] `withDatabase` should drop data...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19125
  
**[Test build #81392 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81392/testReport)**
 for PR 19125 at commit 
[`241d565`](https://github.com/apache/spark/commit/241d56563ed278828567eb8f78029a8e70e96c5d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19125: [SPARK-21913][SQL][TEST] `withDatabase` should dr...

2017-09-04 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/19125

[SPARK-21913][SQL][TEST] `withDatabase` should drop database with CASCADE

## What changes were proposed in this pull request?

Currently, `withDatabase` fails if the database is not empty. It would be 
great if we drop cleanly with CASCADE.

## How was this patch tested?

This is a change on test util. Pass the existing Jenkins.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-21913

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19125.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19125


commit 241d56563ed278828567eb8f78029a8e70e96c5d
Author: Dongjoon Hyun 
Date:   2017-09-04T23:23:37Z

[SPARK-21913][SQL][TEST] `withDatabase` should drop database with CASCADE




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81390/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #81390 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81390/testReport)**
 for PR 18692 at commit 
[`cfeae46`](https://github.com/apache/spark/commit/cfeae46766a6ccb1b1a0113fe41cdb52b16897f3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC datasource table should ...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19124
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC datasource table should ...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19124
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81391/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC datasource table should ...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19124
  
**[Test build #81391 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81391/testReport)**
 for PR 19124 at commit 
[`808dfe0`](https://github.com/apache/spark/commit/808dfe0fcd9de2f43b33f0d1d084172b5624f2a8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19123: [SPARK-21418][SQL] NoSuchElementException: None.g...

2017-09-04 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19123


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19119: [SPARK-21845] [SQL] Make codegen fallback of expressions...

2017-09-04 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19119
  
Hi, @gatorsmile . 
Could you trigger Maven build, too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19123: [SPARK-21418][SQL] NoSuchElementException: None.get in D...

2017-09-04 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/19123
  
LGTM, merging to master/2.2. Thanks for picking this up!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19123: [SPARK-21418][SQL] NoSuchElementException: None.get in D...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19123
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81388/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19123: [SPARK-21418][SQL] NoSuchElementException: None.get in D...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19123
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC datasource table should ...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19124
  
**[Test build #81391 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81391/testReport)**
 for PR 19124 at commit 
[`808dfe0`](https://github.com/apache/spark/commit/808dfe0fcd9de2f43b33f0d1d084172b5624f2a8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19123: [SPARK-21418][SQL] NoSuchElementException: None.get in D...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19123
  
**[Test build #81388 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81388/testReport)**
 for PR 19123 at commit 
[`735ca94`](https://github.com/apache/spark/commit/735ca949e042493632d297db23286a8f8f83a6ed).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19124: [SPARK-21912][SQL] Creating ORC datasource table ...

2017-09-04 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/19124

[SPARK-21912][SQL] Creating ORC datasource table should check invalid 
column names

## What changes were proposed in this pull request?

Currently, users meet job abortions while creating ORC data source tables 
with invalid column names. We had better prevent this by raising 
**AnalysisException** with a guide to use aliases instead like Paquet data 
source tables.

**BEFORE**
```scala
scala> sql("CREATE TABLE orc1 USING ORC AS SELECT 1 `a b`")
17/09/04 13:28:21 ERROR Utils: Aborting task
java.lang.IllegalArgumentException: Error: : expected at the position 8 of 
'struct' but ' ' is found.
17/09/04 13:28:21 ERROR FileFormatWriter: Job job_20170904132821_0001 
aborted.
17/09/04 13:28:21 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
org.apache.spark.SparkException: Task failed while writing rows.
```

**AFTER**
```scala
scala> sql("CREATE TABLE orc1 USING ORC AS SELECT 1 `a b`")
17/09/04 13:27:40 ERROR CreateDataSourceTableAsSelectCommand: Failed to 
write to table orc1
org.apache.spark.sql.AnalysisException: Attribute name "a b" contains 
invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;
```

## How was this patch tested?

Pass the Jenkins with a new test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-21912

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19124.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19124


commit 808dfe0fcd9de2f43b33f0d1d084172b5624f2a8
Author: Dongjoon Hyun 
Date:   2017-09-04T20:46:15Z

[SPARK-21912][SQL] Creating ORC datasource table should check invalid 
column names




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-09-04 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/18692#discussion_r136868330
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala 
---
@@ -152,3 +152,71 @@ object EliminateOuterJoin extends Rule[LogicalPlan] 
with PredicateHelper {
   if (j.joinType == newJoinType) f else Filter(condition, 
j.copy(joinType = newJoinType))
   }
 }
+
+/**
+ * A rule that uses propagated constraints to infer join conditions. The 
optimization is applicable
+ * only to CROSS joins.
--- End diff --

Can you also mention the reason why we are restricting this to cross joins 
only ?

```
For other join types, adding inferred join conditions would potentially 
shuffle children as child node's partitioning won't satisfying the JOIN node's 
requirements which otherwise could have.
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16774: [SPARK-19357][ML] Adding parallel model evaluatio...

2017-09-04 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16774#discussion_r136868226
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -100,31 +113,53 @@ class CrossValidator @Since("1.2.0") (@Since("1.4.0") 
override val uid: String)
 val eval = $(evaluator)
 val epm = $(estimatorParamMaps)
 val numModels = epm.length
-val metrics = new Array[Double](epm.length)
+
+// Create execution context based on $(parallelism)
+val executionContext = getExecutionContext
--- End diff --

In the corresponding PR for PySpark implementation the number of threads is 
limited by the number of models to be trained 
(https://github.com/WeichenXu123/spark/blob/be2f3d0ec50db4730c9e3f9a813a4eb96889f5b6/python/pyspark/ml/tuning.py#L261).
 We might do that for instance by overriding the `getParallelism` method. What 
do you think about this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #81390 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81390/testReport)**
 for PR 18692 at commit 
[`cfeae46`](https://github.com/apache/spark/commit/cfeae46766a6ccb1b1a0113fe41cdb52b16897f3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19115: [SPARK-21882][CORE] OutputMetrics doesn't count written ...

2017-09-04 Thread markhamstra

Github user markhamstra commented on the issue:

https://github.com/apache/spark/pull/19115
  
And now I see that the title was changed to something more useful. Pardon 
any offense, the end result of the title changes look good.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19115: [SPARK-21882][CORE] OutputMetrics doesn't count written ...

2017-09-04 Thread markhamstra

Github user markhamstra commented on the issue:

https://github.com/apache/spark/pull/19115
  
I realize this PR is now closed, but to follow-up on Saisai's request 
concerning PR titles, I'll also note that the title of this PR isn't very 
useful even after the JIRA id and component tag are added. Titles like "fixed 
foo" or "updated bar" don't really tell reviewers or those looking at the 
commit logs in the future what the PR is about. The JIRA should tell us _why_ a 
change or addition is needed, the description in the PR should tell us _what_ 
was changed or added, and the PR title should give us enough of an idea of what 
is going on that we don't necessarily have to open the PR or look at the code 
changes just to see whether it is something that we are even at all interested 
in.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19111
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81389/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19111
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19111
  
**[Test build #81389 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81389/testReport)**
 for PR 19111 at commit 
[`5d156be`](https://github.com/apache/spark/commit/5d156be92fd3cfe8af30094fd759909ce5455d8f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-04 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/18975
  
@gatorsmile : Yes. Hive is not 100% atomic as stuff can go wrong between 
removing old data and renaming staging location. But its superior in these 
regards:

- Hive would output "no data" OR "complete data". Here we can have "no 
data" OR "incomplete data" OR "complete data". The "incomplete data" part 
worries me. Staging dir helps achieving "you either see nothing OR everything" 
behaviour.
- The window of "you see nothing" is much bigger here compared to Hive as 
the output location is cleaned up before execution.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19111
  
**[Test build #81389 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81389/testReport)**
 for PR 19111 at commit 
[`5d156be`](https://github.com/apache/spark/commit/5d156be92fd3cfe8af30094fd759909ce5455d8f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...

2017-09-04 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19111
  
jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...

2017-09-04 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19111
  
+ @shivaram could you do a quick review? given this change I'd love to get 
some feedback


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...

2017-09-04 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19111
  
Yes, the issue is with random sampling, and this PR should fix all of these.
I'm not sure why I haven't seen them much before - they have been around 
for years - appreciate bringing these up, we should track them with JIRA.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...

2017-09-04 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19111
  
jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19123: [SPARK-21418][SQL] NoSuchElementException: None.get in D...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19123
  
**[Test build #81388 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81388/testReport)**
 for PR 19123 at commit 
[`735ca94`](https://github.com/apache/spark/commit/735ca949e042493632d297db23286a8f8f83a6ed).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19123: [SPARK-21418][SQL] NoSuchElementException: None.g...

2017-09-04 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/19123

[SPARK-21418][SQL] NoSuchElementException: None.get in DataSourceScanExec 
with sun.io.serialization.extendedDebugInfo=true

## What changes were proposed in this pull request?

If no SparkConf is available to Utils.redact, simply don't redact.

## How was this patch tested?

Existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-21418

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19123.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19123


commit 735ca949e042493632d297db23286a8f8f83a6ed
Author: Sean Owen 
Date:   2017-09-04T17:32:00Z

Don't fail with NPE in corner case where Utils.redact happens outside 
active session




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19122
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19122
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81387/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19122
  
**[Test build #81387 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81387/testReport)**
 for PR 19122 at commit 
[`be2f3d0`](https://github.com/apache/spark/commit/be2f3d0ec50db4730c9e3f9a813a4eb96889f5b6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...

2017-09-04 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19108
  
cc @yanboliang Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19122
  
**[Test build #81387 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81387/testReport)**
 for PR 19122 at commit 
[`be2f3d0`](https://github.com/apache/spark/commit/be2f3d0ec50db4730c9e3f9a813a4eb96889f5b6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19122
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81386/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19122
  
**[Test build #81386 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81386/testReport)**
 for PR 19122 at commit 
[`57cf534`](https://github.com/apache/spark/commit/57cf53473e5bfb75095b0e519457dbdc973f3300).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class HasParallelism(Params):`
  * `class CrossValidator(Estimator, ValidatorParams, HasParallelism, 
MLReadable, MLWritable):`
  * `class TrainValidationSplit(Estimator, ValidatorParams, HasParallelism, 
MLReadable, MLWritable):`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19122
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluat...

2017-09-04 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19122#discussion_r136850665
  
--- Diff: python/pyspark/ml/tuning.py ---
@@ -255,18 +257,23 @@ def _fit(self, dataset):
 randCol = self.uid + "_rand"
 df = dataset.select("*", rand(seed).alias(randCol))
 metrics = [0.0] * numModels
+
+pool = ThreadPool(processes=min(self.getParallelism(), numModels))
+
 for i in range(nFolds):
 validateLB = i * h
 validateUB = (i + 1) * h
 condition = (df[randCol] >= validateLB) & (df[randCol] < 
validateUB)
-validation = df.filter(condition)
+validation = df.filter(condition).cache()
--- End diff --

Here maybe need a discussion.
Currently in pyspark it both do not cache `train dataset` and `validation 
dataset` but in scala impl it cache both of them.
But I prefer cache `validation dataset` but do not cache `train dataset`, 
because the size of `validation dataset` is only `1/numFolds` of input dataset, 
it deserve caching otherwise it will scan input dataset again. But the size 
`train dataset` is `(numFolds - 1)/numFolds` of input dataset. We can directly 
scan from input dataset to generate the `train dataset` and won't slow down too 
much.
@BryanCutler @MLnick What do you think about it ? Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19122
  
**[Test build #81386 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81386/testReport)**
 for PR 19122 at commit 
[`57cf534`](https://github.com/apache/spark/commit/57cf53473e5bfb75095b0e519457dbdc973f3300).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluat...

2017-09-04 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request:

https://github.com/apache/spark/pull/19122

[SPARK-21911][ML][PySpark] Parallel Model Evaluation for ML Tuning in 
PySpark

## What changes were proposed in this pull request?

Add parallelism support for ML tuning in pyspark.

## How was this patch tested?

Test updated.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WeichenXu123/spark par-ml-tuning-py

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19122.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19122


commit 57cf53473e5bfb75095b0e519457dbdc973f3300
Author: WeichenXu 
Date:   2017-09-04T16:03:55Z

init pr




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16611
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16611
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81385/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16611
  
**[Test build #81385 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81385/testReport)**
 for PR 16611 at commit 
[`4c1a012`](https://github.com/apache/spark/commit/4c1a012e5cad648e81797ec494f44392189560ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19117: [SPARK-21904] [SQL] Rename tempTables to tempViews in Se...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19117
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19117: [SPARK-21904] [SQL] Rename tempTables to tempViews in Se...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19117
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81384/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19117: [SPARK-21904] [SQL] Rename tempTables to tempViews in Se...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19117
  
**[Test build #81384 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81384/testReport)**
 for PR 19117 at commit 
[`02815e7`](https://github.com/apache/spark/commit/02815e7faae23a32b04c7af08c826f4428c60f5c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19119: [SPARK-21845] [SQL] Make codegen fallback of expressions...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19119
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81383/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19119: [SPARK-21845] [SQL] Make codegen fallback of expressions...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19119
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19119: [SPARK-21845] [SQL] Make codegen fallback of expressions...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19119
  
**[Test build #81383 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81383/testReport)**
 for PR 19119 at commit 
[`b96da49`](https://github.com/apache/spark/commit/b96da49aa0893f8bf34da2a2c111499fdbad7b5a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18875
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81382/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18875
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18875
  
**[Test build #81382 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81382/testReport)**
 for PR 18875 at commit 
[`3ebbe67`](https://github.com/apache/spark/commit/3ebbe67e059dfb6a004ff50f3c661f6319d616b8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16611
  
**[Test build #81385 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81385/testReport)**
 for PR 16611 at commit 
[`4c1a012`](https://github.com/apache/spark/commit/4c1a012e5cad648e81797ec494f44392189560ce).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19117: [SPARK-21904] [SQL] Rename tempTables to tempViews in Se...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19117
  
**[Test build #81384 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81384/testReport)**
 for PR 19117 at commit 
[`02815e7`](https://github.com/apache/spark/commit/02815e7faae23a32b04c7af08c826f4428c60f5c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19117: [SPARK-21904] [SQL] Rename tempTables to tempViews in Se...

2017-09-04 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19117
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4

2017-09-04 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19113
  
With 2.7G data, I ran a simple Java problem with 2.5.4 and 2.2.1 with 
`CsvParser`, and simple e2e read tests. Elapsed time diff was roughly  -1.7% ~ 
+1.2%. I think virtually no diff (or 0.5 improvement).

I think we generally trust other communities and libraries we decided to 
add such as ORC, Parquet, Jackson and etc., and de-duplicate such efforts with 
the community support. I think we discussed about this before.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19110
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81381/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19110
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19110
  
**[Test build #81381 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81381/testReport)**
 for PR 19110 at commit 
[`fc6fd5e`](https://github.com/apache/spark/commit/fc6fd5e98edcaccc4e42abf8ba94250ea1dbdfba).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18875
  
**[Test build #81382 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81382/testReport)**
 for PR 18875 at commit 
[`3ebbe67`](https://github.com/apache/spark/commit/3ebbe67e059dfb6a004ff50f3c661f6319d616b8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19110
  
**[Test build #81381 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81381/testReport)**
 for PR 19110 at commit 
[`fc6fd5e`](https://github.com/apache/spark/commit/fc6fd5e98edcaccc4e42abf8ba94250ea1dbdfba).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19119: [SPARK-21845] [SQL] Make codegen fallback of expressions...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19119
  
**[Test build #81383 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81383/testReport)**
 for PR 19119 at commit 
[`b96da49`](https://github.com/apache/spark/commit/b96da49aa0893f8bf34da2a2c111499fdbad7b5a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18875
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18875
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81380/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18875
  
**[Test build #81380 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81380/testReport)**
 for PR 18875 at commit 
[`d466524`](https://github.com/apache/spark/commit/d466524e918361891ef406e4fe9d9b3b638054c3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19116
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81378/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.

2017-09-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19116
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...

2017-09-04 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18576
  
@gatorsmile I think a bit more about this issue and I propose another 
approach; how about just moving `output` into `QueryPlanConstraints` and 
`output` always considering NULL constraints in its own logical plan? This does 
not solve all the existing issues about nullability though, this fix is not 
intrusive but simple (I feel good as a first step for this). 
https://github.com/apache/spark/compare/master...maropu:SPARK-21351-4#diff-b40fcb6ac9b2e94b410f39a94a97e822R36


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.

2017-09-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19116
  
**[Test build #81378 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81378/testReport)**
 for PR 19116 at commit 
[`2447fd0`](https://github.com/apache/spark/commit/2447fd0e152ace4dd074a92bd0d3cdc638b09b1a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 158 matches

Mail list logo