[GitHub] spark pull request #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19882#discussion_r155166536
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcPartitionDiscoverySuite.scala
 ---
@@ -186,8 +159,8 @@ class OrcPartitionDiscoverySuite extends QueryTest with 
TestHiveSingleton with B
   makePartitionDir(base, defaultPartitionName, "pi" -> pi, "ps" -> 
ps))
   }
 
-  read
-.option(ConfVars.DEFAULTPARTITIONNAME.varname, 
defaultPartitionName)
+  spark.read
+.option("hive.exec.default.partition.name", defaultPartitionName)
--- End diff --

the new ORC didn't change these config names?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19901: [SPARK-22705][SQL] Case, Coalesce, and In use less globa...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19901
  
**[Test build #84534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84534/testReport)**
 for PR 19901 at commit 
[`c81d795`](https://github.com/apache/spark/commit/c81d795565ae347243c3571885820b5ac15bf494).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19871#discussion_r155165795
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -553,6 +557,8 @@ object DataSource extends Logging {
   "org.apache.spark.sql.execution.datasources.parquet.DefaultSource" 
-> parquet,
   "org.apache.spark.sql.hive.orc.DefaultSource" -> orc,
   "org.apache.spark.sql.hive.orc" -> orc,
+  "org.apache.spark.sql.execution.datasources.orc.DefaultSource" -> 
nativeOrc,
+  "org.apache.spark.sql.execution.datasources.orc" -> nativeOrc,
--- End diff --

These changes are pretty safe. In case, we move the orc to the other 
location, it will still refer to the right location. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19871#discussion_r155165558
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -553,6 +557,8 @@ object DataSource extends Logging {
   "org.apache.spark.sql.execution.datasources.parquet.DefaultSource" 
-> parquet,
   "org.apache.spark.sql.hive.orc.DefaultSource" -> orc,
   "org.apache.spark.sql.hive.orc" -> orc,
+  "org.apache.spark.sql.execution.datasources.orc.DefaultSource" -> 
nativeOrc,
+  "org.apache.spark.sql.execution.datasources.orc" -> nativeOrc,
--- End diff --

`org.apache.spark.sql.execution.datasources.*` path is meant to be private 
too. But I think it's okay to leave it for practical use cases with some 
comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19871#discussion_r155165388
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -587,7 +601,8 @@ object DataSource extends Logging {
 if (provider1.toLowerCase(Locale.ROOT) == "orc" ||
--- End diff --

`provider1` can't be "orc" anymore. It can be only 
`classOf[OrcFileFormat].getCanonicalName` or 
`"org.apache.spark.sql.hive.orc.OrcFileFormat"`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19871#discussion_r155165377
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -553,6 +557,8 @@ object DataSource extends Logging {
   "org.apache.spark.sql.execution.datasources.parquet.DefaultSource" 
-> parquet,
   "org.apache.spark.sql.hive.orc.DefaultSource" -> orc,
   "org.apache.spark.sql.hive.orc" -> orc,
+  "org.apache.spark.sql.execution.datasources.orc.DefaultSource" -> 
nativeOrc,
+  "org.apache.spark.sql.execution.datasources.orc" -> nativeOrc,
--- End diff --

I think we should rename variable and/or fix the comments there when we 
touch some codes around there to prevent confusion next time though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19871#discussion_r155164921
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -553,6 +557,8 @@ object DataSource extends Logging {
   "org.apache.spark.sql.execution.datasources.parquet.DefaultSource" 
-> parquet,
   "org.apache.spark.sql.hive.orc.DefaultSource" -> orc,
   "org.apache.spark.sql.hive.orc" -> orc,
+  "org.apache.spark.sql.execution.datasources.orc.DefaultSource" -> 
nativeOrc,
+  "org.apache.spark.sql.execution.datasources.orc" -> nativeOrc,
--- End diff --

For parquet, this is for historical reason, see #13311.

Previously you can use parquet by 
`org.apache.spark.sql.execution.datasources.parquet` and 
`org.apache.spark.sql.execution.datasources.parquet.DefaultSource`. So it is 
also for backward compatibility.

For this new orc, it is not the same case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19871#discussion_r155163332
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -553,6 +557,8 @@ object DataSource extends Logging {
   "org.apache.spark.sql.execution.datasources.parquet.DefaultSource" 
-> parquet,
   "org.apache.spark.sql.hive.orc.DefaultSource" -> orc,
   "org.apache.spark.sql.hive.orc" -> orc,
+  "org.apache.spark.sql.execution.datasources.orc.DefaultSource" -> 
nativeOrc,
+  "org.apache.spark.sql.execution.datasources.orc" -> nativeOrc,
--- End diff --

This is for safety. We also do it for parquet


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19871#discussion_r155162795
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -553,6 +557,8 @@ object DataSource extends Logging {
   "org.apache.spark.sql.execution.datasources.parquet.DefaultSource" 
-> parquet,
   "org.apache.spark.sql.hive.orc.DefaultSource" -> orc,
   "org.apache.spark.sql.hive.orc" -> orc,
+  "org.apache.spark.sql.execution.datasources.orc.DefaultSource" -> 
nativeOrc,
+  "org.apache.spark.sql.execution.datasources.orc" -> nativeOrc,
--- End diff --

When I received [the 
advice](https://github.com/apache/spark/pull/19871#discussion_r154580007), I 
thought it's for consistency.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19871#discussion_r155162572
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -553,6 +557,8 @@ object DataSource extends Logging {
   "org.apache.spark.sql.execution.datasources.parquet.DefaultSource" 
-> parquet,
   "org.apache.spark.sql.hive.orc.DefaultSource" -> orc,
   "org.apache.spark.sql.hive.orc" -> orc,
+  "org.apache.spark.sql.execution.datasources.orc.DefaultSource" -> 
nativeOrc,
+  "org.apache.spark.sql.execution.datasources.orc" -> nativeOrc,
--- End diff --

Like `USING org.apache.spark.sql.hive.orc`, we want to use `USING 
org.apache.spark.sql.execution.datasources.orc`, don't we?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19905: [SPARK-22710] ConfigBuilder.fallbackConf should trigger ...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19905
  
**[Test build #84533 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84533/testReport)**
 for PR 19905 at commit 
[`028e0fa`](https://github.com/apache/spark/commit/028e0faf87f15c9a25acc3e4f6755d3d49bb7904).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19871#discussion_r155160863
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -553,6 +557,8 @@ object DataSource extends Logging {
   "org.apache.spark.sql.execution.datasources.parquet.DefaultSource" 
-> parquet,
   "org.apache.spark.sql.hive.orc.DefaultSource" -> orc,
   "org.apache.spark.sql.hive.orc" -> orc,
+  "org.apache.spark.sql.execution.datasources.orc.DefaultSource" -> 
nativeOrc,
+  "org.apache.spark.sql.execution.datasources.orc" -> nativeOrc,
--- End diff --

ah good catch! sounds like we don't need compatibility rule for the new orc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19854: [SPARK-22660][BUILD] Use position() and limit() to fix a...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19854
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84525/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19854: [SPARK-22660][BUILD] Use position() and limit() to fix a...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19854
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19854: [SPARK-22660][BUILD] Use position() and limit() to fix a...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19854
  
**[Test build #84525 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84525/testReport)**
 for PR 19854 at commit 
[`dbe6b3f`](https://github.com/apache/spark/commit/dbe6b3fdfce50f25872b559018bfb1b77e81d04f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19905: [SPARK-22710] ConfigBuilder.fallbackConf should trigger ...

2017-12-05 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/19905
  
cc @vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19905: [SPARK-22710] ConfigBuilder.fallbackConf should t...

2017-12-05 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/19905

[SPARK-22710] ConfigBuilder.fallbackConf should trigger onCreate function

## What changes were proposed in this pull request?
I was looking at the config code today and found that configs defined using 
ConfigBuilder.fallbackConf didn't trigger onCreate function. This patch fixes 
it.

This doesn't require backporting since we currently have no configs that 
use it.

## How was this patch tested?
Added a test case for all the config final creator functions in 
ConfigEntrySuite.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-22710

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19905.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19905


commit 028e0faf87f15c9a25acc3e4f6755d3d49bb7904
Author: Reynold Xin 
Date:   2017-12-06T07:10:38Z

[SPARK-22710] ConfigBuilder.fallbackConf should trigger onCreate function




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19871#discussion_r155160106
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -553,6 +557,8 @@ object DataSource extends Logging {
   "org.apache.spark.sql.execution.datasources.parquet.DefaultSource" 
-> parquet,
   "org.apache.spark.sql.hive.orc.DefaultSource" -> orc,
   "org.apache.spark.sql.hive.orc" -> orc,
+  "org.apache.spark.sql.execution.datasources.orc.DefaultSource" -> 
nativeOrc,
+  "org.apache.spark.sql.execution.datasources.orc" -> nativeOrc,
--- End diff --

This map is for backward compatibility in case we move data sources around. 
I think this `datasources.orc` is newly added. Why we need to add them here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19864#discussion_r155159490
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -94,14 +94,16 @@ class CacheManager extends Logging {
   logWarning("Asked to cache already cached data.")
 } else {
   val sparkSession = query.sparkSession
-  cachedData.add(CachedData(
-planToCache,
-InMemoryRelation(
-  sparkSession.sessionState.conf.useCompression,
-  sparkSession.sessionState.conf.columnBatchSize,
-  storageLevel,
-  sparkSession.sessionState.executePlan(planToCache).executedPlan,
-  tableName)))
+  val inMemoryRelation = InMemoryRelation(
+sparkSession.sessionState.conf.useCompression,
+sparkSession.sessionState.conf.columnBatchSize,
+storageLevel,
+sparkSession.sessionState.executePlan(planToCache).executedPlan,
+tableName)
+  if (planToCache.conf.cboEnabled && 
planToCache.stats.rowCount.isDefined) {
--- End diff --

The statistics from relation is based on files size, will it easily cause 
OOM issue? I think in the cases other than cached query, we still use this 
relation's statistics. If this is an issue, doesn't it also affect the other 
cases?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

2017-12-05 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19872#discussion_r155159185
  
--- Diff: python/pyspark/sql/group.py ---
@@ -89,8 +89,15 @@ def agg(self, *exprs):
 else:
 # Columns
 assert all(isinstance(c, Column) for c in exprs), "all exprs 
should be Column"
-jdf = self._jgd.agg(exprs[0]._jc,
-_to_seq(self.sql_ctx._sc, [c._jc for c in 
exprs[1:]]))
+if isinstance(exprs[0], UDFColumn):
+assert all(isinstance(c, UDFColumn) for c in exprs)
--- End diff --

Ah so what your saying is you don't support mixed Python & Java UDAFs? 
That's certainly something which needs to be communicated in both the 
documentation and the error message.

Is there a reason why we don't support this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19903
  
**[Test build #84532 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84532/testReport)**
 for PR 19903 at commit 
[`d7d268d`](https://github.com/apache/spark/commit/d7d268d19e69b1477e2f3ca9cc8d3f1221d3663c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19903
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19903
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19903
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84528/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19901: [SPARK-22705][SQL] Case, Coalesce, and In use les...

2017-12-05 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19901#discussion_r155158174
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ---
@@ -237,53 +237,58 @@ case class In(value: Expression, list: 
Seq[Expression]) extends Predicate {
 val javaDataType = ctx.javaType(value.dataType)
 val valueGen = value.genCode(ctx)
 val listGen = list.map(_.genCode(ctx))
-ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.value)
-ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
+// inValue  -1:isNull, 0:false, 1:true
+val inValue = ctx.freshName("value")
 val valueArg = ctx.freshName("valueArg")
 // All the blocks are meant to be inside a do { ... } while (false); 
loop.
 // The evaluation of variables can be stopped when we find a matching 
value.
 val listCode = listGen.map(x =>
   s"""
  |${x.code}
  |if (${x.isNull}) {
- |  ${ev.isNull} = true;
+ |  $inValue = -1; // isNull = true
  |} else if (${ctx.genEqual(value.dataType, valueArg, x.value)}) {
- |  ${ev.isNull} = false;
- |  ${ev.value} = true;
+ |  $inValue = 1; // value = TRUE
  |  continue;
  |}
""".stripMargin)
 
 val codes = ctx.splitExpressionsWithCurrentInputs(
   expressions = listCode,
   funcName = "valueIn",
-  extraArguments = (javaDataType, valueArg) :: Nil,
-  makeSplitFunction = body =>
+  extraArguments = (javaDataType, valueArg) :: (ctx.JAVA_BYTE, 
inValue) :: Nil,
+  returnType = ctx.JAVA_BYTE,
+  makeSplitFunction = { body =>
 s"""
|do {
|  $body
|} while (false);
- """.stripMargin,
-  foldFunctions = _.map { funcCall =>
-s"""
-   |$funcCall;
-   |if (${ev.value}) {
-   |  continue;
-   |}
+   |return $inValue;
  """.stripMargin
-  }.mkString("\n"))
+  },
+  foldFunctions = { funcCalls =>
+funcCalls.map(funcCall =>
+  s"""
+ |$inValue = $funcCall;
+ |if ($inValue == 1) {
+ |  continue;
+ |}
+   """.stripMargin).mkString("\n")
+  }
+)
 
 ev.copy(code =
   s"""
  |${valueGen.code}
- |${ev.value} = false;
- |${ev.isNull} = ${valueGen.isNull};
- |if (!${ev.isNull}) {
+ |byte $inValue = (byte)(${valueGen.isNull} ? -1 : 0); // isNull 
or FALSE
+ |if ($inValue != -1) {
  |  $javaDataType $valueArg = ${valueGen.value};
--- End diff --

`${valueGen.value}` may have an constant `1.0D`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19903
  
**[Test build #84528 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84528/testReport)**
 for PR 19903 at commit 
[`d7d268d`](https://github.com/apache/spark/commit/d7d268d19e69b1477e2f3ca9cc8d3f1221d3663c).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19882
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84524/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19882
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19882
  
**[Test build #84524 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84524/testReport)**
 for PR 19882 at commit 
[`baec5fe`](https://github.com/apache/spark/commit/baec5fece95896a28127cd2ac014582874ed5e28).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OrcFilterSuite extends OrcTest with SharedSQLContext `
  * `implicit class IntToBinary(int: Int) `
  * `case class OrcParData(intField: Int, stringField: String)`
  * `case class OrcParDataWithKey(intField: Int, pi: Int, stringField: 
String, ps: String)`
  * `abstract class OrcPartitionDiscoveryTest extends OrcTest `
  * `case class AllDataTypesWithNonPrimitiveType(`
  * `case class BinaryData(binaryData: Array[Byte])`
  * `case class Contact(name: String, phone: String)`
  * `case class Person(name: String, age: Int, contacts: Seq[Contact])`
  * `abstract class OrcQueryTest extends OrcTest `
  * `  test(\"Creating case class RDD table\") `
  * `  test(\"save and load case class RDD with `None`s as orc\") `
  * `class OrcQuerySuite extends OrcQueryTest with SharedSQLContext `
  * `case class OrcData(intField: Int, stringField: String)`
  * `abstract class OrcSuite extends OrcTest with BeforeAndAfterAll `
  * `class OrcSourceSuite extends OrcSuite with SharedSQLContext `
  * `abstract class OrcTest extends QueryTest with SQLTestUtils with 
BeforeAndAfterAll `
  * `class HiveOrcFilterSuite extends OrcTest with TestHiveSingleton `
  * `implicit class IntToBinary(int: Int) `
  * `class HiveOrcPartitionDiscoverySuite extends OrcPartitionDiscoveryTest 
with TestHiveSingleton  `
  * `class HiveOrcQuerySuite extends OrcQueryTest with TestHiveSingleton `
  * `class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton `
  * `class HiveOrcHadoopFsRelationSuite extends OrcHadoopFsRelationSuite `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-12-05 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19873
  
Thanks! @gatorsmile @cloud-fan


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18581
  
**[Test build #84531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84531/testReport)**
 for PR 18581 at commit 
[`265dd48`](https://github.com/apache/spark/commit/265dd48ce16fd62058f4515a9e91c67942b45ed7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19903
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19903
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84526/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19903
  
**[Test build #84526 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84526/testReport)**
 for PR 19903 at commit 
[`ba1e3ae`](https://github.com/apache/spark/commit/ba1e3ae39116c53a8f0d1ed7c71f737fa2e9d0b6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19903
  
Thanks, @gatorsmile .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19899: [SPARK-22704][SQL] Least and Greatest use less global va...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19899
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19899: [SPARK-22704][SQL] Least and Greatest use less global va...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19899
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84527/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19899: [SPARK-22704][SQL] Least and Greatest use less global va...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19899
  
**[Test build #84527 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84527/testReport)**
 for PR 19899 at commit 
[`facaf1c`](https://github.com/apache/spark/commit/facaf1cda0fae41c86a860265b562f38acd63605).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-12-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18581#discussion_r155151425
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
 ---
@@ -184,4 +184,54 @@ class LibSVMRelationSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   spark.sql("DROP TABLE IF EXISTS libsvmTable")
 }
   }
+
+  def testLineSeparator(lineSep: String): Unit = {
+test(s"SPARK-21289: Support line separator - lineSep: '$lineSep'") {
+  val data = Seq(
+"1.0 1:1.0 3:2.0 5:3.0", "0.0", "0.0", "0.0 2:4.0 4:5.0 
6:6.0").mkString(lineSep)
+  val dataWithTrailingLineSep = s"$data$lineSep"
+
+  Seq(data, dataWithTrailingLineSep).foreach { lines =>
+val path0 = new File(tempDir.getCanonicalPath, "write0")
+val path1 = new File(tempDir.getCanonicalPath, "write1")
+try {
+  // Read
+  java.nio.file.Files.write(path0.toPath, 
lines.getBytes(StandardCharsets.UTF_8))
+  val df = spark.read
+.option("lineSep", lineSep)
+.format("libsvm")
+.load(path0.getAbsolutePath)
+
+  assert(df.columns(0) == "label")
+  assert(df.columns(1) == "features")
+  val row1 = df.first()
+  assert(row1.getDouble(0) == 1.0)
+  val v = row1.getAs[SparseVector](1)
+  assert(v == Vectors.sparse(6, Seq((0, 1.0), (2, 2.0), (4, 3.0
+
--- End diff --

OK, let me update it. It's easy to change anyway.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

2017-12-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19873


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19903
  
LGTM pending Jenkins on $84528


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-12-05 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19873
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-12-05 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19873
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19876: [WIP][ML][SPARK-11171][SPARK-11239] Add PMML export to S...

2017-12-05 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/19876
  
You can see some of the past discussion on 
https://github.com/apache/spark/pull/9207 cc @sethah @MLnick 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19717
  
**[Test build #84530 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84530/testReport)**
 for PR 19717 at commit 
[`0e8ca01`](https://github.com/apache/spark/commit/0e8ca012dc6d5dfd5645f45b6d1bb84e1e9e72a5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19903
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84521/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19903
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19903
  
**[Test build #84521 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84521/testReport)**
 for PR 19903 at commit 
[`d6cdde7`](https://github.com/apache/spark/commit/d6cdde748fb25e51f7f100a9bbd48a1d17547ccf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19873
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84520/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19873
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19873
  
**[Test build #84520 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84520/testReport)**
 for PR 19873 at commit 
[`d2375e0`](https://github.com/apache/spark/commit/d2375e0185185cd3516b25c14b48c12840957cbf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19873
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19903
  
@cloud-fan , @HyukjinKwon , @gatorsmile . The PR is updated again.
Please let me know if I need to update more. Thank you!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19873
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84518/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19873
  
**[Test build #84518 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84518/testReport)**
 for PR 19873 at commit 
[`4775a02`](https://github.com/apache/spark/commit/4775a0249b34be16e7bc15dbd4b291fddfe92656).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ReqAndHandler(req: Request, handler: MemberHandler)`
  * `trait TypeCoercionRule extends Rule[LogicalPlan] with Logging `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19899: [SPARK-22704][SQL] Least and Greatest use less global va...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19899
  
**[Test build #84529 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84529/testReport)**
 for PR 19899 at commit 
[`d467192`](https://github.com/apache/spark/commit/d46719234bbed9ded30d6f7db6a6a403e9e25f93).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155148270
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/Client.scala
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and 
Kubernetes Resources
+*/
+  def run(): Unit = {

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18853
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84519/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18853
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18853
  
**[Test build #84519 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84519/testReport)**
 for PR 18853 at commit 
[`7802483`](https://github.com/apache/spark/commit/7802483075e4328b3a2299413376cf2c6440a2b0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-05 Thread CodingCat
Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/19864#discussion_r155147478
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -94,14 +94,16 @@ class CacheManager extends Logging {
   logWarning("Asked to cache already cached data.")
 } else {
   val sparkSession = query.sparkSession
-  cachedData.add(CachedData(
-planToCache,
-InMemoryRelation(
-  sparkSession.sessionState.conf.useCompression,
-  sparkSession.sessionState.conf.columnBatchSize,
-  storageLevel,
-  sparkSession.sessionState.executePlan(planToCache).executedPlan,
-  tableName)))
+  val inMemoryRelation = InMemoryRelation(
+sparkSession.sessionState.conf.useCompression,
+sparkSession.sessionState.conf.columnBatchSize,
+storageLevel,
+sparkSession.sessionState.executePlan(planToCache).executedPlan,
+tableName)
+  if (planToCache.conf.cboEnabled && 
planToCache.stats.rowCount.isDefined) {
--- End diff --

@viirya  you're right! Thanks for clearing the confusion

however, to prevent using relation's stats which can be much smaller than 
the in-memory size and lead to a potential OOM error, we should still have this 
condition here (we can remove cboEnabled though), right?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-12-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18581#discussion_r155146548
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
 ---
@@ -184,4 +184,54 @@ class LibSVMRelationSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   spark.sql("DROP TABLE IF EXISTS libsvmTable")
 }
   }
+
+  def testLineSeparator(lineSep: String): Unit = {
+test(s"SPARK-21289: Support line separator - lineSep: '$lineSep'") {
+  val data = Seq(
+"1.0 1:1.0 3:2.0 5:3.0", "0.0", "0.0", "0.0 2:4.0 4:5.0 
6:6.0").mkString(lineSep)
+  val dataWithTrailingLineSep = s"$data$lineSep"
+
+  Seq(data, dataWithTrailingLineSep).foreach { lines =>
+val path0 = new File(tempDir.getCanonicalPath, "write0")
+val path1 = new File(tempDir.getCanonicalPath, "write1")
+try {
+  // Read
+  java.nio.file.Files.write(path0.toPath, 
lines.getBytes(StandardCharsets.UTF_8))
+  val df = spark.read
+.option("lineSep", lineSep)
+.format("libsvm")
+.load(path0.getAbsolutePath)
+
+  assert(df.columns(0) == "label")
+  assert(df.columns(1) == "features")
+  val row1 = df.first()
+  assert(row1.getDouble(0) == 1.0)
+  val v = row1.getAs[SparseVector](1)
+  assert(v == Vectors.sparse(6, Seq((0, 1.0), (2, 2.0), (4, 3.0
+
--- End diff --

Here we change how to deal with each line in iteration. I think both 
comparing single line or repeated multiple lines are fine. I think many tests 
here already test only first line?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-12-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18581#discussion_r155146209
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
 ---
@@ -184,4 +184,54 @@ class LibSVMRelationSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   spark.sql("DROP TABLE IF EXISTS libsvmTable")
 }
   }
+
+  def testLineSeparator(lineSep: String): Unit = {
+test(s"SPARK-21289: Support line separator - lineSep: '$lineSep'") {
+  val data = Seq(
+"1.0 1:1.0 3:2.0 5:3.0", "0.0", "0.0", "0.0 2:4.0 4:5.0 
6:6.0").mkString(lineSep)
+  val dataWithTrailingLineSep = s"$data$lineSep"
+
+  Seq(data, dataWithTrailingLineSep).foreach { lines =>
+val path0 = new File(tempDir.getCanonicalPath, "write0")
+val path1 = new File(tempDir.getCanonicalPath, "write1")
+try {
+  // Read
+  java.nio.file.Files.write(path0.toPath, 
lines.getBytes(StandardCharsets.UTF_8))
+  val df = spark.read
+.option("lineSep", lineSep)
+.format("libsvm")
+.load(path0.getAbsolutePath)
+
+  assert(df.columns(0) == "label")
+  assert(df.columns(1) == "features")
+  val row1 = df.first()
+  assert(row1.getDouble(0) == 1.0)
+  val v = row1.getAs[SparseVector](1)
+  assert(v == Vectors.sparse(6, Seq((0, 1.0), (2, 2.0), (4, 3.0
--- End diff --

OK but let me use `==`. Seems it's used in the test cases of this file. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19903
  
**[Test build #84528 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84528/testReport)**
 for PR 19903 at commit 
[`d7d268d`](https://github.com/apache/spark/commit/d7d268d19e69b1477e2f3ca9cc8d3f1221d3663c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19899: [SPARK-22704][SQL] Least and Greatest use less gl...

2017-12-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19899#discussion_r155145718
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -668,22 +681,35 @@ case class Greatest(children: Seq[Expression]) 
extends Expression {
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 val evalChildren = children.map(_.genCode(ctx))
-ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
-ctx.addMutableState(ctx.javaType(dataType), ev.value)
-def updateEval(eval: ExprCode): String = {
+val isNull = ctx.freshName("isNull")
+ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull)
+val evals = evalChildren.map(eval =>
   s"""
 ${eval.code}
-if (!${eval.isNull} && (${ev.isNull} ||
+if (!${eval.isNull} && (${isNull} ||
   ${ctx.genGreater(dataType, eval.value, ev.value)})) {
-  ${ev.isNull} = false;
+  $isNull = false;
   ${ev.value} = ${eval.value};
 }
   """
-}
-val codes = 
ctx.splitExpressionsWithCurrentInputs(evalChildren.map(updateEval))
+)
+
+val resultType = ctx.javaType(dataType)
+val codes = ctx.splitExpressionsWithCurrentInputs(
+  expressions = evals,
+  funcName = "least",
--- End diff --

greatest


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19899: [SPARK-22704][SQL] Least and Greatest use less gl...

2017-12-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19899#discussion_r155145590
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -602,23 +602,36 @@ case class Least(children: Seq[Expression]) extends 
Expression {
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 val evalChildren = children.map(_.genCode(ctx))
-ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
-ctx.addMutableState(ctx.javaType(dataType), ev.value)
-def updateEval(eval: ExprCode): String = {
+val isNull = ctx.freshName("isNull")
--- End diff --

val greatestTmpIsNull = ctx.freshName("greatestTmpIsNull")


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19899: [SPARK-22704][SQL] Least and Greatest use less gl...

2017-12-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19899#discussion_r155145566
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -602,23 +602,36 @@ case class Least(children: Seq[Expression]) extends 
Expression {
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 val evalChildren = children.map(_.genCode(ctx))
-ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
-ctx.addMutableState(ctx.javaType(dataType), ev.value)
-def updateEval(eval: ExprCode): String = {
+val isNull = ctx.freshName("leastTmpIsNull")
--- End diff --

`isNull`, `ev.isNull` and `eval.isNull` looks too close.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19899: [SPARK-22704][SQL] Least and Greatest use less gl...

2017-12-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19899#discussion_r155145502
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -602,23 +602,36 @@ case class Least(children: Seq[Expression]) extends 
Expression {
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 val evalChildren = children.map(_.genCode(ctx))
-ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
-ctx.addMutableState(ctx.javaType(dataType), ev.value)
-def updateEval(eval: ExprCode): String = {
+val isNull = ctx.freshName("leastTmpIsNull")
--- End diff --

`val leastTmpIsNull = ctx.freshName("leastTmpIsNull")`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19899: [SPARK-22704][SQL] Least and Greatest use less gl...

2017-12-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19899#discussion_r155145453
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -602,23 +602,36 @@ case class Least(children: Seq[Expression]) extends 
Expression {
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 val evalChildren = children.map(_.genCode(ctx))
-ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
-ctx.addMutableState(ctx.javaType(dataType), ev.value)
-def updateEval(eval: ExprCode): String = {
+val isNull = ctx.freshName("isNull")
--- End diff --

`greatestTmpIsNull`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19903
  
Yep. The test case will be updated together.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19783: [SPARK-21322][SQL] support histogram in filter cardinali...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19783
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19783: [SPARK-21322][SQL] support histogram in filter cardinali...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19783
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84517/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19783: [SPARK-21322][SQL] support histogram in filter cardinali...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19783
  
**[Test build #84517 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84517/testReport)**
 for PR 19783 at commit 
[`d06`](https://github.com/apache/spark/commit/d06d172897dde096c40ce70ad2584bbefc03).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable ex...

2017-12-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19903#discussion_r155145012
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -602,7 +602,8 @@ object DataSource extends Logging {
   provider1.startsWith("org.apache.spark.sql.hive.orc")) {
   throw new AnalysisException(
 "Hive-based ORC data source must be used with Hive 
support enabled. " +
--- End diff --

Yep.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable ex...

2017-12-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19903#discussion_r155144800
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -602,7 +602,8 @@ object DataSource extends Logging {
   provider1.startsWith("org.apache.spark.sql.hive.orc")) {
   throw new AnalysisException(
 "Hive-based ORC data source must be used with Hive 
support enabled. " +
-"Please use native ORC data source instead")
+"Please use native ORC data source instead by setting 
'spark.sql.orc.impl' " +
+"configuration to 'native'")
--- End diff --

Yep.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19904
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84523/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19904
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19904
  
**[Test build #84523 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84523/testReport)**
 for PR 19904 at commit 
[`8bfcc94`](https://github.com/apache/spark/commit/8bfcc948696dd88a55d808322079abae994dbaa6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19899: [SPARK-22704][SQL] Least and Greatest use less global va...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19899
  
**[Test build #84527 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84527/testReport)**
 for PR 19899 at commit 
[`facaf1c`](https://github.com/apache/spark/commit/facaf1cda0fae41c86a860265b562f38acd63605).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19903
  
Do you have a test case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable ex...

2017-12-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19903#discussion_r155144262
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -602,7 +602,8 @@ object DataSource extends Logging {
   provider1.startsWith("org.apache.spark.sql.hive.orc")) {
   throw new AnalysisException(
 "Hive-based ORC data source must be used with Hive 
support enabled. " +
--- End diff --

Hive built-in ORC data source must be used with Hive support enabled. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19904
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84522/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19904
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19904
  
**[Test build #84522 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84522/testReport)**
 for PR 19904 at commit 
[`7725fd8`](https://github.com/apache/spark/commit/7725fd8a86dddba6c61c7d053dfa510a114bebb8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable ex...

2017-12-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19903#discussion_r155144120
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -602,7 +602,8 @@ object DataSource extends Logging {
   provider1.startsWith("org.apache.spark.sql.hive.orc")) {
   throw new AnalysisException(
 "Hive-based ORC data source must be used with Hive 
support enabled. " +
-"Please use native ORC data source instead")
+"Please use native ORC data source instead by setting 
'spark.sql.orc.impl' " +
+"configuration to 'native'")
--- End diff --

Please use the native ORC data source by setting 'spark.sql.orc.impl' to 
'native'


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14151: [SPARK-16496][SQL] Add wholetext as option for re...

2017-12-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14151#discussion_r155143739
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -313,11 +313,13 @@ def text(self, paths):
 Each line in the text file is a new row in the resulting DataFrame.
 
 :param paths: string, or list of strings, for input path(s).
+:param wholetext: if true, read each file from input path(s) as a 
single row.
 
 >>> df = spark.read.text('python/test_support/sql/text-test.txt')
 >>> df.collect()
 [Row(value=u'hello'), Row(value=u'this')]
 """
--- End diff --

Can you add a doctest for `wholetext` too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14151: [SPARK-16496][SQL] Add wholetext as option for re...

2017-12-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14151#discussion_r15514
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextFileFormat.scala
 ---
@@ -97,14 +109,26 @@ class TextFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 assert(
   requiredSchema.length <= 1,
   "Text data source only produces a single data column named 
\"value\".")
-
+val textOptions = new TextOptions(options)
 val broadcastedHadoopConf =
   sparkSession.sparkContext.broadcast(new 
SerializableConfiguration(hadoopConf))
 
+readToUnsafeMem(broadcastedHadoopConf, requiredSchema, 
textOptions.wholeText)
+  }
+
+  private def readToUnsafeMem(conf: Broadcast[SerializableConfiguration],
+  requiredSchema: StructType, wholeTextMode: Boolean):
+  (PartitionedFile) => Iterator[UnsafeRow] = {
+
 (file: PartitionedFile) => {
-  val reader = new HadoopFileLinesReader(file, 
broadcastedHadoopConf.value.value)
+  val confValue = conf.value.value
+  var reader: Iterator[Text] with Closeable = null
+  if (!wholeTextMode) {
+reader = new HadoopFileLinesReader(file, confValue)
+  } else {
+reader = new HadoopFileWholeTextReader(file, confValue)
+  }
--- End diff --

We can avoid using `var`:
```scala
val reader = if (!wholeTextMode) {
  new HadoopFileLinesReader(file, confValue)
} else {
  new HadoopFileWholeTextReader(file, confValue)
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18610: [SPARK-21386] ML LinearRegression supports warm start fr...

2017-12-05 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18610
  
@sethah I don't think we need to saving the initial model right now, and if 
its going to slow down the PR progress maybe dropping it makes sense -- but I 
guess it depends how complicated that is?
For the second point, I also think creating an initial model from a set of 
coefficients could be useful, but I think that could be addressed in a follow 
up issue -- what do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-12-05 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18581#discussion_r155141609
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
 ---
@@ -184,4 +184,54 @@ class LibSVMRelationSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   spark.sql("DROP TABLE IF EXISTS libsvmTable")
 }
   }
+
+  def testLineSeparator(lineSep: String): Unit = {
+test(s"SPARK-21289: Support line separator - lineSep: '$lineSep'") {
+  val data = Seq(
+"1.0 1:1.0 3:2.0 5:3.0", "0.0", "0.0", "0.0 2:4.0 4:5.0 
6:6.0").mkString(lineSep)
+  val dataWithTrailingLineSep = s"$data$lineSep"
+
+  Seq(data, dataWithTrailingLineSep).foreach { lines =>
+val path0 = new File(tempDir.getCanonicalPath, "write0")
+val path1 = new File(tempDir.getCanonicalPath, "write1")
+try {
+  // Read
+  java.nio.file.Files.write(path0.toPath, 
lines.getBytes(StandardCharsets.UTF_8))
+  val df = spark.read
+.option("lineSep", lineSep)
+.format("libsvm")
+.load(path0.getAbsolutePath)
+
+  assert(df.columns(0) == "label")
+  assert(df.columns(1) == "features")
+  val row1 = df.first()
+  assert(row1.getDouble(0) == 1.0)
+  val v = row1.getAs[SparseVector](1)
+  assert(v == Vectors.sparse(6, Seq((0, 1.0), (2, 2.0), (4, 3.0
--- End diff --

so I prefer to keep consistent with others.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-12-05 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18581#discussion_r155141551
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
 ---
@@ -184,4 +184,54 @@ class LibSVMRelationSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   spark.sql("DROP TABLE IF EXISTS libsvmTable")
 }
   }
+
+  def testLineSeparator(lineSep: String): Unit = {
+test(s"SPARK-21289: Support line separator - lineSep: '$lineSep'") {
+  val data = Seq(
+"1.0 1:1.0 3:2.0 5:3.0", "0.0", "0.0", "0.0 2:4.0 4:5.0 
6:6.0").mkString(lineSep)
+  val dataWithTrailingLineSep = s"$data$lineSep"
+
+  Seq(data, dataWithTrailingLineSep).foreach { lines =>
+val path0 = new File(tempDir.getCanonicalPath, "write0")
+val path1 = new File(tempDir.getCanonicalPath, "write1")
+try {
+  // Read
+  java.nio.file.Files.write(path0.toPath, 
lines.getBytes(StandardCharsets.UTF_8))
+  val df = spark.read
+.option("lineSep", lineSep)
+.format("libsvm")
+.load(path0.getAbsolutePath)
+
+  assert(df.columns(0) == "label")
+  assert(df.columns(1) == "features")
+  val row1 = df.first()
+  assert(row1.getDouble(0) == 1.0)
+  val v = row1.getAs[SparseVector](1)
+  assert(v == Vectors.sparse(6, Seq((0, 1.0), (2, 2.0), (4, 3.0
+
--- End diff --

The following test only include checking `df` and `readbackDF` equality. 
But, it seems we also need test the whole loaded `df` and raw file content 
equality.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19903
  
Thank you, @HyukjinKwon and @cloud-fan .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19903
  
**[Test build #84526 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84526/testReport)**
 for PR 19903 at commit 
[`ba1e3ae`](https://github.com/apache/spark/commit/ba1e3ae39116c53a8f0d1ed7c71f737fa2e9d0b6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19864#discussion_r155141140
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -94,14 +94,16 @@ class CacheManager extends Logging {
   logWarning("Asked to cache already cached data.")
 } else {
   val sparkSession = query.sparkSession
-  cachedData.add(CachedData(
-planToCache,
-InMemoryRelation(
-  sparkSession.sessionState.conf.useCompression,
-  sparkSession.sessionState.conf.columnBatchSize,
-  storageLevel,
-  sparkSession.sessionState.executePlan(planToCache).executedPlan,
-  tableName)))
+  val inMemoryRelation = InMemoryRelation(
+sparkSession.sessionState.conf.useCompression,
+sparkSession.sessionState.conf.columnBatchSize,
+storageLevel,
+sparkSession.sessionState.executePlan(planToCache).executedPlan,
+tableName)
+  if (planToCache.conf.cboEnabled && 
planToCache.stats.rowCount.isDefined) {
--- End diff --

If a catalog table doesn't have statistics in its metadata, we will fill it 
with `defaultSizeInBytes`.


https://github.com/apache/spark/blob/326f1d6728a7734c228d8bfaa69442a1c7b92e9b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L121-L134


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19903: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception...

2017-12-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19903
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19854: [SPARK-22660][BUILD] Use position() and limit() to fix a...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19854
  
**[Test build #84525 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84525/testReport)**
 for PR 19854 at commit 
[`dbe6b3f`](https://github.com/apache/spark/commit/dbe6b3fdfce50f25872b559018bfb1b77e81d04f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19864#discussion_r155140756
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -94,14 +94,16 @@ class CacheManager extends Logging {
   logWarning("Asked to cache already cached data.")
 } else {
   val sparkSession = query.sparkSession
-  cachedData.add(CachedData(
-planToCache,
-InMemoryRelation(
-  sparkSession.sessionState.conf.useCompression,
-  sparkSession.sessionState.conf.columnBatchSize,
-  storageLevel,
-  sparkSession.sessionState.executePlan(planToCache).executedPlan,
-  tableName)))
+  val inMemoryRelation = InMemoryRelation(
+sparkSession.sessionState.conf.useCompression,
+sparkSession.sessionState.conf.columnBatchSize,
+storageLevel,
+sparkSession.sessionState.executePlan(planToCache).executedPlan,
+tableName)
+  if (planToCache.conf.cboEnabled && 
planToCache.stats.rowCount.isDefined) {
--- End diff --

`LogicalRelation` uses the statistics from the `relation` only when there 
is no given `catalogTable`.  In this case, it doesn't consider if CBO is 
enabled or not.

Only `catalogTable` considers CBO when computing its statistics in 
`toPlanStats`. It doesn't refer to relation's statistics, IIUC.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19882
  
**[Test build #84524 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84524/testReport)**
 for PR 19882 at commit 
[`baec5fe`](https://github.com/apache/spark/commit/baec5fece95896a28127cd2ac014582874ed5e28).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-05 Thread CodingCat
Github user CodingCat commented on the issue:

https://github.com/apache/spark/pull/19864
  
@viirya sure will add 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-05 Thread CodingCat
Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/19864#discussion_r155140125
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -94,14 +94,16 @@ class CacheManager extends Logging {
   logWarning("Asked to cache already cached data.")
 } else {
   val sparkSession = query.sparkSession
-  cachedData.add(CachedData(
-planToCache,
-InMemoryRelation(
-  sparkSession.sessionState.conf.useCompression,
-  sparkSession.sessionState.conf.columnBatchSize,
-  storageLevel,
-  sparkSession.sessionState.executePlan(planToCache).executedPlan,
-  tableName)))
+  val inMemoryRelation = InMemoryRelation(
+sparkSession.sessionState.conf.useCompression,
+sparkSession.sessionState.conf.columnBatchSize,
+storageLevel,
+sparkSession.sessionState.executePlan(planToCache).executedPlan,
+tableName)
+  if (planToCache.conf.cboEnabled && 
planToCache.stats.rowCount.isDefined) {
--- End diff --

no, if CBO is disabled, the relation's sizeInBytes is the file size


https://github.com/apache/spark/blob/5c3a1f3fad695317c2fff1243cdb9b3ceb25c317/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala#L85





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14151: [SPARK-16496][SQL] Add wholetext as option for re...

2017-12-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14151#discussion_r155139949
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/text/TextSuite.scala
 ---
@@ -39,6 +39,54 @@ class TextSuite extends QueryTest with SharedSQLContext {
 verifyFrame(spark.read.text(testFile))
   }
 
+  test("reading text file with option wholetext=true") {
+val df = spark.read.option("wholetext", "true")
+  .format("text").load(testFile)
+// schema
+assert(df.schema == new StructType().add("value", StringType))
+
+// verify content
+val data = df.collect()
+assert(data(0) ==
+  Row(
+// scalastyle:off nonascii
+"""This is a test file for the text data source
+  |1+1
+  |数据砖头
+  |"doh"
+  |""".stripMargin))
+// scalastyle:on
--- End diff --

nit: // scalastyle:on nonascii


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >