date:20180527

[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...

2018-05-27 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21416
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21420
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91207/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21420
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21420
  
**[Test build #91207 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91207/testReport)**
 for PR 21420 at commit 
[`e66ea49`](https://github.com/apache/spark/commit/e66ea49000860d593074296b2a86e8bbdf5f0261).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3631/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21010
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21010
  
**[Test build #91214 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91214/testReport)**
 for PR 21010 at commit 
[`9ccb648`](https://github.com/apache/spark/commit/9ccb6488f6f8309e0cfa71c4b332e6d680f24ffa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20929: [SPARK-23772][SQL] Provide an option to ignore co...

2018-05-27 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20929#discussion_r191119148
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -2408,4 +2408,24 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
   spark.read.option("mode", "PERMISSIVE").option("encoding", 
"UTF-8").json(Seq(badJson).toDS()),
   Row(badJson))
   }
+
+  test("SPARK-23772 ignore column of all null values or empty array during 
schema inference") {
+ withTempPath { tempDir =>
+  val path = tempDir.getAbsolutePath
+  Seq(
+"""{"a":null, "b":[null, null], "c":null, "d":[[], [null]], 
"e":{}}""",
+"""{"a":null, "b":[null], "c":[], "d": [null, []], "e":{}}""",
+"""{"a":null, "b":[], "c":[], "d": null, "e":null}""")
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20929: [SPARK-23772][SQL] Provide an option to ignore co...

2018-05-27 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20929#discussion_r191118958
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -379,6 +379,8 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* that should be used for parsing.
* `samplingRatio` (default is 1.0): defines fraction of input JSON 
objects used
* for schema inferring.
+   * `dropFieldIfAllNull` (default `false`): whether to ignore column 
of all null values or
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20929: [SPARK-23772][SQL] Provide an option to ignore co...

2018-05-27 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20929#discussion_r191118916
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -2408,4 +2408,24 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
   spark.read.option("mode", "PERMISSIVE").option("encoding", 
"UTF-8").json(Seq(badJson).toDS()),
   Row(badJson))
   }
+
+  test("SPARK-23772 ignore column of all null values or empty array during 
schema inference") {
+ withTempPath { tempDir =>
+  val path = tempDir.getAbsolutePath
+  Seq(
+"""{"a":null, "b":[null, null], "c":null, "d":[[], [null]], 
"e":{}}""",
+"""{"a":null, "b":[null], "c":[], "d": [null, []], "e":{}}""",
+"""{"a":null, "b":[], "c":[], "d": null, "e":null}""")
+.toDS().write.mode("overwrite").text(path)
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...

2018-05-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21424#discussion_r191118494
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -106,11 +108,20 @@ private[execution] object HashedRelation {
   1),
 0)
 }
-
-if (key.length == 1 && key.head.dataType == LongType) {
-  LongHashedRelation(input, key, sizeEstimate, mm)
-} else {
-  UnsafeHashedRelation(input, key, sizeEstimate, mm)
+try {
+  if (key.length == 1 && key.head.dataType == LongType) {
+LongHashedRelation(input, key, sizeEstimate, mm)
+  } else {
+UnsafeHashedRelation(input, key, sizeEstimate, mm)
+  }
+} catch {
+  case oe: SparkOutOfMemoryError =>
+throw new SparkOutOfMemoryError(s"If this SparkOutOfMemoryError 
happens in Spark driver," +
--- End diff --

it seems we don't need to change anything, maybe just add some comments to 
say where OOM can occur, i.e. `RDD#collect` and `BroadcastMode#transform`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...

2018-05-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21416
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21379: [SPARK-24327][SQL] Add an option to quote a parti...

2018-05-27 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21379#discussion_r191117129
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
 ---
@@ -78,7 +79,12 @@ private[sql] object JDBCRelation extends Logging {
 // Overflow and silliness can happen if you subtract then divide.
 // Here we get a little roundoff, but that's (hopefully) OK.
 val stride: Long = upperBound / numPartitions - lowerBound / 
numPartitions
-val column = partitioning.column
+val column = if (jdbcOptions.quotePartitionColumnName) {
+  val dialect = JdbcDialects.get(jdbcOptions.url)
+  dialect.quoteIdentifier(partitioning.column)
--- End diff --

[The latest 
fix](https://github.com/apache/spark/pull/21379/commits/8d5fa9a25dff75327e8ff8ff13b756b895e6e512)
 changes an existing behaviour (when quoting non-partition column names), so 
I'm not sure this fix is acceptable. Any suggestion?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21379
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21379
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3630/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21379
  
**[Test build #91213 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91213/testReport)**
 for PR 21379 at commit 
[`8d5fa9a`](https://github.com/apache/spark/commit/8d5fa9a25dff75327e8ff8ff13b756b895e6e512).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21416
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21416
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3629/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21416
  
**[Test build #91212 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91212/testReport)**
 for PR 21416 at commit 
[`1332406`](https://github.com/apache/spark/commit/1332406d7f4ca7a9a4a85338f758430ecc334ff8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-27 Thread e-dorigatti

Github user e-dorigatti commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r191113977
  
--- Diff: python/pyspark/util.py ---
@@ -89,6 +93,33 @@ def majorMinorVersion(sparkVersion):
  " version numbers.")
 
 
+def fail_on_stopiteration(f):
+"""
+Wraps the input function to fail on 'StopIteration' by raising a 
'RuntimeError'
+prevents silent loss of data when 'f' is used in a for loop
+"""
+def wrapper(*args, **kwargs):
+try:
+return f(*args, **kwargs)
+except StopIteration as exc:
+raise RuntimeError(
+"Caught StopIteration thrown from user's code; failing the 
task",
+exc
+)
+
+# prevent inspect to fail
+# e.g. inspect.getargspec(sum) raises
+# TypeError:  is not a Python function
+try:
+argspec = _get_argspec(f)
--- End diff --

You said to do it in `udf.UserDefinedFunction._create_judf`, but sent the 
code of `udf._create_udf`. I assume you meant the former, since we cannot do 
that in `_create_udf` (`UserDefinedFunction._wrapped` needs the original 
function for its documentation and other stuff). I will also simplify the code 
as you suggested, yes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...

2018-05-27 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/21416
  
@cloud-fan unfortunately,  scala vararg can not be overloaded, and scala 
will return the following error.

```scala
Error:(410, 32) ambiguous reference to overloaded definition,
both method isin in class Column of type (values: 
Iterable[_])org.apache.spark.sql.Column
and  method isin in class Column of type (list: 
Any*)org.apache.spark.sql.Column
match argument types (Seq[Int])
checkAnswer(df.filter($"a".isin(Seq(1, 2))),
```

I'm leaning toward to using `isInCollection` now, and implement the 
corresponding python APIs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-27 Thread e-dorigatti

Github user e-dorigatti commented on the issue:

https://github.com/apache/spark/pull/21383
  
Yes, the problem was that the signature is lost when the function is 
wrapped, and the worker needs the signature to know whether the function needs 
keys together with values or not.

What I meant is that fixing worker side might fix this specific JIRA, but 
the bug will still occur when an udf is not used in a worker, but somewhere 
else. If you fix at the udf side, however, the bug will never occur again, 
regardless of how and where the udf is used. I don't know the codebase well 
enough to know whether this will be a real problem, though


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-27 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16478
  
@metasim Thanks for comment. This patch takes the approach to use Scala 
data types on UDT and let SparkSQL's encoder to convert user data to internal 
format.

I'm not sure which part the serialization/deserialization cost you'd like 
to avoid. Following the link to `InternalRowTile`, seems it wraps an 
`InternalRow` and access some fields in the row. You still need to deserialize 
to `InternalRow` before accessing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21435: [SPARK-24392][PYTHON] Label pandas_udf as Experimental

2018-05-27 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21435
  
Merged to master and branch-2.3.

There were minor conflicts but I just resolved it by myself.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21435: [SPARK-24392][PYTHON] Label pandas_udf as Experim...

2018-05-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21435


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21435: [SPARK-24392][PYTHON] Label pandas_udf as Experimental

2018-05-27 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21435
  
Thanks @BryanCutler. still LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3628/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21435: [SPARK-24392][PYTHON] Label pandas_udf as Experimental

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21435
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21435: [SPARK-24392][PYTHON] Label pandas_udf as Experimental

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21435
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91209/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21435: [SPARK-24392][PYTHON] Label pandas_udf as Experimental

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21435
  
**[Test build #91209 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91209/testReport)**
 for PR 21435 at commit 
[`6b87330`](https://github.com/apache/spark/commit/6b873309b3db7022cfe47561c47d7ae320a46d65).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3627/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21288
  
**[Test build #91211 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91211/testReport)**
 for PR 21288 at commit 
[`2c0d5cb`](https://github.com/apache/spark/commit/2c0d5cbf51268540653543b96de135a6923c6cef).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21288
  
**[Test build #91210 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91210/testReport)**
 for PR 21288 at commit 
[`b7859ed`](https://github.com/apache/spark/commit/b7859ed0905ce3e0476e5d327f65798acc7aba8c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

2018-05-27 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21288#discussion_r191109472
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
 ---
@@ -0,0 +1,437 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.io.File
+
+import scala.util.{Random, Try}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.{DataFrame, SparkSession}
+import org.apache.spark.sql.functions.monotonically_increasing_id
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.{Benchmark, Utils}
+
+
+/**
+ * Benchmark to measure read performance with Filter pushdown.
+ * To run this:
+ *  spark-submit --class  
+ */
+object FilterPushdownBenchmark {
+  val conf = new SparkConf()
+.setAppName("FilterPushdownBenchmark")
+.setIfMissing("spark.master", "local[1]")
+.setIfMissing("spark.driver.memory", "3g")
+.setIfMissing("spark.executor.memory", "3g")
+.setIfMissing("orc.compression", "snappy")
+.setIfMissing("spark.sql.parquet.compression.codec", "snappy")
+
+  private val spark = SparkSession.builder().config(conf).getOrCreate()
+
+  def withTempPath(f: File => Unit): Unit = {
+val path = Utils.createTempDir()
+path.delete()
+try f(path) finally Utils.deleteRecursively(path)
+  }
+
+  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
+try f finally tableNames.foreach(spark.catalog.dropTempView)
+  }
+
+  def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
+val (keys, values) = pairs.unzip
+val currentValues = keys.map(key => Try(spark.conf.get(key)).toOption)
+(keys, values).zipped.foreach(spark.conf.set)
+try f finally {
+  keys.zip(currentValues).foreach {
+case (key, Some(value)) => spark.conf.set(key, value)
+case (key, None) => spark.conf.unset(key)
+  }
+}
+  }
+
+  private def prepareTable(
+  dir: File, numRows: Int, width: Int, useStringForValue: Boolean): 
Unit = {
+import spark.implicits._
+val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i")
+val valueCol = if (useStringForValue) {
+  monotonically_increasing_id().cast("string")
+} else {
+  monotonically_increasing_id()
+}
+val df = spark.range(numRows).map(_ => 
Random.nextLong).selectExpr(selectExpr: _*)
+  .withColumn("value", valueCol)
+  .sort("value")
+
+saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
+saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
+  }
+
+  private def prepareStringDictTable(
+  dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = 
{
+val selectExpr = (0 to width).map {
+  case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value"
+  case i => s"CAST(rand() AS STRING) c$i"
+}
+val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value")
+
+saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
+saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
+  }
+
+  private def saveAsOrcTable(df: DataFrame, dir: String): Unit = {
+df.write.mode("overwrite").orc(dir)
+spark.read.orc(dir).createOrReplaceTempView("orcTable")
+  }
+
+  private def saveAsParquetTable(df: DataFrame, dir: String): Unit = {
+df.write.mode("overwrite").parquet(dir)
+spark.read.parquet(dir).createOrReplaceTempView("parquetTable")
+  }
+
+  def filterPushDownBenchmark(
+  values: Int,
+  title: String,
+  whereExpr: String,
+  selectExpr: String = "*"): Unit = {
+val benchmark = new Benchmark(title, values, minNumIters = 5)
+
+Seq(false, true).foreach { pushDownEnabled =>
+  val name

[GitHub] spark issue #21435: [SPARK-24392][PYTHON] Label pandas_udf as Experimental

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21435
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21435: [SPARK-24392][PYTHON] Label pandas_udf as Experimental

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21435
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3626/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21432: [SPARK-24373][SQL] Add AnalysisBarrier to Relatio...

2018-05-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21432


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21435: [SPARK-24392][PYTHON] Label pandas_udf as Experimental

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21435
  
**[Test build #91209 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91209/testReport)**
 for PR 21435 at commit 
[`6b87330`](https://github.com/apache/spark/commit/6b873309b3db7022cfe47561c47d7ae320a46d65).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21435: [SPARK-24392][PYTHON] Label pandas_udf as Experim...

2018-05-27 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/21435#discussion_r191107787
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1827,6 +1827,10 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
   - Since Spark 2.0, Spark converts Parquet Hive tables by default for 
better performance. Since Spark 2.4, Spark converts ORC Hive tables by default, 
too. It means Spark uses its own ORC support by default instead of Hive SerDe. 
As an example, `CREATE TABLE t(id int) STORED AS ORC` would be handled with 
Hive SerDe in Spark 2.3, and in Spark 2.4, it would be converted into Spark's 
ORC data source table and ORC vectorization would be applied. To set `false` to 
`spark.sql.hive.convertMetastoreOrc` restores the previous behavior.
   - In version 2.3 and earlier, CSV rows are considered as malformed if at 
least one column value in the row is malformed. CSV parser dropped such rows in 
the DROPMALFORMED mode or outputs an error in the FAILFAST mode. Since Spark 
2.4, CSV row is considered as malformed only when it contains malformed column 
values requested from CSV datasource, other values can be ignored. As an 
example, CSV file contains the "id,name" header and one row "1234". In Spark 
2.4, selection of the id column consists of a row with one column value 1234 
but in Spark 2.3 and earlier it is empty in the DROPMALFORMED mode. To restore 
the previous behavior, set `spark.sql.csv.parser.columnPruning.enabled` to 
`false`.
 
+## Upgrading From Spark SQL 2.3.0 to 2.3.1 and Above
--- End diff --

sure


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21432: [SPARK-24373][SQL] Add AnalysisBarrier to RelationalGrou...

2018-05-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21432
  
thanks, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...

2018-05-27 Thread jinxing64

Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21424#discussion_r191107018
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -106,11 +108,20 @@ private[execution] object HashedRelation {
   1),
 0)
 }
-
-if (key.length == 1 && key.head.dataType == LongType) {
-  LongHashedRelation(input, key, sizeEstimate, mm)
-} else {
-  UnsafeHashedRelation(input, key, sizeEstimate, mm)
+try {
+  if (key.length == 1 && key.head.dataType == LongType) {
+LongHashedRelation(input, key, sizeEstimate, mm)
+  } else {
+UnsafeHashedRelation(input, key, sizeEstimate, mm)
+  }
+} catch {
+  case oe: SparkOutOfMemoryError =>
+throw new SparkOutOfMemoryError(s"If this SparkOutOfMemoryError 
happens in Spark driver," +
--- End diff --

So, I change back ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21389
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21389
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3625/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21389
  
**[Test build #91208 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91208/testReport)**
 for PR 21389 at commit 
[`04f4028`](https://github.com/apache/spark/commit/04f40281e2a457ea27d425b5b1db0e07a0150aaf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...

2018-05-27 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21416
  
Yes. The design of PySpark API seems to be a bit different than Scala/Java 
API at beginning. If we are going to make them consistent, either we break 
Scala queries like `col.isin(Array[Byte](1,2,3))` or PySpark queries like 
`col.isin([1, 2, 3])`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-05-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21319
  
Anyway I think moving statistics to physical plan is the ultimate solution, 
all others are workarounds, we should pick the simplest workaround. I'm glad to 
take your stats visitor approach if it's simpler.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-05-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21319
  
> but in the first pass I see that it breaks equality by relying on a 
user-supplied reader instance.

Can you say more about this? I explicitly mentioned it in the PR 
description that
> keep DataSourceReader as an optional parameter in the constructor of 
DataSourceV2Relation, exclude it in the equality definition but include it when 
copying.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

2018-05-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19602#discussion_r191104615
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -657,18 +656,46 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 
 val useAdvanced = SQLConf.get.advancedPartitionPredicatePushdownEnabled
 
+object ExtractAttribute {
+  def unapply(expr: Expression): Option[Attribute] = {
+expr match {
+  case attr: Attribute => Some(attr)
+  case cast @ Cast(child, dt: StringType, _) if 
child.dataType.isInstanceOf[NumericType] =>
+unapply(child)
+  case cast @ Cast(child, dt: NumericType, _) if child.dataType == 
StringType =>
--- End diff --

I think we should only support safe upcast, `cast("1.234" as int)` should 
be excluded. Can we follow `Cast.mayTruncate` strictly?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...

2018-05-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21424#discussion_r191104413
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -106,11 +108,20 @@ private[execution] object HashedRelation {
   1),
 0)
 }
-
-if (key.length == 1 && key.head.dataType == LongType) {
-  LongHashedRelation(input, key, sizeEstimate, mm)
-} else {
-  UnsafeHashedRelation(input, key, sizeEstimate, mm)
+try {
+  if (key.length == 1 && key.head.dataType == LongType) {
+LongHashedRelation(input, key, sizeEstimate, mm)
+  } else {
+UnsafeHashedRelation(input, key, sizeEstimate, mm)
+  }
+} catch {
+  case oe: SparkOutOfMemoryError =>
+throw new SparkOutOfMemoryError(s"If this SparkOutOfMemoryError 
happens in Spark driver," +
--- End diff --

ah i see. So the `SparkOutOfMemoryError` is thrown by `BytesToBytesMap`, we 
need to catch and rethrow it to attach the error message anyway.

I also found that we may throw OOM when calling 
`child.executeCollectIterator` which calls `RDD#collect`, seems the previous 
code is corrected.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21397: [SPARK-24334] Fix race condition in ArrowPythonRu...

2018-05-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21397


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...

2018-05-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21416
  
@viirya good point. One thing I'm not sure is, does `isin(collection: 
Iterable)` conflict with `isin(list: Any*)`? if they don't conflict, they we 
can follow pyspark.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-27 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21397
  
Merged to master and branch-2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isInCollection in DataF...

2018-05-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21416#discussion_r191101660
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -219,7 +218,14 @@ object ReorderAssociativeOperator extends 
Rule[LogicalPlan] {
 object OptimizeIn extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan => q transformExpressionsDown {
-  case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral
+  case In(v, list) if list.isEmpty =>
--- End diff --

this improvement looks reasonable, but can we move them to a separated PR? 
it's not related to adding `isInCollection`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18390: [SPARK-21178][ML] Add support for label specific metrics...

2018-05-27 Thread thesuperzapper

Github user thesuperzapper commented on the issue:

https://github.com/apache/spark/pull/18390
  
@MLnick @WeichenXu123 sorry to ping again, but what is currently stopping 
this from merging?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21420
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3624/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21420
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21420: [SPARK-24377][Spark Submit] make --py-files work in non ...

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21420
  
**[Test build #91207 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91207/testReport)**
 for PR 21420 at commit 
[`e66ea49`](https://github.com/apache/spark/commit/e66ea49000860d593074296b2a86e8bbdf5f0261).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21317: [SPARK-24232][k8s] Add support for secret env var...

2018-05-27 Thread skonto

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/21317#discussion_r191092872
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/SecretEnvUtils.scala
 ---
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.Container
+
+private[spark] object SecretEnvUtils {
+
+  def containerHasEnvVar(container: Container, envVarName: String): 
Boolean = {
--- End diff --

Ok will fix. Guys I have one thing that worries me. Yaml allows to pass as 
characters even the reserved ones. So if the name for the secret had a 
character like `:` character then we cant handle this right now.  In yaml that 
would be escaped or we could use double quotes. Here we dont have that logic 
correct? Is it ok not to have the same level of expressiveness?  At the end of 
the day yaml seems close to k8s and I was wondering why we dont just specify 
all properties for pods (driver & executor) in the yaml format read it and set 
everyting via that method. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21434: [SPARK-24331][SparkR][SQL] Adding arrays_overlap,...

2018-05-27 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21434#discussion_r191090953
  
--- Diff: R/pkg/R/functions.R ---
@@ -3062,6 +3077,21 @@ setMethod("array_sort",
 column(jc)
   })
 
+#' @details
+#' \code{arrays_overlap}: Returns true if the input arrays have at least 
one non-null element in
+#' common. If not and both arrays are non-empty and any of them contains a 
null, it returns null.
+#' It returns false otherwise.
+#'
+#' @rdname column_collection_functions
+#' @aliases arrays_overlap arrays_overlap,Column-method
+#' @note arrays_overlap since 2.4.0
+setMethod("arrays_overlap",
+  signature(y = "Column", x = "Column"),
--- End diff --

right, I don't know why they were (y, x) either. for some (one?) it was to 
match existing parameter names (like `atan2`), and then it sticks.

I think we should first - name the first column `x`, second - stay close to 
the parameter name in Scala for everything else.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21434: [SPARK-24331][SparkR][SQL] Adding arrays_overlap,...

2018-05-27 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21434#discussion_r191090821
  
--- Diff: R/pkg/R/functions.R ---
@@ -207,7 +208,7 @@ NULL
 #' tmp <- mutate(df, v1 = create_array(df$mpg, df$cyl, df$hp))
 #' head(select(tmp, array_contains(tmp$v1, 21), size(tmp$v1)))
 #' head(select(tmp, array_max(tmp$v1), array_min(tmp$v1)))
-#' head(select(tmp, array_position(tmp$v1, 21), array_sort(tmp$v1)))
+#' head(select(tmp, array_position(tmp$v1, 21), array_repeat(21, 5L), 
array_sort(tmp$v1)))
--- End diff --

also for `5L`, `5` should be ok and more clear as well


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21434: [SPARK-24331][SparkR][SQL] Adding arrays_overlap,...

2018-05-27 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21434#discussion_r191090863
  
--- Diff: R/pkg/R/functions.R ---
@@ -3048,6 +3048,26 @@ setMethod("array_position",
 column(jc)
   })
 
+#' @details
+#' \code{array_repeat}: Creates an array containing the left argument 
repeated the number of times
+#' given by the right argument.
+#'
+#' @param count Column or constant determining the number of repetitions.
+#' @rdname column_collection_functions
+#' @aliases array_repeat array_repeat,Column,numericOrColumn-method
+#' @note array_repeat since 2.4.0
+setMethod("array_repeat",
+  signature(x = "Column", count = "numericOrColumn"),
+  function(x, count) {
+if (class(count) == "Column") {
+count <- count@jc
+} else {
+count <- as.integer(count)
--- End diff --

indent is 2 space actually, could you update this and line L3063


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21434: [SPARK-24331][SparkR][SQL] Adding arrays_overlap,...

2018-05-27 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21434#discussion_r191090619
  
--- Diff: R/pkg/R/functions.R ---
@@ -207,7 +208,7 @@ NULL
 #' tmp <- mutate(df, v1 = create_array(df$mpg, df$cyl, df$hp))
 #' head(select(tmp, array_contains(tmp$v1, 21), size(tmp$v1)))
 #' head(select(tmp, array_max(tmp$v1), array_min(tmp$v1)))
-#' head(select(tmp, array_position(tmp$v1, 21), array_sort(tmp$v1)))
+#' head(select(tmp, array_position(tmp$v1, 21), array_repeat(21, 5L), 
array_sort(tmp$v1)))
--- End diff --

this example is a bit unusual? do you intend for the first param to be `21` 
the constant? (also, does that work?)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21434: [SPARK-24331][SparkR][SQL] Adding arrays_overlap,...

2018-05-27 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21434#discussion_r191090696
  
--- Diff: R/pkg/R/functions.R ---
@@ -3048,6 +3048,26 @@ setMethod("array_position",
 column(jc)
   })
 
+#' @details
+#' \code{array_repeat}: Creates an array containing the left argument 
repeated the number of times
+#' given by the right argument.
+#'
+#' @param count Column or constant determining the number of repetitions.
--- End diff --

change to `@param count a Column or constant determining the number of 
repetitions.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21434: [SPARK-24331][SparkR][SQL] Adding arrays_overlap,...

2018-05-27 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21434#discussion_r191090725
  
--- Diff: R/pkg/R/functions.R ---
@@ -3048,6 +3048,26 @@ setMethod("array_position",
 column(jc)
   })
 
+#' @details
+#' \code{array_repeat}: Creates an array containing the left argument 
repeated the number of times
+#' given by the right argument.
--- End diff --

let's change this to
```
#' \code{array_repeat}: Creates an array containing \code{x} repeated the 
number of times
#' given by \code{count}.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...

2018-05-27 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/21416
  
@cloud-fan Let me know if the new API looks good to you. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21370
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91206/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21370
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21370
  
**[Test build #91206 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91206/testReport)**
 for PR 21370 at commit 
[`425bee1`](https://github.com/apache/spark/commit/425bee1628917859b58dc87faccb7bc6146b7f1f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21424
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21424
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21424
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91203/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21424
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91204/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21424
  
**[Test build #91203 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91203/testReport)**
 for PR 21424 at commit 
[`8f224fb`](https://github.com/apache/spark/commit/8f224fbedda0d2e126810918c99291eb395d6bab).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21424
  
**[Test build #91204 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91204/testReport)**
 for PR 21424 at commit 
[`3a9669c`](https://github.com/apache/spark/commit/3a9669cff3486aa98cf0a49e69cc0e08d927affd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16478
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91202/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16478
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16478
  
**[Test build #91202 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91202/testReport)**
 for PR 16478 at commit 
[`ae00de1`](https://github.com/apache/spark/commit/ae00de13dd779a2a09b142c54a2fcc144d7f8c23).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91201/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21397
  
**[Test build #91201 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91201/testReport)**
 for PR 21397 at commit 
[`756a73a`](https://github.com/apache/spark/commit/756a73aea843e8d5d90994d127c0d9d4c357c67b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21397: [SPARK-24334] Fix race condition in ArrowPythonRu...

2018-05-27 Thread icexelloss

Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/21397#discussion_r191082371
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4931,6 +4931,30 @@ def foo3(key, pdf):
 expected4 = udf3.func((), pdf)
 self.assertPandasEqual(expected4, result4)
 
+# Regression test for SPARK-24334
+def test_memory_leak(self):
--- End diff --

SGTM! Moved to PR description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-27 Thread icexelloss

Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/21397
  
Sure! Added.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21370
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21370
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3623/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21370
  
**[Test build #91206 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91206/testReport)**
 for PR 21370 at commit 
[`425bee1`](https://github.com/apache/spark/commit/425bee1628917859b58dc87faccb7bc6146b7f1f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-27 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191080316
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and repl you're using supports 
eager evaluation,
+dataframe will be ran automatically and html table will feedback the 
queries user have defined
+(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for 
more details).
+  
+
+
+  spark.sql.repl.eagerEval.showRows
+  20
+  
+Default number of rows in HTML table.
+  
+
+
+  spark.sql.repl.eagerEval.truncate
--- End diff --

Yep, I just want to keep the same behavior of `dataframe.show`.
```
That's useful for console output, but not so much for notebooks.
```
Notebooks aren't afraid for too many chaacters within a cell, so I just 
delete this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21370
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3622/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21370
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-27 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191080194
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -237,9 +238,13 @@ class Dataset[T] private[sql](
* @param truncate If set to more than 0, truncates strings to 
`truncate` characters and
*   all cells will be aligned right.
* @param vertical If set to true, prints output rows vertically (one 
line per column value).
+   * @param html If set to true, return output as html table.
--- End diff --

@viirya @gatorsmile @rdblue Sorry for the late commit, the refactor do in 
94f3414. I spend some time on testing and implementing the transformation of 
rows between python and scala.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21370
  
**[Test build #91205 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91205/testReport)**
 for PR 21370 at commit 
[`94f3414`](https://github.com/apache/spark/commit/94f3414ebb689f4435018eab2e888e7d2974dc98).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21370
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91205/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21370
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-27 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191080082
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -358,6 +357,43 @@ class Dataset[T] private[sql](
 sb.toString()
   }
 
+  /**
+   * Transform current row string and append to builder
+   *
+   * @param row   Current row of string
+   * @param truncate  If set to more than 0, truncates strings to 
`truncate` characters and
+   *all cells will be aligned right.
+   * @param colWidths The width of each column
+   * @param html  If set to true, return output as html table.
+   * @param head  Set to true while current row is table head.
+   * @param sbStringBuilder for current row.
+   */
+  private[sql] def appendRowString(
+  row: Seq[String],
+  truncate: Int,
+  colWidths: Array[Int],
+  html: Boolean,
+  head: Boolean,
+  sb: StringBuilder): Unit = {
+val data = row.zipWithIndex.map { case (cell, i) =>
+  if (truncate > 0) {
+StringUtils.leftPad(cell, colWidths(i))
+  } else {
+StringUtils.rightPad(cell, colWidths(i))
+  }
+}
+(html, head) match {
+  case (true, true) =>
+data.map(StringEscapeUtils.escapeHtml).addString(
+  sb, "", "\n", "\n")
--- End diff --

I change the format in python \_repr\_html\_ in 94f3414.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21370
  
**[Test build #91205 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91205/testReport)**
 for PR 21370 at commit 
[`94f3414`](https://github.com/apache/spark/commit/94f3414ebb689f4435018eab2e888e7d2974dc98).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-27 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191080049
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False):
  name | Bob
 """
 if isinstance(truncate, bool) and truncate:
-print(self._jdf.showString(n, 20, vertical))
+print(self._jdf.showString(n, 20, vertical, False))
 else:
-print(self._jdf.showString(n, int(truncate), vertical))
+print(self._jdf.showString(n, int(truncate), vertical, False))
--- End diff --

Fix in 94f3414.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-27 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191080066
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False):
  name | Bob
 """
 if isinstance(truncate, bool) and truncate:
-print(self._jdf.showString(n, 20, vertical))
+print(self._jdf.showString(n, 20, vertical, False))
 else:
-print(self._jdf.showString(n, int(truncate), vertical))
+print(self._jdf.showString(n, int(truncate), vertical, False))
 
 def __repr__(self):
 return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
 
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by repr 
you're
--- End diff --

Thanks, change to REPL in 94f3414.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-27 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191080057
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3040,6 +3040,50 @@ def test_csv_sampling_ratio(self):
 .csv(rdd, samplingRatio=0.5).schema
 self.assertEquals(schema, StructType([StructField("_c0", 
IntegerType(), True)]))
 
+def _get_content(self, content):
+"""
+Strips leading spaces from content up to the first '|' in each 
line.
+"""
+import re
+pattern = re.compile(r'^ *\|', re.MULTILINE)
--- End diff --

Thanks! Fix it in 94f3414.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-27 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191080044
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False):
  name | Bob
 """
 if isinstance(truncate, bool) and truncate:
-print(self._jdf.showString(n, 20, vertical))
+print(self._jdf.showString(n, 20, vertical, False))
--- End diff --

Thanks, fix in 94f3414.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 153 matches

Mail list logo