date:20180830

[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22274
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22274
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2725/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21987: [SPARK-25015][BUILD] Update Hadoop 2.7 to 2.7.7

2018-08-30 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/21987
  
It seems that this change caused permission issue:
```
export HADOOP_PROXY_USER=user_a
spark-sql
```
It will create dir `/tmp/hive-$%7Buser.name%7D/user_a/`. then change to 
other user:
```
export HADOOP_PROXY_USER=user_b
spark-sql
```
exception:
```scala
Exception in thread "main" java.lang.RuntimeException: 
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=user_b, access=EXECUTE, 
inode="/tmp/hive-$%7Buser.name%7D/user_b/6b446017-a880-4f23-a8d0-b62f37d3c413":user_a:hadoop:drwx--
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1780)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
```

I'll do verification later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22274
  
**[Test build #95520 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95520/testReport)**
 for PR 22274 at commit 
[`4b6cd9f`](https://github.com/apache/spark/commit/4b6cd9f532e07f08c86659dcd4a0f2d40995d8ef).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22270
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22270
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95516/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22270
  
**[Test build #95516 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95516/testReport)**
 for PR 22270 at commit 
[`53f4984`](https://github.com/apache/spark/commit/53f4984bd35d07da7382866960279233aadebea5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21721
  
@arunmahadevan, feel free to pick up the commits in my PR in your followup 
if they have to be changed. I will close mine.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread arunmahadevan

Github user arunmahadevan commented on the issue:

https://github.com/apache/spark/pull/21721
  
@rxin its for streaming sources and sinks as explained in the [doc](

https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/sources/v2/CustomMetrics.java#L23)

It had to be shared between classes in reader.streaming and 
writer.streaming, so was added in the parent package (similar to other 
streaming specific classes that exists here like 
[StreamingWriteSupportProvider.java 
](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/sources/v2/StreamingWriteSupportProvider.java)

[MicroBatchReadSupportProvider.java](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/sources/v2/MicroBatchReadSupportProvider.java))

we could move all of it to a streaming package.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes fo...

2018-08-30 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22274#discussion_r214246976
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -3633,7 +3633,8 @@ test_that("catalog APIs, currentDatabase, 
setCurrentDatabase, listDatabases", {
   expect_equal(currentDatabase(), "default")
   expect_error(setCurrentDatabase("default"), NA)
   expect_error(setCurrentDatabase("zxwtyswklpf"),
-"Error in setCurrentDatabase : analysis error - Database 
'zxwtyswklpf' does not exist")
+   paste("Error in setCurrentDatabase : analysis error - 
Database",
--- End diff --

@felixcheung Sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22183: [SPARK-25132][SQL][BACKPORT-2.3] Case-insensitive field ...

2018-08-30 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22183
  
As discussed in the JIRA, this is a partial fix, and we need to backport 
another 2 PRs, which is risky. Can we close it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21721
  
I'm confused by this api. Is this for streaming only? If yes, why are they 
not in the stream package? If not, I only found streaming implementation. Maybe 
I missed it.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...

2018-08-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21968#discussion_r214246268
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala
 ---
@@ -130,6 +134,12 @@ class RowBasedHashMapGenerator(
   }
 }.mkString(";\n")
 
+val nullByteWriter = if (groupingKeySchema.map(_.nullable).forall(_ == 
false)) {
--- End diff --

maybe name it `resetNullBits`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...

2018-08-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21968#discussion_r214246211
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala
 ---
@@ -48,6 +48,12 @@ class RowBasedHashMapGenerator(
 val keySchema = ctx.addReferenceObj("keySchemaTerm", groupingKeySchema)
 val valueSchema = ctx.addReferenceObj("valueSchemaTerm", bufferSchema)
 
+val numVarLenFields = groupingKeys.map(_.dataType).count {
--- End diff --

groupingKeys.map(_.dataType).count(dt => !UnsafeRow.isFixedLength(dt))


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-30 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r214245829
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2546,15 +2546,37 @@ object functions {
   def soundex(e: Column): Column = withExpr { SoundEx(e.expr) }
 
   /**
-   * Splits str around pattern (pattern is a regular expression).
+   * Splits str around matches of the given regex.
*
-   * @note Pattern is a string representation of the regular expression.
+   * @param str a string expression to split
+   * @param regex a string representing a regular expression. The regex 
string should be
+   *  a Java regular expression.
*
* @group string_funcs
* @since 1.5.0
*/
-  def split(str: Column, pattern: String): Column = withExpr {
-StringSplit(str.expr, lit(pattern).expr)
+  def split(str: Column, regex: String): Column = withExpr {
+StringSplit(str.expr, Literal(regex), Literal(-1))
+  }
+
+  /**
+   * Splits str around matches of the given regex.
+   *
+   * @param str a string expression to split
+   * @param regex a string representing a regular expression. The regex 
string should be
+   *  a Java regular expression.
+   * @param limit an integer expression which controls the number of times 
the regex is applied.
+   *limit greater than 0: The resulting array's length will not be 
more than `limit`,
+   *  and the resulting array's last entry will 
contain all input beyond
+   *  the last matched regex.
+   *limit less than or equal to 0: `regex` will be applied as many 
times as possible, and
+   *   the resulting array can be of any size.
--- End diff --

Indentation here looks a bit odd and looks inconsistent at least. Can you 
double check Scaladoc and format this correctly?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-30 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r214245703
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1669,20 +1669,36 @@ def repeat(col, n):
 return Column(sc._jvm.functions.repeat(_to_java_column(col), n))
 
 
-@since(1.5)
+@since(2.4)
 @ignore_unicode_prefix
-def split(str, pattern):
-"""
-Splits str around pattern (pattern is a regular expression).
-
-.. note:: pattern is a string represent the regular expression.
-
->>> df = spark.createDataFrame([('ab12cd',)], ['s',])
->>> df.select(split(df.s, '[0-9]+').alias('s')).collect()
-[Row(s=[u'ab', u'cd'])]
-"""
-sc = SparkContext._active_spark_context
-return Column(sc._jvm.functions.split(_to_java_column(str), pattern))
+def split(str, regex, limit=-1):
--- End diff --

Please change `regex ` back to `pattern`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21721
  
Stuff like this merits api discussions. Not just implementation changes ...



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21721
  
I actually thought those all of them are part of DataSource V2. Why are we 
fine with changing those interfaces but not okay with this one and we consider 
reverting it?

Other things should be clarified if there are some concerns, yea of course. 
In this case, switching it to `Unstable` looks alleviating the concerns listed 
here enough.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-30 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r214244981
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1669,20 +1669,36 @@ def repeat(col, n):
 return Column(sc._jvm.functions.repeat(_to_java_column(col), n))
 
 
-@since(1.5)
+@since(2.4)
 @ignore_unicode_prefix
-def split(str, pattern):
-"""
-Splits str around pattern (pattern is a regular expression).
-
-.. note:: pattern is a string represent the regular expression.
-
->>> df = spark.createDataFrame([('ab12cd',)], ['s',])
->>> df.select(split(df.s, '[0-9]+').alias('s')).collect()
-[Row(s=[u'ab', u'cd'])]
-"""
-sc = SparkContext._active_spark_context
-return Column(sc._jvm.functions.split(_to_java_column(str), pattern))
+def split(str, regex, limit=-1):
--- End diff --

this would be a breaking API change I believe for python


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-30 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r214244918
  
--- Diff: R/pkg/R/functions.R ---
@@ -3410,13 +3410,14 @@ setMethod("collect_set",
 #' \dontrun{
 #' head(select(df, split_string(df$Sex, "a")))
 #' head(select(df, split_string(df$Class, "\\d")))
+#' head(select(df, split_string(df$Class, "\\d", 2)))
 #' # This is equivalent to the following SQL expression
 #' head(selectExpr(df, "split(Class, 'd')"))}
 #' @note split_string 2.3.0
 setMethod("split_string",
   signature(x = "Column", pattern = "character"),
-  function(x, pattern) {
-jc <- callJStatic("org.apache.spark.sql.functions", "split", 
x@jc, pattern)
+  function(x, pattern, limit = -1) {
+jc <- callJStatic("org.apache.spark.sql.functions", "split", 
x@jc, pattern, limit)
--- End diff --

you should have `as.integer(limit)` instead
could we add a test in R?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22298
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22298
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2724/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22298
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2724/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22213: [SPARK-25221][DEPLOY] Consistent trailing whitesp...

2018-08-30 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22213#discussion_r214244665
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -1144,6 +1144,46 @@ class SparkSubmitSuite
 conf1.get(PY_FILES.key) should be (s"s3a://${pyFile.getAbsolutePath}")
 conf1.get("spark.submit.pyFiles") should (startWith("/"))
   }
+
+  test("handles natural line delimiters in --properties-file and --conf 
uniformly") {
+val delimKey = "spark.my.delimiter."
+val LF = "\n"
+val CR = "\r"
+
+val leadingDelimKeyFromFile = s"${delimKey}leadingDelimKeyFromFile" -> 
s"${LF}blah"
+val trailingDelimKeyFromFile = s"${delimKey}trailingDelimKeyFromFile" 
-> s"blah${CR}"
+val infixDelimFromFile = s"${delimKey}infixDelimFromFile" -> 
s"${CR}blah${LF}"
+val nonDelimSpaceFromFile = s"${delimKey}nonDelimSpaceFromFile" -> " 
blah\f"
--- End diff --

Sorry for the stupid question. I guess I was thinking of something 
different.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22298
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95519/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22298
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22298
  
**[Test build #95519 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95519/testReport)**
 for PR 22298 at commit 
[`46c30cc`](https://github.com/apache/spark/commit/46c30cc27cd3a7279a116ec6a70a937b8502cd73).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes fo...

2018-08-30 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22274#discussion_r214244580
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -3633,7 +3633,8 @@ test_that("catalog APIs, currentDatabase, 
setCurrentDatabase, listDatabases", {
   expect_equal(currentDatabase(), "default")
   expect_error(setCurrentDatabase("default"), NA)
   expect_error(setCurrentDatabase("zxwtyswklpf"),
-"Error in setCurrentDatabase : analysis error - Database 
'zxwtyswklpf' does not exist")
+   paste("Error in setCurrentDatabase : analysis error - 
Database",
--- End diff --

I'd use paste0 instead to make clear about the implicit space that should 
be after `Database`

ie. `paste0("Error in setCurrentDatabase : analysis error - Database ",  
"'zxwtyswklpf' does not exist"))


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95503/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22192
  
**[Test build #95503 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95503/testReport)**
 for PR 22192 at commit 
[`2907c6b`](https://github.com/apache/spark/commit/2907c6b62495f8d25c0016883202239634685fec).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22281: [SPARK-25280][SQL] Add support for USING syntax for Data...

2018-08-30 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22281
  
For clarification, I am okay with targeting this to 3.0.0 since the code 
freeze will be very soon if I am not mistaken.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22291: [SPARK-25007][R]Add array_intersect/array_except/...

2018-08-30 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22291#discussion_r214244359
  
--- Diff: R/pkg/R/generics.R ---
@@ -799,10 +807,18 @@ setGeneric("array_sort", function(x) { 
standardGeneric("array_sort") })
 #' @name NULL
 setGeneric("arrays_overlap", function(x, y) { 
standardGeneric("arrays_overlap") })
 
+#' @rdname column_collection_functions
+#' @name NULL
+setGeneric("array_union", function(x, y) { standardGeneric("array_union") 
})
+
 #' @rdname column_collection_functions
 #' @name NULL
 setGeneric("arrays_zip", function(x, ...) { standardGeneric("arrays_zip") 
})
 
+#' @rdname column_collection_functions
+#' @name NULL
+setGeneric("shuffle", function(x) { standardGeneric("shuffle") })
--- End diff --

this should go below - this part of the list should be sorted alphabetically


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

2018-08-30 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22048
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-30 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20637
  
with the test removed, do we still need this change? 
https://github.com/apache/spark/pull/20637/files#diff-41747ec3f56901eb7bfb95d2a217e94dR226


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22281: [SPARK-25280][SQL] Add support for USING syntax for Data...

2018-08-30 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22281
  
Yea, but the default fallback should rather be DataSource V2's. Both of you 
are super active in DataSource V2. Do you guys have some concerns about 
defaulting to DataSource V1's behaviour?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22298
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2724/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...

2018-08-30 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/6#discussion_r214243817
  
--- Diff: R/pkg/R/functions.R ---
@@ -1697,8 +1697,8 @@ setMethod("to_date",
   })
 
 #' @details
-#' \code{to_json}: Converts a column containing a \code{structType}, array 
of \code{structType},
-#' a \code{mapType} or array of \code{mapType} into a Column of JSON 
string.
+#' \code{to_json}: Converts a column containing a \code{structType}, a 
\code{mapType}
+#' or an array into a Column of JSON string.
--- End diff --

Let's add one simple python doctest as well


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22281: [SPARK-25280][SQL] Add support for USING syntax for Data...

2018-08-30 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22281
  
USING syntax has to be there, but what can USING maybe only data source v1 
and file format.

IIUC the agreement is: a data source v2 with catalog can create a table 
with USING, and the data source should interpret the USING parameter. e.g. 
`USING parquet` may have a different meaning in iceberg data source.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...

2018-08-30 Thread ifilonenko

Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/22298#discussion_r214243652
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/SecretsTestsSuite.scala
 ---
@@ -53,6 +53,7 @@ private[spark] trait SecretsTestsSuite { k8sSuite: 
KubernetesSuite =>
   .delete()
   }
 
+  // TODO: [SPARK-25291] This test is flaky with regards to memory of 
executors
--- End diff --

@mccheah This test periodically fails on setting proper memory for 
executors on this specific test. I have filed a JIRA: SPARK-25291


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-30 Thread ifilonenko

Github user ifilonenko commented on the issue:

https://github.com/apache/spark/pull/22298
  
@rdblue @holdenk for review. This contains both unit and integration tests 
that verify [SPARK-25004] for K8S


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...

2018-08-30 Thread ifilonenko

GitHub user ifilonenko opened a pull request:

https://github.com/apache/spark/pull/22298

[SPARK-25021][K8S] Add spark.executor.pyspark.memory limit for K8S

## What changes were proposed in this pull request?

Add spark.executor.pyspark.memory limit for K8S

## How was this patch tested?

Unit and Integration tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ifilonenko/spark SPARK-25021

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22298.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22298


commit b54a039da08aec93a6db9d1470d0b2eaaec08814
Author: Ilan Filonenko 
Date:   2018-08-30T00:19:40Z

initial WIP push for SPARK-25021

commit 75742a37687a7eb3ebaa34069ac7a62521a4e2f8
Author: Ilan Filonenko 
Date:   2018-08-30T05:26:27Z

add python.worker.reuse

commit 46c30cc27cd3a7279a116ec6a70a937b8502cd73
Author: Ilan Filonenko 
Date:   2018-08-31T04:32:22Z

final checks with e2e tests




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21721
  
Note that, data source v2 API is not stable yet and we may even change the 
abstraction of the APIs. The design of custom metrics may affect the design of 
the streaming source APIs.

I had a hard time to figure out the life cycle of custom metrics. It seems 
like its life cycle should be bound to an epoch, but unfortunately we don't 
have such an interface in continuous streaming to represent an epoch. Is it 
possible that we may end up with 2 sets of custom metrics APIs for micro-batch 
and continuous? The documentation added in this PR is not clear about this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...

2018-08-30 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/6#discussion_r214243115
  
--- Diff: R/pkg/R/functions.R ---
@@ -1697,8 +1697,8 @@ setMethod("to_date",
   })
 
 #' @details
-#' \code{to_json}: Converts a column containing a \code{structType}, array 
of \code{structType},
-#' a \code{mapType} or array of \code{mapType} into a Column of JSON 
string.
+#' \code{to_json}: Converts a column containing a \code{structType}, a 
\code{mapType}
+#' or an array into a Column of JSON string.
--- End diff --

it should
could we add some tests for this in R?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileSize bec...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22232
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95508/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileSize bec...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22232
  
**[Test build #95508 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95508/testReport)**
 for PR 22232 at commit 
[`1c32646`](https://github.com/apache/spark/commit/1c326466fbd24c432184be6e53afec93369970c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-30 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21732
  
> The only tricky thing is, Product is handled specially in the top level, 
being flattened into multiple columns.

@cloud-fan Compared with Option of Product which is not supported before, 
the encoding of Product is current behavior. I think we don't need to change it 
so far. WDYT?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/7
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/7
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95511/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/7
  
**[Test build #95511 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95511/testReport)**
 for PR 7 at commit 
[`a641106`](https://github.com/apache/spark/commit/a6411069c352b30f9094a83991c35f0730b5df55).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22186
  
**[Test build #95518 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95518/testReport)**
 for PR 22186 at commit 
[`fbced52`](https://github.com/apache/spark/commit/fbced52e5687cd5eb6a06c3b9bca5cbeb9343002).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22186
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95518/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22186
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in Hive te...

2018-08-30 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22264
  
@srowen  A PR for this "bug" is proposed: 
https://github.com/scala/scala/pull/7156

Hopefully, Scala 2.12.7 will fix it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20086: [SPARK-22903]Fix already being created exception in stag...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20086
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in ...

2018-08-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22264


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-08-30 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22295#discussion_r214237818
  
--- Diff: python/pyspark/sql/session.py ---
@@ -252,6 +252,16 @@ def newSession(self):
 """
 return self.__class__(self._sc, self._jsparkSession.newSession())
 
+@since(2.4)
+def getActiveSession(self):
+"""
+Returns the active SparkSession for the current thread, returned 
by the builder.
+>>> s = spark.getActiveSession()
+>>> spark._jsparkSession.getDefaultSession().get().equals(s.get())
+True
+"""
+return self._jsparkSession.getActiveSession()
--- End diff --

Does this return JVM instance?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22213: [SPARK-25221][DEPLOY] Consistent trailing whitesp...

2018-08-30 Thread gerashegalov

Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/22213#discussion_r214237801
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -1144,6 +1144,46 @@ class SparkSubmitSuite
 conf1.get(PY_FILES.key) should be (s"s3a://${pyFile.getAbsolutePath}")
 conf1.get("spark.submit.pyFiles") should (startWith("/"))
   }
+
+  test("handles natural line delimiters in --properties-file and --conf 
uniformly") {
+val delimKey = "spark.my.delimiter."
+val LF = "\n"
+val CR = "\r"
+
+val leadingDelimKeyFromFile = s"${delimKey}leadingDelimKeyFromFile" -> 
s"${LF}blah"
+val trailingDelimKeyFromFile = s"${delimKey}trailingDelimKeyFromFile" 
-> s"blah${CR}"
+val infixDelimFromFile = s"${delimKey}infixDelimFromFile" -> 
s"${CR}blah${LF}"
+val nonDelimSpaceFromFile = s"${delimKey}nonDelimSpaceFromFile" -> " 
blah\f"
--- End diff --

@jerryshao I try not to spend time on issues unrelated to our production 
deployments. @steveloughran and this PR already pointed at the 
`Properties#load` method documenting the format.

Line terminator characters can be included using `\r` and `\n` escape 
sequences. Or you can encode any character using `\u`

In addition you can take a look at the file generated by this code:
```
#test whitespace
#Thu Aug 30 20:20:33 PDT 2018
spark.my.delimiter.nonDelimSpaceFromFile=\ blah\f
spark.my.delimiter.infixDelimFromFile=\rblah\n
spark.my.delimiter.trailingDelimKeyFromFile=blah\r
spark.my.delimiter.leadingDelimKeyFromFile=\nblah
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22273
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95514/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22186
  
**[Test build #95518 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95518/testReport)**
 for PR 22186 at commit 
[`fbced52`](https://github.com/apache/spark/commit/fbced52e5687cd5eb6a06c3b9bca5cbeb9343002).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22273
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22197
  
**[Test build #95517 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95517/testReport)**
 for PR 22197 at commit 
[`e0d6196`](https://github.com/apache/spark/commit/e0d61969b13bcfd9dfc95e2a013b14e111d2b832).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22273
  
**[Test build #95514 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95514/testReport)**
 for PR 22273 at commit 
[`e8a2602`](https://github.com/apache/spark/commit/e8a2602476a52622a01c0cf4f72067f3119be96a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22186
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22186
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2723/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22297: [SPARK-25290][Core][Test] Reduce the size of acquired ar...

2018-08-30 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22297
  
cc @cloud-fan @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

2018-08-30 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/22186
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

2018-08-30 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/22186
  
I see. Thanks for the explain, I checked the code again, yes you're right. 
Let me retrigger the test again, will merge it if everything is fine.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22270
  
**[Test build #95516 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95516/testReport)**
 for PR 22270 at commit 
[`53f4984`](https://github.com/apache/spark/commit/53f4984bd35d07da7382866960279233aadebea5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22270
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22270
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2722/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...

2018-08-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21968#discussion_r214235758
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala
 ---
@@ -141,11 +151,8 @@ class RowBasedHashMapGenerator(
|if (buckets[idx] == -1) {
|  if (numRows < capacity && !isBatchFull) {
|// creating the unsafe for new entry
-   |
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter agg_rowWriter
-   |  = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(
-   |  ${groupingKeySchema.length}, ${numVarLenFields * 
32});
|agg_rowWriter.reset(); //TODO: investigate if reset or 
zeroout are actually needed
--- End diff --

I think now reset and zero out is needed? So maybe remove this TODO?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...

2018-08-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21968#discussion_r214235660
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala
 ---
@@ -141,11 +151,8 @@ class RowBasedHashMapGenerator(
|if (buckets[idx] == -1) {
|  if (numRows < capacity && !isBatchFull) {
|// creating the unsafe for new entry
--- End diff --

Remove or update this comment?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22297: [SPARK-25290][Core][Test] Reduce the size of acquired ar...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22297
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2721/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22297: [SPARK-25290][Core][Test] Reduce the size of acquired ar...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22297
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22297: [SPARK-25290][Core][Test] Reduce the size of acquired ar...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22297
  
**[Test build #95515 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95515/testReport)**
 for PR 22297 at commit 
[`cc7a710`](https://github.com/apache/spark/commit/cc7a710a1ba8d050836f64d820f675546712b3c9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22297: [SPARK-25290][Core][Test] Reduce the size of acqu...

2018-08-30 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/22297

[SPARK-25290][Core][Test] Reduce the size of acquired arrays to avoid OOM 
error

## What changes were proposed in this pull request?

`BytesToBytesMapOnHeapSuite`.`randomizedStressTest` caused 
`OutOfMemoryError` on several test runs. Seems better to reduce memory usage in 
this test.

## How was this patch tested?

Unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-25290

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22297.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22297


commit cc7a710a1ba8d050836f64d820f675546712b3c9
Author: Liang-Chi Hsieh 
Date:   2018-08-31T02:59:18Z

Reduce the size of acquired arrays to avoid OOM error.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-08-30 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/21860
  
cc @cloud-fan @maropu @kiszk 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22279: [SPARK-25277][YARN] YARN applicationMaster metric...

2018-08-30 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22279#discussion_r214234325
  
--- Diff: core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala 
---
@@ -103,6 +103,14 @@ private[spark] class MetricsSystem private (
 sinks.foreach(_.start)
   }
 
+  // Same as start but this method only registers sinks
--- End diff --

Would you please explain why only registering sinks could solve the problem 
here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory operat...

2018-08-30 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/21968
  
cc @cloud-fan @maropu  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22279: [SPARK-25277][YARN] YARN applicationMaster metrics shoul...

2018-08-30 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/22279
  
Hi @LucaCanali do you have an output current AM metrics? I would like to 
know what kind of metrics will be output for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

2018-08-30 Thread xuejianbest

Github user xuejianbest commented on the issue:

https://github.com/apache/spark/pull/22048
  
I see. A new commit has been done.
Thinks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...

2018-08-30 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22197
  
Seems fine to me too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22197: [SPARK-25207][SQL] Case-insensitve field resoluti...

2018-08-30 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22197#discussion_r214233946
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -44,7 +45,14 @@ private[parquet] class ParquetFilters(
 pushDownTimestamp: Boolean,
 pushDownDecimal: Boolean,
 pushDownStartWith: Boolean,
-pushDownInFilterThreshold: Int) {
+pushDownInFilterThreshold: Int,
+caseSensitive: Boolean) {
+
+  private case class ParquetField(
+  // field name in parquet file
--- End diff --

I'd just move those into the doc for this case class above, for instance,

```
/**
 * blabla
 * @param blabla
 */
private case class ParquetField
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...

2018-08-30 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/22273
  
> I thought the current information is enough to indicate which Arrow or 
Pandas we would use and test
Well yeah, it is when they are skipped but my point was that having an 
additional positive confirmation that the tests were run would be nice.  Maybe 
that's just me though, so we don't have to merge this. 

I am still a bit concerned why my skip message was only printed in 1 of the 
tests here, so I'll run a few more to see if I can figure it out.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22289: [SPARK-25200][YARN] Allow specifying HADOOP_CONF_...

2018-08-30 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22289#discussion_r214233802
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java ---
@@ -200,6 +200,7 @@ void addOptionString(List cmd, String options) {
 
 addToClassPath(cp, getenv("HADOOP_CONF_DIR"));
 addToClassPath(cp, getenv("YARN_CONF_DIR"));
+addToClassPath(cp, getEffectiveConfig().get("spark.yarn.conf.dir"));
--- End diff --

I'm wondering how do we update the classpath to change to another hadoop 
confs with InProcessLauncher? Seems the classpath here is not changeable after 
JVM is launched.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22273
  
**[Test build #95514 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95514/testReport)**
 for PR 22273 at commit 
[`e8a2602`](https://github.com/apache/spark/commit/e8a2602476a52622a01c0cf4f72067f3119be96a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22273
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2720/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22273
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in Hive te...

2018-08-30 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22264
  
Yeah, OK. I think this is acceptable as a potential "known issue" for Scala 
2.12 support, which we can accept for a beta release of 2.12 support with Spark 
2.4. I think I'd merge this and then see where we are.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22294: [SPARK-25287][INFRA] Add up-front check for JIRA_USERNAM...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22294
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22294: [SPARK-25287][INFRA] Add up-front check for JIRA_USERNAM...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22294
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95498/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22213: [SPARK-25221][DEPLOY] Consistent trailing whitespace tre...

2018-08-30 Thread gerashegalov

Github user gerashegalov commented on the issue:

https://github.com/apache/spark/pull/22213
  
@steveloughran Regarding XML format, java.util.Properties has its dedicated 
storeTo/loadFromXML methods which Spark does not use, so we don't need to check 
this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22294: [SPARK-25287][INFRA] Add up-front check for JIRA_USERNAM...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22294
  
**[Test build #95498 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95498/testReport)**
 for PR 22294 at commit 
[`1ed41dd`](https://github.com/apache/spark/commit/1ed41ddc922cd07f6d6c2384c5aa248699f9ef87).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22138
  
**[Test build #95513 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95513/testReport)**
 for PR 22138 at commit 
[`017c0bb`](https://github.com/apache/spark/commit/017c0bbf9365b32467de64c96a1a0d6aee1f6875).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...

2018-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22296
  
**[Test build #95512 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95512/testReport)**
 for PR 22296 at commit 
[`a847099`](https://github.com/apache/spark/commit/a8470991ba73eb959c0e7dbda31e5d391c2d34ef).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22296
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22296
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2719/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22173
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...

2018-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22173
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95499/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 672 matches

Mail list logo