[GitHub] spark issue #22299: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...

2018-08-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22299
  
Thanks, @jerryshao for pointing this out. I will close mine after we see 
what we want.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21721
  
Can someone write a design doc for the metrics support? I think this is an 
important feature for data source v2 and we need to be careful here. The design 
doc should explain how custom metrics fit in the abstraction of data source v2 
API, how the metrics API would look like for batch, micro-batch and continuous 
(I feel metrics is also important for batch sources), and how the sources 
report metrics physically (via task complete event? via heartbeat? via RPC?).

@rxin just sent an email to the dev list about the data source v2 API 
abstraction, it would be great if you guys can kick it and talk about the 
metrics support.

It's very likely that the custom metrics API would be replaced by something 
totally different after we finish the design. I don't think we should rush into 
something that works but not well designed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...

2018-08-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22296
  
Let me leave this open in case we only want to mark this as unstable for 
now. Other changes are proposed in https://github.com/apache/spark/pull/22296


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics...

2018-08-30 Thread HyukjinKwon
GitHub user HyukjinKwon reopened a pull request:

https://github.com/apache/spark/pull/22296

[SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Unstable APIs

## What changes were proposed in this pull request?

This PR proposes to switch the api stability from `Evolving` to `Unstable` 
given the discussion in the original PR for now.

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-24748

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22296.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22296


commit a8470991ba73eb959c0e7dbda31e5d391c2d34ef
Author: hyukjinkwon 
Date:   2018-08-31T02:29:30Z

Switch custom metrics to Unstable APIs




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics...

2018-08-30 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at:

https://github.com/apache/spark/pull/22296


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21721
  
I skimmed about how AccumulatorV2 works, and looks like the values in a 
task are reported along with CompletionEvent which is triggered when a task 
ends. Then in continuous mode driver even doesn't have updated metrics. It 
should not couple with lifecycle of task.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...

2018-08-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22296
  
I am closing this per https://github.com/apache/spark/pull/22296


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22299: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...

2018-08-30 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/22299
  
Seems there's another similar PR #22296 . 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark...

2018-08-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22186


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22277: [SPARK-25276] Redundant constrains when using alias

2018-08-30 Thread ajithme
Github user ajithme commented on the issue:

https://github.com/apache/spark/pull/22277
  
@gatorsmile and @jiangxb1987 any inputs.?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22274
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22274
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95520/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...

2018-08-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22274
  
**[Test build #95520 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95520/testReport)**
 for PR 22274 at commit 
[`4b6cd9f`](https://github.com/apache/spark/commit/4b6cd9f532e07f08c86659dcd4a0f2d40995d8ef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-30 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20637
  
I believe we still need this change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread arunmahadevan
Github user arunmahadevan commented on the issue:

https://github.com/apache/spark/pull/21721
  
>It seems like its life cycle should be bound to an epoch, but 
unfortunately we don't have such an interface in continuous streaming to 
represent an epoch. Is it possible that we may end up with 2 sets of custom 
metrics APIs for micro-batch and continuous?

@cloud-fan we could still report progress at the end of each epoch (e.g. 
[here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala#L231)
 and via the EpochCordinator). There need not be separate interfaces for the 
progress or the custom metrics, just the mechanisms could be different.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

2018-08-30 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/22186
  
Merging to master branch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22296
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95512/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22296
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18877: [SPARK-17742][core] Handle child process exit in SparkLa...

2018-08-30 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18877
  
yes @danelkotev `asfgit closed this in cba826d on Aug 15, 2017`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21721
  
My 2 cents, the root reason is the lifecycle of reporting query progress is 
tied to `finishTrigger` and we read updated metrics from executed plan which 
continuous mode doesn't have both `finishTrigger` as well as finished plan to 
be executed.

I'm not aware of how/when updated information of nodes of physical plan are 
transmitted from executor to the driver, but we should avoid using executed 
plan as a source to read information, and find alternative to be compatible 
between micro-batch and continuous mode. It doesn't apply only metrics but also 
watermarks.

I'm not sure it is viable, but It could be via RPC or whatever once we can 
aggregate the information from driver. Then each operators can send information 
on driver directly and driver can aggregate them and utilize once a batch or an 
epoch is finished.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22293: [SPARK-25288][Tests]Fix flaky Kafka transaction t...

2018-08-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22293


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...

2018-08-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22296
  
**[Test build #95512 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95512/testReport)**
 for PR 22296 at commit 
[`a847099`](https://github.com/apache/spark/commit/a8470991ba73eb959c0e7dbda31e5d391c2d34ef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r214253824
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -34,6 +36,32 @@ object ArrayData {
 case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a)
 case other => new GenericArrayData(other)
   }
+
+
+  /**
+   * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given 
parameters.
+   *
+   * @param elementSize a size of an element in bytes
+   * @param numElements the number of elements the array should contain
+   * @param isPrimitiveType whether the type of an element is primitive 
type
+   * @param additionalErrorMessage string to include in the error message
+   */
+  def allocateArrayData(
+  elementSize: Int,
--- End diff --

ah it's called in the generated code. Maybe we can use elementSize `-1` to 
create a generic array.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22293: [SPARK-25288][Tests]Fix flaky Kafka transaction tests

2018-08-30 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/22293
  
Thanks! Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r214253479
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -34,6 +36,32 @@ object ArrayData {
 case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a)
 case other => new GenericArrayData(other)
   }
+
+
+  /**
+   * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given 
parameters.
+   *
+   * @param elementSize a size of an element in bytes
+   * @param numElements the number of elements the array should contain
+   * @param isPrimitiveType whether the type of an element is primitive 
type
+   * @param additionalErrorMessage string to include in the error message
+   */
+  def allocateArrayData(
+  elementSize: Int,
--- End diff --

`elementSize` is only used when creating unsafe array. I think we just have 
a `elementSize: Option[Int]` and remove the `isPrimitiveType` parameter.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22299: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...

2018-08-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22299
  
**[Test build #95521 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95521/testReport)**
 for PR 22299 at commit 
[`49a94c6`](https://github.com/apache/spark/commit/49a94c6016a0a4cd6076329797f4c2ac5a9cb588).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22299: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22299
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r214253006
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -452,6 +452,16 @@ public UnsafeArrayData copy() {
 
   public static UnsafeArrayData fromPrimitiveArray(
Object arr, int offset, int length, int elementSize) {
+UnsafeArrayData result = createFreshArray(length, elementSize);
+final long headerInBytes = calculateHeaderPortionInBytes(length);
+final long valueRegionInBytes = (long)elementSize * length;
+final Object data = result.getBaseObject();
--- End diff --

now the data is `Object` instead of `long[]`. Can we duplicate the code for 
now and think of how to deduplicate them later?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread arunmahadevan
Github user arunmahadevan commented on the issue:

https://github.com/apache/spark/pull/21721
  
I created a follow up PR to move CustomMetrics (and a few other streaming 
specific interfaces in that package) to 'streaming' and mark the interfaces as 
Unstable here - https://github.com/apache/spark/pull/22299


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22299: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22299
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22299: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics...

2018-08-30 Thread arunmahadevan
GitHub user arunmahadevan opened a pull request:

https://github.com/apache/spark/pull/22299

[SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Unstable APIs

- Mark custom metrics related APIs as unstable
- Move CustomMetrics (and a few other streaming interfaces in parent 
package) to streaming package

Ideally could move `v2/reader/streaming` and `v2/writer/streaming` under 
`streaming/reader` and `streaming/writer` but that can be a follow up PR if 
required.

## How was this patch tested?
Existing unit tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arunmahadevan/spark refactor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22299.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22299


commit 49a94c6016a0a4cd6076329797f4c2ac5a9cb588
Author: Arun Mahadevan 
Date:   2018-08-31T05:53:57Z

[SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Unstable APIs

- Mark custom metrics related APIs as unstable
- Move streaming related interfaces to streaming package




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22297: [SPARK-25290][Core][Test] Reduce the size of acquired ar...

2018-08-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22297
  
do we have a memory leak here? It seems these arrays are allocated in the 
loop and can be released soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-08-30 Thread npoberezkin
Github user npoberezkin commented on the issue:

https://github.com/apache/spark/pull/22255
  
I got your idea now. Apparently I was a little confused because of the 
description of tickets.
I can try to implement these (writing info about writer.model like "avro" 
etc in Spark), if you give me some directions on how can i do it and where 
should i make changes.
Also I can add "spark.version" property, but if I got everything right, 
we'll need to open new issue in parquet to do this, am I right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22274
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22274
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2725/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21987: [SPARK-25015][BUILD] Update Hadoop 2.7 to 2.7.7

2018-08-30 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/21987
  
It seems that this change caused permission issue:
```
export HADOOP_PROXY_USER=user_a
spark-sql
```
It will create dir `/tmp/hive-$%7Buser.name%7D/user_a/`. then change to 
other user:
```
export HADOOP_PROXY_USER=user_b
spark-sql
```
exception:
```scala
Exception in thread "main" java.lang.RuntimeException: 
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=user_b, access=EXECUTE, 
inode="/tmp/hive-$%7Buser.name%7D/user_b/6b446017-a880-4f23-a8d0-b62f37d3c413":user_a:hadoop:drwx--
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1780)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
```

I'll do verification later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...

2018-08-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22274
  
**[Test build #95520 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95520/testReport)**
 for PR 22274 at commit 
[`4b6cd9f`](https://github.com/apache/spark/commit/4b6cd9f532e07f08c86659dcd4a0f2d40995d8ef).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22270
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22270
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95516/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-08-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22270
  
**[Test build #95516 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95516/testReport)**
 for PR 22270 at commit 
[`53f4984`](https://github.com/apache/spark/commit/53f4984bd35d07da7382866960279233aadebea5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21721
  
@arunmahadevan, feel free to pick up the commits in my PR in your followup 
if they have to be changed. I will close mine.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread arunmahadevan
Github user arunmahadevan commented on the issue:

https://github.com/apache/spark/pull/21721
  
@rxin its for streaming sources and sinks as explained in the [doc](

https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/sources/v2/CustomMetrics.java#L23)

It had to be shared between classes in reader.streaming and 
writer.streaming, so was added in the parent package (similar to other 
streaming specific classes that exists here like 
[StreamingWriteSupportProvider.java 
](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/sources/v2/StreamingWriteSupportProvider.java)

[MicroBatchReadSupportProvider.java](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/sources/v2/MicroBatchReadSupportProvider.java))

we could move all of it to a streaming package.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes fo...

2018-08-30 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22274#discussion_r214246976
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -3633,7 +3633,8 @@ test_that("catalog APIs, currentDatabase, 
setCurrentDatabase, listDatabases", {
   expect_equal(currentDatabase(), "default")
   expect_error(setCurrentDatabase("default"), NA)
   expect_error(setCurrentDatabase("zxwtyswklpf"),
-"Error in setCurrentDatabase : analysis error - Database 
'zxwtyswklpf' does not exist")
+   paste("Error in setCurrentDatabase : analysis error - 
Database",
--- End diff --

@felixcheung Sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22183: [SPARK-25132][SQL][BACKPORT-2.3] Case-insensitive field ...

2018-08-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22183
  
As discussed in the JIRA, this is a partial fix, and we need to backport 
another 2 PRs, which is risky. Can we close it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21721
  
I'm confused by this api. Is this for streaming only? If yes, why are they 
not in the stream package? If not, I only found streaming implementation. Maybe 
I missed it.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...

2018-08-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21968#discussion_r214246268
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala
 ---
@@ -130,6 +134,12 @@ class RowBasedHashMapGenerator(
   }
 }.mkString(";\n")
 
+val nullByteWriter = if (groupingKeySchema.map(_.nullable).forall(_ == 
false)) {
--- End diff --

maybe name it `resetNullBits`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...

2018-08-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21968#discussion_r214246211
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala
 ---
@@ -48,6 +48,12 @@ class RowBasedHashMapGenerator(
 val keySchema = ctx.addReferenceObj("keySchemaTerm", groupingKeySchema)
 val valueSchema = ctx.addReferenceObj("valueSchemaTerm", bufferSchema)
 
+val numVarLenFields = groupingKeys.map(_.dataType).count {
--- End diff --

groupingKeys.map(_.dataType).count(dt => !UnsafeRow.isFixedLength(dt))


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r214245829
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2546,15 +2546,37 @@ object functions {
   def soundex(e: Column): Column = withExpr { SoundEx(e.expr) }
 
   /**
-   * Splits str around pattern (pattern is a regular expression).
+   * Splits str around matches of the given regex.
*
-   * @note Pattern is a string representation of the regular expression.
+   * @param str a string expression to split
+   * @param regex a string representing a regular expression. The regex 
string should be
+   *  a Java regular expression.
*
* @group string_funcs
* @since 1.5.0
*/
-  def split(str: Column, pattern: String): Column = withExpr {
-StringSplit(str.expr, lit(pattern).expr)
+  def split(str: Column, regex: String): Column = withExpr {
+StringSplit(str.expr, Literal(regex), Literal(-1))
+  }
+
+  /**
+   * Splits str around matches of the given regex.
+   *
+   * @param str a string expression to split
+   * @param regex a string representing a regular expression. The regex 
string should be
+   *  a Java regular expression.
+   * @param limit an integer expression which controls the number of times 
the regex is applied.
+   *limit greater than 0: The resulting array's length will not be 
more than `limit`,
+   *  and the resulting array's last entry will 
contain all input beyond
+   *  the last matched regex.
+   *limit less than or equal to 0: `regex` will be applied as many 
times as possible, and
+   *   the resulting array can be of any size.
--- End diff --

Indentation here looks a bit odd and looks inconsistent at least. Can you 
double check Scaladoc and format this correctly?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r214245703
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1669,20 +1669,36 @@ def repeat(col, n):
 return Column(sc._jvm.functions.repeat(_to_java_column(col), n))
 
 
-@since(1.5)
+@since(2.4)
 @ignore_unicode_prefix
-def split(str, pattern):
-"""
-Splits str around pattern (pattern is a regular expression).
-
-.. note:: pattern is a string represent the regular expression.
-
->>> df = spark.createDataFrame([('ab12cd',)], ['s',])
->>> df.select(split(df.s, '[0-9]+').alias('s')).collect()
-[Row(s=[u'ab', u'cd'])]
-"""
-sc = SparkContext._active_spark_context
-return Column(sc._jvm.functions.split(_to_java_column(str), pattern))
+def split(str, regex, limit=-1):
--- End diff --

Please change `regex ` back to `pattern`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21721
  
Stuff like this merits api discussions. Not just implementation changes ...



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21721
  
I actually thought those all of them are part of DataSource V2. Why are we 
fine with changing those interfaces but not okay with this one and we consider 
reverting it?

Other things should be clarified if there are some concerns, yea of course. 
In this case, switching it to `Unstable` looks alleviating the concerns listed 
here enough.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-30 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r214244981
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1669,20 +1669,36 @@ def repeat(col, n):
 return Column(sc._jvm.functions.repeat(_to_java_column(col), n))
 
 
-@since(1.5)
+@since(2.4)
 @ignore_unicode_prefix
-def split(str, pattern):
-"""
-Splits str around pattern (pattern is a regular expression).
-
-.. note:: pattern is a string represent the regular expression.
-
->>> df = spark.createDataFrame([('ab12cd',)], ['s',])
->>> df.select(split(df.s, '[0-9]+').alias('s')).collect()
-[Row(s=[u'ab', u'cd'])]
-"""
-sc = SparkContext._active_spark_context
-return Column(sc._jvm.functions.split(_to_java_column(str), pattern))
+def split(str, regex, limit=-1):
--- End diff --

this would be a breaking API change I believe for python


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-30 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r214244918
  
--- Diff: R/pkg/R/functions.R ---
@@ -3410,13 +3410,14 @@ setMethod("collect_set",
 #' \dontrun{
 #' head(select(df, split_string(df$Sex, "a")))
 #' head(select(df, split_string(df$Class, "\\d")))
+#' head(select(df, split_string(df$Class, "\\d", 2)))
 #' # This is equivalent to the following SQL expression
 #' head(selectExpr(df, "split(Class, 'd')"))}
 #' @note split_string 2.3.0
 setMethod("split_string",
   signature(x = "Column", pattern = "character"),
-  function(x, pattern) {
-jc <- callJStatic("org.apache.spark.sql.functions", "split", 
x@jc, pattern)
+  function(x, pattern, limit = -1) {
+jc <- callJStatic("org.apache.spark.sql.functions", "split", 
x@jc, pattern, limit)
--- End diff --

you should have `as.integer(limit)` instead
could we add a test in R?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22298
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22298
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2724/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22298
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2724/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22213: [SPARK-25221][DEPLOY] Consistent trailing whitesp...

2018-08-30 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22213#discussion_r214244665
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -1144,6 +1144,46 @@ class SparkSubmitSuite
 conf1.get(PY_FILES.key) should be (s"s3a://${pyFile.getAbsolutePath}")
 conf1.get("spark.submit.pyFiles") should (startWith("/"))
   }
+
+  test("handles natural line delimiters in --properties-file and --conf 
uniformly") {
+val delimKey = "spark.my.delimiter."
+val LF = "\n"
+val CR = "\r"
+
+val leadingDelimKeyFromFile = s"${delimKey}leadingDelimKeyFromFile" -> 
s"${LF}blah"
+val trailingDelimKeyFromFile = s"${delimKey}trailingDelimKeyFromFile" 
-> s"blah${CR}"
+val infixDelimFromFile = s"${delimKey}infixDelimFromFile" -> 
s"${CR}blah${LF}"
+val nonDelimSpaceFromFile = s"${delimKey}nonDelimSpaceFromFile" -> " 
blah\f"
--- End diff --

Sorry for the stupid question. I guess I was thinking of something 
different.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22298
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95519/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22298
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22298
  
**[Test build #95519 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95519/testReport)**
 for PR 22298 at commit 
[`46c30cc`](https://github.com/apache/spark/commit/46c30cc27cd3a7279a116ec6a70a937b8502cd73).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes fo...

2018-08-30 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22274#discussion_r214244580
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -3633,7 +3633,8 @@ test_that("catalog APIs, currentDatabase, 
setCurrentDatabase, listDatabases", {
   expect_equal(currentDatabase(), "default")
   expect_error(setCurrentDatabase("default"), NA)
   expect_error(setCurrentDatabase("zxwtyswklpf"),
-"Error in setCurrentDatabase : analysis error - Database 
'zxwtyswklpf' does not exist")
+   paste("Error in setCurrentDatabase : analysis error - 
Database",
--- End diff --

I'd use paste0 instead to make clear about the implicit space that should 
be after `Database`

ie. `paste0("Error in setCurrentDatabase : analysis error - Database ",  
"'zxwtyswklpf' does not exist"))


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95503/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22192
  
**[Test build #95503 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95503/testReport)**
 for PR 22192 at commit 
[`2907c6b`](https://github.com/apache/spark/commit/2907c6b62495f8d25c0016883202239634685fec).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22281: [SPARK-25280][SQL] Add support for USING syntax for Data...

2018-08-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22281
  
For clarification, I am okay with targeting this to 3.0.0 since the code 
freeze will be very soon if I am not mistaken.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22291: [SPARK-25007][R]Add array_intersect/array_except/...

2018-08-30 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22291#discussion_r214244359
  
--- Diff: R/pkg/R/generics.R ---
@@ -799,10 +807,18 @@ setGeneric("array_sort", function(x) { 
standardGeneric("array_sort") })
 #' @name NULL
 setGeneric("arrays_overlap", function(x, y) { 
standardGeneric("arrays_overlap") })
 
+#' @rdname column_collection_functions
+#' @name NULL
+setGeneric("array_union", function(x, y) { standardGeneric("array_union") 
})
+
 #' @rdname column_collection_functions
 #' @name NULL
 setGeneric("arrays_zip", function(x, ...) { standardGeneric("arrays_zip") 
})
 
+#' @rdname column_collection_functions
+#' @name NULL
+setGeneric("shuffle", function(x) { standardGeneric("shuffle") })
--- End diff --

this should go below - this part of the list should be sorted alphabetically


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

2018-08-30 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22048
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20637
  
with the test removed, do we still need this change? 
https://github.com/apache/spark/pull/20637/files#diff-41747ec3f56901eb7bfb95d2a217e94dR226


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22281: [SPARK-25280][SQL] Add support for USING syntax for Data...

2018-08-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22281
  
Yea, but the default fallback should rather be DataSource V2's. Both of you 
are super active in DataSource V2. Do you guys have some concerns about 
defaulting to DataSource V1's behaviour?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22298
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2724/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...

2018-08-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/6#discussion_r214243817
  
--- Diff: R/pkg/R/functions.R ---
@@ -1697,8 +1697,8 @@ setMethod("to_date",
   })
 
 #' @details
-#' \code{to_json}: Converts a column containing a \code{structType}, array 
of \code{structType},
-#' a \code{mapType} or array of \code{mapType} into a Column of JSON 
string.
+#' \code{to_json}: Converts a column containing a \code{structType}, a 
\code{mapType}
+#' or an array into a Column of JSON string.
--- End diff --

Let's add one simple python doctest as well


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22281: [SPARK-25280][SQL] Add support for USING syntax for Data...

2018-08-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22281
  
USING syntax has to be there, but what can USING maybe only data source v1 
and file format.

IIUC the agreement is: a data source v2 with catalog can create a table 
with USING, and the data source should interpret the USING parameter. e.g. 
`USING parquet` may have a different meaning in iceberg data source.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...

2018-08-30 Thread ifilonenko
Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/22298#discussion_r214243652
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/SecretsTestsSuite.scala
 ---
@@ -53,6 +53,7 @@ private[spark] trait SecretsTestsSuite { k8sSuite: 
KubernetesSuite =>
   .delete()
   }
 
+  // TODO: [SPARK-25291] This test is flaky with regards to memory of 
executors
--- End diff --

@mccheah This test periodically fails on setting proper memory for 
executors on this specific test. I have filed a JIRA: SPARK-25291


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-30 Thread ifilonenko
Github user ifilonenko commented on the issue:

https://github.com/apache/spark/pull/22298
  
@rdblue @holdenk for review. This contains both unit and integration tests 
that verify [SPARK-25004] for K8S


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...

2018-08-30 Thread ifilonenko
GitHub user ifilonenko opened a pull request:

https://github.com/apache/spark/pull/22298

[SPARK-25021][K8S] Add spark.executor.pyspark.memory limit for K8S

## What changes were proposed in this pull request?

Add spark.executor.pyspark.memory limit for K8S

## How was this patch tested?

Unit and Integration tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ifilonenko/spark SPARK-25021

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22298.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22298


commit b54a039da08aec93a6db9d1470d0b2eaaec08814
Author: Ilan Filonenko 
Date:   2018-08-30T00:19:40Z

initial WIP push for SPARK-25021

commit 75742a37687a7eb3ebaa34069ac7a62521a4e2f8
Author: Ilan Filonenko 
Date:   2018-08-30T05:26:27Z

add python.worker.reuse

commit 46c30cc27cd3a7279a116ec6a70a937b8502cd73
Author: Ilan Filonenko 
Date:   2018-08-31T04:32:22Z

final checks with e2e tests




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21721
  
Note that, data source v2 API is not stable yet and we may even change the 
abstraction of the APIs. The design of custom metrics may affect the design of 
the streaming source APIs.

I had a hard time to figure out the life cycle of custom metrics. It seems 
like its life cycle should be bound to an epoch, but unfortunately we don't 
have such an interface in continuous streaming to represent an epoch. Is it 
possible that we may end up with 2 sets of custom metrics APIs for micro-batch 
and continuous? The documentation added in this PR is not clear about this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...

2018-08-30 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/6#discussion_r214243115
  
--- Diff: R/pkg/R/functions.R ---
@@ -1697,8 +1697,8 @@ setMethod("to_date",
   })
 
 #' @details
-#' \code{to_json}: Converts a column containing a \code{structType}, array 
of \code{structType},
-#' a \code{mapType} or array of \code{mapType} into a Column of JSON 
string.
+#' \code{to_json}: Converts a column containing a \code{structType}, a 
\code{mapType}
+#' or an array into a Column of JSON string.
--- End diff --

it should
could we add some tests for this in R?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileSize bec...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22232
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95508/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileSize bec...

2018-08-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22232
  
**[Test build #95508 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95508/testReport)**
 for PR 22232 at commit 
[`1c32646`](https://github.com/apache/spark/commit/1c326466fbd24c432184be6e53afec93369970c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-30 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21732
  
> The only tricky thing is, Product is handled specially in the top level, 
being flattened into multiple columns.

@cloud-fan Compared with Option of Product which is not supported before, 
the encoding of Product is current behavior. I think we don't need to change it 
so far. WDYT?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/7
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/7
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95511/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/7
  
**[Test build #95511 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95511/testReport)**
 for PR 7 at commit 
[`a641106`](https://github.com/apache/spark/commit/a6411069c352b30f9094a83991c35f0730b5df55).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22186
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95518/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

2018-08-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22186
  
**[Test build #95518 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95518/testReport)**
 for PR 22186 at commit 
[`fbced52`](https://github.com/apache/spark/commit/fbced52e5687cd5eb6a06c3b9bca5cbeb9343002).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22186
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in Hive te...

2018-08-30 Thread sadhen
Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22264
  
@srowen  A PR for this "bug" is proposed: 
https://github.com/scala/scala/pull/7156

Hopefully, Scala 2.12.7 will fix it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20086: [SPARK-22903]Fix already being created exception in stag...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20086
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in ...

2018-08-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22264


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-08-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22295#discussion_r214237818
  
--- Diff: python/pyspark/sql/session.py ---
@@ -252,6 +252,16 @@ def newSession(self):
 """
 return self.__class__(self._sc, self._jsparkSession.newSession())
 
+@since(2.4)
+def getActiveSession(self):
+"""
+Returns the active SparkSession for the current thread, returned 
by the builder.
+>>> s = spark.getActiveSession()
+>>> spark._jsparkSession.getDefaultSession().get().equals(s.get())
+True
+"""
+return self._jsparkSession.getActiveSession()
--- End diff --

Does this return JVM instance?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22213: [SPARK-25221][DEPLOY] Consistent trailing whitesp...

2018-08-30 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/22213#discussion_r214237801
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -1144,6 +1144,46 @@ class SparkSubmitSuite
 conf1.get(PY_FILES.key) should be (s"s3a://${pyFile.getAbsolutePath}")
 conf1.get("spark.submit.pyFiles") should (startWith("/"))
   }
+
+  test("handles natural line delimiters in --properties-file and --conf 
uniformly") {
+val delimKey = "spark.my.delimiter."
+val LF = "\n"
+val CR = "\r"
+
+val leadingDelimKeyFromFile = s"${delimKey}leadingDelimKeyFromFile" -> 
s"${LF}blah"
+val trailingDelimKeyFromFile = s"${delimKey}trailingDelimKeyFromFile" 
-> s"blah${CR}"
+val infixDelimFromFile = s"${delimKey}infixDelimFromFile" -> 
s"${CR}blah${LF}"
+val nonDelimSpaceFromFile = s"${delimKey}nonDelimSpaceFromFile" -> " 
blah\f"
--- End diff --

@jerryshao I try not to spend time on issues unrelated to our production 
deployments. @steveloughran and this PR already pointed at the 
`Properties#load` method documenting the format.

Line terminator characters can be included using `\r` and `\n` escape 
sequences. Or you can encode any character using `\u`

In addition you can take a look at the file generated by this code:
```
#test whitespace
#Thu Aug 30 20:20:33 PDT 2018
spark.my.delimiter.nonDelimSpaceFromFile=\ blah\f
spark.my.delimiter.infixDelimFromFile=\rblah\n
spark.my.delimiter.trailingDelimKeyFromFile=blah\r
spark.my.delimiter.leadingDelimKeyFromFile=\nblah
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22273
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95514/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

2018-08-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22186
  
**[Test build #95518 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95518/testReport)**
 for PR 22186 at commit 
[`fbced52`](https://github.com/apache/spark/commit/fbced52e5687cd5eb6a06c3b9bca5cbeb9343002).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22273
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...

2018-08-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22197
  
**[Test build #95517 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95517/testReport)**
 for PR 22197 at commit 
[`e0d6196`](https://github.com/apache/spark/commit/e0d61969b13bcfd9dfc95e2a013b14e111d2b832).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...

2018-08-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22273
  
**[Test build #95514 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95514/testReport)**
 for PR 22273 at commit 
[`e8a2602`](https://github.com/apache/spark/commit/e8a2602476a52622a01c0cf4f72067f3119be96a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22186
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

2018-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22186
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2723/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22297: [SPARK-25290][Core][Test] Reduce the size of acquired ar...

2018-08-30 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22297
  
cc @cloud-fan @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

2018-08-30 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/22186
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >