date:20180910

[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-09-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/7
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-09-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21596
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22337
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95909/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22337
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22337
  
**[Test build #95909 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95909/testReport)**
 for PR 22337 at commit 
[`2d9e34a`](https://github.com/apache/spark/commit/2d9e34abdae13efa1fac9e906f331cdd04105e82).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FlatMapGroupsWithStateSuite extends StateStoreMetricsTest `
  * `class StreamingAggregationSuite extends StateStoreMetricsTest with 
Assertions `
  * `class StreamingDeduplicationSuite extends StateStoreMetricsTest `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22384: [SPARK-25398][CORE][MESOS] Minor bugs from comparing unr...

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22384
  
**[Test build #4334 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4334/testReport)**
 for PR 22384 at commit 
[`9e70b62`](https://github.com/apache/spark/commit/9e70b625992310a44880a9e42f6fead6c2068dc7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-10 Thread seancxmao

Github user seancxmao commented on the issue:

https://github.com/apache/spark/pull/22343
  
@dongjoon-hyun It is a little complicated. There has been a discussion 
about this in #22184. Below are some key comments from @cloud-fan and 
@gatorsmile, just FYI.

* https://github.com/apache/spark/pull/22184#discussion_r212834477
* https://github.com/apache/spark/pull/22184#issuecomment-416885728


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-09-10 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21860
  
LGTM cc @cloud-fan @hvanhovell


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22355: [SPARK-25358][SQL] MutableProjection supports fallback t...

2018-09-10 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22355
  
Thanks for your review, kirs! I'll update in a day.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-10 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/22357#discussion_r216545091
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
 ---
@@ -110,7 +110,17 @@ private[sql] object ParquetSchemaPruning extends 
Rule[LogicalPlan] {
 val projectionRootFields = projects.flatMap(getRootFields)
 val filterRootFields = filters.flatMap(getRootFields)
 
-(projectionRootFields ++ filterRootFields).distinct
+// Kind of expressions don't need to access any fields of a root 
fields, e.g., `IsNotNull`.
+// For them, if there are any nested fields accessed in the query, we 
don't need to add root
+// field access of above expressions.
+// For example, for a query `SELECT name.first FROM contacts WHERE 
name IS NOT NULL`,
+// we don't need to read nested fields of `name` struct other than 
`first` field.
--- End diff --

I'm having trouble accepting this, but perhaps I'm reading too much into it 
(or not enough). Let me illustrate with a couple of queries and their physical 
plans.

Assuming the data model in `ParquetSchemaPruningSuite.scala`, the physical 
plan for the query

select employer.id from contacts where employer is not null

is

```
== Physical Plan ==
*(1) Project [employer#36.id AS id#46]
+- *(1) Filter isnotnull(employer#36)
   +- *(1) FileScan parquet [employer#36,p#37] Batched: false, Format: 
Parquet,
PartitionCount: 2, PartitionFilters: [], PushedFilters: 
[IsNotNull(employer)],
ReadSchema: struct>
```

The physical plan for the query

select employer.id from contacts where employer.id is not null

is

```
== Physical Plan ==
*(1) Project [employer#36.id AS id#47]
+- *(1) Filter (isnotnull(employer#36) && isnotnull(employer#36.id))
   +- *(1) FileScan parquet [employer#36,p#37] Batched: false, Format: 
Parquet,
PartitionCount: 2, PartitionFilters: [], PushedFilters: 
[IsNotNull(employer)],
ReadSchema: struct>
```

The read schemata are the same, but the query filters are not. The file 
scan for the second query looks as I would expect, but the scan for the first 
query appears to only read `employer.id` even though it needs to check 
`employer is not null`. If it only reads `employer.id`, how does it check that 
`employer.company` is not null? Perhaps `employer.id` is null but 
`employer.company` is not null for some row...

I have run some tests to validate that this PR is returning the correct 
results for both queries, and it is. But I don't understand why.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-10 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22343
  
What I asked was the following, wasn't it? 
> In case-insensitive mode, when converting hive parquet table to parquet 
data source, we switch the duplicated fields resolution mode to ask parquet 
data source to pick the first matched field - the same behavior as hive parquet 
table - to keep behaviors consistent.

Spark should not pick up the first matched field in any cases because it's 
considered as a correctness behavior in previous PR which is backported to 
`branch-2.3` https://github.com/apache/spark/pull/22183. I don't think we need 
to follow incorrect Hive behavior.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22381: [SPARK-25394][CORE] Add an application status metrics so...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22381
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22381: [SPARK-25394][CORE] Add an application status metrics so...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22381
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95905/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22381: [SPARK-25394][CORE] Add an application status metrics so...

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22381
  
**[Test build #95905 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95905/testReport)**
 for PR 22381 at commit 
[`a8fc89d`](https://github.com/apache/spark/commit/a8fc89d51971e16c37783cf336daa07d18e6d3c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21688
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95902/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21688
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21688
  
**[Test build #95902 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95902/testReport)**
 for PR 21688 at commit 
[`573390d`](https://github.com/apache/spark/commit/573390d0a933e7a2641f944046442a136bba6cd8).
 * This patch **fails from timeout after a configured wait of \`400m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22387
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2998/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22387
  
**[Test build #95916 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95916/testReport)**
 for PR 22387 at commit 
[`a7b857c`](https://github.com/apache/spark/commit/a7b857c69fa20615108413d6f17a87978ca44ae2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22387
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...

2018-09-10 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22387
  
Retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22384: [SPARK-25398][CORE][MESOS] Minor bugs from comparing unr...

2018-09-10 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22384
  
Thanks, @srowen . If you ran `inspection` for all modules, what about 
removing all tags `[CORE][MESOS]`? Otherwise, could you put `[SQL]` on the 
title because four `sql` module fixes are included, too.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22376
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2996/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22385: [SPARK-25400][CORE] Increase test timeouts

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22385
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2997/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22385: [SPARK-25400][CORE] Increase test timeouts

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22385
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22376
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22376
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2996/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22387
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22387
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95910/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22387
  
**[Test build #95910 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95910/testReport)**
 for PR 22387 at commit 
[`a7b857c`](https://github.com/apache/spark/commit/a7b857c69fa20615108413d6f17a87978ca44ae2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22192
  
**[Test build #95915 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95915/testReport)**
 for PR 22192 at commit 
[`447c5e5`](https://github.com/apache/spark/commit/447c5e5974ca2a176026e63518a7a6cf29b78008).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22384: [SPARK-25398][CORE][MESOS] Minor bugs from compar...

2018-09-10 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22384#discussion_r216540786
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala
 ---
@@ -147,7 +147,7 @@ class PropagateEmptyRelationSuite extends PlanTest {
   .where(false)
   .select('a)
   .where('a > 1)
-  .where('a != 200)
+  .where('a =!= 200)
--- End diff --

Oh, thank you for fixing this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-09-10 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21710
  
I think we missed the windows before the branch, I'll review in a few days


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-10 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22192
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22385: [SPARK-25400][CORE] Increase test timeouts

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22385
  
**[Test build #95914 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95914/testReport)**
 for PR 22385 at commit 
[`daf76ed`](https://github.com/apache/spark/commit/daf76ed592ed82aa4b390b444c4669ae65c9b355).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22385: [SPARK-25400][CORE] Increase test timeouts

2018-09-10 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22385
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22382: [SPARK-23243] [SPARK-20715][CORE][2.2] Fix RDD.repartiti...

2018-09-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22382
  
thanks, merging to 2.2!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22376
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2996/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...

2018-09-10 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21649#discussion_r216539767
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3939,7 +3929,15 @@ setMethod("hint",
   signature(x = "SparkDataFrame", name = "character"),
   function(x, name, ...) {
 parameters <- list(...)
-stopifnot(all(sapply(parameters, isTypeAllowedForSqlHint)))
+stopifnot(all(sapply(parameters, function(x) {
--- End diff --

if recall, let's not have a inside scope with the same variable name `x` in 
the outer scope?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22370: don't link to deprecated function

2018-09-10 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22370#discussion_r216539411
  
--- Diff: R/pkg/R/catalog.R ---
@@ -69,7 +69,6 @@ createExternalTable <- function(x, ...) {
 #' @param ... additional named parameters as options for the data source.
 #' @return A SparkDataFrame.
 #' @rdname createTable
-#' @seealso \link{createExternalTable}
--- End diff --

`registerTempTable` is because of the `@family` tag, so it's a bit 
different.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-10 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/22357
  
> FYI, per further checking code and discussion with @dbtsai regarding with 
predicate pushdown, we know that predicate push down only works for primitive 
types on Parquet datasource. So both `IsNotNull(employer)` and 
`IsNotNull(employer.id)` are not actually pushed down to work at Parquet reader

I would expect `IsNotNull(employer.id)` to be pushed down. In any case, I 
misunderstood what that `PushedFilters` metadata item means in the `FileScan` 
part of the physical plan. I thought that was a Parquet filter, but sometimes 
it is not. In any case, I'm not concerned about supporting filter push down at 
this point. My concern is around its side effects, but that has been allayed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-10 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22379#discussion_r216538924
  
--- Diff: R/pkg/R/functions.R ---
@@ -3720,3 +3720,22 @@ setMethod("current_timestamp",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"current_timestamp")
 column(jc)
   })
+
+#' @details
+#' \code{from_csv}: Parses a column containing a CSV string into a Column 
of \code{structType}
+#' with the specified \code{schema}.
+#' If the string is unparseable, the Column will contain the value NA.
+#'
+#' @rdname column_collection_functions
+#' @param schema a DDL-formatted string
+#' @aliases from_csv from_csv,Column,character-method
+#'
+#' @note from_csv since 3.0.0
+setMethod("from_csv", signature(x = "Column", schema = "character"),
+  function(x, schema, ...) {
--- End diff --

can you add to the doc for `...` (in column_collection_functions) to 
indicate the use options for this function? if there is anything new?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22376
  
**[Test build #95913 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95913/testReport)**
 for PR 22376 at commit 
[`4a0cffb`](https://github.com/apache/spark/commit/4a0cffb3ce9e1bede43e6a89fdd7a7b912bf93d2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-10 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22357
  
if recall, parquet reader can have filter pushdown? only not so in spark 
parquet data source?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...

2018-09-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22387
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....

2018-09-10 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22376
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Struc...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22282
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Struc...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22282
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95906/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Struc...

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22282
  
**[Test build #95906 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95906/testReport)**
 for PR 22282 at commit 
[`220bd0a`](https://github.com/apache/spark/commit/220bd0a90b0c606b8f74c227218bec7bb6614782).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-10 Thread seancxmao

Github user seancxmao commented on the issue:

https://github.com/apache/spark/pull/22343
  
Hi, @dongjoon-hyun
When we find duplicated field names in the case of convertMetastoreXXX, we 
have 2 options
(1) raise exception as parquet data source. To most of end users, they do 
not know the difference between hive parquet table and parquet data source. If 
the conversion leads to different behaviors, they may be confused, and in some 
cases even lead to tricky data issues silently.
(2) Adjust behaviors of parquet data source to keep behaviors consistent. 
This seems more friendly to end users, and avoid any potential issues 
introduced by the conversion.

BTW, for parquet data source which is not converted from hive parquet 
table, we raise exception when there is ambiguity, sine this is more intuitive 
and reasonable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22386: [SPARK-25399] Continuous processing state should not aff...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22386
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95908/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22386: [SPARK-25399] Continuous processing state should not aff...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22386
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22386: [SPARK-25399] Continuous processing state should not aff...

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22386
  
**[Test build #95908 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95908/testReport)**
 for PR 22386 at commit 
[`c2f813b`](https://github.com/apache/spark/commit/c2f813bb46bd08ee808ef35ad9569fb9dc7194a6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...

2018-09-10 Thread ConeyLiu

Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/22371
  
@squito , thanks for the review. I intend to using `ConcurrentHashMap[Int, 
AtomicReferenceArray]` previously.

After re-think the code, I can know the lock here is used to prevent the 
same task with different attempt to commit the shuffle writer result at the 
same time. The task has a different attempt can be caused by follows:

1. Failed task or stage. In this case, the previous task attempt should 
already finish(failed or killed) or the result is not used anymore.

2. `Speculative task`. In this case, the speculative task can't be 
scheduled to the same executor as other attempts.

So, what's real value for the lock. Maybe I'm wrong, hopeful some answers.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-10 Thread bersprockets

Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/22192
  
retest this please.

It's that old "java.lang.reflect.InvocationTargetException: null" error 
we've seen many times.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-10 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22357
  
FYI, per further checking code and discussion with @dbtsai regarding with 
predicate pushdown, we know that predicate push down only works for primitive 
types on Parquet datasource. So both `IsNotNull(employer)` and 
`IsNotNull(employer.id)` are not actually pushed down to work at Parquet reader.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22374: [SPARK-25387][SQL] Fix for NPE caused by bad CSV input

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22374
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95904/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22374: [SPARK-25387][SQL] Fix for NPE caused by bad CSV input

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22374
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22388
  
**[Test build #95912 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95912/testReport)**
 for PR 22388 at commit 
[`e31ecfa`](https://github.com/apache/spark/commit/e31ecfa574393971586fa04d93766343f7661399).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22374: [SPARK-25387][SQL] Fix for NPE caused by bad CSV input

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22374
  
**[Test build #95904 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95904/testReport)**
 for PR 22374 at commit 
[`bd4ebe4`](https://github.com/apache/spark/commit/bd4ebe44c3268311e1a3569f9c32b9875ccbb292).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22388
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2995/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22388
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22388: Revert [SPARK-24882][SQL] improve data source v2 ...

2018-09-10 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/22388

Revert [SPARK-24882][SQL] improve data source v2 API from branch 2.4

## What changes were proposed in this pull request?

As discussed in the dev list, we don't want to include this change in Spark 
2.4, as it needs data source v2 users to change the implementation 
intensitively, while they need to change again in next release.

## How was this patch tested?

existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark revert

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22388.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22388


commit b4cf7701146675c682d51a72279c9c98a62e21c9
Author: Wenchen Fan 
Date:   2018-09-11T00:59:16Z

Revert "[SPARK-24882][SQL] improve data source v2 API"

This reverts commit e754887182304ad0d622754e33192ebcdd515965.

commit e31ecfa574393971586fa04d93766343f7661399
Author: Wenchen Fan 
Date:   2018-09-11T02:24:36Z

fix import




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...

2018-09-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22388
  
cc @rxin @rdblue 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....

2018-09-10 Thread ifilonenko

Github user ifilonenko commented on the issue:

https://github.com/apache/spark/pull/22376
  
@holdenk @felixcheung can this be merged as error isn't related to the 
features presented in this PR? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22375: [WIP][SPARK-25388][Test][SQL] Detect incorrect nullable ...

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22375
  
**[Test build #95911 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95911/testReport)**
 for PR 22375 at commit 
[`51aa9d5`](https://github.com/apache/spark/commit/51aa9d58999a546678a8dff1660e3e9f6d73ec8b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22375: [WIP][SPARK-25388][Test][SQL] Detect incorrect nullable ...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22375
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2994/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22375: [WIP][SPARK-25388][Test][SQL] Detect incorrect nullable ...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22375
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22375: [WIP][SPARK-25388][Test][SQL] Detect incorrect nullable ...

2018-09-10 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22375
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22387
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2993/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22387
  
**[Test build #95910 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95910/testReport)**
 for PR 22387 at commit 
[`a7b857c`](https://github.com/apache/spark/commit/a7b857c69fa20615108413d6f17a87978ca44ae2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22387
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22385: [SPARK-25400][CORE] Increase test timeouts

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22385
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95903/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22385: [SPARK-25400][CORE] Increase test timeouts

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22385
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22385: [SPARK-25400][CORE] Increase test timeouts

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22385
  
**[Test build #95903 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95903/testReport)**
 for PR 22385 at commit 
[`daf76ed`](https://github.com/apache/spark/commit/daf76ed592ed82aa4b390b444c4669ae65c9b355).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix I...

2018-09-10 Thread wangyum

GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/22387

[SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIntoHiveDirCommand 
output schema in Parquet issue

## What changes were proposed in this pull request?

Backport https://github.com/apache/spark/pull/22359 to branch-2.3.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-25313-FOLLOW-UP-branch-2.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22387.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22387


commit a7b857c69fa20615108413d6f17a87978ca44ae2
Author: Yuming Wang 
Date:   2018-09-11T02:02:55Z

[SPARK-25313][SQL][FOLLOW-UP] Fix InsertIntoHiveDirCommand output schema in 
Parquet issue




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22355: [SPARK-25358][SQL] MutableProjection supports fal...

2018-09-10 Thread rednaxelafx

Github user rednaxelafx commented on a diff in the pull request:

https://github.com/apache/spark/pull/22355#discussion_r216525084
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala
 ---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.InternalRow
+
+
+/**
+ * A [[MutableProjection]] that is calculated by calling `eval` on each of 
the specified
+ * expressions.
+ *
+ * @param expressions a sequence of expressions that determine the value 
of each column of the
+ *output row.
+ */
+class InterpretedMutableProjection(expressions: Seq[Expression]) extends 
MutableProjection {
+  def this(expressions: Seq[Expression], inputSchema: Seq[Attribute]) =
+this(expressions.map(BindReferences.bindReference(_, inputSchema)))
--- End diff --

use `toBoundExpr`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22355: [SPARK-25358][SQL] MutableProjection supports fal...

2018-09-10 Thread rednaxelafx

Github user rednaxelafx commented on a diff in the pull request:

https://github.com/apache/spark/pull/22355#discussion_r216524666
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala
 ---
@@ -86,24 +86,12 @@ package object expressions  {
   }
 
   /**
-   * Converts a [[InternalRow]] to another Row given a sequence of 
expression that define each
-   * column of the new row. If the schema of the input row is specified, 
then the given expression
-   * will be bound to that schema.
-   *
-   * In contrast to a normal projection, a MutableProjection reuses the 
same underlying row object
-   * each time an input row is added.  This significantly reduces the cost 
of calculating the
-   * projection, but means that it is not safe to hold on to a reference 
to a [[InternalRow]] after
-   * `next()` has been called on the [[Iterator]] that produced it. 
Instead, the user must call
-   * `InternalRow.copy()` and hold on to the returned [[InternalRow]] 
before calling `next()`.
+   * A helper function to bound given expressions to an input schema.
--- End diff --

Spelling nitpick: s/bound/bind/


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22355: [SPARK-25358][SQL] MutableProjection supports fal...

2018-09-10 Thread rednaxelafx

Github user rednaxelafx commented on a diff in the pull request:

https://github.com/apache/spark/pull/22355#discussion_r216526434
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala
 ---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.InternalRow
+
+
+/**
+ * A [[MutableProjection]] that is calculated by calling `eval` on each of 
the specified
+ * expressions.
+ *
+ * @param expressions a sequence of expressions that determine the value 
of each column of the
+ *output row.
+ */
+class InterpretedMutableProjection(expressions: Seq[Expression]) extends 
MutableProjection {
+  def this(expressions: Seq[Expression], inputSchema: Seq[Attribute]) =
+this(expressions.map(BindReferences.bindReference(_, inputSchema)))
+
+  private[this] val buffer = new Array[Any](expressions.size)
+
+  override def initialize(partitionIndex: Int): Unit = {
+expressions.foreach(_.foreach {
+  case n: Nondeterministic => n.initialize(partitionIndex)
+  case _ =>
+})
+  }
+
+  private[this] val exprArray = expressions.toArray
+  private[this] var mutableRow: InternalRow = new 
GenericInternalRow(exprArray.length)
+  def currentValue: InternalRow = mutableRow
+
+  override def target(row: InternalRow): MutableProjection = {
+mutableRow = row
+this
+  }
+
+  override def apply(input: InternalRow): InternalRow = {
+var i = 0
+while (i < exprArray.length) {
+  // Store the result into buffer first, to make the projection atomic 
(needed by aggregation)
+  buffer(i) = exprArray(i).eval(input)
+  i += 1
+}
+i = 0
+while (i < exprArray.length) {
+  mutableRow(i) = buffer(i)
+  i += 1
+}
+mutableRow
+  }
+}
--- End diff --

+1 on the check for `NoOp`s.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20999: [SPARK-14922][SPARK-17732][SPARK-23866][SQL] Supp...

2018-09-10 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20999#discussion_r216525719
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -293,6 +293,28 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 }
   }
 
+  /**
+   * Create a partition specification map with filters.
+   */
+  override def visitDropPartitionSpec(
+  ctx: DropPartitionSpecContext): Seq[Expression] = {
+withOrigin(ctx) {
+  ctx.dropPartitionVal().asScala.map { pFilter =>
+if (pFilter.identifier() == null || pFilter.constant() == null ||
+pFilter.comparisonOperator() == null) {
+  throw new ParseException(s"Invalid partition spec: 
${pFilter.getText}", ctx)
+}
+// We cannot use UnresolvedAttribute because resolution is 
performed after Analysis, when
+// running the command. The type is not relevant, it is replaced 
during the real resolution
+val partition =
+  AttributeReference(pFilter.identifier().getText, StringType)()
--- End diff --

Ya, looks good to me. But, I'm not sure which one is the right approach, so 
we'd be better to wait for other reviewer's comments here, too. cc: @gatorsmile 
@viirya 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22379
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22379
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95901/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22364: [SPARK-25379][SQL] Improve AttributeSet and Colum...

2018-09-10 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22364#discussion_r216525223
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala
 ---
@@ -39,10 +41,15 @@ object AttributeSet {
 
   /** Constructs a new [[AttributeSet]] given a sequence of [[Expression 
Expressions]]. */
   def apply(baseSet: Iterable[Expression]): AttributeSet = {
-new AttributeSet(
-  baseSet
-.flatMap(_.references)
-.map(new AttributeEquals(_)).toSet)
--- End diff --

Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22379
  
**[Test build #95901 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95901/testReport)**
 for PR 22379 at commit 
[`d2bfd94`](https://github.com/apache/spark/commit/d2bfd9430f05d006accdecb6a62ed659fbd6a2f8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22337
  
**[Test build #95909 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95909/testReport)**
 for PR 22337 at commit 
[`2d9e34a`](https://github.com/apache/spark/commit/2d9e34abdae13efa1fac9e906f331cdd04105e82).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22337
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22337
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2992/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22192
  
**[Test build #4333 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4333/testReport)**
 for PR 22192 at commit 
[`447c5e5`](https://github.com/apache/spark/commit/447c5e5974ca2a176026e63518a7a6cf29b78008).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...

2018-09-10 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22337
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22365: [SPARK-25381][SQL] Stratified sampling by Column argumen...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22365
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95900/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22365: [SPARK-25381][SQL] Stratified sampling by Column argumen...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22365
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22365: [SPARK-25381][SQL] Stratified sampling by Column argumen...

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22365
  
**[Test build #95900 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95900/testReport)**
 for PR 22365 at commit 
[`e85175e`](https://github.com/apache/spark/commit/e85175e18e95d7751748d4615792579375859786).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22341: [SPARK-24889][Core] Update block info when unpers...

2018-09-10 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22341#discussion_r216523989
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
@@ -646,7 +647,47 @@ private[spark] class AppStatusListener(
   }
 
   override def onUnpersistRDD(event: SparkListenerUnpersistRDD): Unit = {
-liveRDDs.remove(event.rddId)
+liveRDDs.remove(event.rddId).foreach { liveRDD =>
+  val executorsToUpdate = new HashSet[LiveExecutor]()
--- End diff --

Right. But it would be nice to avoid the hash set if possible. The less 
stuff listeners have to do, the better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22341: [SPARK-24889][Core] Update block info when unpers...

2018-09-10 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22341#discussion_r216523824
  
--- Diff: core/src/main/scala/org/apache/spark/status/LiveEntity.scala ---
@@ -538,6 +538,14 @@ private class LiveRDD(val info: RDDInfo) extends 
LiveEntity {
 distributions.get(exec.executorId)
   }
 
+  def getPartitions(): Map[String, LiveRDDPartition] = {
+partitions.toMap
--- End diff --

Sure, it's an internal API. Listener code needs to avoid doing unnecessary 
things like copying stuff to avoid issues with dropping events.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22384: [SPARK-25398][CORE][MESOS] Minor bugs from comparing unr...

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22384
  
**[Test build #4334 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4334/testReport)**
 for PR 22384 at commit 
[`9e70b62`](https://github.com/apache/spark/commit/9e70b625992310a44880a9e42f6fead6c2068dc7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22337
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95890/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22337
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...

2018-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22337
  
**[Test build #95890 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95890/testReport)**
 for PR 22337 at commit 
[`2d9e34a`](https://github.com/apache/spark/commit/2d9e34abdae13efa1fac9e906f331cdd04105e82).
 * This patch **fails from timeout after a configured wait of \`400m\`**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FlatMapGroupsWithStateSuite extends StateStoreMetricsTest `
  * `class StreamingAggregationSuite extends StateStoreMetricsTest with 
Assertions `
  * `class StreamingDeduplicationSuite extends StateStoreMetricsTest `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix empty stri...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22367
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95899/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix empty stri...

2018-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22367
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 591 matches

Mail list logo