date:20160906

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-09-06 Thread clarkfitzg

Github user clarkfitzg commented on a diff in the pull request:

https://github.com/apache/spark/pull/14783#discussion_r77763275
  
--- Diff: R/pkg/R/utils.R ---
@@ -697,3 +697,18 @@ is_master_local <- function(master) {
 is_sparkR_shell <- function() {
   grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE)
 }
+
+# rbind a list of rows with raw (binary) columns
+#
+# @param inputData a list of rows, with each row a list
+# @return data.frame with raw columns as lists
+rbindRaws <- function(inputData){
+  row1 <- inputData[[1]]
+  rawcolumns <- ("raw" == sapply(row1, class))
+
+  listmatrix <- do.call(rbind, inputData)
--- End diff --

Since everything in in `inputData` is a list this goes straight to the top 
of hierarchy- same as if you called `rbind(list1, list2, ...)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and use Pat...

2016-09-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14960
  
Yeap, I quickly fixed and re-ran :). Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-09-06 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14783#discussion_r77763061
  
--- Diff: R/pkg/R/utils.R ---
@@ -697,3 +697,18 @@ is_master_local <- function(master) {
 is_sparkR_shell <- function() {
   grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE)
 }
+
+# rbind a list of rows with raw (binary) columns
+#
+# @param inputData a list of rows, with each row a list
+# @return data.frame with raw columns as lists
+rbindRaws <- function(inputData){
+  row1 <- inputData[[1]]
+  rawcolumns <- ("raw" == sapply(row1, class))
+
+  listmatrix <- do.call(rbind, inputData)
--- End diff --

Ah I see - the types are inside the `listmatrix`. Thanks @clarkfitzg for 
clarifying. Let us know once you have added the test for a single column of raw 
as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-09-06 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/14957#discussion_r77762907
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -280,6 +280,29 @@ case class StructType(fields: Array[StructField]) 
extends DataType with Seq[Stru
   }
 
   /**
+   * Extracts the [[StructField]] with the given name recursively.
+   *
+   * @throws IllegalArgumentException if the parent field's type is not 
StructType
+   */
+  def getFieldRecursively(name: String): StructField = {
--- End diff --

I think there's another way to solve this problem, maybe generate the final 
structType in FileSourceStrategy better.I'll try it and give another pr later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and use Pat...

2016-09-06 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/14960
  
seems to fail to build:
```
[INFO] Compiling 468 Scala sources and 74 Java sources to 
C:\projects\spark\core\target\scala-2.11\classes...
[ERROR] 
C:\projects\spark\core\src\main\scala\org\apache\spark\SparkContext.scala:995: 
type mismatch;
 found   : org.apache.spark.SparkConf
 required: org.apache.hadoop.conf.Configuration
[ERROR] FileSystem.getLocal(conf)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-09-06 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14783#discussion_r77762689
  
--- Diff: R/pkg/R/utils.R ---
@@ -697,3 +697,18 @@ is_master_local <- function(master) {
 is_sparkR_shell <- function() {
   grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE)
 }
+
+# rbind a list of rows with raw (binary) columns
+#
+# @param inputData a list of rows, with each row a list
+# @return data.frame with raw columns as lists
+rbindRaws <- function(inputData){
+  row1 <- inputData[[1]]
+  rawcolumns <- ("raw" == sapply(row1, class))
+
+  listmatrix <- do.call(rbind, inputData)
--- End diff --

I think the correct class is maintained:
```
> sapply(listmatrix, class)
[1] "integer"   "integer"   "raw"   "raw"   "character" "character"
> sapply(listmatrix, typeof)
[1] "integer"   "integer"   "raw"   "raw"   "character" "character"
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-09-06 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14783#discussion_r77762250
  
--- Diff: R/pkg/R/utils.R ---
@@ -697,3 +697,18 @@ is_master_local <- function(master) {
 is_sparkR_shell <- function() {
   grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE)
 }
+
+# rbind a list of rows with raw (binary) columns
+#
+# @param inputData a list of rows, with each row a list
+# @return data.frame with raw columns as lists
+rbindRaws <- function(inputData){
+  row1 <- inputData[[1]]
+  rawcolumns <- ("raw" == sapply(row1, class))
+
+  listmatrix <- do.call(rbind, inputData)
--- End diff --

I was looking at 
https://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html specifically 
the section `Value` which says

```
The type of a matrix result determined from the highest type of any of the 
inputs in the hierarchy raw < logical < integer < double < complex < character 
< list .
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and use Pat...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14960
  
**[Test build #65026 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65026/consoleFull)**
 for PR 14960 at commit 
[`41aaaf1`](https://github.com/apache/spark/commit/41aaaf127e949af7563024c1584567a177295409).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-09-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14957#discussion_r77762149
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -571,6 +571,44 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
 }
   }
 
+  test("SPARK-4502 parquet nested fields pruning") {
+// Schema of "test-data/nested-array-struct.parquet":
+//root
+//|-- primitive: integer (nullable = true)
+//|-- myComplex: array (nullable = true)
+//||-- element: struct (containsNull = true)
+//|||-- id: integer (nullable = true)
+//|||-- repeatedMessage: array (nullable = true)
+//||||-- element: struct (containsNull = true)
+//|||||-- someId: integer (nullable = true)
+val df = 
readResourceParquetFile("test-data/nested-array-struct.parquet")
--- End diff --

Ah, I missed. Sorry.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and use Pat...

2016-09-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14960
  
I re-run the test after this commit - 
https://ci.appveyor.com/project/HyukjinKwon/spark/build/81-SPARK-17339-fix-r

Let's wait and see :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77761938
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/TempViewManager.scala
 ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.catalog
+
+import javax.annotation.concurrent.GuardedBy
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.AnalysisException
+import 
org.apache.spark.sql.catalyst.analysis.TempViewAlreadyExistsException
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.util.StringUtils
+
+
+/**
+ * A thread-safe manager for a list of temp views, providing atomic 
operations to manage temp views.
--- End diff --

In the description of `TempViewManager`, could we mention the name of temp 
view is always case sensitive? The caller is responsible for handling 
case-related issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and use Pat...

2016-09-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14960
  
@sarutak Ah, I will do this here. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-09-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14957#discussion_r77761381
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -280,6 +280,29 @@ case class StructType(fields: Array[StructField]) 
extends DataType with Seq[Stru
   }
 
   /**
+   * Extracts the [[StructField]] with the given name recursively.
+   *
+   * @throws IllegalArgumentException if the parent field's type is not 
StructType
+   */
+  def getFieldRecursively(name: String): StructField = {
--- End diff --

I think I understood how it works. My point is, this is a Parquet-specific 
problem not related with Catalyst module. I don't see any reason that this 
method should be exposed.

I believe we can do this not by modifying the column names (not even for a 
temporary use).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and use Pat...

2016-09-06 Thread sarutak

Github user sarutak commented on the issue:

https://github.com/apache/spark/pull/14960
  
I found we can replace `FileSystem.get` in `SparkContext#hadoopFile` and 
`SparkContext.newAPIHadoopFile` with `FileSystem.getLocal` like 
`SparkContext#hadoopRDD` so once they are replaced, we need not discuss the 
case of comma-separated file list.

@HyukjinKwon You can replace them in this PR or not do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14623: [SPARK-17044][SQL] Make test files for window functions ...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14623
  
**[Test build #65025 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65025/consoleFull)**
 for PR 14623 at commit 
[`0a28fd6`](https://github.com/apache/spark/commit/0a28fd6d559f36a3ec68cd4c195db5ebf568e67b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14527: [SPARK-16938][SQL] `drop/dropDuplicate` should handle th...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14527
  
**[Test build #65024 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65024/consoleFull)**
 for PR 14527 at commit 
[`0970781`](https://github.com/apache/spark/commit/0970781b5d3b92fee0546d4bb9cb6a029fb9888e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-09-06 Thread clarkfitzg

Github user clarkfitzg commented on a diff in the pull request:

https://github.com/apache/spark/pull/14783#discussion_r77760776
  
--- Diff: R/pkg/R/utils.R ---
@@ -697,3 +697,18 @@ is_master_local <- function(master) {
 is_sparkR_shell <- function() {
   grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE)
 }
+
+# rbind a list of rows with raw (binary) columns
+#
+# @param inputData a list of rows, with each row a list
+# @return data.frame with raw columns as lists
+rbindRaws <- function(inputData){
+  row1 <- inputData[[1]]
+  rawcolumns <- ("raw" == sapply(row1, class))
+
+  listmatrix <- do.call(rbind, inputData)
--- End diff --

```
> b = serialize(1:10, NULL)
> inputData = list(list(1L, b, 'a'), list(2L, b, 'b'))  # Mixed data types
> listmatrix <- do.call(rbind, inputData)
> listmatrix
 [,1] [,2]   [,3]
[1,] 1Raw,62 "a"
[2,] 2Raw,62 "b"
> class(listmatrix)
[1] "matrix"
> typeof(listmatrix)
[1] "list"
> is.character(listmatrix)
[1] FALSE
```

A little unusual- it's a list matrix. Hence the name. Which docs are you 
referring to?

The test that's in here now does test for mixed columns, but it doesn't 
test for a single column of raws. I'll add that now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14991: [SPARK-17427][SQL] function SIZE should return -1 when p...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14991
  
**[Test build #65021 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65021/consoleFull)**
 for PR 14991 at commit 
[`1ccbe6b`](https://github.com/apache/spark/commit/1ccbe6bd41b1e60ea62a157771d4b3ca37f8678f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14426
  
**[Test build #65023 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65023/consoleFull)**
 for PR 14426 at commit 
[`0d19e28`](https://github.com/apache/spark/commit/0d19e28a8c53c83b3ca45ef3498f5faf9894c11c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14990: [SPARK-17426][SQL] Refactor `TreeNode.toJSON` to avoid O...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14990
  
**[Test build #65022 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65022/consoleFull)**
 for PR 14990 at commit 
[`33983e5`](https://github.com/apache/spark/commit/33983e5771f5d00dc3d5a97adfa23003e76f94c2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14991: [SPARK-17427][SQL] function SIZE should return -1...

2016-09-06 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/14991

[SPARK-17427][SQL] function SIZE should return -1 when parameter is null

## What changes were proposed in this pull request?

`select size(null)` returns -1 in Hive. In order to be compatible, we 
should return `-1`.


## How was this patch tested?

unit test in `CollectionFunctionsSuite` and `DataFrameFunctionsSuite`.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark size

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14991.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14991


commit 1ccbe6bd41b1e60ea62a157771d4b3ca37f8678f
Author: Daoyuan Wang 
Date:   2016-09-07T04:52:58Z

size(null)=-1




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-09-06 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/14957#discussion_r77760397
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -280,6 +280,29 @@ case class StructType(fields: Array[StructField]) 
extends DataType with Seq[Stru
   }
 
   /**
+   * Extracts the [[StructField]] with the given name recursively.
+   *
+   * @throws IllegalArgumentException if the parent field's type is not 
StructType
+   */
+  def getFieldRecursively(name: String): StructField = {
--- End diff --

The mark of nested fields is one kind of tmp data, finally it will convert 
to a pruned StructType and pass to 
`org.apache.spark.sql.parquet.row.requested_schema`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14962: [SPARK-17402][SQL] separate the management of temp views...

2016-09-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14962
  
Found a common bug in the following ALTER TABLE commands:
```
| ALTER TABLE tableIdentifier (partitionSpec)?
SET SERDE STRING (WITH SERDEPROPERTIES tablePropertyList)? 
#setTableSerDe
| ALTER TABLE tableIdentifier (partitionSpec)?
SET SERDEPROPERTIES tablePropertyList  
#setTableSerDe
| ALTER TABLE tableIdentifier ADD (IF NOT EXISTS)?
partitionSpecLocation+ 
#addTablePartition
| ALTER VIEW tableIdentifier ADD (IF NOT EXISTS)?
partitionSpec+ 
#addTablePartition
| ALTER TABLE tableIdentifier
from=partitionSpec RENAME TO to=partitionSpec  
#renameTablePartition
| ALTER TABLE tableIdentifier
DROP (IF EXISTS)? partitionSpec (',' partitionSpec)* PURGE?
#dropTablePartitions
| ALTER VIEW tableIdentifier
DROP (IF EXISTS)? partitionSpec (',' partitionSpec)*   
#dropTablePartitions
| ALTER TABLE tableIdentifier partitionSpec? SET locationSpec  
#setTableLocation
| ALTER TABLE tableIdentifier RECOVER PARTITIONS   
#recoverPartitions
```

We need to issue an exception when the tableType is `VIEW`. This is not 
introduced by this PR. Should we fix it here? or create a separate PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-09-06 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/14957#discussion_r77760264
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -571,6 +571,44 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
 }
   }
 
+  test("SPARK-4502 parquet nested fields pruning") {
+// Schema of "test-data/nested-array-struct.parquet":
+//root
+//|-- primitive: integer (nullable = true)
+//|-- myComplex: array (nullable = true)
+//||-- element: struct (containsNull = true)
+//|||-- id: integer (nullable = true)
+//|||-- repeatedMessage: array (nullable = true)
+//||||-- element: struct (containsNull = true)
+//|||||-- someId: integer (nullable = true)
+val df = 
readResourceParquetFile("test-data/nested-array-struct.parquet")
--- End diff --


https://github.com/apache/spark/blob/master/sql/core/src/test/resources/test-data/nested-array-struct.parquet
I reuse this file to test nested struct in paruqet, this file in 
sql/core/src/test/resources/test-data/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14990: [SPARK-17426][SQL] Refactor `TreeNode.toJSON` to avoid O...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14990
  
**[Test build #65020 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65020/consoleFull)**
 for PR 14990 at commit 
[`284d780`](https://github.com/apache/spark/commit/284d780c446b3f7cb59f8a2f34c522d90bc43fe1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14990: [SPARK-17426][SQL] Refactor `TreeNode.toJSON` to ...

2016-09-06 Thread clockfly

GitHub user clockfly opened a pull request:

https://github.com/apache/spark/pull/14990

[SPARK-17426][SQL] Refactor `TreeNode.toJSON` to avoid OOM when converting 
unknown fields to JSON

## What changes were proposed in this pull request?

This PR is a follow up of SPARK-17356. Current implementation of 
`TreeNode.toJSON` recursively converts all fields to JSON, even if the field is 
of type `Seq` or type Map. This may trigger out of memory exception in cases 
like:

1. the Seq or Map can be very big. Converting them to JSON make take huge 
memory, which may trigger out of memory error.
2. Some user space input may also be propagated to the Plan. The user space 
input can be of arbitrary type, and may also be self-referencing. Trying to 
print user space input to JSON may trigger out of memory error or stack 
overflow error.

For a real example, please check the Jira description of SPARK-17426.

In this PR, we refactor the `TreeNode.toJSON` so that we only convert a 
field to JSON string if the field is a safe type.

## How was this patch tested?

Unit test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/clockfly/spark json_oom2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14990.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14990


commit 284d780c446b3f7cb59f8a2f34c522d90bc43fe1
Author: Sean Zhong 
Date:   2016-09-07T03:49:23Z

json oom




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14988
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65018/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14988
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14988
  
**[Test build #65018 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65018/consoleFull)**
 for PR 14988 at commit 
[`d9ba28d`](https://github.com/apache/spark/commit/d9ba28d2cbd7823324f7dc02fb1072fa71d2450a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-09-06 Thread kayousterhout

Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/12436
  
@sitalkedia I was thinking about this over the weekend and I'm not sure 
this is the right approach.  I suspect it might be better to re-use the same 
task set manager for the new stage.  This copying of information is confusing 
and I'm concerned it will be bug-prone in the future.  Did you consider that 
approach?

Also, separately from what approach is used, how do you deal with the 
following: suppose map task 1 loses its output (e.g., the reducer where that 
task is located dies).  Now, suppose reduce task A gets a fetch failure for map 
task 1, triggering map task 1 to be re-run.  Meanwhile, reduce task B is still 
running.  Now the re-run map task 1 completes and the scheduler launches the 
reduce phase again.  Suppose after that happens, task B fails (this is the old 
task B, that started before the fetch failure) because it can't get the data 
from map task 1, but that's because it still has the old location for map task 
1.  My understanding is that, with the current code, that would cause the map 
stage to get re-triggered again, but really, reduce task B should be re-started 
with the correct location for the output from map 1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-06 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/9#discussion_r77758918
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -137,6 +138,17 @@ class KMeansModel private[ml] (
   @Since("1.6.0")
   override def write: MLWriter = new KMeansModel.KMeansModelWriter(this)
 
+  override def hashCode(): Int = {
+(Array(this.getClass, uid) ++ clusterCenters)
--- End diff --

@yinxusen Correct me if I'm wrong, but I believe you override the equals 
method is because the params are checked for equality in the read/write tests. 
Just thinking ahead, we will have to do this for every model we use as an 
initial model. We can avoid this by adding some handling inside the read/write 
params test, and then checking the initial model equality for read/write inside 
the `checkModelData` method. I guess I'd prefer not to randomly overwrite some 
models equals methods, and not others, especially since the reasoning behind 
won't be clear. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14116
  
**[Test build #65019 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65019/consoleFull)**
 for PR 14116 at commit 
[`d107721`](https://github.com/apache/spark/commit/d107721ffe1a83d7081b846db80cb4b787d79d7d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14957: [SPARK-4502][SQL]Support parquet nested struct pruning a...

2016-09-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14957
  
Also, it seems you might need to update your PR description. It seems the 
last commit you just pushed acts differently with your PR description. In 
addition, maybe you would need to fix the title of this PR to be complete 
(without `...`) if you'd like to keep this PR open.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14957: [SPARK-4502][SQL]Support parquet nested struct pruning a...

2016-09-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14957
  
Could you please check out related tests pass locally? It seems it affects 
all other data sources.

Also, I am not sure of the approach here. Marking nested fields by 
modifying column names does look like a good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-09-06 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14783
  
Sorry for the delay @clarkfitzg - The code change looks pretty good to me. 
I just had one question about mixed type columns. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-09-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14957#discussion_r77757859
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -571,6 +571,44 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
 }
   }
 
+  test("SPARK-4502 parquet nested fields pruning") {
+// Schema of "test-data/nested-array-struct.parquet":
+//root
+//|-- primitive: integer (nullable = true)
+//|-- myComplex: array (nullable = true)
+//||-- element: struct (containsNull = true)
+//|||-- id: integer (nullable = true)
+//|||-- repeatedMessage: array (nullable = true)
+//||||-- element: struct (containsNull = true)
+//|||||-- someId: integer (nullable = true)
+val df = 
readResourceParquetFile("test-data/nested-array-struct.parquet")
+df.createOrReplaceTempView("tmp_table")
+// normal test
+val query1 = "select primitive,myComplex[0].id from tmp_table"
+val result1 = sql(query1)
+withSQLConf(SQLConf.PARQUET_NEST_COLUMN_PRUNING.key -> "true") {
+  checkAnswer(sql(query1), result1)
--- End diff --

Does this really test if the nested fields are pruned? I think this test 
will pass regardless of the newly added option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-09-06 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14783#discussion_r77757807
  
--- Diff: R/pkg/R/utils.R ---
@@ -697,3 +697,18 @@ is_master_local <- function(master) {
 is_sparkR_shell <- function() {
   grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE)
 }
+
+# rbind a list of rows with raw (binary) columns
+#
+# @param inputData a list of rows, with each row a list
+# @return data.frame with raw columns as lists
+rbindRaws <- function(inputData){
+  row1 <- inputData[[1]]
+  rawcolumns <- ("raw" == sapply(row1, class))
+
+  listmatrix <- do.call(rbind, inputData)
--- End diff --

Do you know what happens if we have a mixed set of columns here ? i.e. say 
one column with "raw", one with "integer" and one with "character" -- From 
reading some docs it looks like everything is converted to create a `character` 
matrix when we use `rbind`. 

I think we have two choices if thats the case 
(a) we apply the type conversions after `rbind` 
(b) we only call this method when all columns are `raw`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-09-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14957#discussion_r77757667
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -280,6 +280,29 @@ case class StructType(fields: Array[StructField]) 
extends DataType with Seq[Stru
   }
 
   /**
+   * Extracts the [[StructField]] with the given name recursively.
+   *
+   * @throws IllegalArgumentException if the parent field's type is not 
StructType
+   */
+  def getFieldRecursively(name: String): StructField = {
--- End diff --

Isn't this Parquet-specific problem? I wonder adding this method is 
appropriate.

Also, I am not too sure if it is appropriate to mark nested fields by 
modifying field names with a character.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-09-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14957#discussion_r77757611
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -571,6 +571,44 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
 }
   }
 
+  test("SPARK-4502 parquet nested fields pruning") {
+// Schema of "test-data/nested-array-struct.parquet":
+//root
+//|-- primitive: integer (nullable = true)
+//|-- myComplex: array (nullable = true)
+//||-- element: struct (containsNull = true)
+//|||-- id: integer (nullable = true)
+//|||-- repeatedMessage: array (nullable = true)
+//||||-- element: struct (containsNull = true)
+//|||||-- someId: integer (nullable = true)
+val df = 
readResourceParquetFile("test-data/nested-array-struct.parquet")
--- End diff --

It seems we don't have this file in this PR. So running tests will fail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-09-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14957#discussion_r77757552
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
 ---
@@ -97,7 +98,16 @@ object FileSourceStrategy extends Strategy with Logging {
 dataColumns
   .filter(requiredAttributes.contains)
   .filterNot(partitionColumns.contains)
-  val outputSchema = readDataColumns.toStructType
+  val outputSchema = if 
(fsRelation.sqlContext.conf.isParquetNestColumnPruning) {
--- End diff --

It will affect all other data sources. I am pretty sure any tests related 
with this will pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14912: [SPARK-17357][SQL] Fix current predicate pushdown

2016-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14912#discussion_r77757275
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala
 ---
@@ -171,6 +172,27 @@ class FilterPushdownSuite extends PlanTest {
 comparePlans(optimized, correctAnswer)
   }
 
+  test("push down filters that are combined") {
+// The following predicate ('a === 2 || 'a === 3) && ('c > 10 || 'a 
=== 2)
+// will be simplified as ('a == 2) || ('c > 10 && 'a == 3).
+// ('a === 2 || 'a === 3) can be pushed down. But the simplified one 
can't.
--- End diff --

You are right. It is only triggered when adjoining Filters are there. So in 
above example, the predicate `(a == 2 || a==3)` will not be pushed down when 
there is no `.where(c > 10)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77756736
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -159,12 +171,13 @@ case class AlterTableRenameCommand(
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val catalog = sparkSession.sessionState.catalog
 DDLUtils.verifyAlterTableType(catalog, oldName, isView)
-// If this is a temp view, just rename the view.
-// Otherwise, if this is a real table, we also need to uncache and 
invalidate the table.
-val isTemporary = catalog.isTemporaryTable(oldName)
-if (isTemporary) {
-  catalog.renameTable(oldName, newName)
-} else {
+
+// If the old table name contains database part, we should rename a 
metastore table directly,
+// otherwise, try to rename a temp view first, if that not exists, 
rename a metastore table.
+val renameMetastoreTable =
+  oldName.database.isDefined || !catalog.renameTempView(oldName.table, 
newName)
--- End diff --

Here, we also need to check if it is VIEW before trying to drop a temp view.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77756537
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -95,12 +95,12 @@ class SQLViewSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
   e = intercept[AnalysisException] {
 sql(s"""LOAD DATA LOCAL INPATH "$testData" INTO TABLE $viewName""")
   }.getMessage
-  assert(e.contains(s"Target table in LOAD DATA cannot be temporary: 
`$viewName`"))
+  assert(e.contains(s"Target table in LOAD DATA does not exist: 
`$viewName`"))
--- End diff --


https://github.com/apache/spark/blob/c0ae6bc6ea38909730fad36e653d3c7ab0a84b44/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L218-L223

Before this PR, `tableExists` checks the temp table, but 
`getTableMetadataOption` does not check it. Thus, instead of changing the test 
case, we need to change the impl of `LoadDataCommand` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14912: [SPARK-17357][SQL] Fix current predicate pushdown

2016-09-06 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14912
  
The CNF exponential expansion issue is an important concern in previous 
works. Actually you can find that this patch doesn't produce a real CNF for 
predicate. I use `splitDisjunctivePredicates` to obtain disjunctive predicates 
and convert them to conjunctive form. The conversion here is not recursive. I 
think this should prevent exponential explosion. Of course it is a compromise 
and can't benefit for all predicates. But I would suspect how often a complex 
predicate need complete conversion of CNF is used.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-09-06 Thread sitalkedia

Github user sitalkedia commented on the issue:

https://github.com/apache/spark/pull/12436
  
@davies - Thanks for looking into this.  Updated the PR description with 
details of the change. Let me know if the approach seem reasonable, I will work 
on rebasing the change against latest master. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77756261
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -246,33 +246,23 @@ class SessionCatalog(
   }
 
   /**
-   * Retrieve the metadata of an existing metastore table.
-   * If no database is specified, assume the table is in the current 
database.
-   * If the specified table is not found in the database then a 
[[NoSuchTableException]] is thrown.
+   * Retrieve the metadata of an existing metastore table/view.
+   * If no database is specified, assume the table/view is in the current 
database.
+   * If the specified table/view is not found in the database then a 
[[NoSuchTableException]] is
+   * thrown.
*/
   def getTableMetadata(name: TableIdentifier): CatalogTable = {
 val db = 
formatDatabaseName(name.database.getOrElse(getCurrentDatabase))
 val table = formatTableName(name.table)
-val tid = TableIdentifier(table)
-if (isTemporaryTable(name)) {
-  CatalogTable(
-identifier = tid,
-tableType = CatalogTableType.VIEW,
-storage = CatalogStorageFormat.empty,
-schema = tempTables(table).output.toStructType,
-properties = Map(),
-viewText = None)
-} else {
-  requireDbExists(db)
-  requireTableExists(TableIdentifier(table, Some(db)))
-  externalCatalog.getTable(db, table)
-}
+requireDbExists(db)
+requireTableExists(TableIdentifier(table, Some(db)))
+externalCatalog.getTable(db, table)
   }
 
   /**
-   * Retrieve the metadata of an existing metastore table.
+   * Retrieve the metadata of an existing metastore table/view.
* If no database is specified, assume the table is in the current 
database.
-   * If the specified table is not found in the database then return None 
if it doesn't exist.
+   * If the specified table/view is not found in the database then return 
None if it doesn't exist.
*/
   def getTableMetadataOption(name: TableIdentifier): Option[CatalogTable] 
= {
--- End diff --

`getTableMetadataOption` does not check the temp view, but 
`getTableMetadata` does check it... We might have more bugs...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14988
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14988
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65017/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14988
  
**[Test build #65017 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65017/consoleFull)**
 for PR 14988 at commit 
[`8e537a1`](https://github.com/apache/spark/commit/8e537a161560a6d717a40d8aae44b1973dda9695).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14762: [SPARK-16962][CORE][SQL] Fix misaligned record ac...

2016-09-06 Thread sumansomasundar

Github user sumansomasundar commented on a diff in the pull request:

https://github.com/apache/spark/pull/14762#discussion_r77755718
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java 
---
@@ -47,13 +47,20 @@ public static int roundNumberOfBytesToNearestWord(int 
numBytes) {
   public static boolean arrayEquals(
   Object leftBase, long leftOffset, Object rightBase, long 
rightOffset, final long length) {
 int i = 0;
-while (i <= length - 8) {
-  if (Platform.getLong(leftBase, leftOffset + i) !=
-Platform.getLong(rightBase, rightOffset + i)) {
-return false;
-  }
-  i += 8;
-}
+
+  // This attempts to speed up the memcmp type of operation, but there 
is no way
+  // to guarantee that the offsets will be on a word boundary in order 
to use
+  // Platform.getLong
--- End diff --

It can still be used only if both the leftOffset AND rightOffset start on 
proper word boundaries.
By checking for these 2 conditions, we lose the advantage gained by this 
block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression interf...

2016-09-06 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/14834
  
| numClasses| isMultinomial| coefficientMatrix size|
| - |:-:| -:|
|3+|true|3+ x numFeatures|
|2|true|2 x numFeatures|
|2|false|1 x numFeatures|

The current behavior is as follows:
* If it is binary classification trained with multinomial family, then we 
store `2 x numFeatures` coefficients in a matrix. We will predict with this 
matrix (i.e. we do not convert to `1 x numFeatures`). 
* If it is binary classification trained with binomial family, then we 
store `1 x numFeatures` (i.e. these coefficients are pivoted) and we use a 
`DenseVector` instead of a matrix for prediction.

The coefficients are stored in an array, truly. There is always 
`coefficientMatrix` which is backed by that array and in some cases has only 1 
row. When it is binomial family, we also have a `cofficients` vector which is 
backed by the same array as the matrix. We use that vector for prediction in 
the binomial case. 

Hopefully that clears it up. I don't think it's necessary to convert the 
case of multinomial family but binary classification to `1 x numFeatures` for 
prediction since it won't be a regression and users would have to explicitly 
specify that family (hopefully knowing the consequences of that choice).

I also vote for Option 2 in the original description. We can avoid any 
regressions with past versions and the implementation isn't too messy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14931
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65016/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14931
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14931
  
**[Test build #65016 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65016/consoleFull)**
 for PR 14931 at commit 
[`a62289e`](https://github.com/apache/spark/commit/a62289ebd47c7a91c3e8659bf13b1d940499dccb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14912: [SPARK-17357][SQL] Fix current predicate pushdown

2016-09-06 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14912
  
hmm, looks like there are previous works regarding CNF but none of them are 
really merged. @gatorsmile Thanks for the context.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14988: [SPARK-17425][SQL] Override sameResult in HiveTab...

2016-09-06 Thread watermen

Github user watermen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14988#discussion_r77754923
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
 ---
@@ -164,4 +164,11 @@ case class HiveTableScanExec(
   }
 
   override def output: Seq[Attribute] = attributes
+
+  override def sameResult(plan: SparkPlan): Boolean = plan match {
--- End diff --

`left.cleanArgs == right.cleanArgs` in defalut `sameResult` return false, 
because `equals` in `MetastoreRelation` compare the 
output(`AttributeReference`) and `exprId`s are diff. We need to erase the 
exprId.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14988
  
**[Test build #65018 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65018/consoleFull)**
 for PR 14988 at commit 
[`d9ba28d`](https://github.com/apache/spark/commit/d9ba28d2cbd7823324f7dc02fb1072fa71d2450a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14912: [SPARK-17357][SQL] Fix current predicate pushdown

2016-09-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14912
  
@viirya Could you please wait for the CNF predicate normalization rule? 
@liancheng @yjshen did a few related work before. See 
https://github.com/apache/spark/pull/10444 and 
https://github.com/apache/spark/pull/8200. 

Let us also collect the inputs from @ioana-delaney @nsyca . They did a lot 
of related work in the past 10+ years. We need a good design about CNF 
normalization, which can benefit the other optimizer rules.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14989: [MINOR][SQL] Fixing the typo in unit test

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14989
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14989: [MINOR][SQL] Fixing the typo in unit test

2016-09-06 Thread vundela

GitHub user vundela opened a pull request:

https://github.com/apache/spark/pull/14989

[MINOR][SQL] Fixing the typo in unit test

## What changes were proposed in this pull request?

Fixing the typo in the unit test of CodeGenerationSuite.scala


## How was this patch tested?
Ran the unit test after fixing the typo and it passes




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vundela/spark typo_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14989.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14989


commit 0a96ac233dc06a985f56741019dc69a9e869596a
Author: Srinivasa Reddy Vundela 
Date:   2016-09-07T02:50:49Z

[MINOR][SQL] Fixing the typo in unit test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14887: [SPARK-17321][YARN] YARN shuffle service should use good...

2016-09-06 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/14887
  
@zhaoyunjiong , the fix you made may introduce a situation where recovery 
data will be existed in multiple directories, I'm not sure if this will 
introduce recovery issue or others, since now the recovery data may not be 
consistent.

IMO I think here based on SPARK-14963, we could change to enable Spark's 
shuffle service recovery as a configuration:

1. If it is not enabled, then Spark will not persist data into leveldb, in 
that case yarn shuffle service can still be served but lose the ability for 
recovery.
2. If it is enabled, then user should guarantee recovery path is reliable. 
Because recovery path is also crucial for NM to recover.
3. Also this configuration should be consistent with NM's recovery enabled 
configuration.
4. If this shuffle service is running on a lower version of Hadoop where 
there's no NM recovery
* If Spark's shuffle service recovery is enabled, refer to 2.
* If it is not enabled, then refer to 1.

Just my two cents, may have some missing parts. Basically I think to solve 
your problem (also considering recovery) it might be better to make Spark's 
shuffle recovery mechanism as configurable.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77753115
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -189,31 +189,39 @@ case class DropTableCommand(
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val catalog = sparkSession.sessionState.catalog
-if (!catalog.tableExists(tableName)) {
-  if (!ifExists) {
-val objectName = if (isView) "View" else "Table"
-throw new AnalysisException(s"$objectName to drop '$tableName' 
does not exist")
-  }
-} else {
-  // If the command DROP VIEW is to drop a table or DROP TABLE is to 
drop a view
-  // issue an exception.
-  catalog.getTableMetadataOption(tableName).map(_.tableType match {
-case CatalogTableType.VIEW if !isView =>
-  throw new AnalysisException(
-"Cannot drop a view with DROP TABLE. Please use DROP VIEW 
instead")
-case o if o != CatalogTableType.VIEW && isView =>
-  throw new AnalysisException(
-s"Cannot drop a table with DROP VIEW. Please use DROP TABLE 
instead")
-case _ =>
-  })
-  try {
-sparkSession.sharedState.cacheManager.uncacheQuery(
-  sparkSession.table(tableName.quotedString))
-  } catch {
-case NonFatal(e) => log.warn(e.toString, e)
+
+// If the table name contains database part, we should drop a 
metastore table directly,
+// otherwise, try to drop a temp view first, if that not exist, drop 
metastore table.
+val dropMetastoreTable =
+  tableName.database.isDefined || 
!catalog.dropTempView(tableName.table)
--- End diff --

`Drop Table` is unable to drop a temp view, right? 
```SQL
spark.range(10).createTempView("tempView")
sql("DESC tempView").show()
sql("DROP TABLE tempView")
sql("DESC tempView").show()
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14957: [SPARK-4502][SQL]Support parquet nested struct pr...

2016-09-06 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/14957#discussion_r77753006
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -259,8 +259,23 @@ case class StructType(fields: Array[StructField]) 
extends DataType with Seq[Stru
* @throws IllegalArgumentException if a field with the given name does 
not exist
*/
   def apply(name: String): StructField = {
-nameToField.getOrElse(name,
-  throw new IllegalArgumentException(s"""Field "$name" does not 
exist."""))
+if (name.contains('.')) {
--- End diff --

@HyukjinKwon Thanks for your review, mix the recursively get with the 
default apply has this problem, I fixed it in next patch and use ',' which is a 
invalid character in Parquet schema


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14987: [SPARK-17372][SQL][STREAMING] Avoid serialization...

2016-09-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14987


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and ...

2016-09-06 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14960#discussion_r77751910
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -992,7 +992,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
 
 // This is a hack to enforce loading hdfs-site.xml.
 // See SPARK-11227 for details.
-FileSystem.get(new URI(path), hadoopConfiguration)
+FileSystem.get(new Path(path).toUri, hadoopConfiguration)
--- End diff --

Yeah I'm not sure what part of the URI we are using here. If its just the 
scheme, authority then I think its fine to use that from the first path. FWIW 
there is a method in Hadoop to parse comma separated path strings but its 
private [1].

IMHO this problem existed even before this PR so I'm fine not fixing it 
here if thats okay with @sarutak 

[1] 
https://hadoop.apache.org/docs/r2.7.1/api/src-html/org/apache/hadoop/mapred/FileInputFormat.html#line.467


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14987: [SPARK-17372][SQL][STREAMING] Avoid serialization issues...

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14987
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14987: [SPARK-17372][SQL][STREAMING] Avoid serialization issues...

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14987
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65015/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14987: [SPARK-17372][SQL][STREAMING] Avoid serialization issues...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14987
  
**[Test build #65015 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65015/consoleFull)**
 for PR 14987 at commit 
[`9bcbb08`](https://github.com/apache/spark/commit/9bcbb087d2935657a30eb9bc6b52ea6fbed65edf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...

2016-09-06 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14847
  
/cc @cloud-fan @rxin @davies for reviewing this. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14850: [SPARK-17279][SQL] better error message for exceptions d...

2016-09-06 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14850
  
also backport it to 2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14988: [SPARK-17425][SQL] Override sameResult in HiveTab...

2016-09-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14988#discussion_r77750354
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
 ---
@@ -164,4 +164,11 @@ case class HiveTableScanExec(
   }
 
   override def output: Seq[Attribute] = attributes
+
+  override def sameResult(plan: SparkPlan): Boolean = plan match {
--- End diff --

why the default one doesn't work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14988
  
**[Test build #65017 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65017/consoleFull)**
 for PR 14988 at commit 
[`8e537a1`](https://github.com/apache/spark/commit/8e537a161560a6d717a40d8aae44b1973dda9695).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14988: [SPARK-17425][SQL] Override sameResult in HiveTab...

2016-09-06 Thread watermen

GitHub user watermen opened a pull request:

https://github.com/apache/spark/pull/14988

[SPARK-17425][SQL] Override sameResult in HiveTableScanExec to make 
ReusedExchange work in text format table

## What changes were proposed in this pull request?
The PR will override the `sameResult` in `HiveTableScanExec` to make 
`ReusedExchange` work in text format table.

## How was this patch tested?
# SQL
```sql
SELECT * FROM src t1
JOIN src t2 ON t1.key = t2.key
JOIN src t3 ON t1.key = t3.key;
```

# Before
```
== Physical Plan ==
*BroadcastHashJoin [key#30], [key#34], Inner, BuildRight
:- *BroadcastHashJoin [key#30], [key#32], Inner, BuildRight
:  :- *Filter isnotnull(key#30)
:  :  +- HiveTableScan [key#30, value#31], MetastoreRelation default, src
:  +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
false] as bigint)))
: +- *Filter isnotnull(key#32)
:+- HiveTableScan [key#32, value#33], MetastoreRelation default, src
+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
false] as bigint)))
   +- *Filter isnotnull(key#34)
  +- HiveTableScan [key#34, value#35], MetastoreRelation default, src
```

# After
```
== Physical Plan ==
*BroadcastHashJoin [key#2], [key#6], Inner, BuildRight
:- *BroadcastHashJoin [key#2], [key#4], Inner, BuildRight
:  :- *Filter isnotnull(key#2)
:  :  +- HiveTableScan [key#2, value#3], MetastoreRelation default, src
:  +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
false] as bigint)))
: +- *Filter isnotnull(key#4)
:+- HiveTableScan [key#4, value#5], MetastoreRelation default, src
+- ReusedExchange [key#6, value#7], BroadcastExchange 
HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))
```

cc: @davies @cloud-fan


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/watermen/spark SPARK-17425

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14988.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14988


commit 8e537a161560a6d717a40d8aae44b1973dda9695
Author: Yadong Qi 
Date:   2016-09-07T01:26:46Z

Override sameResult in HiveTableScanExec.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14958: [SPARK-17378] [BUILD] Upgrade snappy-java to 1.1.2.6

2016-09-06 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/14958
  
```
Using `mvn` from path: 
/home/jenkins/workspace/spark-branch-1.6-lint/build/apache-maven-3.3.9/bin/mvn
Spark's published dependencies DO NOT MATCH the manifest file 
(dev/spark-deps).
To update the manifest file, run './dev/test-dependencies.sh 
--replace-manifest'.
diff --git a/dev/deps/spark-deps-hadoop-1 b/dev/pr-deps/spark-deps-hadoop-1
index dd5a6dc..a97b10c 100644
--- a/dev/deps/spark-deps-hadoop-1
+++ b/dev/pr-deps/spark-deps-hadoop-1
@@ -143,7 +143,7 @@ servlet-api-2.5.jar
 slf4j-api-1.7.10.jar
 slf4j-log4j12-1.7.10.jar
 snappy-0.2.jar
-snappy-java-1.1.2.1.jar
+snappy-java-1.1.2.6.jar
 spire-macros_2.10-0.7.4.jar
 spire_2.10-0.7.4.jar
 stax-api-1.0.1.jar
Using `mvn` from path: 
/home/jenkins/workspace/spark-branch-1.6-lint/build/apache-maven-3.3.9/bin/mvn
Build step 'Execute shell' marked build as failure
Finished: FAILURE
```

Can you take a look at 1.6 build 
(https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-1.6-lint/262/console)?
 Seems the 1.6 build is broken by this pr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14912: [SPARK-17357][SQL] Fix current predicate pushdown

2016-09-06 Thread srinathshankar

Github user srinathshankar commented on a diff in the pull request:

https://github.com/apache/spark/pull/14912#discussion_r77748668
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala
 ---
@@ -171,6 +172,27 @@ class FilterPushdownSuite extends PlanTest {
 comparePlans(optimized, correctAnswer)
   }
 
+  test("push down filters that are combined") {
+// The following predicate ('a === 2 || 'a === 3) && ('c > 10 || 'a 
=== 2)
+// will be simplified as ('a == 2) || ('c > 10 && 'a == 3).
+// ('a === 2 || 'a === 3) can be pushed down. But the simplified one 
can't.
--- End diff --

I agree with you that we should respect the interaction between 
CombineFilters, PushDownPredicates and other rules. I do think it's important 
that cnf conversion run before any of the push-down / reordering rules. And the 
simplification rules should run afterwards. 
My concern with rolling this into CombineFilters is that it doesn't get 
triggered unless there are adjoining Filter nodes. In the example you have:
val originalQuery = testRelation

  .select('a, 'b, ('c + 1) as 'cc)  

  .groupBy('a)('a, count('cc) as 'c)

  .where('c > 10)   

  .where(('a === 2) || ('c > 10 && 'a === 3))

I think that (a == 2 || a==3) should get pushed down even if you don't have 
".where (c > 10)",
but I'm not sure that it will be since toCNF is in CombineFilters. Could 
you confirm ?
My suggestion is that toCNF warrants a separate rule -- for example when 
you're doing joins, and you have
select * from A inner join C on (A.a1 = C.c1) where A.a2 = 2 || (C.c2 = 10 
&& A.a2 = 3),
you want (A.a2 = 2 || A.a2 = 3) pushed down into A


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...

2016-09-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14809


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14809: [SPARK-17238][SQL] simplify the logic for converting dat...

2016-09-06 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14809
  
thanks for the review, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #10225: [SPARK-12196][Core] Store/retrieve blocks from di...

2016-09-06 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/10225#discussion_r77748327
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala ---
@@ -136,7 +136,9 @@ private[spark] class IndexShuffleBlockResolver(
   shuffleId: Int,
   mapId: Int,
   lengths: Array[Long],
-  dataTmp: File): Unit = {
--- End diff --

Do we have to change the code in this function?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14985: [SPARK-17396][core] Share the task support between Union...

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14985
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65012/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14985: [SPARK-17396][core] Share the task support between Union...

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14985
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14985: [SPARK-17396][core] Share the task support between Union...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14985
  
**[Test build #65012 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65012/consoleFull)**
 for PR 14985 at commit 
[`89065fd`](https://github.com/apache/spark/commit/89065fd08cb2eb1e492571ce5980daa8f059a820).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...

2016-09-06 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/14712
  
@yhuai @hvanhovell @cloud-fan Sorry for the late response, I'm out of 
office for two days.
@gatorsmile Thanks to fix it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and ...

2016-09-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14960#discussion_r77747489
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -992,7 +992,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
 
 // This is a hack to enforce loading hdfs-site.xml.
 // See SPARK-11227 for details.
-FileSystem.get(new URI(path), hadoopConfiguration)
+FileSystem.get(new Path(path).toUri, hadoopConfiguration)
--- End diff --

cc - @sarutak WDYT? is my understanding correct?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14984: [SPARK-17296][SQL] Simplify parser join processing [BACK...

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14984
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65013/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #10970: [SPARK-13067][SQL] workaround for a weird scala reflecti...

2016-09-06 Thread atronchi

Github user atronchi commented on the issue:

https://github.com/apache/spark/pull/10970
  
The solution mentioned in [SPARK-17424] by @rdblue fixes this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14984: [SPARK-17296][SQL] Simplify parser join processing [BACK...

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14984
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14984: [SPARK-17296][SQL] Simplify parser join processing [BACK...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14984
  
**[Test build #65013 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65013/consoleFull)**
 for PR 14984 at commit 
[`cc74334`](https://github.com/apache/spark/commit/cc743345de45b7367509cde74098de0cedfac9a9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and ...

2016-09-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14960#discussion_r77747323
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -992,7 +992,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
 
 // This is a hack to enforce loading hdfs-site.xml.
 // See SPARK-11227 for details.
-FileSystem.get(new URI(path), hadoopConfiguration)
+FileSystem.get(new Path(path).toUri, hadoopConfiguration)
--- End diff --

As it is known that is hacky and ugly, maybe we can make this separate to 
another issue (although I am careful to say this)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14960: [SPARK-17339][SPARKR][CORE] Fix some R tests and ...

2016-09-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14960#discussion_r77747258
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -992,7 +992,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
 
 // This is a hack to enforce loading hdfs-site.xml.
 // See SPARK-11227 for details.
-FileSystem.get(new URI(path), hadoopConfiguration)
+FileSystem.get(new Path(path).toUri, hadoopConfiguration)
--- End diff --

Hm.. I didn't know it supports comma separated path. BTW, we still can use 
`spark.sparkContext.textFile(..)` though. 
I took a look and it seems okay though (but it's ugly and hacky).

If the first given path is okay, it seems working fine. It looks only 
`getScheme` and `getAuth` in `FileSystem.get(..)` (I track down the 
`FileSystem.get(..)` and related function calls.)

So, iff the first path is correct, it seems `getAuthority` and `getScheme` 
give a correct ones to get a file system.

For example, the path `http://localhost:8080/a/b,http://localhost:8081/c/d` 
parses the URI as below:

![2016-09-07 10 19 
11](https://cloud.githubusercontent.com/assets/6477701/18296462/d213126c-74e4-11e6-9859-e68e2d6f58cb.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-09-06 Thread clarkfitzg

Github user clarkfitzg commented on the issue:

https://github.com/apache/spark/pull/14783
  
I'm presenting something related to this on Thursday- it would be nice to 
tell the audience this patch made it in. Can I do anything to help this along?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14931
  
**[Test build #65016 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65016/consoleFull)**
 for PR 14931 at commit 
[`a62289e`](https://github.com/apache/spark/commit/a62289ebd47c7a91c3e8659bf13b1d940499dccb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14702
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65011/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14702
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14931: [SPARK-17370] Shuffle service files not invalidat...

2016-09-06 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/14931#discussion_r77746289
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
 ---
@@ -153,7 +153,7 @@ private[spark] class StandaloneSchedulerBackend(
   override def executorRemoved(fullId: String, message: String, 
exitStatus: Option[Int]) {
 val reason: ExecutorLossReason = exitStatus match {
   case Some(code) => ExecutorExited(code, exitCausedByApp = true, 
message)
-  case None => SlaveLost(message)
+  case None => SlaveLost(message, workerLost = true /* worker loss 
event from master */)
--- End diff --

Went with propagating just `workerLost` explicitly all the way from the 
master, since ExecutorState is private to deploy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14702
  
**[Test build #65011 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65011/consoleFull)**
 for PR 14702 at commit 
[`9afbd5e`](https://github.com/apache/spark/commit/9afbd5e2d2b08087596dc5d575935e4894b390bc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-06 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r77745578
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -72,9 +72,7 @@ class SessionCatalog(
 this(externalCatalog, new SimpleFunctionRegistry, new 
SimpleCatalystConf(true))
   }
 
-  /** List of temporary tables, mapping from table name to their logical 
plan. */
-  @GuardedBy("this")
-  protected val tempTables = new mutable.HashMap[String, LogicalPlan]
+  private val tempViews = new TempViewManager
--- End diff --

Since the goal of this PR is to add some view related API. So I think 
refactoring using TempViewManager is not the major goal?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14816: [SPARK-17245] [SQL] [BRANCH-1.6] Do not rely on Hive's s...

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14816
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14816: [SPARK-17245] [SQL] [BRANCH-1.6] Do not rely on Hive's s...

2016-09-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14816
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65014/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14816: [SPARK-17245] [SQL] [BRANCH-1.6] Do not rely on Hive's s...

2016-09-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14816
  
**[Test build #65014 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65014/consoleFull)**
 for PR 14816 at commit 
[`8b57886`](https://github.com/apache/spark/commit/8b57886c0489c759f0308a7b104f5b058204cdcd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14931: [SPARK-17370] Shuffle service files not invalidat...

2016-09-06 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14931#discussion_r77745305
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
 ---
@@ -153,7 +153,7 @@ private[spark] class StandaloneSchedulerBackend(
   override def executorRemoved(fullId: String, message: String, 
exitStatus: Option[Int]) {
 val reason: ExecutorLossReason = exitStatus match {
   case Some(code) => ExecutorExited(code, exitCausedByApp = true, 
message)
-  case None => SlaveLost(message)
+  case None => SlaveLost(message, workerLost = true /* worker loss 
event from master */)
--- End diff --

This assumes that `exitStatus == None` implies that a worker was lost, but 
there are some corner-cases where this isn't necessarily true (e.g. if an 
executor kill fails). Looking through both the 1.6.x and 2.0.x code, it appears 
that `ExecutorStatus.LOST` is used exclusively for denoting whole-worker-loss, 
so I think that we should check that status here instead of assuming `true`. 
Other than that minor corner-case, this patch looks good to me, so I'll merge 
once we fix this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 525 matches

Mail list logo