date:20180719

[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...

2018-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21821
  
**[Test build #93319 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93319/testReport)**
 for PR 21821 at commit 
[`9edc28f`](https://github.com/apache/spark/commit/9edc28fdcb7261f01db716f65e723668a493327e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21813: [SPARK-24424][SQL] Support ANSI-SQL compliant syn...

2018-07-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21813


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21813: [SPARK-24424][SQL] Support ANSI-SQL compliant syntax for...

2018-07-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21813
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...

2018-07-19 Thread eatoncys

Github user eatoncys commented on the issue:

https://github.com/apache/spark/pull/21823
  
@cloud-fan  Cast 'Key' to lower case is done by rule  of ResolveReferences:

![image](https://user-images.githubusercontent.com/26834091/42987332-7798ba3e-8c2b-11e8-9bed-d8be2ec7dad7.png)




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21049: [SPARK-23957][SQL] Remove redundant sort operators from ...

2018-07-19 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/21049
  
@gatorsmile Sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20057
  
**[Test build #93322 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93322/testReport)**
 for PR 20057 at commit 
[`bc75051`](https://github.com/apache/spark/commit/bc75051f5f4a47ef045c93bd933b2f95635100ad).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21813: [SPARK-24424][SQL] Support ANSI-SQL compliant syntax for...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21813
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93313/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21813: [SPARK-24424][SQL] Support ANSI-SQL compliant syntax for...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21813
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...

2018-07-19 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21823
  
> so the sql analyzer get the filed of 'key' from the original table 'src', 
which is lowercase.

shouldn't we always do it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21813: [SPARK-24424][SQL] Support ANSI-SQL compliant syntax for...

2018-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21813
  
**[Test build #93313 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93313/testReport)**
 for PR 21813 at commit 
[`2ecf3e1`](https://github.com/apache/spark/commit/2ecf3e183e51f13ec9c92692d29854f01eb327c5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21049: [SPARK-23957][SQL] Remove redundant sort operators from ...

2018-07-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21049
  
@dilipbiswal Can you take this over?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-07-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20057
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r203949498
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -105,7 +105,12 @@ object JdbcUtils extends Logging {
 val statement = conn.createStatement
 try {
   statement.setQueryTimeout(options.queryTimeout)
-  statement.executeUpdate(dialect.getTruncateQuery(options.table))
+  if (options.isCascadeTruncate.isDefined) {
+statement.executeUpdate(dialect.getTruncateQuery(options.table,
+  options.isCascadeTruncate))
+  } else {
+statement.executeUpdate(dialect.getTruncateQuery(options.table))
+  }
--- End diff --

+1 to the above comment of @dongjoon-hyun 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21805: [SPARK-24850][SQL] fix str representation of Cach...

2018-07-19 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21805#discussion_r203949905
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetCacheSuite.scala ---
@@ -206,4 +206,19 @@ class DatasetCacheSuite extends QueryTest with 
SharedSQLContext with TimeLimits
 // first time use, load cache
 checkDataset(df5, Row(10))
   }
+
+  test("SPARK-24850 InMemoryRelation string representation does not 
include cached plan") {
+val dummyQueryExecution = spark.range(0, 1).toDF().queryExecution
+val inMemoryRelation = InMemoryRelation(
+  true,
+  1000,
+  StorageLevel.MEMORY_ONLY,
+  dummyQueryExecution.sparkPlan,
+  Some("test-relation"),
+  dummyQueryExecution.logical)
+
+
assert(!inMemoryRelation.simpleString.contains(dummyQueryExecution.sparkPlan.toString))
+assert(inMemoryRelation.simpleString.contains(
+  "CachedRDDBuilder(true, 1000, StorageLevel(memory, deserialized, 1 
replicas))"))
--- End diff --

How about just comparing explain output results like the query in this pr 
description?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r203948972
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ---
@@ -120,11 +121,26 @@ abstract class JdbcDialect extends Serializable {
* The SQL query that should be used to truncate a table. Dialects can 
override this method to
* return a query that is suitable for a particular database. For 
PostgreSQL, for instance,
* a different query is used to prevent "TRUNCATE" affecting other 
tables.
-   * @param table The name of the table.
+   * @param table The table to truncate
* @return The SQL query to use for truncating a table
*/
   @Since("2.3.0")
   def getTruncateQuery(table: String): String = {
+getTruncateQuery(table, isCascadingTruncateTable)
+  }
+
+  /**
+   * The SQL query that should be used to truncate a table. Dialects can 
override this method to
+   * return a query that is suitable for a particular database. For 
PostgreSQL, for instance,
+   * a different query is used to prevent "TRUNCATE" affecting other 
tables.
+   * @param table The table to truncate
+   * @param cascade Whether or not to cascade the truncation
+   * @return The SQL query to use for truncating a table
+   */
+  @Since("2.4.0")
+  def getTruncateQuery(
+table: String,
+cascade: Option[Boolean] = isCascadingTruncateTable): String = {
--- End diff --

Nit: indent.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21804: [SPARK-24268][SQL] Use datatype.catalogString in ...

2018-07-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21804


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21804: [SPARK-24268][SQL] Use datatype.catalogString in error m...

2018-07-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21804
  
LGTM

Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21815: [SPARK-23731][SQL] Make FileSourceScanExec canonicalizab...

2018-07-19 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21815
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21815: [SPARK-23731][SQL] Make FileSourceScanExec canoni...

2018-07-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21815#discussion_r203947912
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/FileSourceScanExecSuite.scala
 ---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileSourceScanExecSuite extends SharedSQLContext {
+  test("FileSourceScanExec should be canonicalizable on executor side") {
--- End diff --

I'd like to put this test in `QueryPlanSuite`, with name `SPARK-: query 
plans can be serialized and deserialized`.

In the test we don't need to trigger a job, just call 
`spark.env.serializer` to serialize and deserialize the `FileSourceScanExec`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21804: [SPARK-24268][SQL] Use datatype.catalogString in ...

2018-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21804#discussion_r203947486
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala 
---
@@ -104,7 +104,7 @@ class HashingTF @Since("1.4.0") (@Since("1.4.0") 
override val uid: String)
   override def transformSchema(schema: StructType): StructType = {
 val inputType = schema($(inputCol)).dataType
 require(inputType.isInstanceOf[ArrayType],
-  s"The input column must be ArrayType, but got $inputType.")
+  s"The input column must be ${ArrayType.simpleString}, but got 
${inputType.catalogString}.")
--- End diff --

let us change all?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21804: [SPARK-24268][SQL] Use datatype.catalogString in ...

2018-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21804#discussion_r203947449
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -208,8 +208,9 @@ class FeatureHasher(@Since("2.3.0") override val uid: 
String) extends Transforme
   require(dataType.isInstanceOf[NumericType] ||
 dataType.isInstanceOf[StringType] ||
 dataType.isInstanceOf[BooleanType],
-s"FeatureHasher requires columns to be of NumericType, BooleanType 
or StringType. " +
-  s"Column $fieldName was $dataType")
+s"FeatureHasher requires columns to be of 
${NumericType.simpleString}, " +
--- End diff --

Let us change all?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21798: [SPARK-24836][SQL] New option for Avro datasource - igno...

2018-07-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21798
  
LGTM except a comment


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21823
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21823
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93312/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...

2018-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21823
  
**[Test build #93312 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93312/testReport)**
 for PR 21823 at commit 
[`2b2a5a3`](https://github.com/apache/spark/commit/2b2a5a33ed58ce07fd2515eb01e80acbedeb8b2a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongHashedRelation in ex...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21772
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93314/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongHashedRelation in ex...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21772
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21798: [SPARK-24836][SQL] New option for Avro datasource...

2018-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21798#discussion_r203946680
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala ---
@@ -275,10 +273,12 @@ private[avro] object AvroFileFormat {
 }
   }
 
-  def ignoreFilesWithoutExtensions(conf: Configuration): Boolean = {
-// Files without .avro extensions are not ignored by default
-val defaultValue = false
+  def ignoreExtension(conf: Configuration, options: AvroOptions): Boolean 
= {
--- End diff --

Can we move this to `object AvroOptions`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21815: [SPARK-23731][SQL] Make FileSourceScanExec canonicalizab...

2018-07-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21815
  
cc @gengliangwang @cloud-fan This needs a careful review. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21774: [SPARK-24811][SQL]Avro: add new function from_avr...

2018-07-19 Thread gengliangwang

Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21774#discussion_r203945945
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala ---
@@ -36,4 +40,27 @@ package object avro {
 @scala.annotation.varargs
 def avro(sources: String*): DataFrame = 
reader.format("avro").load(sources: _*)
   }
+
+  /**
+   * Converts a binary column of avro format into its corresponding 
catalyst value. The specified
+   * schema must match the read data, otherwise the behavior is undefined: 
it may fail or return
+   * arbitrary result.
+   *
+   * @param data the binary column.
+   * @param avroType the avro type.
+   */
+  @Experimental
+  def from_avro(data: Column, avroType: Schema): Column = {
--- End diff --

I am actually +1 with @HyukjinKwon . 
To make everyone happy, I will add another override API, but I don't think 
we should have more.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21775: [SPARK-24812][SQL] Last Access Time in the table descrip...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21775
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21775: [SPARK-24812][SQL] Last Access Time in the table descrip...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21775
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93311/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21805: [SPARK-24850][SQL] fix str representation of Cach...

2018-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21805#discussion_r203945646
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetCacheSuite.scala ---
@@ -206,4 +206,19 @@ class DatasetCacheSuite extends QueryTest with 
SharedSQLContext with TimeLimits
 // first time use, load cache
 checkDataset(df5, Row(10))
   }
+
+  test("SPARK-24850 InMemoryRelation string representation does not 
include cached plan") {
+val dummyQueryExecution = spark.range(0, 1).toDF().queryExecution
+val inMemoryRelation = InMemoryRelation(
+  true,
+  1000,
+  StorageLevel.MEMORY_ONLY,
+  dummyQueryExecution.sparkPlan,
+  Some("test-relation"),
+  dummyQueryExecution.logical)
+
+
assert(!inMemoryRelation.simpleString.contains(dummyQueryExecution.sparkPlan.toString))
+assert(inMemoryRelation.simpleString.contains(
+  "CachedRDDBuilder(true, 1000, StorageLevel(memory, deserialized, 1 
replicas))"))
--- End diff --

Or we might not need the batch size in the plan. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21805: [SPARK-24850][SQL] fix str representation of Cach...

2018-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21805#discussion_r203945605
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetCacheSuite.scala ---
@@ -206,4 +206,19 @@ class DatasetCacheSuite extends QueryTest with 
SharedSQLContext with TimeLimits
 // first time use, load cache
 checkDataset(df5, Row(10))
   }
+
+  test("SPARK-24850 InMemoryRelation string representation does not 
include cached plan") {
+val dummyQueryExecution = spark.range(0, 1).toDF().queryExecution
+val inMemoryRelation = InMemoryRelation(
+  true,
+  1000,
+  StorageLevel.MEMORY_ONLY,
+  dummyQueryExecution.sparkPlan,
+  Some("test-relation"),
+  dummyQueryExecution.logical)
+
+
assert(!inMemoryRelation.simpleString.contains(dummyQueryExecution.sparkPlan.toString))
+assert(inMemoryRelation.simpleString.contains(
+  "CachedRDDBuilder(true, 1000, StorageLevel(memory, deserialized, 1 
replicas))"))
--- End diff --

`true` and `1000` look confusing to end users. Can we improve it? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21775: [SPARK-24812][SQL] Last Access Time in the table descrip...

2018-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21775
  
**[Test build #93311 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93311/testReport)**
 for PR 21775 at commit 
[`b527fdc`](https://github.com/apache/spark/commit/b527fdc5919296ffa12e1be54367b9132ecee61e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21774: [SPARK-24811][SQL]Avro: add new function from_avr...

2018-07-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21774#discussion_r203945436
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala ---
@@ -36,4 +40,27 @@ package object avro {
 @scala.annotation.varargs
 def avro(sources: String*): DataFrame = 
reader.format("avro").load(sources: _*)
   }
+
+  /**
+   * Converts a binary column of avro format into its corresponding 
catalyst value. The specified
+   * schema must match the read data, otherwise the behavior is undefined: 
it may fail or return
+   * arbitrary result.
+   *
+   * @param data the binary column.
+   * @param avroType the avro type.
+   */
+  @Experimental
+  def from_avro(data: Column, avroType: Schema): Column = {
--- End diff --

ah sorry i thought you are talking about the `data` parameter.

Yes, for `avroType` parameter, we should have a string version


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21805: [SPARK-24850][SQL] fix str representation of CachedRDDBu...

2018-07-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21805
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21774: [SPARK-24811][SQL]Avro: add new function from_avr...

2018-07-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21774#discussion_r203943047
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala 
---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.avro.generic.GenericDatumReader
+import org.apache.avro.io.{BinaryDecoder, DecoderFactory}
+
+import org.apache.spark.sql.avro.{AvroDeserializer, SchemaConverters, 
SerializableSchema}
+import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, 
Expression, UnaryExpression}
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types.{AbstractDataType, BinaryType, DataType}
+
+case class AvroDataToCatalyst(child: Expression, avroType: 
SerializableSchema)
+  extends UnaryExpression with ExpectsInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(BinaryType)
+
+  override lazy val dataType: DataType =
+SchemaConverters.toSqlType(avroType.value).dataType
+
+  override def nullable: Boolean = true
+
+  @transient private lazy val reader = new 
GenericDatumReader[Any](avroType.value)
+
+  @transient private lazy val deserializer = new 
AvroDeserializer(avroType.value, dataType)
+
+  @transient private var decoder: BinaryDecoder = _
+
+  @transient private var result: Any = _
+
+  override def nullSafeEval(input: Any): Any = {
+val binary = input.asInstanceOf[Array[Byte]]
+decoder = DecoderFactory.get().binaryDecoder(binary, 0, binary.length, 
decoder)
+result = reader.read(result, decoder)
+deserializer.deserialize(result)
+  }
+
+  override def simpleString: String = {
+s"from_avro(${child.sql}, ${dataType.simpleString})"
--- End diff --

It's not used in `sql`, we can override `sql` here and use untruncated 
version.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21817: [SPARK-24861][SS][test] create corrected temp dir...

2018-07-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21817


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21817: [SPARK-24861][SS][test] create corrected temp directorie...

2018-07-19 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21817
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21822
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1156/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21822
  
**[Test build #93320 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93320/testReport)**
 for PR 21822 at commit 
[`83ffa51`](https://github.com/apache/spark/commit/83ffa51f4b165152dea214be4d73dd518d742a56).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-07-19 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20146
  
@HyukjinKwon Thanks. Closed and re-opened now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20146: [SPARK-11215][ML] Add multiple columns support to...

2018-07-19 Thread viirya

GitHub user viirya reopened a pull request:

https://github.com/apache/spark/pull/20146

[SPARK-11215][ML] Add multiple columns support to StringIndexer

## What changes were proposed in this pull request?

This takes over #19621 to add multi-column support to StringIndexer.

## How was this patch tested?

Added tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-11215

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20146.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20146


commit bb990f1a6511d8ce20f4fff254dfe0ff43262a10
Author: Liang-Chi Hsieh 
Date:   2018-01-03T03:51:59Z

Add multi-column support to StringIndexer.

commit 26cc94bb335cf0ba3bcdbc2b78effd447026792c
Author: Liang-Chi Hsieh 
Date:   2018-01-07T01:42:28Z

Fix glm test.

commit 540c364d2a70ecd6ee5b92fadedc5e9b85026d2c
Author: Liang-Chi Hsieh 
Date:   2018-01-16T08:20:19Z

Merge remote-tracking branch 'upstream/master' into SPARK-11215

commit 18acbbf7b70b87c75ba62be863580fe9accc23b4
Author: Liang-Chi Hsieh 
Date:   2018-01-24T12:03:26Z

Improve test cases.

commit b884fb5c0ce1e627390d08d8425721ea8e4d
Author: Liang-Chi Hsieh 
Date:   2018-01-27T00:58:06Z

Merge remote-tracking branch 'upstream/master' into SPARK-11215

commit 76ff7bf6054a687abd9fc16c8044020a5454d95f
Author: Liang-Chi Hsieh 
Date:   2018-04-19T11:27:21Z

Merge remote-tracking branch 'upstream/master' into SPARK-11215

commit 50af02eaccce7cecb7c3093d5bc14675ca860c22
Author: Liang-Chi Hsieh 
Date:   2018-04-19T11:30:46Z

Change from 2.3 to 2.4.

commit c1be2c7e28ebdfed580577a108d2f254834caed7
Author: Liang-Chi Hsieh 
Date:   2018-04-23T10:15:49Z

Address comments.

commit ed35d875414ba3cf8751a77463f61665e9c373b0
Author: Liang-Chi Hsieh 
Date:   2018-04-23T14:00:16Z

Address comment.

commit a1dcfda85243a1e2210177f2acfb78821c539b17
Author: Liang-Chi Hsieh 
Date:   2018-04-24T06:41:07Z

Use SQL Aggregator for counting string labels.

commit c1685228f7ec7f4904bca67efaee70498b9894c8
Author: Liang-Chi Hsieh 
Date:   2018-04-25T13:28:24Z

Merge remote-tracking branch 'upstream/master' into SPARK-11215

commit a6551b02a10428d66e0dadcfcb5a8da3798ec814
Author: Liang-Chi Hsieh 
Date:   2018-04-26T04:13:09Z

Drop NA values for both frequency and alphabet order types.

commit c003bd3d6c58cf19249ff0ba9dd10140971d655c
Author: Liang-Chi Hsieh 
Date:   2018-07-18T18:58:21Z

Merge remote-tracking branch 'upstream/master' into SPARK-11215




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20146: [SPARK-11215][ML] Add multiple columns support to...

2018-07-19 Thread viirya

Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/20146


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21821
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21774: [SPARK-24811][SQL]Avro: add new function from_avr...

2018-07-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21774#discussion_r203938957
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala ---
@@ -36,4 +40,27 @@ package object avro {
 @scala.annotation.varargs
 def avro(sources: String*): DataFrame = 
reader.format("avro").load(sources: _*)
   }
+
+  /**
+   * Converts a binary column of avro format into its corresponding 
catalyst value. The specified
+   * schema must match the read data, otherwise the behavior is undefined: 
it may fail or return
+   * arbitrary result.
+   *
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21774: [SPARK-24811][SQL]Avro: add new function from_avr...

2018-07-19 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21774#discussion_r203938964
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala 
---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.avro.generic.GenericDatumReader
+import org.apache.avro.io.{BinaryDecoder, DecoderFactory}
+
+import org.apache.spark.sql.avro.{AvroDeserializer, SchemaConverters, 
SerializableSchema}
+import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, 
Expression, UnaryExpression}
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types.{AbstractDataType, BinaryType, DataType}
+
+case class AvroDataToCatalyst(child: Expression, avroType: 
SerializableSchema)
+  extends UnaryExpression with ExpectsInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(BinaryType)
+
+  override lazy val dataType: DataType =
+SchemaConverters.toSqlType(avroType.value).dataType
+
+  override def nullable: Boolean = true
+
+  @transient private lazy val reader = new 
GenericDatumReader[Any](avroType.value)
+
+  @transient private lazy val deserializer = new 
AvroDeserializer(avroType.value, dataType)
+
+  @transient private var decoder: BinaryDecoder = _
+
+  @transient private var result: Any = _
+
+  override def nullSafeEval(input: Any): Any = {
+val binary = input.asInstanceOf[Array[Byte]]
+decoder = DecoderFactory.get().binaryDecoder(binary, 0, binary.length, 
decoder)
+result = reader.read(result, decoder)
+deserializer.deserialize(result)
+  }
+
+  override def simpleString: String = {
+s"from_avro(${child.sql}, ${dataType.simpleString})"
--- End diff --

but this is being used in `sql` though. Do we prefer truncated string form 
in `sql` too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21774: [SPARK-24811][SQL]Avro: add new function from_avr...

2018-07-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21774#discussion_r203938936
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala ---
@@ -36,4 +40,27 @@ package object avro {
 @scala.annotation.varargs
 def avro(sources: String*): DataFrame = 
reader.format("avro").load(sources: _*)
   }
+
+  /**
+   * Converts a binary column of avro format into its corresponding 
catalyst value. The specified
+   * schema must match the read data, otherwise the behavior is undefined: 
it may fail or return
+   * arbitrary result.
+   *
+   * @param data the binary column.
+   * @param avroType the avro type.
+   */
+  @Experimental
+  def from_avro(data: Column, avroType: Schema): Column = {
--- End diff --

`Column` can support arbitrary expressions. Personally I don't like these 
string-based overloads...

Since it's in an external module, I think we can only add python/R/SQL 
version when we have a persistent UDF API.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...

2018-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21821
  
**[Test build #93319 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93319/testReport)**
 for PR 21821 at commit 
[`9edc28f`](https://github.com/apache/spark/commit/9edc28fdcb7261f01db716f65e723668a493327e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21774: [SPARK-24811][SQL]Avro: add new function from_avr...

2018-07-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21774#discussion_r203938730
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala 
---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.avro.generic.GenericDatumReader
+import org.apache.avro.io.{BinaryDecoder, DecoderFactory}
+
+import org.apache.spark.sql.avro.{AvroDeserializer, SchemaConverters, 
SerializableSchema}
+import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, 
Expression, UnaryExpression}
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types.{AbstractDataType, BinaryType, DataType}
+
+case class AvroDataToCatalyst(child: Expression, avroType: 
SerializableSchema)
+  extends UnaryExpression with ExpectsInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(BinaryType)
+
+  override lazy val dataType: DataType =
+SchemaConverters.toSqlType(avroType.value).dataType
+
+  override def nullable: Boolean = true
+
+  @transient private lazy val reader = new 
GenericDatumReader[Any](avroType.value)
+
+  @transient private lazy val deserializer = new 
AvroDeserializer(avroType.value, dataType)
+
+  @transient private var decoder: BinaryDecoder = _
+
+  @transient private var result: Any = _
+
+  override def nullSafeEval(input: Any): Any = {
+val binary = input.asInstanceOf[Array[Byte]]
+decoder = DecoderFactory.get().binaryDecoder(binary, 0, binary.length, 
decoder)
+result = reader.read(result, decoder)
+deserializer.deserialize(result)
+  }
+
+  override def simpleString: String = {
+s"from_avro(${child.sql}, ${dataType.simpleString})"
--- End diff --

IIRC `simpleString` will be used in the plan string and should not be too 
long.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...

2018-07-19 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21823
  
do you know why it's happening? It's super weird that `select key from 
src_cache where positiveNum = 1` can hit the cache but `select key from 
src_cache` can not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21533: [SPARK-24195][Core] Ignore the files with "local" scheme...

2018-07-19 Thread xuanyuanking

Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/21533
  
Thanks everyone for your help!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21320
  
@HyukjinKwon We plan to merge this highly desirable feature into Spark 2.4 
release. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

2018-07-19 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21733
  
Now I'd like to propose changing default behavior to apply new path but 
keeping backward compatibility, so applied it to the patch. I'm still open on 
decision to apply it as advanced option as first approach, and happy to roll 
back when we decide on that way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21820: [SPARK-24868][PYTHON]add sequence function in Python

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21820
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21820: [SPARK-24868][PYTHON]add sequence function in Python

2018-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21820
  
**[Test build #93317 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93317/testReport)**
 for PR 21820 at commit 
[`725c1b7`](https://github.com/apache/spark/commit/725c1b74d13309a567698b24150d1e7e06662769).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21824: [SPARK-24871][SQL] Refactor Concat and MapConcat to avoi...

2018-07-19 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21824
  
also cc @cloud-fan @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21802: [SPARK-23928][SQL] Add shuffle collection function.

2018-07-19 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21802
  
cc @cloud-fan @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21809: [SPARK-24851] : Map a Stage ID to it's Associated...

2018-07-19 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21809#discussion_r203936074
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -94,6 +94,13 @@ private[spark] class AppStatusStore(
   }.toSeq
   }
 
+  def getJobIdsAssociatedWithStage(stageId: Int): Seq[Set[Int]] = {
--- End diff --

We don't need to fetch all the stage attempts, just the first of it is 
enough to get all the jobIds.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21818: [SPARK-24860][SQL] Support setting of partitionOv...

2018-07-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21818#discussion_r203934987
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala ---
@@ -486,6 +486,25 @@ class InsertSuite extends DataSourceTest with 
SharedSQLContext {
   }
 }
   }
+  test("SPARK-24860: dynamic partition overwrite specified per source 
without catalog table") {
+withTempPath { path =>
+  Seq((1, 1, 1)).toDF("i", "part1", "part2")
--- End diff --

can we simplify the test? ideally we only need one partition column, and 
write some initial data to the table. then do an overwrite with 
partitionOverwriteMode=dynamic, and another overwrite with 
partitionOverwriteMode=static.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21824: [SPARK-24871][SQL] Refactor Concat and MapConcat to avoi...

2018-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21824
  
**[Test build #93316 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93316/testReport)**
 for PR 21824 at commit 
[`644a30d`](https://github.com/apache/spark/commit/644a30d32e390f39f25515bb251e120e69502f27).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21820: [SPARK-24868][PYTHON]add sequence function in Python

2018-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21820
  
**[Test build #93317 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93317/testReport)**
 for PR 21820 at commit 
[`725c1b7`](https://github.com/apache/spark/commit/725c1b74d13309a567698b24150d1e7e06662769).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21818: [SPARK-24860][SQL] Support setting of partitionOv...

2018-07-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21818#discussion_r203934743
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
 ---
@@ -91,8 +92,12 @@ case class InsertIntoHadoopFsRelationCommand(
 
 val pathExists = fs.exists(qualifiedOutputPath)
 
-val enableDynamicOverwrite =
-  sparkSession.sessionState.conf.partitionOverwriteMode == 
PartitionOverwriteMode.DYNAMIC
+val parameters = CaseInsensitiveMap(options)
+
+val enableDynamicOverwrite = parameters.get("partitionOverwriteMode")
+  .map(mode => PartitionOverwriteMode.withName(mode.toUpperCase))
+  .getOrElse(sparkSession.sessionState.conf.partitionOverwriteMode) ==
+  PartitionOverwriteMode.DYNAMIC
--- End diff --

nit: this is too long
```
val partitionOverwriteMode = parameters.get("partitionOverwriteMode")
val enableDynamicOverwrite = partitionOverwriteMode == 
PartitionOverwriteMode.DYNAMIC
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21824: [SPARK-24871][SQL] Refactor Concat and MapConcat to avoi...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21824
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1153/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21818: [SPARK-24860][SQL] Support setting of partitionOv...

2018-07-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21818#discussion_r203934766
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala ---
@@ -486,6 +486,25 @@ class InsertSuite extends DataSourceTest with 
SharedSQLContext {
   }
 }
   }
+  test("SPARK-24860: dynamic partition overwrite specified per source 
without catalog table") {
--- End diff --

nit: add a blank file before this new test case


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21824: [SPARK-24871][SQL] Refactor Concat and MapConcat to avoi...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21824
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21824: [SPARK-24871][SQL] Refactor Concat and MapConcat to avoi...

2018-07-19 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21824
  
cc @mn-mikke @bersprockets


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-19 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r203934633
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
 ---
@@ -288,6 +310,27 @@ private[parquet] object ParquetReadSupport {
 }
   }
 
+  /**
+   * Computes the structural intersection between two Parquet group types.
+   */
+  private def intersectParquetGroups(
+  groupType1: GroupType, groupType2: GroupType): Option[GroupType] = {
+val fields =
+  groupType1.getFields.asScala
+.filter(field => groupType2.containsField(field.getName))
+.flatMap {
+  case field1: GroupType =>
--- End diff --

`field1` -> `field`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21824: [SPARK-24871][SQL] Refactor Concat and MapConcat ...

2018-07-19 Thread ueshin

GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/21824

[SPARK-24871][SQL] Refactor Concat and MapConcat to avoid creating 
concatenator object for each row.

## What changes were proposed in this pull request?

Refactor `Concat` and `MapConcat` to:

- avoid creating concatenator object for each row.
- make `Concat` handle `containsNull` properly.
- make `Concat` shortcut if `null` child is found.

## How was this patch tested?

Added some tests and existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark 
issues/SPARK-24871/refactor_concat_mapconcat

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21824.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21824


commit c4d4b28756d93ecc5c1e9a50e705e2a9cb4b8460
Author: Takuya UESHIN 
Date:   2018-07-19T08:32:15Z

Refactor `Concat` and `MapConcat` to avoid creating concatenator for each 
row.

commit 644a30d32e390f39f25515bb251e120e69502f27
Author: Takuya UESHIN 
Date:   2018-07-19T10:38:20Z

Make `Concat` shortcut if null child is found.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21820: [SPARK-24868][PYTHON]add sequence function in Pyt...

2018-07-19 Thread huaxingao

Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21820#discussion_r203934505
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2551,6 +2551,27 @@ def map_concat(*cols):
 return Column(jc)
 
 
+@since(2.4)
+def sequence(start, stop, step=None):
+"""
+Generate a sequence of integers from start to stop, incrementing by 
step.
+If step is not set, incrementing by 1 if start is less than or equal 
to stop, otherwise -1.
+
+>>> df1 = spark.createDataFrame([(-2, 2)], ('C1', 'C2'))
+>>> df1.select(sequence('C1', 'C2').alias('r')).collect()
+[Row(r=[-2, -1, 0, 1, 2])]
+>>> df2 = spark.createDataFrame([(4, -4, -2)], ('C1', 'C2', 'C3'))
--- End diff --

It will throw ```java.lang.IllegalArgumentException```. 
There is a check in ```collectionOperations.scala```
```
require(
  (step > num.zero && start <= stop)
|| (step < num.zero && start >= stop)
|| (step == num.zero && start == stop),
  s"Illegal sequence boundaries: $start to $stop by $step")
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-19 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r203934281
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
 ---
@@ -47,16 +47,25 @@ import org.apache.spark.sql.types._
  *
  * Due to this reason, we no longer rely on [[ReadContext]] to pass 
requested schema from [[init()]]
  * to [[prepareForRead()]], but use a private `var` for simplicity.
+ *
+ * @param parquetMrCompatibility support reading with parquet-mr or 
Spark's built-in Parquet reader
  */
-private[parquet] class ParquetReadSupport(val convertTz: Option[TimeZone])
+private[parquet] class ParquetReadSupport(val convertTz: Option[TimeZone],
+parquetMrCompatibility: Boolean)
 extends ReadSupport[UnsafeRow] with Logging {
   private var catalystRequestedSchema: StructType = _
 
+  /**
+   * Construct a [[ParquetReadSupport]] with [[convertTz]] set to [[None]] 
and
+   * [[parquetMrCompatibility]] set to [[false]].
+   *
+   * We need a zero-arg constructor for SpecificParquetRecordReaderBase.  
But that is only
+   * used in the vectorized reader, where we get the convertTz value 
directly, and the value here
+   * is ignored. Further, we set [[parquetMrCompatibility]] to [[false]] 
as this constructor is only
+   * called by the Spark reader.
--- End diff --

re: 
https://github.com/apache/spark/pull/21320/files/cb858f202e49d69f2044681e37f982dc10676296#r199631341
 actually, it doesn't looks clear to me too. What does the flag indicate? you 
mean normal parquet reader vs vectorized parquet reader?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-19 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r203933423
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala
 ---
@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.planning
+
+import org.scalatest.BeforeAndAfterAll
+import org.scalatest.exceptions.TestFailedException
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.NamedExpression
+import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.types._
+
+// scalastyle:off line.size.limit
+class SelectedFieldSuite extends SparkFunSuite with BeforeAndAfterAll {
+  // The test schema as a tree string, i.e. `schema.treeString`
+  // root
+  //  |-- col1: string (nullable = false)
+  //  |-- col2: struct (nullable = true)
+  //  ||-- field1: integer (nullable = true)
+  //  ||-- field2: array (nullable = true)
+  //  |||-- element: integer (containsNull = false)
+  //  ||-- field3: array (nullable = false)
+  //  |||-- element: struct (containsNull = true)
+  //  ||||-- subfield1: integer (nullable = true)
+  //  ||||-- subfield2: integer (nullable = true)
+  //  ||||-- subfield3: array (nullable = true)
+  //  |||||-- element: integer (containsNull = true)
+  //  ||-- field4: map (nullable = true)
+  //  |||-- key: string
+  //  |||-- value: struct (valueContainsNull = false)
+  //  ||||-- subfield1: integer (nullable = true)
+  //  ||||-- subfield2: array (nullable = true)
+  //  |||||-- element: integer (containsNull = false)
+  //  ||-- field5: array (nullable = false)
+  //  |||-- element: struct (containsNull = true)
+  //  ||||-- subfield1: struct (nullable = false)
+  //  |||||-- subsubfield1: integer (nullable = true)
+  //  |||||-- subsubfield2: integer (nullable = true)
+  //  ||||-- subfield2: struct (nullable = true)
+  //  |||||-- subsubfield1: struct (nullable = true)
+  //  ||||||-- subsubsubfield1: string (nullable = 
true)
+  //  |||||-- subsubfield2: integer (nullable = true)
+  //  ||-- field6: struct (nullable = true)
+  //  |||-- subfield1: string (nullable = false)
+  //  |||-- subfield2: string (nullable = true)
+  //  ||-- field7: struct (nullable = true)
+  //  |||-- subfield1: struct (nullable = true)
+  //  ||||-- subsubfield1: integer (nullable = true)
+  //  ||||-- subsubfield2: integer (nullable = true)
+  //  ||-- field8: map (nullable = true)
+  //  |||-- key: string
+  //  |||-- value: array (valueContainsNull = false)
+  //  ||||-- element: struct (containsNull = true)
+  //  |||||-- subfield1: integer (nullable = true)
+  //  |||||-- subfield2: array (nullable = true)
+  //  ||||||-- element: integer (containsNull = false)
+  //  ||-- field9: map (nullable = true)
+  //  |||-- key: string
+  //  |||-- value: integer (valueContainsNull = false)
+  //  |-- col3: array (nullable = false)
+  //  ||-- element: struct (containsNull = false)
+  //  |||-- field1: struct (nullable = true)
+  //  ||||-- subfield1: integer (nullable = false)
+  //  ||||-- subfield2: integer (nullable = true)
+  //  |||-- field2: map (nullable = true)
+  //  ||||-- key: string
+  //  ||||-- value: integer (valueConta

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-19 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r203933307
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala
 ---
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.parquet
+
+import java.io.File
+
+import org.apache.spark.sql.{QueryTest, Row}
+import org.apache.spark.sql.execution.FileSchemaPruningTest
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.SharedSQLContext
+
+class ParquetSchemaPruningSuite
+extends QueryTest
+with ParquetTest
+with FileSchemaPruningTest
+with SharedSQLContext {
--- End diff --

Can we just simply:

```scala
  override def beforeEach(): Unit = {
super.beforeEach()
spark.conf.set(SQLConf. NESTED_SCHEMA_PRUNING_ENABLED.key, "true")
  }

  override def afterEach(): Unit = {
try {
  spark.conf.unset(SQLConf.NESTED_SCHEMA_PRUNING_ENABLED.key)
} finally {
  super.afterEach()
}
  }
```

without the complicated hierarchy?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21653
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93302/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

2018-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21733
  
**[Test build #93315 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93315/testReport)**
 for PR 21733 at commit 
[`977428c`](https://github.com/apache/spark/commit/977428cb35a6fc0a9fa7a0ca1a51e39a94447a01).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21653
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-19 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r203931755
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.planning
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types._
+
+/**
+ * A Scala extractor that projects an expression over a given schema. Data 
types,
+ * field indexes and field counts of complex type extractors and attributes
+ * are adjusted to fit the schema. All other expressions are left as-is. 
This
+ * class is motivated by columnar nested schema pruning.
+ */
+case class ProjectionOverSchema(schema: StructType) {
--- End diff --

This still looks weird that we place this under `catalyst` since we 
currently only use it under `execution`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21813: [SPARK-24424][SQL] Support ANSI-SQL compliant syntax for...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21813
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93304/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21813: [SPARK-24424][SQL] Support ANSI-SQL compliant syntax for...

2018-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21813
  
**[Test build #93304 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93304/testReport)**
 for PR 21813 at commit 
[`e0c57f7`](https://github.com/apache/spark/commit/e0c57f73ec4c3e24e4af107cb457c9b1c13f4174).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-19 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r203931060
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.planning
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types._
+
+/**
+ * A Scala extractor that projects an expression over a given schema. Data 
types,
+ * field indexes and field counts of complex type extractors and attributes
+ * are adjusted to fit the schema. All other expressions are left as-is. 
This
+ * class is motivated by columnar nested schema pruning.
+ */
+case class ProjectionOverSchema(schema: StructType) {
+  private val fieldNames = schema.fieldNames.toSet
+
+  def unapply(expr: Expression): Option[Expression] = getProjection(expr)
+
+  private def getProjection(expr: Expression): Option[Expression] =
+expr match {
+  case a @ AttributeReference(name, _, _, _) if 
(fieldNames.contains(name)) =>
+Some(a.copy(dataType = schema(name).dataType)(a.exprId, 
a.qualifier))
+  case GetArrayItem(child, arrayItemOrdinal) =>
+getProjection(child).map {
+  case projection =>
+GetArrayItem(projection, arrayItemOrdinal)
--- End diff --

nit:

```
getProjection(child).map { projection => GetArrayItem(projection, 
arrayItemOrdinal) }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-19 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r203931030
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.planning
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types._
+
+/**
+ * A Scala extractor that projects an expression over a given schema. Data 
types,
+ * field indexes and field counts of complex type extractors and attributes
+ * are adjusted to fit the schema. All other expressions are left as-is. 
This
+ * class is motivated by columnar nested schema pruning.
+ */
+case class ProjectionOverSchema(schema: StructType) {
+  private val fieldNames = schema.fieldNames.toSet
+
+  def unapply(expr: Expression): Option[Expression] = getProjection(expr)
+
+  private def getProjection(expr: Expression): Option[Expression] =
+expr match {
+  case a @ AttributeReference(name, _, _, _) if 
(fieldNames.contains(name)) =>
+Some(a.copy(dataType = schema(name).dataType)(a.exprId, 
a.qualifier))
+  case GetArrayItem(child, arrayItemOrdinal) =>
+getProjection(child).map {
+  case projection =>
+GetArrayItem(projection, arrayItemOrdinal)
+}
+  case GetArrayStructFields(child, StructField(name, _, _, _), _, 
numFields, containsNull) =>
+getProjection(child).map(p => (p, p.dataType)).map {
+  case (projection, ArrayType(projSchema @ StructType(_), _)) =>
+GetArrayStructFields(projection,
+  projSchema(name), projSchema.fieldIndex(name), 
projSchema.size, containsNull)
+}
+  case GetMapValue(child, key) =>
+getProjection(child).map {
+  case projection =>
--- End diff --

nit:

```scala
getProjection(child).map { projection => GetMapValue(projection, key) }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21822
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21822
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93310/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21822
  
**[Test build #93310 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93310/testReport)**
 for PR 21822 at commit 
[`738e99c`](https://github.com/apache/spark/commit/738e99c615f45cfb6e5882a822dafcb8472b78ea).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21474: [SPARK-24297][CORE] Fetch-to-disk by default for > 2gb

2018-07-19 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21474
  
I will take a look at this sometime day, but don't block on me if it is 
urgent.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21533: [SPARK-24195][Core] Ignore the files with "local"...

2018-07-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21533


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21533: [SPARK-24195][Core] Ignore the files with "local" scheme...

2018-07-19 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21533
  
Merging to master branch. Thanks all!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21533: [SPARK-24195][Core] Ignore the files with "local" scheme...

2018-07-19 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21533
  
SGTM too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21821
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93309/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21821
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongHashedRelation in ex...

2018-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21772
  
**[Test build #93314 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93314/testReport)**
 for PR 21772 at commit 
[`f67ff4d`](https://github.com/apache/spark/commit/f67ff4d91764e67f811ebfb7c02ab466153a5b02).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongHashedRelation in ex...

2018-07-19 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/21772
  
Jenkins, test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21820: [SPARK-24868][PYTHON]add sequence function in Pyt...

2018-07-19 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21820#discussion_r203924999
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2551,6 +2551,27 @@ def map_concat(*cols):
 return Column(jc)
 
 
+@since(2.4)
+def sequence(start, stop, step=None):
+"""
+Generate a sequence of integers from start to stop, incrementing by 
step.
+If step is not set, incrementing by 1 if start is less than or equal 
to stop, otherwise -1.
+
+>>> df1 = spark.createDataFrame([(-2, 2)], ('C1', 'C2'))
+>>> df1.select(sequence('C1', 'C2').alias('r')).collect()
+[Row(r=[-2, -1, 0, 1, 2])]
+>>> df2 = spark.createDataFrame([(4, -4, -2)], ('C1', 'C2', 'C3'))
--- End diff --

What happens if the `step` is positive in this case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21813: [SPARK-24424][SQL] Support ANSI-SQL compliant syntax for...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21813
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21813: [SPARK-24424][SQL] Support ANSI-SQL compliant syntax for...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21813
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1152/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-19 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r203924609
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper {
 
 // Simplify the predicates before validating any unsupported 
correlation patterns
 // in the plan.
-BooleanSimplification(sub).foreachUp {
+// TODO(rxin): Why did this need to call BooleanSimplification???
--- End diff --

@rxin From what i remember Reynold, most of this logic was housed in 
Analyzer before and we moved it to optimizer. In the old code we used to walk 
the plan after simplifying the predicates. The comment used to read "Simplify 
the predicates before pulling them out.". I just retained that semantics.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21767: SPARK-24804 There are duplicate words in the test title ...

2018-07-19 Thread httfighter

Github user httfighter commented on the issue:

https://github.com/apache/spark/pull/21767
  
Thank you for your comments, @srowen @HyukjinKwon @wangyum. I will try to 
contribute more valuable issues!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21746: [SPARK-24699] [SS]Make watermarks work with Trigger.Once...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21746
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21746: [SPARK-24699] [SS]Make watermarks work with Trigger.Once...

2018-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21746
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93300/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 529 matches

Mail list logo