[GitHub] spark pull request #13756: [SPARK-16041][SQL] Disallow Duplicate Columns in ...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13756#discussion_r69541474
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -248,4 +281,15 @@ private[sql] case class PreWriteCheck(conf: SQLConf, 
catalog: SessionCatalog)
   case _ => // OK
 }
   }
+
+  private def checkDuplicates(columnNames: Seq[String], columnType: 
String): Unit = {
+val duplicateColumns = columnNames.groupBy { name =>
+  if (conf.caseSensitiveAnalysis) name else name.toLowerCase }.collect 
{
+  case (x, ys) if ys.length > 1 => s"`$x`"
+}
+if (duplicateColumns.nonEmpty) {
+  throw new AnalysisException(
+s"Found duplicate column(s) in $columnType: 
${duplicateColumns.mkString(", ")}")
--- End diff --

Nit: duplicate => duplicated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13756: [SPARK-16041][SQL] Disallow Duplicate Columns in ...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13756#discussion_r69541158
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -206,7 +207,39 @@ private[sql] case class PreWriteCheck(conf: SQLConf, 
catalog: SessionCatalog)
 // The relation in l is not an InsertableRelation.
 failAnalysis(s"$l does not allow insertion.")
 
+  case c: CreateTableCommand =>
+val allColNamesInSchema = c.table.schema.map(_.name)
+val colNames = 
allColNamesInSchema.diff(c.table.partitionColumnNames)
+val partitionColumnNames = c.table.partitionColumnNames
+// Duplicates are not allowed in partitionBy
+// Todo: when bucketBy and sortBy are supported, we also need to 
ban the duplication.
+checkDuplicates(partitionColumnNames, "Partition")
+// Ensuring whether no duplicate name is used in table definition
+checkDuplicates(colNames, s"table definition of 
${c.table.identifier}")
+// For non-data-source tables, partition columns must not be part 
of the schema
+val badPartCols = 
partitionColumnNames.toSet.intersect(colNames.toSet)
--- End diff --

Again, this line is using case sensitive comparison.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13756: [SPARK-16041][SQL] Disallow Duplicate Columns in ...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13756#discussion_r69540936
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -248,4 +281,15 @@ private[sql] case class PreWriteCheck(conf: SQLConf, 
catalog: SessionCatalog)
   case _ => // OK
 }
   }
+
+  private def checkDuplicates(columnNames: Seq[String], columnType: 
String): Unit = {
+val duplicateColumns = columnNames.groupBy { name =>
+  if (conf.caseSensitiveAnalysis) name else name.toLowerCase }.collect 
{
+  case (x, ys) if ys.length > 1 => s"`$x`"
+}
--- End diff --

Code style within this block looks weird.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13756: [SPARK-16041][SQL] Disallow Duplicate Columns in ...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13756#discussion_r69540754
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -206,7 +207,39 @@ private[sql] case class PreWriteCheck(conf: SQLConf, 
catalog: SessionCatalog)
 // The relation in l is not an InsertableRelation.
 failAnalysis(s"$l does not allow insertion.")
 
+  case c: CreateTableCommand =>
+val allColNamesInSchema = c.table.schema.map(_.name)
+val colNames = 
allColNamesInSchema.diff(c.table.partitionColumnNames)
--- End diff --

Is it safe to do case sensitive comparison here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13756: [SPARK-16041][SQL] Disallow Duplicate Columns in ...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13756#discussion_r69528941
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -867,8 +865,8 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   }
 
   /**
-   * Create a table, returning either a [[CreateTableCommand]] or a
-   * [[CreateHiveTableAsSelectLogicalPlan]].
+   * Create a table, returning either a [[CreateTableCommand]], a
--- End diff --

Nit: Remove "either".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13389: [SPARK-9876][SQL][FOLLOWUP] Enable string and binary tes...

2016-07-05 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13389
  
LGTM, merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13389: [SPARK-9876][SQL][FOLLOWUP] Enable string and bin...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13389#discussion_r69527639
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystWriteSupport.scala
 ---
@@ -150,7 +150,8 @@ private[parquet] class CatalystWriteSupport extends 
WriteSupport[InternalRow] wi
 
   case StringType =>
 (row: SpecializedGetters, ordinal: Int) =>
-  
recordConsumer.addBinary(Binary.fromByteArray(row.getUTF8String(ordinal).getBytes))
+  recordConsumer.addBinary(
+
Binary.fromReusedByteArray(row.getUTF8String(ordinal).getBytes))
--- End diff --

`UTF8String` itself is immutable, but the underlying buffer it points to 
can be mutable. I'd vote for using `Binary.fromReusedByteArray` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-05 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14044
  
Merged to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14012: [SPARK-16343][SQL] Improve the PushDownPredicate ...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14012#discussion_r69522538
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1106,12 +1106,15 @@ object PushDownPredicate extends Rule[LogicalPlan] 
with PredicateHelper {
 // Push [[Filter]] operators through [[Window]] operators. Parts of 
the predicate that can be
 // pushed beneath must satisfy the following two conditions:
 // 1. All the expressions are part of window partitioning key. The 
expressions can be compound.
-// 2. Deterministic
+// 2. Deterministic.
+// 3. Placed before any non-deterministic predicates.
 case filter @ Filter(condition, w: Window)
 if w.partitionSpec.forall(_.isInstanceOf[AttributeReference]) =>
   val partitionAttrs = 
AttributeSet(w.partitionSpec.flatMap(_.references))
+  var isPredicatePushdownAble = true
   val (pushDown, stayUp) = 
splitConjunctivePredicates(condition).partition { cond =>
-cond.references.subsetOf(partitionAttrs) && cond.deterministic &&
+isPredicatePushdownAble = isPredicatePushdownAble && 
cond.deterministic
+isPredicatePushdownAble && 
cond.references.subsetOf(partitionAttrs) &&
--- End diff --

And we should move the `partitionAttrs.forall(_.isInstanceOf[Attribute])` 
predicate out of the closure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14012: [SPARK-16343][SQL] Improve the PushDownPredicate ...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14012#discussion_r69522366
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1106,12 +1106,15 @@ object PushDownPredicate extends Rule[LogicalPlan] 
with PredicateHelper {
 // Push [[Filter]] operators through [[Window]] operators. Parts of 
the predicate that can be
 // pushed beneath must satisfy the following two conditions:
 // 1. All the expressions are part of window partitioning key. The 
expressions can be compound.
-// 2. Deterministic
+// 2. Deterministic.
+// 3. Placed before any non-deterministic predicates.
 case filter @ Filter(condition, w: Window)
 if w.partitionSpec.forall(_.isInstanceOf[AttributeReference]) =>
   val partitionAttrs = 
AttributeSet(w.partitionSpec.flatMap(_.references))
+  var isPredicatePushdownAble = true
   val (pushDown, stayUp) = 
splitConjunctivePredicates(condition).partition { cond =>
-cond.references.subsetOf(partitionAttrs) && cond.deterministic &&
+isPredicatePushdownAble = isPredicatePushdownAble && 
cond.deterministic
+isPredicatePushdownAble && 
cond.references.subsetOf(partitionAttrs) &&
--- End diff --

The following can be easier to read:

```scala
val (candidates, containingNonDeterministic) =
  splitConjunctivePredicates(condition).span(_.deterministic)

val (pushDown, rest) = candidates.partition { cond =>
  cond.references.subsetOf(partitionAttrs) && 
partitionAttrs.forall(_.isInstanceOf[Attribute])
}

val stayUp = rest ++ containingNonDeterministic
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14012: [SPARK-16343][SQL] Improve the PushDownPredicate rule to...

2016-07-05 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14012
  
add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14038#discussion_r69520473
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala
 ---
@@ -437,11 +442,26 @@ private[sql] object HadoopFsRelation extends Logging {
   accessTime: Long,
   blockLocations: Array[FakeBlockLocation])
 
+  private[sql] def mergePathFilter(
+  filter1: Option[PathFilter], filter2: Option[PathFilter]): Path => 
Boolean = {
+(filter1, filter2) match {
+  case (Some(f1), Some(f2)) =>
+(path: Path) => f1.accept(path) && f2.accept(path)
+  case (Some(f1), None) =>
+(path: Path) => f1.accept(path)
+  case (None, Some(f2)) =>
+(path: Path) => f2.accept(path)
+  case (None, None) =>
+(path: Path) => true
+}
--- End diff --

This can be conciser:

```scala
(filter1 ++ filter2).reduceOption { (f1, f2) =>
  (path: Path) => f1.accept(path) && f2.accept(path)
}.getOrElse {
  (path: Path) => true
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...

2016-07-05 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14038
  
Left some comments, the overall structure looks good. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14038#discussion_r69520156
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala
 ---
@@ -172,6 +171,13 @@ case class HadoopFsRelation(
 }
 
 /**
+ * A helper class to list up qualified files in parallel.
+ */
+private[spark] abstract class SerializablePathFilter extends PathFilter 
with Serializable {
--- End diff --

Oh I see, because parallel file listing may filter input files on executor 
side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14038#discussion_r69519931
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala
 ---
@@ -172,6 +171,13 @@ case class HadoopFsRelation(
 }
 
 /**
+ * A helper class to list up qualified files in parallel.
+ */
+private[spark] abstract class SerializablePathFilter extends PathFilter 
with Serializable {
--- End diff --

I probably missed something here, but why it has to be `Serializable`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14038#discussion_r69519846
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala
 ---
@@ -230,6 +236,15 @@ trait FileFormat {
   }
 
   /**
+   * Return a `SerializablePathFilter` class to filter qualified files for 
this format.
+   */
+  def getPathFilter(): SerializablePathFilter = {
--- End diff --

What is the semantics of the return value of the method? Seems that it 
should never return a null filter since it defaults to an "accept all" filter. 
If this is true, it's unnecessary to use `Option` to wrap returned filters 
elsewhere in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14038#discussion_r69519641
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala
 ---
@@ -172,6 +171,13 @@ case class HadoopFsRelation(
 }
 
 /**
+ * A helper class to list up qualified files in parallel.
+ */
+private[spark] abstract class SerializablePathFilter extends PathFilter 
with Serializable {
--- End diff --

Extending from `PathFilter` makes internal implementation a little bit 
easier, but I'd prefer to avoid depending on Hadoop classes/interfaces in Spark 
SQL public interfaces whenever possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...

2016-07-05 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14038#discussion_r69518421
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala
 ---
@@ -230,6 +236,15 @@ trait FileFormat {
   }
 
   /**
+   * Return a `SerializablePathFilter` class to filter qualified files for 
this format.
+   */
+  def getPathFilter(): SerializablePathFilter = {
--- End diff --

Shall we add either the data source options map or the Hadoop conf as an 
argument of this method?

For example, the Avro data source may filter out all input files whose file 
names don't end with ".avro" if Hadoop conf 
"avro.mapred.ignore.inputs.without.extension" is set to true. This is 
consistent with default behavior of `AvroInputFormat`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12972: [SPARK-15198][SQL] Support for pushing down filters for ...

2016-07-04 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/12972
  
LGTM, merging to master. Sorry for leaving this PR for so long...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14044
  
LGTM pending Jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14044
  
Agree with @hvanhovell. Analysis should never take so long a time for such 
a simple query. We should avoid duplicated analysis work, but fixing 
performance issue(s) within the analyzer seems to be more resultful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13818: [SPARK-15968][SQL] Nonempty partitioned metastore tables...

2016-07-04 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13818
  
Shall we also have this in branch-2.0? This seems to be a pretty serious 
bug. cc @rxin.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13818: [SPARK-15968][SQL] Nonempty partitioned metastore tables...

2016-07-04 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13818
  
LGTM except for minor styling issues. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13818: [SPARK-15968][SQL] Nonempty partitioned metastore...

2016-07-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13818#discussion_r69448268
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/parquetSuites.scala ---
@@ -425,6 +425,28 @@ class ParquetMetastoreSuite extends 
ParquetPartitioningTest {
 }
   }
 
+  test("SPARK-15968: nonempty partitioned metastore Parquet table lookup 
should use cached " +
+  "relation") {
+withTable("partitioned") {
+  sql(
+s"""CREATE TABLE partitioned (
+   | key INT,
+   | value STRING
+   |)
+   |PARTITIONED BY (part INT)
+   |STORED AS PARQUET
+   """.stripMargin)
+  sql("INSERT INTO TABLE partitioned PARTITION(part=0) SELECT 1 as 
key, 'one' as value")
+
+  // First lookup fills the cache
+  val r1 = collectHadoopFsRelation (table("partitioned"))
+  // Second lookup should reuse the cache
+  val r2 = collectHadoopFsRelation (table("partitioned"))
--- End diff --

Nit: Remove space before `(`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13818: [SPARK-15968][SQL] Nonempty partitioned metastore...

2016-07-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13818#discussion_r69448259
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/parquetSuites.scala ---
@@ -425,6 +425,28 @@ class ParquetMetastoreSuite extends 
ParquetPartitioningTest {
 }
   }
 
+  test("SPARK-15968: nonempty partitioned metastore Parquet table lookup 
should use cached " +
+  "relation") {
+withTable("partitioned") {
+  sql(
+s"""CREATE TABLE partitioned (
+   | key INT,
+   | value STRING
+   |)
+   |PARTITIONED BY (part INT)
+   |STORED AS PARQUET
+   """.stripMargin)
+  sql("INSERT INTO TABLE partitioned PARTITION(part=0) SELECT 1 as 
key, 'one' as value")
+
+  // First lookup fills the cache
+  val r1 = collectHadoopFsRelation (table("partitioned"))
--- End diff --

Nit: Remove space before `(`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13818: [SPARK-15968][SQL] Nonempty partitioned metastore...

2016-07-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13818#discussion_r69448203
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/parquetSuites.scala ---
@@ -425,6 +425,28 @@ class ParquetMetastoreSuite extends 
ParquetPartitioningTest {
 }
   }
 
+  test("SPARK-15968: nonempty partitioned metastore Parquet table lookup 
should use cached " +
+  "relation") {
+withTable("partitioned") {
+  sql(
+s"""CREATE TABLE partitioned (
+   | key INT,
+   | value STRING
+   |)
+   |PARTITIONED BY (part INT)
+   |STORED AS PARQUET
+   """.stripMargin)
--- End diff --

Nit: Indentation is off here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14025: [DOC][SQL] update out-of-date code snippets using SQLCon...

2016-07-04 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14025
  
@tdas Would you please to help review streaming example code changes? 
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14025: [DOC][SQL] update out-of-date code snippets using...

2016-07-04 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14025#discussion_r69415295
  
--- Diff: docs/configuration.md ---
@@ -1564,8 +1564,8 @@ spark.sql("SET -v").show(n=200, truncate=False)
 
 
 {% highlight r %}
-# sqlContext is an existing sqlContext.
-properties <- sql(sqlContext, "SET -v")
+# spark is an existing SparkSession.
+properties <- sql(spark, "SET -v")
--- End diff --

I believe in the most recent SparkR API, you don't need to pass in the 
`spark` parameter to run a SQL statement now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14025: [WIP][DOC] update out-of-date code snippets using SQLCon...

2016-07-03 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14025
  
@WeichenXu123 Is this ready for review? If yes, please remove the WIP tag 
in the PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14009: [SPARK-16311][SQL] Metadata refresh should work on tempo...

2016-07-01 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14009
  
LGTM except for a minor styling issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14009: [SPARK-16311][SQL] Metadata refresh should work o...

2016-07-01 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14009#discussion_r69316671
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/MetadataCacheSuite.scala ---
@@ -77,12 +77,12 @@ class MetadataCacheSuite extends QueryTest with 
SharedSQLContext {
 sql("select count(*) from view_refresh").first()
   }
   assert(e.getMessage.contains("FileNotFoundException"))
-  assert(e.getMessage.contains("refresh()"))
+  assert(e.getMessage.contains("REFRESH"))
 
   // Refresh and we should be able to read it again.
   spark.catalog.refreshTable("view_refresh")
   val newCount = sql("select count(*) from 
view_refresh").first().getLong(0)
   assert(newCount > 0 && newCount < 100)
-}
+}}
--- End diff --

This style is pretty weird...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `PropagateEmptyRelation` optimize...

2016-07-01 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13906
  
Merged to master.

@cloud-fan Sorry that I didn't notice your comment while merging it. We may 
address it in follow-up ones.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14013: [SPARK-16344][SQL][BRANCH-1.6] Decoding Parquet array of...

2016-07-01 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14013
  
@rdblue Verified that parquet-avro also suffers from this issue. Filed 
[PARQUET-651][1] to track it.

[1]: https://issues.apache.org/jira/browse/PARQUET-651


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14013: [SPARK-16344][SQL][BRANCH-1.6] Decoding Parquet array of...

2016-07-01 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14013
  
@rdblue Would you mind to help review this one? My initial investigation 
suggested that parquet-avro probably suffers the same issue. Will file a 
parquet-mr JIRA ticket soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-01 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14014
  
cc @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14014: [SPARK-16344][SQL] Decoding Parquet array of stru...

2016-07-01 Thread liancheng
GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/14014

[SPARK-16344][SQL] Decoding Parquet array of struct with a single field 
named "element"

## What changes were proposed in this pull request?

This PR ports #14013 to master and branch-2.0.

## How was this patch tested?

See #14013.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark spark-16344-for-master-and-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14014.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14014


commit 3bfe45fe8b81f44141b737df6b292f12cd37d06a
Author: Cheng Lian <l...@databricks.com>
Date:   2016-07-01T11:32:52Z

Fixes SPARK-16344




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14013: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-01 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14013
  
cc @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14013: [SPARK-16344][SQL] Decoding Parquet array of stru...

2016-07-01 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14013#discussion_r69283877
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala
 ---
@@ -481,13 +481,106 @@ private[parquet] class CatalystRowConverter(
  */
 // scalastyle:on
 private def isElementType(
-parquetRepeatedType: Type, catalystElementType: DataType, 
parentName: String): Boolean = {
+parquetRepeatedType: Type, catalystElementType: DataType, parent: 
GroupType): Boolean = {
+
+  def isStandardListLayout(t: GroupType): Boolean =
+Option(parent.getOriginalType) == Some(LIST) &&
+  t.getFieldCount == 1 &&
+  t.getName == "list" &&
+  t.getFieldName(0) == "element"
+
   (parquetRepeatedType, catalystElementType) match {
-case (t: PrimitiveType, _) => true
-case (t: GroupType, _) if t.getFieldCount > 1 => true
-case (t: GroupType, _) if t.getFieldCount == 1 && t.getName == 
"array" => true
-case (t: GroupType, _) if t.getFieldCount == 1 && t.getName == 
parentName + "_tuple" => true
-case (t: GroupType, StructType(Array(f))) if f.name == 
t.getFieldName(0) => true
+case (t: PrimitiveType, _) =>
+  // For legacy 2-level list types with primitive element type, 
e.g.:
+  //
+  //// List (nullable list, non-null elements)
+  //optional group my_list (LIST) {
+  //  repeated int32 element;
+  //}
+  true
+
+case (t: GroupType, _) if t.getFieldCount > 1 =>
+  // For legacy 2-level list types whose element type is a group 
type with 2 or more fields,
+  // e.g.:
+  //
+  //// List<Tuple<String, Integer>> (nullable list, non-null 
elements)
+  //optional group my_list (LIST) {
+  //  repeated group element {
+  //required binary str (UTF8);
+  //required int32 num;
+  //  };
+  //}
+  true
+
+case (t: GroupType, _) if t.getFieldCount == 1 && t.getName == 
"array" =>
+  // For Parquet data generated by parquet-thrift, e.g.:
+  //
+  //// List<OneTuple> (nullable list, non-null 
elements)
+  //optional group my_list (LIST) {
+  //  repeated group my_list_tuple {
+  //required binary str (UTF8);
+  //  };
+  //}
+  true
+
+case (t: GroupType, _) if t.getFieldCount == 1 && t.getName == 
parent + "_tuple" =>
+  // For Parquet data generated by parquet-thrift, e.g.:
+  //
+  //// List<OneTuple> (nullable list, non-null 
elements)
+  //optional group my_list (LIST) {
+  //  repeated group my_list_tuple {
+  //required binary str (UTF8);
+  //  };
+  //}
+  true
+
+case (t: GroupType, _) if isStandardListLayout(t) =>
+  // For standard 3-level list types, e.g.:
+  //
+  //// List (list nullable, elements non-null)
+  //optional group my_list (LIST) {
+  //  repeated group list {
+  //required binary element (UTF8);
+  //  }
+  //}
+  //
+  // This case branch must appear before the next one. See 
comments of the next case branch
+  // for details.
+  false
--- End diff --

This case branch is essential for the bug fix. Basically, it matches the 
standard 3-level layout first before trying to match the legacy 2-level layout, 
so that the "element" syntactic group in Parquet LIST won't be mistaken for the 
"element" field in the nested struct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14013: [SPARK-16344][SQL] Decoding Parquet array of stru...

2016-07-01 Thread liancheng
GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/14013

[SPARK-16344][SQL] Decoding Parquet array of struct with a single field 
named "element"

## What changes were proposed in this pull request?

Please refer to [SPARK-16344][1] for details about this issue.

## How was this patch tested?

New test case added in `ParquetQuerySuite`.

[1]: https://issues.apache.org/jira/browse/SPARK-16344

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark 
spark-16344-parquet-schema-corner-case

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14013.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14013


commit 9620b48d463ed2f2a8ede7397420050dc1e7d832
Author: Cheng Lian <l...@databricks.com>
Date:   2016-07-01T10:52:29Z

Fixes SPARK-16344




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14006: [SPARK-13015][MLlib][DOC] Replace example code in...

2016-06-30 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14006#discussion_r69242197
  
--- Diff: docs/_plugins/include_example.rb ---
@@ -85,20 +85,20 @@ def select_lines(code)
 .select { |l, i| l.include? "$example off#{@snippet_label}$" }
 .map { |l, i| i }
 
-  raise "Start indices amount is not equal to end indices amount, see 
#{@file}." \
+  raise "Start indices amount is not equal to end indices amount, see 
#{@file}, #{@snippet_label}." \
--- End diff --

I'd prefer

```
... see #{@file} [labeled=#{@snippet_label}].
```

Otherwise the label itself might be mistaken for a file path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13974: [SPARK-16296][SQL] add null check for key when cr...

2016-06-30 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13974#discussion_r69231758
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala
 ---
@@ -19,6 +19,11 @@ package org.apache.spark.sql.catalyst.util
 
 class ArrayBasedMapData(val keyArray: ArrayData, val valueArray: 
ArrayData) extends MapData {
   require(keyArray.numElements() == valueArray.numElements())
+  for (i <- 0 until keyArray.numElements()) {
--- End diff --

Let's use a `while`-loop to replace `for` since Scala `for` is super slow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-30 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13558
  
LGTM, merging to master and branch-2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer

2016-06-30 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13906
  
LGTM except for those comments @cloud-fan brought up. Thanks for working on 
this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-30 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r69140767
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlanSuite.scala
 ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+
+class CollapseEmptyPlanSuite extends PlanTest {
+  object Optimize extends RuleExecutor[LogicalPlan] {
+val batches =
+  Batch("CollapseEmptyPlan", Once,
+CombineUnions,
+ReplaceDistinctWithAggregate,
+ReplaceExceptWithAntiJoin,
+ReplaceIntersectWithSemiJoin,
+PushDownPredicate,
+PruneFilters,
+CollapseEmptyPlan) :: Nil
+  }
+
+  object OptimizeWithoutCollapseEmptyPlan extends 
RuleExecutor[LogicalPlan] {
+val batches =
+  Batch("CollapseEmptyPlan", Once,
+CombineUnions,
+ReplaceDistinctWithAggregate,
+ReplaceExceptWithAntiJoin,
+ReplaceIntersectWithSemiJoin,
+PushDownPredicate,
+PruneFilters) :: Nil
+  }
+
+  val testRelation1 = LocalRelation.fromExternalRows(Seq('a.int), data = 
Seq(Row(1)))
+  val testRelation2 = LocalRelation.fromExternalRows(Seq('b.int), data = 
Seq(Row(1)))
+  val testRelation3 = LocalRelation.fromExternalRows(Seq('c.int), data = 
Seq(Row(1)))
+
+  test("Binary Logical Plans - Collapse empty union") {
+val query = testRelation1
+  .where(false)
+  .union(testRelation2.where(false))
+
+val optimized = Optimize.execute(query.analyze)
+val correctAnswer = LocalRelation('a.int)
+
+comparePlans(optimized, correctAnswer)
+  }
+
+  test("Binary Logical Plans - Collapse joins") {
--- End diff --

Well, I guess the verbosity is necessary since it guarantees that we don't 
do wrong empty relation propagation for certain types of joins (e.g. outer join 
with only one empty child).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-30 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r69140206
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala
 ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.DeclarativeAggregate
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+/**
+ * Collapse plans consisting empty local relations generated by 
[[PruneFilters]].
+ * 1. Binary(or Higher)-node Logical Plans
+ *- Union with all empty children.
+ *- Join with one or two empty children (including Intersect/Except).
+ * 2. Unary-node Logical Plans
+ *- Project/Filter/Sample/Join/Limit/Repartition with all empty 
children.
+ *- Aggregate with all empty children and without DeclarativeAggregate 
expressions like COUNT.
+ *- Generate(Explode) with all empty children. Others like Hive UDTF 
may return results.
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean = plan 
match {
+case p: LocalRelation => p.data.isEmpty
+case _ => false
+  }
+
+  private def isDeclarativeAggregate(e: Expression): Boolean = e match {
+case _: DeclarativeAggregate => true
+case _: LeafExpression => false
+case other => other.children.forall(isDeclarativeAggregate)
+  }
+
+  private def empty(plan: LogicalPlan) = LocalRelation(plan.output, data = 
Seq.empty)
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case p: Union if p.children.nonEmpty && 
p.children.forall(isEmptyLocalRelation) =>
+  empty(p)
+
+case p @ Join(_, _, joinType, _) if 
p.children.exists(isEmptyLocalRelation) => joinType match {
+  case Inner => empty(p)
+  case LeftOuter | LeftSemi | LeftAnti if isEmptyLocalRelation(p.left) 
=> empty(p)
+  case RightOuter if isEmptyLocalRelation(p.right) => empty(p)
+  case FullOuter if p.children.forall(isEmptyLocalRelation) => empty(p)
+  case _ => p
+}
--- End diff --

Sorry that I didn't notice the comment above. I think that should be enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13994: [BUILD] Fix version in poms related to kafka-0-10

2016-06-30 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13994
  
Merging to branch-2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-30 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r69111075
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlanSuite.scala
 ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+
+class CollapseEmptyPlanSuite extends PlanTest {
+  object Optimize extends RuleExecutor[LogicalPlan] {
+val batches =
+  Batch("CollapseEmptyPlan", Once,
+CombineUnions,
+ReplaceDistinctWithAggregate,
+ReplaceExceptWithAntiJoin,
+ReplaceIntersectWithSemiJoin,
+PushDownPredicate,
+PruneFilters,
+CollapseEmptyPlan) :: Nil
+  }
+
+  object OptimizeWithoutCollapseEmptyPlan extends 
RuleExecutor[LogicalPlan] {
+val batches =
+  Batch("CollapseEmptyPlan", Once,
+CombineUnions,
+ReplaceDistinctWithAggregate,
+ReplaceExceptWithAntiJoin,
+ReplaceIntersectWithSemiJoin,
+PushDownPredicate,
+PruneFilters) :: Nil
+  }
+
+  val testRelation1 = LocalRelation.fromExternalRows(Seq('a.int), data = 
Seq(Row(1)))
+  val testRelation2 = LocalRelation.fromExternalRows(Seq('b.int), data = 
Seq(Row(1)))
+  val testRelation3 = LocalRelation.fromExternalRows(Seq('c.int), data = 
Seq(Row(1)))
+
+  test("Binary(or Higher) Logical Plans - Collapse empty union") {
--- End diff --

Remove "(or Higher)".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-30 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r69110745
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala
 ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.DeclarativeAggregate
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+/**
+ * Collapse plans consisting empty local relations generated by 
[[PruneFilters]].
+ * 1. Binary(or Higher)-node Logical Plans
+ *- Union with all empty children.
+ *- Join with one or two empty children (including Intersect/Except).
+ * 2. Unary-node Logical Plans
+ *- Project/Filter/Sample/Join/Limit/Repartition with all empty 
children.
+ *- Aggregate with all empty children and without DeclarativeAggregate 
expressions like COUNT.
+ *- Generate(Explode) with all empty children. Others like Hive UDTF 
may return results.
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean = plan 
match {
+case p: LocalRelation => p.data.isEmpty
+case _ => false
+  }
+
+  private def isDeclarativeAggregate(e: Expression): Boolean = e match {
+case _: DeclarativeAggregate => true
+case _: LeafExpression => false
+case other => other.children.forall(isDeclarativeAggregate)
+  }
+
+  private def empty(plan: LogicalPlan) = LocalRelation(plan.output, data = 
Seq.empty)
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case p: Union if p.children.nonEmpty && 
p.children.forall(isEmptyLocalRelation) =>
+  empty(p)
+
+case p @ Join(_, _, joinType, _) if 
p.children.exists(isEmptyLocalRelation) => joinType match {
+  case Inner => empty(p)
+  case LeftOuter | LeftSemi | LeftAnti if isEmptyLocalRelation(p.left) 
=> empty(p)
+  case RightOuter if isEmptyLocalRelation(p.right) => empty(p)
+  case FullOuter if p.children.forall(isEmptyLocalRelation) => empty(p)
+  case _ => p
+}
+
+case p: UnaryNode if p.children.nonEmpty && 
p.children.forall(isEmptyLocalRelation) => p match {
+  case _: Project | _: Filter | _: Sample | _: Sort | _: GlobalLimit | 
_: LocalLimit |
+   _: Repartition | _: RepartitionByExpression => empty(p)
+  case Aggregate(_, ae, _) if !ae.exists(isDeclarativeAggregate) => 
empty(p)
+  case Generate(_ : Explode, _, _, _, _, _) => empty(p)
--- End diff --

Let's add comment here to explain why we must special case `Aggregate` and 
`Generate`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-30 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r69109823
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala
 ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.DeclarativeAggregate
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+/**
+ * Collapse plans consisting empty local relations generated by 
[[PruneFilters]].
+ * 1. Binary(or Higher)-node Logical Plans
+ *- Union with all empty children.
+ *- Join with one or two empty children (including Intersect/Except).
+ * 2. Unary-node Logical Plans
+ *- Project/Filter/Sample/Join/Limit/Repartition with all empty 
children.
+ *- Aggregate with all empty children and without DeclarativeAggregate 
expressions like COUNT.
+ *- Generate(Explode) with all empty children. Others like Hive UDTF 
may return results.
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean = plan 
match {
+case p: LocalRelation => p.data.isEmpty
+case _ => false
+  }
+
+  private def isDeclarativeAggregate(e: Expression): Boolean = e match {
+case _: DeclarativeAggregate => true
+case _: LeafExpression => false
+case other => other.children.forall(isDeclarativeAggregate)
+  }
+
+  private def empty(plan: LogicalPlan) = LocalRelation(plan.output, data = 
Seq.empty)
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case p: Union if p.children.nonEmpty && 
p.children.forall(isEmptyLocalRelation) =>
+  empty(p)
+
+case p @ Join(_, _, joinType, _) if 
p.children.exists(isEmptyLocalRelation) => joinType match {
+  case Inner => empty(p)
+  case LeftOuter | LeftSemi | LeftAnti if isEmptyLocalRelation(p.left) 
=> empty(p)
+  case RightOuter if isEmptyLocalRelation(p.right) => empty(p)
+  case FullOuter if p.children.forall(isEmptyLocalRelation) => empty(p)
+  case _ => p
+}
--- End diff --

Could you please comment that `Intersect` is also covered here? I didn't 
realized that we've already translated `Intersect` using joins at first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-30 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r69109611
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala
 ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.DeclarativeAggregate
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+/**
+ * Collapse plans consisting empty local relations generated by 
[[PruneFilters]].
+ * 1. Binary(or Higher)-node Logical Plans
+ *- Union with all empty children.
+ *- Join with one or two empty children (including Intersect/Except).
+ * 2. Unary-node Logical Plans
+ *- Project/Filter/Sample/Join/Limit/Repartition with all empty 
children.
+ *- Aggregate with all empty children and without DeclarativeAggregate 
expressions like COUNT.
+ *- Generate(Explode) with all empty children. Others like Hive UDTF 
may return results.
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean = plan 
match {
+case p: LocalRelation => p.data.isEmpty
+case _ => false
+  }
+
+  private def isDeclarativeAggregate(e: Expression): Boolean = e match {
+case _: DeclarativeAggregate => true
+case _: LeafExpression => false
+case other => other.children.forall(isDeclarativeAggregate)
+  }
--- End diff --

I'd replace this method with:

```scala
def containsAggregateExpression(e: Expression): Boolean = {
  e.collectFirst { case _: AggregateFunction => () }.isDefined
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-30 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r69109340
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala
 ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.DeclarativeAggregate
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+/**
+ * Collapse plans consisting empty local relations generated by 
[[PruneFilters]].
+ * 1. Binary(or Higher)-node Logical Plans
+ *- Union with all empty children.
+ *- Join with one or two empty children (including Intersect/Except).
+ * 2. Unary-node Logical Plans
+ *- Project/Filter/Sample/Join/Limit/Repartition with all empty 
children.
+ *- Aggregate with all empty children and without DeclarativeAggregate 
expressions like COUNT.
+ *- Generate(Explode) with all empty children. Others like Hive UDTF 
may return results.
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean = plan 
match {
+case p: LocalRelation => p.data.isEmpty
+case _ => false
+  }
+
+  private def isDeclarativeAggregate(e: Expression): Boolean = e match {
+case _: DeclarativeAggregate => true
--- End diff --

Why `DeclarativeAggregate` rather than `AggregateFunction`? 
`AggregateFunction` also covers `ImperativeAggregate` like `ScalaUDAF`, which 
should also be covered here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13992: [SPARK-12177][TEST] Removed test to avoid compilation is...

2016-06-30 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13992
  
Merging to master and branch-2.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-30 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13989
  
In general, I think reconstructing a DataFrame/Dataset or using `REFRESH 
TABLE` may be a better approach to solve the problem this PR tries to solve. 
Did I missed some context here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-30 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13989
  
One concern of mine is that, analyzed plan, optimized plan, and executed 
(physical) plan stored in `QueryExecution` are all lazy vals, which means that 
they won't be re-optimized/planned accordingly after refreshing metadata of the 
corresponding logical plan.

Say we constructed a DataFrame `df` to join a small table `A` and a large 
table `B`. After calling `df.write.parquet(...)`, analyzed, optimized, and 
executed plans of `df` are all computed. Since `A` is small, the planner may 
decide to broadcast it, and this decision is reflected in the physical plan.

Next, we add a bunch of files into the directory where table `A` lives to 
make it super large, then call `df.refresh()` to refresh the logical plan. Now, 
if we try to call `df.write.parquet(...)` again, the query may probably crash 
since the physical plan is not refreshed and still thinks that `A` should be 
broadcasted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-30 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69090469
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2307,6 +2307,19 @@ class Dataset[T] private[sql](
   def distinct(): Dataset[T] = dropDuplicates()
 
   /**
+   * Refreshes the metadata and data cached in Spark for data associated 
with this Dataset.
+   * An example use case is to invalidate the file system metadata cached 
by Spark, when the
+   * underlying files have been updated by an external process.
+   *
+   * @group action
+   * @since 2.0.0
+   */
+  def refresh(): Unit = {
+unpersist(false)
--- End diff --

Actually we can and should call `unpersist`, but we should also call 
`persist()`/`cache()` again so that the Dataset will be cached lazily again 
with correct data when it gets executed next time. I guess that's also what 
@gatorsmile meant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-30 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r69089371
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+/**
+ * Collapse plans consisting empty local relations generated by 
[[PruneFilters]].
+ * 1. InnerJoin with one or two empty children.
+ * 2. Project/Generate/Filter/Sample/Join/Limit/Union/Repartition with all 
empty children.
+ * 3. Aggregate with all empty children and grpExprs containing all 
aggExprs.
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean =
+plan.isInstanceOf[LocalRelation] && 
plan.asInstanceOf[LocalRelation].data.isEmpty
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case p @ Join(_, _, Inner, _) if 
p.children.exists(isEmptyLocalRelation) =>
+  LocalRelation(p.output, data = Seq.empty)
+
+case p: LogicalPlan if p.children.nonEmpty && 
p.children.forall(isEmptyLocalRelation) =>
+  p match {
+case _: Project | _: Generate | _: Filter | _: Sample | _: Join |
+ _: Sort | _: GlobalLimit | _: LocalLimit | _: Union | _: 
Repartition =>
+  LocalRelation(p.output, data = Seq.empty)
+case Aggregate(ge, ae, _) if ae.forall(ge.contains(_)) =>
--- End diff --

Yea. The following predicate should work:

```scala
ae.forall(_.collectFirst { case _: AggregateExpression => () }.isEmpty)
```

(But probably put it into a separate method though.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13992: [SPARK-12177][TEST] Removed test to avoid compilation is...

2016-06-30 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13992
  
LGTM pending Jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-30 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69083486
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -265,6 +265,11 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   s"Reference '$name' is ambiguous, could be: $referenceNames.")
 }
   }
+
+  /**
+   * Invalidates any metadata cached in the plan recursively.
+   */
+  def refresh(): Unit = children.foreach(_.refresh())
--- End diff --

`children.foreach(_.refresh())` isn't tail recursive. Using a non-tail 
recursion here is probably fine since this isn't a critical path. As 
@petermaxlee said, we are already using recursion to handle `TreeNode`s 
everywhere with the assumption that such trees are relatively shallow. We do 
need to pay attention when using recursion in other places though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-30 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69081636
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -265,6 +265,11 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   s"Reference '$name' is ambiguous, could be: $referenceNames.")
 }
   }
+
+  /**
+   * Invalidates any metadata cached in the plan recursively.
--- End diff --

"Refreshes" instead of "Invalidates"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-30 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13972
  
Other example snippets in the SQL programming guide will be updated in 
follow-up PRs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-30 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13972
  
Thanks for the review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer

2016-06-29 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13906
  
@cloud-fan Yea, that's a good point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-29 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r69065541
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+/**
+ * Collapse plans consisting empty local relations generated by 
[[PruneFilters]].
+ * 1. InnerJoin with one or two empty children.
+ * 2. Project/Generate/Filter/Sample/Join/Limit/Union/Repartition with all 
empty children.
+ * 3. Aggregate with all empty children and grpExprs containing all 
aggExprs.
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean =
+plan.isInstanceOf[LocalRelation] && 
plan.asInstanceOf[LocalRelation].data.isEmpty
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case p @ Join(_, _, Inner, _) if 
p.children.exists(isEmptyLocalRelation) =>
--- End diff --

Yea, we can also add `Intersect`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-29 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r69065425
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+/**
+ * Collapse plans consisting empty local relations generated by 
[[PruneFilters]].
+ * 1. InnerJoin with one or two empty children.
+ * 2. Project/Generate/Filter/Sample/Join/Limit/Union/Repartition with all 
empty children.
+ * 3. Aggregate with all empty children and grpExprs containing all 
aggExprs.
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean =
+plan.isInstanceOf[LocalRelation] && 
plan.asInstanceOf[LocalRelation].data.isEmpty
--- End diff --

```scala
plan match {
  case p: LocalRelation => p.data.isEmpty
  case _ => false
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer

2016-06-29 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13906
  
My feeling is that, this optimization rule is mostly useful for binary plan 
nodes like inner join and intersection, where we can avoid scanning output of 
the non-empty side.

On the other hand, for unary plan nodes, firstly it doesn't bring much 
performance benefits, especially when whole stage codegen is enabled; secondly 
there are non-obvious and tricky corner cases, like `Aggregate` and `Generate`.

That said, although this patch is not a big one, it does introduce 
non-trivial complexities. For example, I didn't immediately realize that why 
`Aggregate` must be special cased at first (`COUNT(x)` may return 0 for empty 
input). The `Generate` case is even trickier.

So my suggestion is to only implement this rule for inner join and 
intersection, which are much simpler to handle. what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-29 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r69064054
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+/**
+ * Collapse plans consisting empty local relations generated by 
[[PruneFilters]].
+ * 1. InnerJoin with one or two empty children.
+ * 2. Project/Generate/Filter/Sample/Join/Limit/Union/Repartition with all 
empty children.
+ * 3. Aggregate with all empty children and grpExprs containing all 
aggExprs.
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean =
+plan.isInstanceOf[LocalRelation] && 
plan.asInstanceOf[LocalRelation].data.isEmpty
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case p @ Join(_, _, Inner, _) if 
p.children.exists(isEmptyLocalRelation) =>
+  LocalRelation(p.output, data = Seq.empty)
+
+case p: LogicalPlan if p.children.nonEmpty && 
p.children.forall(isEmptyLocalRelation) =>
+  p match {
+case _: Project | _: Generate | _: Filter | _: Sample | _: Join |
--- End diff --

Actually `Generate` can't be included here. Our `Generate` also support 
Hive style UDTF, which has a weird semantics: for a UDTF `f`, after all rows 
being processed, `f.close()` will be called, and *more rows can be generated* 
within `f.close()`. This means a UDTF may generate one or more rows even if the 
underlying input is empty.

See [here][1] and PR #5338 for more details.

[1]: https://github.com/apache/spark/pull/5383/files


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13972
  
@yinxusen Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13972
  
@yinxusen @mengxr Actually I found overlapped labelling is far more easier 
than I expected earlier... So did it in the last commit.

Made the following experiment to illustrate the effect:

```python
# $SPARK_HOME/examples/src/main/python/test.py

# $example on:foo$
import used.by.foo
# $example on:bar$
import common.stuff
# $example off:foo$
import used.by.bar
# $example off:bar$
```

Liquid template:

```
{% include_example foo python/test.py %}

{% include_example bar python/test.py %}
```

Screenshot:

https://cloud.githubusercontent.com/assets/230655/16474910/1490d872-3ead-11e6-9225-6cc35b55f365.png;>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13972
  
@yinxusen Thanks for the review! Also discussed with @mengxr. IIUC, 
overlapped labels is most useful for handling imports, since sometimes we may 
want to include one import line in multiple example snippet blocks. Regarding 
to this issue, I have two concerns:

1. It's a rare case, and we possibly can workaround this case while 
authoring documentations. Thus I guess we can do it in a follow-up PR.
2. Instead of overlapped labels, I'd prefer a multi-label approach:

   ```scala
   // $example on:A$
   import org.examples.A
   // $example off:A$
   // $example on:A:B$
   import org.examples.common._
   // $example off:A:B$
   // $example on:B$
   import org.examples.B
   // $example off:B$
   ```

   This approach achieves the same goal, and is much simpler to implement 
than handling overlapped scoping.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13846: [SPARK-16134][SQL] optimizer rules for typed filter

2016-06-29 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13846
  
Reverted the commit on branch-2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13846: [SPARK-16134][SQL] optimizer rules for typed filter

2016-06-29 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13846
  
LGTM, merged to master.

(Also merged to branch-2.0 by mistake, will revert it ASAP. Sorry for the 
trouble.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13972
  
@yinxusen Could you please help review this one since you're the original 
author of this plugin?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13972
  
Also cc @yhuai and @rxin.

The background here is that I'm going to extract snippets from actual 
Scala/Java/Python/R source files rather than hard-code them in the SQL 
programming guide, so that they are easier to maintain.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13972
  
@mengxr Could you please help review this PR? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13972: [SPARK-16294][SQL] Labelling support for the incl...

2016-06-29 Thread liancheng
GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/13972

[SPARK-16294][SQL] Labelling support for the include_example Jekyll plugin

## What changes were proposed in this pull request?

This PR adds labelling support for the `include_example` Jekyll plugin, so 
that we may split a single source file into multiple line blocks with different 
labels, and include them in multiple code snippets in the generated HTML page.

## How was this patch tested?

Manually tested.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark include-example-with-labels

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13972.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13972


commit 71307508928a0e0706c1665df9951f909bf7b214
Author: Cheng Lian <l...@databricks.com>
Date:   2016-06-29T10:37:32Z

Add labelling support for include_example Jekyll plugin

commit 93f33986e2d6c31200c53f4887893a275948a65a
Author: Cheng Lian <l...@databricks.com>
Date:   2016-06-29T11:39:30Z

Update SQL programming guide and example code to illustrate the new 
labelling feature




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13968: [SPARK-16291][SQL] CheckAnalysis should capture nested a...

2016-06-29 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13968
  
cc @yhuai @cloud-fan @clockfly 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13893: [SPARK-14172][SQL] Hive table partition predicate not pa...

2016-06-29 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13893
  
@jiangxb1987 Please feel free to create a new JIRA ticket and PR for this, 
thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13968: [SPARK-16291][SQL] CheckAnalysis should capture n...

2016-06-29 Thread liancheng
GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/13968

[SPARK-16291][SQL] CheckAnalysis should capture nested aggregate functions 
that reference no input attributes

## What changes were proposed in this pull request?

`MAX(COUNT(*))` is invalid since aggregate expression can't be nested 
within another aggregate expression. This case should be captured at analysis 
phase, but somehow sneaks off to runtime.

The reason is that when checking aggregate expressions in `CheckAnalysis`, 
a checking branch treats all expressions that reference no input attributes as 
valid ones. However, `MAX(COUNT(*))` is translated into `MAX(COUNT(1))` at 
analysis phase and also references no input attribute.

This PR fixes this issue by removing the aforementioned branch.

## How was this patch tested?

New test case added in `AnalysisErrorSuite`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark 
spark-16291-nested-agg-functions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13968.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13968


commit 18e74855968fe404dea86a49e03c689c01b1b737
Author: Cheng Lian <l...@databricks.com>
Date:   2016-06-29T08:00:20Z

CheckAnalysis should capture nested aggregate functions that reference no 
input attributes.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13835: [SPARK-16100][SQL] fix bug when use Map as the buffer ty...

2016-06-28 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13835
  
LGTM, merging to master and branch-2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13933: [SPARK-16236] [SQL] Add Path Option back to Load API in ...

2016-06-28 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13933
  
LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13720: [SPARK-16004] [SQL] Correctly display "Last Acces...

2016-06-28 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13720#discussion_r68855824
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -522,7 +523,7 @@ case class DescribeTableCommand(table: TableIdentifier, 
isExtended: Boolean, isF
 
   private def describeSchema(schema: Seq[CatalogColumn], buffer: 
ArrayBuffer[Row]): Unit = {
 schema.foreach { column =>
-  append(buffer, column.name, column.dataType.toLowerCase, 
column.comment.orNull)
+  append(buffer, column.name, column.dataType.toLowerCase, 
column.comment.getOrElse(""))
--- End diff --

Agree, a column without any comment (null) is different from a column with 
a comment that is an empty string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13720: [SPARK-16004] [SQL] Correctly display "Last Acces...

2016-06-28 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13720#discussion_r68855663
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -180,7 +180,8 @@ case class CatalogTable(
   Seq(s"Table: ${identifier.quotedString}",
 if (owner.nonEmpty) s"Owner: $owner" else "",
 s"Created: ${new Date(createTime).toString}",
-s"Last Access: ${new Date(lastAccessTime).toString}",
+"Last Access: " +
+  (if (lastAccessTime == -1) "UNKNOWN" else new 
Date(lastAccessTime).toString),
--- End diff --

Empty string looks fine to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13846: [SPARK-16134][SQL] optimizer rules for typed filt...

2016-06-28 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13846#discussion_r68753881
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TypedFilterOptimizationSuite.scala
 ---
@@ -23,54 +23,111 @@ import 
org.apache.spark.sql.catalyst.analysis.UnresolvedDeserializer
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.dsl.plans._
 import org.apache.spark.sql.catalyst.encoders.{encoderFor, 
ExpressionEncoder}
+import org.apache.spark.sql.catalyst.expressions.{BoundReference, 
ReferenceToExpressions}
 import org.apache.spark.sql.catalyst.plans.PlanTest
 import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, 
LogicalPlan}
 import org.apache.spark.sql.catalyst.rules.RuleExecutor
-import org.apache.spark.sql.types.BooleanType
+import org.apache.spark.sql.types.{BooleanType, ObjectType}
 
 class TypedFilterOptimizationSuite extends PlanTest {
   object Optimize extends RuleExecutor[LogicalPlan] {
 val batches =
   Batch("EliminateSerialization", FixedPoint(50),
 EliminateSerialization) ::
-  Batch("EmbedSerializerInFilter", FixedPoint(50),
-EmbedSerializerInFilter) :: Nil
+  Batch("CombineTypedFilters", FixedPoint(50),
+CombineTypedFilters) :: Nil
   }
 
   implicit private def productEncoder[T <: Product : TypeTag] = 
ExpressionEncoder[T]()
 
-  test("back to back filter") {
+  test("filter after serialize") {
 val input = LocalRelation('_1.int, '_2.int)
-val f1 = (i: (Int, Int)) => i._1 > 0
-val f2 = (i: (Int, Int)) => i._2 > 0
+val f = (i: (Int, Int)) => i._1 > 0
 
-val query = input.filter(f1).filter(f2).analyze
+val query = input
+  .deserialize[(Int, Int)]
+  .serialize[(Int, Int)]
+  .filter(f).analyze
 
 val optimized = Optimize.execute(query)
 
-val expected = input.deserialize[(Int, Int)]
-  .where(callFunction(f1, BooleanType, 'obj))
-  .select('obj.as("obj"))
-  .where(callFunction(f2, BooleanType, 'obj))
+val expected = input
+  .deserialize[(Int, Int)]
+  .where(callFunction(f, BooleanType, 'obj))
   .serialize[(Int, Int)].analyze
 
 comparePlans(optimized, expected)
   }
 
-  // TODO: Remove this after we completely fix SPARK-15632 by adding 
optimization rules
-  // for typed filters.
-  ignore("embed deserializer in typed filter condition if there is only 
one filter") {
+  test("filter after serialize with object change") {
+val input = LocalRelation('_1.int, '_2.int)
+val f = (i: OtherTuple) => i._1 > 0
+
+val query = input
+  .deserialize[(Int, Int)]
+  .serialize[(Int, Int)]
+  .filter(f).analyze
+val optimized = Optimize.execute(query)
+comparePlans(optimized, query)
+  }
+
+  test("filter before deserialize") {
 val input = LocalRelation('_1.int, '_2.int)
 val f = (i: (Int, Int)) => i._1 > 0
 
-val query = input.filter(f).analyze
+val query = input
+  .filter(f)
+  .deserialize[(Int, Int)]
+  .serialize[(Int, Int)].analyze
+
+val optimized = Optimize.execute(query)
+
+val expected = input
+  .deserialize[(Int, Int)]
+  .where(callFunction(f, BooleanType, 'obj))
+  .serialize[(Int, Int)].analyze
+
+comparePlans(optimized, expected)
+  }
+
+  test("filter before deserialize with object change") {
+val input = LocalRelation('_1.int, '_2.int)
+val f = (i: OtherTuple) => i._1 > 0
+
+val query = input
+  .filter(f)
+  .deserialize[(Int, Int)]
+  .serialize[(Int, Int)].analyze
+val optimized = Optimize.execute(query)
+comparePlans(optimized, query)
+  }
+
+  test("back to back filter") {
--- End diff --

Nit: "back to back filters with the same object type"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13846: [SPARK-16134][SQL] optimizer rules for typed filt...

2016-06-28 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13846#discussion_r68753763
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TypedFilterOptimizationSuite.scala
 ---
@@ -23,54 +23,111 @@ import 
org.apache.spark.sql.catalyst.analysis.UnresolvedDeserializer
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.dsl.plans._
 import org.apache.spark.sql.catalyst.encoders.{encoderFor, 
ExpressionEncoder}
+import org.apache.spark.sql.catalyst.expressions.{BoundReference, 
ReferenceToExpressions}
 import org.apache.spark.sql.catalyst.plans.PlanTest
 import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, 
LogicalPlan}
 import org.apache.spark.sql.catalyst.rules.RuleExecutor
-import org.apache.spark.sql.types.BooleanType
+import org.apache.spark.sql.types.{BooleanType, ObjectType}
 
 class TypedFilterOptimizationSuite extends PlanTest {
   object Optimize extends RuleExecutor[LogicalPlan] {
 val batches =
   Batch("EliminateSerialization", FixedPoint(50),
 EliminateSerialization) ::
-  Batch("EmbedSerializerInFilter", FixedPoint(50),
-EmbedSerializerInFilter) :: Nil
+  Batch("CombineTypedFilters", FixedPoint(50),
+CombineTypedFilters) :: Nil
   }
 
   implicit private def productEncoder[T <: Product : TypeTag] = 
ExpressionEncoder[T]()
 
-  test("back to back filter") {
+  test("filter after serialize") {
 val input = LocalRelation('_1.int, '_2.int)
-val f1 = (i: (Int, Int)) => i._1 > 0
-val f2 = (i: (Int, Int)) => i._2 > 0
+val f = (i: (Int, Int)) => i._1 > 0
 
-val query = input.filter(f1).filter(f2).analyze
+val query = input
+  .deserialize[(Int, Int)]
+  .serialize[(Int, Int)]
+  .filter(f).analyze
 
 val optimized = Optimize.execute(query)
 
-val expected = input.deserialize[(Int, Int)]
-  .where(callFunction(f1, BooleanType, 'obj))
-  .select('obj.as("obj"))
-  .where(callFunction(f2, BooleanType, 'obj))
+val expected = input
+  .deserialize[(Int, Int)]
+  .where(callFunction(f, BooleanType, 'obj))
   .serialize[(Int, Int)].analyze
 
 comparePlans(optimized, expected)
   }
 
-  // TODO: Remove this after we completely fix SPARK-15632 by adding 
optimization rules
-  // for typed filters.
-  ignore("embed deserializer in typed filter condition if there is only 
one filter") {
+  test("filter after serialize with object change") {
+val input = LocalRelation('_1.int, '_2.int)
+val f = (i: OtherTuple) => i._1 > 0
+
+val query = input
+  .deserialize[(Int, Int)]
+  .serialize[(Int, Int)]
+  .filter(f).analyze
+val optimized = Optimize.execute(query)
+comparePlans(optimized, query)
+  }
+
+  test("filter before deserialize") {
 val input = LocalRelation('_1.int, '_2.int)
 val f = (i: (Int, Int)) => i._1 > 0
 
-val query = input.filter(f).analyze
+val query = input
+  .filter(f)
+  .deserialize[(Int, Int)]
+  .serialize[(Int, Int)].analyze
+
+val optimized = Optimize.execute(query)
+
+val expected = input
+  .deserialize[(Int, Int)]
+  .where(callFunction(f, BooleanType, 'obj))
+  .serialize[(Int, Int)].analyze
+
+comparePlans(optimized, expected)
+  }
+
+  test("filter before deserialize with object change") {
+val input = LocalRelation('_1.int, '_2.int)
+val f = (i: OtherTuple) => i._1 > 0
+
+val query = input
+  .filter(f)
+  .deserialize[(Int, Int)]
+  .serialize[(Int, Int)].analyze
+val optimized = Optimize.execute(query)
+comparePlans(optimized, query)
+  }
+
+  test("back to back filter") {
+val input = LocalRelation('_1.int, '_2.int)
+val f1 = (i: (Int, Int)) => i._1 > 0
+val f2 = (i: (Int, Int)) => i._2 > 0
+
+val query = input.filter(f1).filter(f2).analyze
 
 val optimized = Optimize.execute(query)
 
 val deserializer = UnresolvedDeserializer(encoderFor[(Int, 
Int)].deserializer)
-val condition = callFunction(f, BooleanType, deserializer)
-val expected = input.where(condition).select('_1.as("_1"), 
'_2.as("_2")).analyze
+val boundReference = BoundReference(0, ObjectType(classOf[(Int, 
Int)]), nullable = false)
+val callFunc1 = callFunction(f1, BooleanType, boundReference)
+

[GitHub] spark pull request #13846: [SPARK-16134][SQL] optimizer rules for typed filt...

2016-06-28 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13846#discussion_r68751774
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
 ---
@@ -167,6 +169,43 @@ case class MapElements(
 outputObjAttr: Attribute,
 child: LogicalPlan) extends ObjectConsumer with ObjectProducer
 
+object TypedFilter {
+  def apply[T : Encoder](func: AnyRef, child: LogicalPlan): TypedFilter = {
+TypedFilter(func, UnresolvedDeserializer(encoderFor[T].deserializer), 
child)
+  }
+}
+
+/**
+ * A relation produced by applying `func` to each element of the `child` 
and filter them by the
+ * resulting boolean value.
+ *
+ * This is logically equal to a normal [[Filter]] operator whose condition 
expression is decoding
+ * the input row to object and apply the given function with decoded 
object. However we need the
+ * encapsulation of [[TypedFilter]] to make the concept more clear and 
make it easier to write
+ * optimizer rules.
+ */
+case class TypedFilter(
+func: AnyRef,
+deserializer: Expression,
+child: LogicalPlan) extends UnaryNode {
+
+  override def output: Seq[Attribute] = child.output
+
+  def withObject(obj: LogicalPlan): Filter = {
--- End diff --

How about renaming this method to `withObjectProducerChild`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13846: [SPARK-16134][SQL] optimizer rules for typed filt...

2016-06-28 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13846#discussion_r68751775
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
 ---
@@ -167,6 +169,43 @@ case class MapElements(
 outputObjAttr: Attribute,
 child: LogicalPlan) extends ObjectConsumer with ObjectProducer
 
+object TypedFilter {
+  def apply[T : Encoder](func: AnyRef, child: LogicalPlan): TypedFilter = {
+TypedFilter(func, UnresolvedDeserializer(encoderFor[T].deserializer), 
child)
+  }
+}
+
+/**
+ * A relation produced by applying `func` to each element of the `child` 
and filter them by the
+ * resulting boolean value.
+ *
+ * This is logically equal to a normal [[Filter]] operator whose condition 
expression is decoding
+ * the input row to object and apply the given function with decoded 
object. However we need the
+ * encapsulation of [[TypedFilter]] to make the concept more clear and 
make it easier to write
+ * optimizer rules.
+ */
+case class TypedFilter(
+func: AnyRef,
+deserializer: Expression,
+child: LogicalPlan) extends UnaryNode {
+
+  override def output: Seq[Attribute] = child.output
+
+  def withObject(obj: LogicalPlan): Filter = {
+assert(obj.output.length == 1)
+Filter(getCondition(obj.output.head), obj)
+  }
+
+  def getCondition(input: Expression): Expression = {
--- End diff --

How about renaming it to `typedCondition`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13846: [SPARK-16134][SQL] optimizer rules for typed filt...

2016-06-28 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13846#discussion_r68750076
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ReferenceToExpressions.scala
 ---
@@ -45,6 +45,7 @@ case class ReferenceToExpressions(result: Expression, 
children: Seq[Expression])
 var maxOrdinal = -1
 result foreach {
   case b: BoundReference if b.ordinal > maxOrdinal => maxOrdinal = 
b.ordinal
+  case _ =>
--- End diff --

So this is actually a bug fix? Before we can only use a single 
`BoundReference` as `result`, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13846: [SPARK-16134][SQL] optimizer rules for typed filt...

2016-06-28 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13846#discussion_r68750055
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1637,55 +1654,31 @@ case class GetCurrentDatabase(sessionCatalog: 
SessionCatalog) extends Rule[Logic
 }
 
 /**
- * Typed [[Filter]] is by default surrounded by a [[DeserializeToObject]] 
beneath it and a
- * [[SerializeFromObject]] above it.  If these serializations can't be 
eliminated, we should embed
- * the deserializer in filter condition to save the extra serialization at 
last.
+ * Combines all adjacent [[TypedFilter]]s, which operate on same type 
object in condition, into a
+ * single [[Filter]].
  */
-object EmbedSerializerInFilter extends Rule[LogicalPlan] {
+object CombineTypedFilters extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-case s @ SerializeFromObject(_, Filter(condition, d: 
DeserializeToObject))
-  // SPARK-15632: Conceptually, filter operator should never introduce 
schema change. This
-  // optimization rule also relies on this assumption. However, 
Dataset typed filter operator
-  // does introduce schema changes in some cases. Thus, we only enable 
this optimization when
-  //
-  //  1. either input and output schemata are exactly the same, or
-  //  2. both input and output schemata are single-field schema and 
share the same type.
-  //
-  // The 2nd case is included because encoders for primitive types 
always have only a single
-  // field with hard-coded field name "value".
-  // TODO Cleans this up after fixing SPARK-15632.
-  if s.schema == d.child.schema || samePrimitiveType(s.schema, 
d.child.schema) =>
-
-  val numObjects = condition.collect {
-case a: Attribute if a == d.output.head => a
-  }.length
-
-  if (numObjects > 1) {
-// If the filter condition references the object more than one 
times, we should not embed
-// deserializer in it as the deserialization will happen many 
times and slow down the
-// execution.
-// TODO: we can still embed it if we can make sure subexpression 
elimination works here.
-s
+case t @ TypedFilter(_, deserializer, child) =>
+  val filters = collectTypedFiltersOnSameTypeObj(child, 
deserializer.dataType, ArrayBuffer(t))
+  if (filters.length > 1) {
+val objHolder = BoundReference(0, deserializer.dataType, nullable 
= false)
+val condition = filters.map(_.getCondition(objHolder)).reduce(And)
+Filter(ReferenceToExpressions(condition, deserializer :: Nil), 
filters.last.child)
   } else {
-val newCondition = condition transform {
-  case a: Attribute if a == d.output.head => d.deserializer
-}
-val filter = Filter(newCondition, d.child)
-
-// Adds an extra Project here, to preserve the output expr id of 
`SerializeFromObject`.
-// We will remove it later in RemoveAliasOnlyProject rule.
-val objAttrs = filter.output.zip(s.output).map { case (fout, sout) 
=>
-  Alias(fout, fout.name)(exprId = sout.exprId)
-}
-Project(objAttrs, filter)
+t
   }
   }
 
-  def samePrimitiveType(lhs: StructType, rhs: StructType): Boolean = {
-(lhs, rhs) match {
-  case (StructType(Array(f1)), StructType(Array(f2))) => f1.dataType 
== f2.dataType
-  case _ => false
-}
+  @tailrec
+  private def collectTypedFiltersOnSameTypeObj(
+  plan: LogicalPlan,
+  objType: DataType,
+  filters: ArrayBuffer[TypedFilter]): Array[TypedFilter] = plan match {
+case t: TypedFilter if t.deserializer.dataType == objType =>
+  filters += t
--- End diff --

Shall we prepend rather than append found filters here? Otherwise filter 
predicates will be evaluated in reverse order after being combined. Also would 
be nice to comment about this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13918: [SPARK-16221][SQL] Redirect Parquet JUL logger via SLF4J...

2016-06-27 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13918
  
Thanks, merged to master.

@rxin Shall we have this in branch-2.0 at this stage?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13913: [SPARK-10591][SQL][TEST] Add a testcase to ensure if `ch...

2016-06-27 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13913
  
Thanks! Merged to master and branch-2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13918: [SPARK-16221][SQL] Redirect Parquet JUL logger via SLF4J...

2016-06-27 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13918
  
LGTM except for one minor comment. Thanks for fixing this annoying issue!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13918: [SPARK-16221][SQL] Redirect Parquet JUL logger vi...

2016-06-27 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13918#discussion_r68557487
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -129,6 +129,8 @@ private[sql] class ParquetFileFormat
   conf.setBoolean(ParquetOutputFormat.ENABLE_JOB_SUMMARY, false)
 }
 
+ParquetFileFormat.redirectParquetLogs
--- End diff --

Let's add `()` since this method is defined with `()` and is invoked for 
side effect only.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-06-27 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13893#discussion_r68556327
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruningSuite.scala 
---
@@ -141,6 +141,14 @@ class PruningSuite extends HiveComparisonTest with 
BeforeAndAfter {
   Seq("2008-04-08", "11"),
   Seq("2008-04-09", "11")))
 
+  createPruningTest("Partition pruning - with nondeterministic fields",
+"SELECT value, hr FROM srcpart1 WHERE ds = '2008-04-08' AND rand(7) < 
1",
--- End diff --

Good question. I think technically we can't push down any predicates that 
are placed after a non-deterministic predicate. Otherwise number of input rows 
may change and lead to wrong query results.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13889: [SQL][minor] Simplify data source predicate filter trans...

2016-06-24 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13889
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68352349
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("SPARK-13709: reading partitioned Avro table with nested schema") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  val tableName = "spark_13709"
+  val tempTableName = "spark_13709_temp"
+
+  new File(path, tableName).mkdir()
+  new File(path, tempTableName).mkdir()
+
+  val avroSchema =
+"""{
+  |  "name": "test_record",
+  |  "type": "record",
+  |  "fields": [ {
+  |"name": "f0",
+  |"type": "int"
+  |  }, {
+  |"name": "f1",
+  |"type": {
+  |  "type": "record",
+  |  "name": "inner",
+  |  "fields": [ {
+  |"name": "f10",
+  |"type": "int"
+  |  }, {
+  |"name": "f11",
+  |"type": "double"
+  |  } ]
+  |}
+  |  } ]
+  |}
+""".stripMargin
+
+  withTable(tableName, tempTableName) {
+// Creates the external partitioned Avro table to be tested.
+sql(
+  s"""CREATE EXTERNAL TABLE $tableName
+ |PARTITIONED BY (ds STRING)
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Creates an temporary Avro table used to prepare testing Avro 
file.
+sql(
+  s"""CREATE EXTERNAL TABLE $tempTableName
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tempTableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Generates Avro data.
+sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
+
+// Adds generated Avro data as a new partition to the testing 
table.
+sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableName"),
--- End diff --

Yea, sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68352258
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("SPARK-13709: reading partitioned Avro table with nested schema") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  val tableName = "spark_13709"
+  val tempTableName = "spark_13709_temp"
+
+  new File(path, tableName).mkdir()
+  new File(path, tempTableName).mkdir()
+
+  val avroSchema =
+"""{
+  |  "name": "test_record",
+  |  "type": "record",
+  |  "fields": [ {
+  |"name": "f0",
+  |"type": "int"
+  |  }, {
+  |"name": "f1",
+  |"type": {
+  |  "type": "record",
+  |  "name": "inner",
+  |  "fields": [ {
+  |"name": "f10",
+  |"type": "int"
+  |  }, {
+  |"name": "f11",
+  |"type": "double"
+  |  } ]
+  |}
+  |  } ]
+  |}
+""".stripMargin
+
+  withTable(tableName, tempTableName) {
+// Creates the external partitioned Avro table to be tested.
+sql(
+  s"""CREATE EXTERNAL TABLE $tableName
+ |PARTITIONED BY (ds STRING)
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Creates an temporary Avro table used to prepare testing Avro 
file.
+sql(
+  s"""CREATE EXTERNAL TABLE $tempTableName
+ |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ |STORED AS
+ |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ |LOCATION '$path/$tempTableName'
+ |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema')
+   """.stripMargin
+)
+
+// Generates Avro data.
+sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 
2.5)")
+
+// Adds generated Avro data as a new partition to the testing 
table.
+sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION 
'$path/$tempTableName'")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableName"),
--- End diff --

Yea, when reading data from a partition, the Avro deserializer needs to 
know the Avro schema defined in the table properties (`avro.schema.literal`). 
However, originally we only initialize the deserializer using the partition 
properties, which doesn't contain `avro.schema.literal`. This PR fixes it by 
merging to sets of properties.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...

2016-06-23 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13865#discussion_r68335825
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -230,10 +234,21 @@ class HadoopTableReader(
   // Fill all partition keys to the given MutableRow object
   fillPartitionKeys(partValues, mutableRow)
 
+  val tableProperties = relation.tableDesc.getProperties
--- End diff --

Local variable for serialization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13865: [SPARK-13709][SQL] Initialize deserializer with both tab...

2016-06-23 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13865
  
This is ready for review.

cc @yhuai @cloud-fan @clockfly 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13870: [SPARK-16165][SQL] Fix the update logic for InMemoryTabl...

2016-06-23 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13870
  
LGTM, merging to master and branch-2.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13871: [SPARK-16163] [SQL] Cache the statistics for logical pla...

2016-06-23 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13871
  
LGTM except for the compilation error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >