[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14034


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70194607
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -660,18 +660,51 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 
   test("limit") {
 checkAnswer(
-  sql("SELECT * FROM testData LIMIT 10"),
+  sql("SELECT * FROM testData LIMIT 9 + 1"),
   testData.take(10).toSeq)
 
 checkAnswer(
-  sql("SELECT * FROM arrayData LIMIT 1"),
+  sql("SELECT * FROM arrayData LIMIT CAST(1 AS Integer)"),
   arrayData.collect().take(1).map(Row.fromTuple).toSeq)
 
 checkAnswer(
   sql("SELECT * FROM mapData LIMIT 1"),
   mapData.collect().take(1).map(Row.fromTuple).toSeq)
   }
 
+  test("non-foldable expressions in LIMIT") {
+val e = intercept[AnalysisException] {
+  sql("SELECT * FROM testData LIMIT key > 3")
+}.getMessage
+assert(e.contains("The argument to the LIMIT clause must evaluate to a 
constant value. " +
+  "Limit:(testdata.`key` > 3)"))
+  }
+
+  test("Limit: unable to evaluate and cast expressions in limit clauses to 
Int") {
--- End diff --

Yeah, will change it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70194599
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala ---
@@ -31,4 +33,46 @@ class StatisticsSuite extends QueryTest with 
SharedSQLContext {
   spark.sessionState.conf.autoBroadcastJoinThreshold)
   }
 
+  test("estimates the size of limit") {
+withTempTable("test") {
+  Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
+.createOrReplaceTempView("test")
+  Seq((0, 1), (1, 24), (2, 48)).foreach { case (limit, expected) =>
+val df = sql(s"""SELECT * FROM test limit $limit""")
+
+val sizesGlobalLimit = df.queryExecution.analyzed.collect { case 
g: GlobalLimit =>
+  g.statistics.sizeInBytes
+}
+assert(sizesGlobalLimit.size === 1, s"Size wrong for:\n 
${df.queryExecution}")
+assert(sizesGlobalLimit.head === BigInt(expected),
+  s"expected exact size 24 for table 'test', got: 
${sizesGlobalLimit.head}")
+
+val sizesLocalLimit = df.queryExecution.analyzed.collect { case l: 
LocalLimit =>
+  l.statistics.sizeInBytes
+}
+assert(sizesLocalLimit.size === 1, s"Size wrong for:\n 
${df.queryExecution}")
+assert(sizesLocalLimit.head === BigInt(expected),
+  s"expected exact size 24 for table 'test', got: 
${sizesLocalLimit.head}")
+  }
+}
+  }
+
+  test("estimates the size of a limit 0 on outer join") {
+withTempTable("test") {
+  Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
+.createOrReplaceTempView("test")
+  val df1 = spark.table("test")
+  val df2 = spark.table("test").limit(0)
+  val df = df1.join(df2, Seq("k"), "left")
+
+  val sizes = df.queryExecution.analyzed.collect { case g: Join =>
+g.statistics.sizeInBytes
+  }
+
+  assert(sizes.size === 1, s"Size wrong for:\n ${df.queryExecution}")
--- End diff --

Sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70194595
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala ---
@@ -31,4 +33,46 @@ class StatisticsSuite extends QueryTest with 
SharedSQLContext {
   spark.sessionState.conf.autoBroadcastJoinThreshold)
   }
 
+  test("estimates the size of limit") {
+withTempTable("test") {
+  Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
+.createOrReplaceTempView("test")
+  Seq((0, 1), (1, 24), (2, 48)).foreach { case (limit, expected) =>
+val df = sql(s"""SELECT * FROM test limit $limit""")
+
+val sizesGlobalLimit = df.queryExecution.analyzed.collect { case 
g: GlobalLimit =>
+  g.statistics.sizeInBytes
+}
+assert(sizesGlobalLimit.size === 1, s"Size wrong for:\n 
${df.queryExecution}")
+assert(sizesGlobalLimit.head === BigInt(expected),
+  s"expected exact size 24 for table 'test', got: 
${sizesGlobalLimit.head}")
--- End diff --

: ) Forgot to change it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70183976
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala ---
@@ -31,4 +33,46 @@ class StatisticsSuite extends QueryTest with 
SharedSQLContext {
   spark.sessionState.conf.autoBroadcastJoinThreshold)
   }
 
+  test("estimates the size of limit") {
+withTempTable("test") {
+  Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
+.createOrReplaceTempView("test")
+  Seq((0, 1), (1, 24), (2, 48)).foreach { case (limit, expected) =>
+val df = sql(s"""SELECT * FROM test limit $limit""")
+
+val sizesGlobalLimit = df.queryExecution.analyzed.collect { case 
g: GlobalLimit =>
+  g.statistics.sizeInBytes
+}
+assert(sizesGlobalLimit.size === 1, s"Size wrong for:\n 
${df.queryExecution}")
+assert(sizesGlobalLimit.head === BigInt(expected),
+  s"expected exact size 24 for table 'test', got: 
${sizesGlobalLimit.head}")
+
+val sizesLocalLimit = df.queryExecution.analyzed.collect { case l: 
LocalLimit =>
+  l.statistics.sizeInBytes
+}
+assert(sizesLocalLimit.size === 1, s"Size wrong for:\n 
${df.queryExecution}")
+assert(sizesLocalLimit.head === BigInt(expected),
+  s"expected exact size 24 for table 'test', got: 
${sizesLocalLimit.head}")
+  }
+}
+  }
+
+  test("estimates the size of a limit 0 on outer join") {
+withTempTable("test") {
+  Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
+.createOrReplaceTempView("test")
+  val df1 = spark.table("test")
+  val df2 = spark.table("test").limit(0)
+  val df = df1.join(df2, Seq("k"), "left")
+
+  val sizes = df.queryExecution.analyzed.collect { case g: Join =>
+g.statistics.sizeInBytes
+  }
+
+  assert(sizes.size === 1, s"Size wrong for:\n ${df.queryExecution}")
--- End diff --

how about `number of Join nodes is wrong`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70183958
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,20 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+if (!limitExpr.foldable) {
--- End diff --

this may be more readable:
```
limitExpr match {
  case e if !e.foldable => fail(...)
  case e if e.dataType != IntegerType => fail("the limit expression must be 
int type, but got ${e.dataType.simpleString}")
  case e if e.eval() < 0 => fail(...)
  case e =>
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70183882
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala ---
@@ -31,4 +33,46 @@ class StatisticsSuite extends QueryTest with 
SharedSQLContext {
   spark.sessionState.conf.autoBroadcastJoinThreshold)
   }
 
+  test("estimates the size of limit") {
+withTempTable("test") {
+  Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
+.createOrReplaceTempView("test")
+  Seq((0, 1), (1, 24), (2, 48)).foreach { case (limit, expected) =>
+val df = sql(s"""SELECT * FROM test limit $limit""")
+
+val sizesGlobalLimit = df.queryExecution.analyzed.collect { case 
g: GlobalLimit =>
+  g.statistics.sizeInBytes
+}
+assert(sizesGlobalLimit.size === 1, s"Size wrong for:\n 
${df.queryExecution}")
+assert(sizesGlobalLimit.head === BigInt(expected),
+  s"expected exact size 24 for table 'test', got: 
${sizesGlobalLimit.head}")
--- End diff --

why we hardcode `24` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70183835
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -660,18 +660,51 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 
   test("limit") {
 checkAnswer(
-  sql("SELECT * FROM testData LIMIT 10"),
+  sql("SELECT * FROM testData LIMIT 9 + 1"),
   testData.take(10).toSeq)
 
 checkAnswer(
-  sql("SELECT * FROM arrayData LIMIT 1"),
+  sql("SELECT * FROM arrayData LIMIT CAST(1 AS Integer)"),
   arrayData.collect().take(1).map(Row.fromTuple).toSeq)
 
 checkAnswer(
   sql("SELECT * FROM mapData LIMIT 1"),
   mapData.collect().take(1).map(Row.fromTuple).toSeq)
   }
 
+  test("non-foldable expressions in LIMIT") {
+val e = intercept[AnalysisException] {
+  sql("SELECT * FROM testData LIMIT key > 3")
+}.getMessage
+assert(e.contains("The argument to the LIMIT clause must evaluate to a 
constant value. " +
+  "Limit:(testdata.`key` > 3)"))
+  }
+
+  test("Limit: unable to evaluate and cast expressions in limit clauses to 
Int") {
--- End diff --

We should also update the test name, we don't try to cast the limit 
expression to int type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70183276
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,21 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+if (!limitExpr.foldable) {
+  failAnalysis(
+"The argument to the LIMIT clause must evaluate to a constant 
value. " +
+s"Limit:${limitExpr.sql}")
+}
+limitExpr.eval() match {
+  case o: Int if o >= 0 => // OK
+  case o: Int => failAnalysis(
+s"number_rows in limit clause must be equal to or greater than 0. 
number_rows:$o")
+  case o => failAnalysis(
+s"""number_rows in limit clause cannot be cast to 
integer:\"$o\".""")
--- End diff --

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70178966
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,21 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+if (!limitExpr.foldable) {
+  failAnalysis(
+"The argument to the LIMIT clause must evaluate to a constant 
value. " +
+s"Limit:${limitExpr.sql}")
+}
+limitExpr.eval() match {
+  case o: Int if o >= 0 => // OK
+  case o: Int => failAnalysis(
+s"number_rows in limit clause must be equal to or greater than 0. 
number_rows:$o")
+  case o => failAnalysis(
+s"""number_rows in limit clause cannot be cast to 
integer:\"$o\".""")
--- End diff --

`cannot be cast to integer` -> `must be integer`? e.g. byte is castable to 
int.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70167835
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2045,6 +2047,21 @@ object EliminateUnions extends Rule[LogicalPlan] {
 }
 
 /**
+ * Converts foldable numeric expressions to integers in [[GlobalLimit]] 
and [[LocalLimit]] operators
+ */
+object ResolveLimits extends Rule[LogicalPlan] {
--- End diff --

Sure, let me revert it back. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70166579
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2045,6 +2047,21 @@ object EliminateUnions extends Rule[LogicalPlan] {
 }
 
 /**
+ * Converts foldable numeric expressions to integers in [[GlobalLimit]] 
and [[LocalLimit]] operators
+ */
+object ResolveLimits extends Rule[LogicalPlan] {
--- End diff --

We can create a JIRA first and discuss with others to decide if it's useful 
to support all integral types in limit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70166568
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2045,6 +2047,21 @@ object EliminateUnions extends Rule[LogicalPlan] {
 }
 
 /**
+ * Converts foldable numeric expressions to integers in [[GlobalLimit]] 
and [[LocalLimit]] operators
+ */
+object ResolveLimits extends Rule[LogicalPlan] {
--- End diff --

Thanks for trying this out! Yea looks like it's doable. Let's remove it 
first and do it in follow-ups, to make this PR surgical.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70156160
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2045,6 +2047,21 @@ object EliminateUnions extends Rule[LogicalPlan] {
 }
 
 /**
+ * Converts foldable numeric expressions to integers of [[GlobalLimit]] 
and [[LocalLimit]] operators
+ */
+object ResolveLimits extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case g @ GlobalLimit(limitExpr, _) if limitExpr.foldable && 
isNumeric(limitExpr.eval()) =>
+  g.copy(limitExpr = 
Literal(limitExpr.eval().asInstanceOf[Number].intValue(), IntegerType))
--- End diff --

Another way is like
```Scala
g.copy(limitExpr = Literal(Cast(limitExpr, IntegerType).eval(), 
IntegerType))
```
Not sure which one is better here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70118783
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,21 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+if (!limitExpr.foldable) {
+  failAnalysis(
+"The argument to the LIMIT clause must evaluate to a constant 
value. " +
+s"Limit:${limitExpr.sql}")
+}
+limitExpr.eval() match {
+  case o: Int if o >= 0 => // OK
+  case o: Int => failAnalysis(
+s"number_rows in limit clause must be equal to or greater than 0. 
number_rows:$o")
+  case o => failAnalysis(
--- End diff --

Tried to do it in different ways, but the best way I found is to introduce 
an Analyzer rule. 

We have one rule in Optimizer and one rule in Planner. Both assume the 
`limitExpr` is an `Integer` type. It looks ugly to convert the type everywhere. 
To support different numeric types, we need to convert these foldable numeric 
values to integer in `Analzyer`, I think. Let me upload a fix at first. You can 
judge whether this is a good way or not. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70109475
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,21 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+if (!limitExpr.foldable) {
+  failAnalysis(
+"The argument to the LIMIT clause must evaluate to a constant 
value. " +
+s"Limit:${limitExpr.sql}")
+}
+limitExpr.eval() match {
+  case o: Int if o >= 0 => // OK
+  case o: Int => failAnalysis(
+s"number_rows in limit clause must be equal to or greater than 0. 
number_rows:$o")
+  case o => failAnalysis(
--- End diff --

You know, we can do it, but we need to fix the other parts. Let me try it. 
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70035305
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,21 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+if (!limitExpr.foldable) {
+  failAnalysis(
+"The argument to the LIMIT clause must evaluate to a constant 
value. " +
+s"Limit:${limitExpr.sql}")
+}
+limitExpr.eval() match {
+  case o: Int if o >= 0 => // OK
+  case o: Int => failAnalysis(
+s"number_rows in limit clause must be equal to or greater than 0. 
number_rows:$o")
+  case o => failAnalysis(
--- End diff --

do we allow other integral type? e.g. byte, long, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70033419
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -660,18 +660,39 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 
   test("limit") {
 checkAnswer(
-  sql("SELECT * FROM testData LIMIT 10"),
+  sql("SELECT * FROM testData LIMIT 9 + 1"),
   testData.take(10).toSeq)
 
 checkAnswer(
-  sql("SELECT * FROM arrayData LIMIT 1"),
+  sql("SELECT * FROM arrayData LIMIT CAST(1 AS Integer)"),
   arrayData.collect().take(1).map(Row.fromTuple).toSeq)
 
 checkAnswer(
   sql("SELECT * FROM mapData LIMIT 1"),
   mapData.collect().take(1).map(Row.fromTuple).toSeq)
   }
 
+  test("non-foldable expressions in LIMIT") {
+val e = intercept[AnalysisException] {
+  sql("SELECT * FROM testData LIMIT key > 3")
--- End diff --

A good question! : ) Now, the exception we issued is not good:
```
java.lang.Boolean cannot be cast to java.lang.Integer
java.lang.ClassCastException: java.lang.Boolean cannot be cast to 
java.lang.Integer
at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)
```

Let me fix it and throw a more reasonable exception:
```
number_rows in limit clause cannot be cast to integer:true;
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70029842
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -660,18 +660,39 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 
   test("limit") {
 checkAnswer(
-  sql("SELECT * FROM testData LIMIT 10"),
+  sql("SELECT * FROM testData LIMIT 9 + 1"),
   testData.take(10).toSeq)
 
 checkAnswer(
-  sql("SELECT * FROM arrayData LIMIT 1"),
+  sql("SELECT * FROM arrayData LIMIT CAST(1 AS Integer)"),
   arrayData.collect().take(1).map(Row.fromTuple).toSeq)
 
 checkAnswer(
   sql("SELECT * FROM mapData LIMIT 1"),
   mapData.collect().take(1).map(Row.fromTuple).toSeq)
   }
 
+  test("non-foldable expressions in LIMIT") {
+val e = intercept[AnalysisException] {
+  sql("SELECT * FROM testData LIMIT key > 3")
--- End diff --

what will happen if the type is wrong? e.g. `LIMIT true`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70028525
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -660,7 +660,12 @@ case class GlobalLimit(limitExpr: Expression, child: 
LogicalPlan) extends UnaryN
   }
   override lazy val statistics: Statistics = {
 val limit = limitExpr.eval().asInstanceOf[Int]
-val sizeInBytes = (limit: Long) * output.map(a => 
a.dataType.defaultSize).sum
+var sizeInBytes = (limit: Long) * output.map(a => 
a.dataType.defaultSize).sum
+if (sizeInBytes == 0) {
--- End diff --

Thanks! Let me fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70028351
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,20 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+if (!limitExpr.foldable) {
--- End diff --

uh, I see. Thank you! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70025602
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,20 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+if (!limitExpr.foldable) {
--- End diff --

I'm saying maybe we can declare Limit as `case class GlobalLimit(limit: 
Int, child: LogicalPlan)`, but it's not a small change, we could think about it 
later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70024905
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -660,7 +660,12 @@ case class GlobalLimit(limitExpr: Expression, child: 
LogicalPlan) extends UnaryN
   }
   override lazy val statistics: Statistics = {
 val limit = limitExpr.eval().asInstanceOf[Int]
-val sizeInBytes = (limit: Long) * output.map(a => 
a.dataType.defaultSize).sum
+var sizeInBytes = (limit: Long) * output.map(a => 
a.dataType.defaultSize).sum
+if (sizeInBytes == 0) {
--- End diff --

I think it's a special case for limit. How about we make it more clear? e.g.
```
val sizeInBytes = if (limit == 0) {
  1
} else {
  ...
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69599120
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala ---
@@ -31,4 +33,46 @@ class StatisticsSuite extends QueryTest with 
SharedSQLContext {
   spark.sessionState.conf.autoBroadcastJoinThreshold)
   }
 
+  test("estimates the size of limit") {
+withTempTable("test") {
+  Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
+.createOrReplaceTempView("test")
+  Seq((0, 1), (1, 24), (2, 48)).foreach { case (limit, expected) =>
+val df = sql(s"""SELECT * FROM test limit $limit""")
+
+val sizesGlobalLimit = df.queryExecution.analyzed.collect { case 
g: GlobalLimit =>
+  g.statistics.sizeInBytes
+}
+assert(sizesGlobalLimit.size === 1, s"Size wrong for:\n 
${df.queryExecution}")
+assert(sizesGlobalLimit(0).equals(BigInt(expected)),
+  s"expected exact size 24 for table 'test', got: 
${sizesGlobalLimit(0)}")
+
+val sizesLocalLimit = df.queryExecution.analyzed.collect { case l: 
LocalLimit =>
+  l.statistics.sizeInBytes
+}
+assert(sizesLocalLimit.size === 1, s"Size wrong for:\n 
${df.queryExecution}")
+assert(sizesLocalLimit(0).equals(BigInt(expected)),
+  s"expected exact size 24 for table 'test', got: 
${sizesLocalLimit(0)}")
+  }
+}
+  }
+
+  test("estimates the size of a limit 0 on outer join") {
+withTempTable("test") {
+  Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
+.createOrReplaceTempView("test")
+  val df1 = spark.table("test")
+  val df2 = spark.table("test").limit(0)
+  val df = df1.join(df2, Seq("k"), "left")
+
+  val sizes = df.queryExecution.analyzed.collect { case g: Join =>
+g.statistics.sizeInBytes
+  }
+
+  assert(sizes.size === 1, s"Size wrong for:\n ${df.queryExecution}")
+  assert(sizes(0).equals(BigInt(96)),
--- End diff --

This is just following what the other test cases did. Sure, we can change 
all of them. Let me do it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69599041
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,20 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+if (!limitExpr.foldable) {
--- End diff --

Are you saying we should change the SQL parser? You know, if so, we also 
need to change the `TABLESAMPLE n ROWS`.

FYI, even Hive does not support foldable expressions.
```
hive> select * from t1 limit 1 + 1;
FAILED: ParseException line 1:25 missing EOF at '+' near '1'
hive> select * from t1 limit (1+1);
FAILED: ParseException line 1:23 extraneous input '(' expecting Number near 
'1'
```
I found that Impala supports it. 

http://www.cloudera.com/documentation/archive/impala/2-x/2-1-x/topics/impala_limit.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69586217
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,20 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+if (!limitExpr.foldable) {
--- End diff --

If it must be foldable, can we just use `int` as limit instead of 
expression?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-05 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69528969
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala ---
@@ -31,4 +33,46 @@ class StatisticsSuite extends QueryTest with 
SharedSQLContext {
   spark.sessionState.conf.autoBroadcastJoinThreshold)
   }
 
+  test("estimates the size of limit") {
+withTempTable("test") {
+  Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
+.createOrReplaceTempView("test")
+  Seq((0, 1), (1, 24), (2, 48)).foreach { case (limit, expected) =>
+val df = sql(s"""SELECT * FROM test limit $limit""")
+
+val sizesGlobalLimit = df.queryExecution.analyzed.collect { case 
g: GlobalLimit =>
+  g.statistics.sizeInBytes
+}
+assert(sizesGlobalLimit.size === 1, s"Size wrong for:\n 
${df.queryExecution}")
+assert(sizesGlobalLimit(0).equals(BigInt(expected)),
+  s"expected exact size 24 for table 'test', got: 
${sizesGlobalLimit(0)}")
+
+val sizesLocalLimit = df.queryExecution.analyzed.collect { case l: 
LocalLimit =>
+  l.statistics.sizeInBytes
+}
+assert(sizesLocalLimit.size === 1, s"Size wrong for:\n 
${df.queryExecution}")
+assert(sizesLocalLimit(0).equals(BigInt(expected)),
+  s"expected exact size 24 for table 'test', got: 
${sizesLocalLimit(0)}")
+  }
+}
+  }
+
+  test("estimates the size of a limit 0 on outer join") {
+withTempTable("test") {
+  Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
+.createOrReplaceTempView("test")
+  val df1 = spark.table("test")
+  val df2 = spark.table("test").limit(0)
+  val df = df1.join(df2, Seq("k"), "left")
+
+  val sizes = df.queryExecution.analyzed.collect { case g: Join =>
+g.statistics.sizeInBytes
+  }
+
+  assert(sizes.size === 1, s"Size wrong for:\n ${df.queryExecution}")
+  assert(sizes(0).equals(BigInt(96)),
--- End diff --

Why do you `equals`? Would `===` not work here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69499913
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+val numRows = limitExpr.eval().asInstanceOf[Int]
--- End diff --

Let me do it in this PR. Thank you for your review! : ) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69499892
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+val numRows = limitExpr.eval().asInstanceOf[Int]
--- End diff --

We do not support non-foldable limit clauses.

https://github.com/apache/spark/blob/d063898bebaaf4ec2aad24c3ac70aabdbf97a190/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L67-L89

https://github.com/apache/spark/blob/d063898bebaaf4ec2aad24c3ac70aabdbf97a190/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L398-L401

But,,, we do not issue an exception if users do it. Thus, the error we got 
is strange:
```
assertion failed: No plan for GlobalLimit (_nondeterministic#203 > 0.2)
+- Project [key#11, value#12, rand(-1441968339187861415) AS 
_nondeterministic#203]
   +- LocalLimit (_nondeterministic#202 > 0.2)
  +- Project [key#11, value#12, rand(-1308350387169017676) AS 
_nondeterministic#202]
 +- LogicalRDD [key#11, value#12]

java.lang.AssertionError: assertion failed: No plan for GlobalLimit 
(_nondeterministic#203 > 0.2)
+- Project [key#11, value#12, rand(-1441968339187861415) AS 
_nondeterministic#203]
   +- LocalLimit (_nondeterministic#202 > 0.2)
  +- Project [key#11, value#12, rand(-1308350387169017676) AS 
_nondeterministic#202]
 +- LogicalRDD [key#11, value#12]
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69499666
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+val numRows = limitExpr.eval().asInstanceOf[Int]
--- End diff --

- Oracle: 
http://docs.oracle.com/javadb/10.5.3.0/ref/rrefsqljoffsetfetch.html
- DB2 z/OS: 
https://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/com.ibm.db2z10.doc.sqlref/src/tpc/db2z_sql_fetchfirstclause.html
- MySQL: http://dev.mysql.com/doc/refman/5.7/en/select.html
- PostgreSQL: https://www.postgresql.org/docs/8.1/static/queries-limit.html

It sounds like nobody supports it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69498807
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+val numRows = limitExpr.eval().asInstanceOf[Int]
--- End diff --

ah, but it's still foldable. Is it possible it's non-foldable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69498258
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+val numRows = limitExpr.eval().asInstanceOf[Int]
--- End diff --

Nope. Users can input an expression here. For example, 

https://github.com/apache/spark/blob/e5d703bca85c65ce329b1e202283cfa35d109146/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala#L234


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69498054
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+val numRows = limitExpr.eval().asInstanceOf[Int]
--- End diff --

is the limit expression guaranteed to be literal?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69494036
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -251,6 +251,22 @@ trait CheckAnalysis extends PredicateHelper {
 s"but one table has '${firstError.output.length}' columns 
and another table has " +
 s"'${s.children.head.output.length}' columns")
 
+  case l: GlobalLimit =>
+val numRows = l.limitExpr.eval().asInstanceOf[Int]
+if (numRows < 0) {
+  failAnalysis(
+s"number_rows in limit clause must be equal to or greater 
than 0. " +
+  s"number_rows:$numRows")
+}
+
+  case l: LocalLimit =>
--- End diff --

I do not think we can merge them, but, yeah, we can create a local function 
for it. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-04 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69488260
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -251,6 +251,22 @@ trait CheckAnalysis extends PredicateHelper {
 s"but one table has '${firstError.output.length}' columns 
and another table has " +
 s"'${s.children.head.output.length}' columns")
 
+  case l: GlobalLimit =>
+val numRows = l.limitExpr.eval().asInstanceOf[Int]
+if (numRows < 0) {
+  failAnalysis(
+s"number_rows in limit clause must be equal to or greater 
than 0. " +
+  s"number_rows:$numRows")
+}
+
+  case l: LocalLimit =>
--- End diff --

What do you think about merging the two cases to `case l @ (_: LocalLimit | 
_: GlobalLimit) =>` to remove the duplication (or at least introduce a local 
method).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org