[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20851


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176971232
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -129,6 +154,10 @@ private[parquet] object ParquetFilters {
 case BinaryType =>
   (n: String, v: Any) =>
 FilterApi.gt(binaryColumn(n), 
Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]]))
+case DateType if SQLConf.get.parquetFilterPushDownDate =>
--- End diff --

we should refactor it, so that adding a new data type doesn't need to touch 
so many places. This can be done later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176971146
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -50,6 +59,10 @@ private[parquet] object ParquetFilters {
   (n: String, v: Any) => FilterApi.eq(
 binaryColumn(n),
 Option(v).map(b => 
Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull)
+case DateType if SQLConf.get.parquetFilterPushDownDate =>
+  (n: String, v: Any) => FilterApi.eq(
+intColumn(n),
+Option(v).map(date => 
dateToDays(date.asInstanceOf[Date]).asInstanceOf[Integer]).orNull)
--- End diff --

sorry I was wrong. I took a look at how these dates get created, in 
`DataSourceStrategy.translateFilter`. Actually they are created via 
`DateTimeUtils.toJavaDate` without timezone, which means here we should not use 
timezone either.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-25 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176966288
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(false)
--- End diff --

@dongjoon-hyun, we are investigating to use this feature in some kind of 
eBay's queries, if its performance is good, will benefit a lot.
As per our discussion, I will turn it on by default, thanks very much!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176948516
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(false)
--- End diff --

Great! Thank you for confirmation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-25 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176946273
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(false)
--- End diff --

This is not a bug fix. We will not backport it to branch-2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176922300
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(false)
--- End diff --

If you think so, +1. :)

BTW, based on Apache Spark way, I assume that this will not land on 
`branch-2.3` with `spark.sql.parquet.filterPushdown.date=true`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176897485
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(false)
--- End diff --

I think it's common that we turn on new feature by default, if there is no 
known regression. And turn it off if we find regression later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176762766
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(false)
--- End diff --

@yucai . The reason is that `spark.sql.orc.filterPushdown` is still `false` 
in Spark 2.3 while `spark.sql.parquet.filterPushdown` is `true`. We don't know 
this is safe or not.

Anyway, we have 6 or more months for Apache Spark 2.4. We may enable this 
in `master` branch temporarily for testing purpose only, and are able to 
disable this at the last moment of 2.4 release like we did about ORC conf if 
there is some issue.

BTW, did you use this in your company a lot in production?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176676405
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(false)
--- End diff --

I am fine either way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-23 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176660685
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(false)
--- End diff --

@dongjoon-hyun @HyukjinKwon suggested such configuration could be better to 
start with "false", but actually ORC does support Date push down by default, 
any idea?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176636849
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(false)
--- End diff --

an internal by-default-false conf usually means it's not available for 
users...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176633578
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,12 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
--- End diff --

Yes, it should be an internal conf. In Spark 3.0 release, we will revisit 
all the internal confs and remove all the unnecessary confs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176317120
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +315,48 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class StringToDate(s: String) {
+  def date: Date = Date.valueOf(s)
+}
+
+val data = Seq("2018-03-18", "2018-03-19", "2018-03-20", "2018-03-21")
+
+withParquetDataFrame(data.map(i => Tuple1(i.date))) { implicit df =>
+  checkFilterPredicate('_1.isNull, classOf[Eq[_]], Seq.empty[Row])
+  checkFilterPredicate('_1.isNotNull, classOf[NotEq[_]], data.map(i => 
Row.apply(i.date)))
+
+  checkFilterPredicate('_1 === "2018-03-18".date, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 <=> "2018-03-18".date, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 =!= "2018-03-18".date, classOf[NotEq[_]],
+Seq("2018-03-19", "2018-03-20", "2018-03-21").map(i => 
Row.apply(i.date)))
+
+  checkFilterPredicate('_1 < "2018-03-19".date, classOf[Lt[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 > "2018-03-20".date, classOf[Gt[_]], 
"2018-03-21".date)
+  checkFilterPredicate('_1 <= "2018-03-18".date, classOf[LtEq[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 >= "2018-03-21".date, classOf[GtEq[_]], 
"2018-03-21".date)
+
+  checkFilterPredicate(
+Literal("2018-03-18".date) === '_1, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-18".date) <=> '_1, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-19".date) > '_1, classOf[Lt[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-20".date) < '_1, classOf[Gt[_]], 
"2018-03-21".date)
+  checkFilterPredicate(
+Literal("2018-03-18".date) >= '_1, classOf[LtEq[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-21".date) <= '_1, classOf[GtEq[_]], 
"2018-03-21".date)
+
+  checkFilterPredicate(!('_1 < "2018-03-21".date), classOf[GtEq[_]], 
"2018-03-21".date)
+  checkFilterPredicate(
+'_1 < "2018-03-19".date || '_1 > "2018-03-20".date,
+classOf[Operators.Or],
+Seq(Row("2018-03-18".date), Row("2018-03-21".date)))
+}
--- End diff --

This is different from what @HyukjinKwon gave you. :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176310447
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +315,49 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class StringToDate(s: String) {
+  def date: Date = Date.valueOf(s)
+}
+
+withParquetDataFrame(
+  Seq("2018-03-18", "2018-03-19", "2018-03-20", "2018-03-21").map(i => 
Tuple1(i.date))) {
+implicit df =>
+  checkFilterPredicate('_1.isNull, classOf[Eq[_]], Seq.empty[Row])
+  checkFilterPredicate('_1.isNotNull, classOf[NotEq[_]],
+Seq("2018-03-18", "2018-03-19", "2018-03-20", 
"2018-03-21").map(i => Row.apply(i.date)))
+
+  checkFilterPredicate('_1 === "2018-03-18".date, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 <=> "2018-03-18".date, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 =!= "2018-03-18".date, 
classOf[NotEq[_]],
+Seq("2018-03-19", "2018-03-20", "2018-03-21").map(i => 
Row.apply(i.date)))
+
+  checkFilterPredicate('_1 < "2018-03-19".date, classOf[Lt[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 > "2018-03-20".date, classOf[Gt[_]], 
"2018-03-21".date)
+  checkFilterPredicate('_1 <= "2018-03-18".date, classOf[LtEq[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 >= "2018-03-21".date, classOf[GtEq[_]], 
"2018-03-21".date)
+
+  checkFilterPredicate(
+Literal("2018-03-18".date) === '_1, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-18".date) <=> '_1, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-19".date) > '_1, classOf[Lt[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-20".date) < '_1, classOf[Gt[_]], 
"2018-03-21".date)
+  checkFilterPredicate(
+Literal("2018-03-18".date) >= '_1, classOf[LtEq[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-21".date) <= '_1, classOf[GtEq[_]], 
"2018-03-21".date)
+
+  checkFilterPredicate(!('_1 < "2018-03-21".date), 
classOf[GtEq[_]], "2018-03-21".date)
+  checkFilterPredicate(
+'_1 < "2018-03-19".date || '_1 > "2018-03-20".date,
+classOf[Operators.Or],
+Seq(Row("2018-03-18".date), Row("2018-03-21".date)))
+}
--- End diff --

Thanks, @HyukjinKwon ! That's better what I thought.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-21 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176308130
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(true)
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-21 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176308097
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +315,49 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class StringToDate(s: String) {
+  def date: Date = Date.valueOf(s)
+}
+
+withParquetDataFrame(
+  Seq("2018-03-18", "2018-03-19", "2018-03-20", "2018-03-21").map(i => 
Tuple1(i.date))) {
+implicit df =>
+  checkFilterPredicate('_1.isNull, classOf[Eq[_]], Seq.empty[Row])
+  checkFilterPredicate('_1.isNotNull, classOf[NotEq[_]],
+Seq("2018-03-18", "2018-03-19", "2018-03-20", 
"2018-03-21").map(i => Row.apply(i.date)))
+
+  checkFilterPredicate('_1 === "2018-03-18".date, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 <=> "2018-03-18".date, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 =!= "2018-03-18".date, 
classOf[NotEq[_]],
+Seq("2018-03-19", "2018-03-20", "2018-03-21").map(i => 
Row.apply(i.date)))
+
+  checkFilterPredicate('_1 < "2018-03-19".date, classOf[Lt[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 > "2018-03-20".date, classOf[Gt[_]], 
"2018-03-21".date)
+  checkFilterPredicate('_1 <= "2018-03-18".date, classOf[LtEq[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 >= "2018-03-21".date, classOf[GtEq[_]], 
"2018-03-21".date)
+
+  checkFilterPredicate(
+Literal("2018-03-18".date) === '_1, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-18".date) <=> '_1, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-19".date) > '_1, classOf[Lt[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-20".date) < '_1, classOf[Gt[_]], 
"2018-03-21".date)
+  checkFilterPredicate(
+Literal("2018-03-18".date) >= '_1, classOf[LtEq[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-21".date) <= '_1, classOf[GtEq[_]], 
"2018-03-21".date)
+
+  checkFilterPredicate(!('_1 < "2018-03-21".date), 
classOf[GtEq[_]], "2018-03-21".date)
+  checkFilterPredicate(
+'_1 < "2018-03-19".date || '_1 > "2018-03-20".date,
+classOf[Operators.Or],
+Seq(Row("2018-03-18".date), Row("2018-03-21".date)))
+}
--- End diff --

```scala
  test("filter pushdown - date") {
implicit class StringToDate(s: String) {
  def date: Date = Date.valueOf(s)
}

val data = Seq("2018-03-18", "2018-03-19", "2018-03-20", 
"2018-03-21").map(i => Tuple1(i.date))

withParquetDataFrame(data) { implicit df =>
  checkFilterPredicate('_1.isNull, classOf[Eq[_]], Seq.empty[Row])
  checkFilterPredicate('_1.isNotNull, classOf[NotEq[_]], data)

  checkFilterPredicate('_1 === "2018-03-18".date, classOf[Eq[_]], 
"2018-03-18".date)
  checkFilterPredicate('_1 <=> "2018-03-18".date, classOf[Eq[_]], 
"2018-03-18".date)
  checkFilterPredicate('_1 =!= "2018-03-18".date, classOf[NotEq[_]],
Seq("2018-03-19", "2018-03-20", "2018-03-21").map(i => 
Row.apply(i.date)))

  checkFilterPredicate('_1 < "2018-03-19".date, classOf[Lt[_]], 
"2018-03-18".date)
  checkFilterPredicate('_1 > "2018-03-20".date, classOf[Gt[_]], 
"2018-03-21".date)
  checkFilterPredicate('_1 <= "2018-03-18".date, classOf[LtEq[_]], 
"2018-03-18".date)
  checkFilterPredicate('_1 >= "2018-03-21".date, classOf[GtEq[_]], 
"2018-03-21".date)

  checkFilterPredicate(
Literal("2018-03-18".date) === '_1, classOf[Eq[_]], 
"2018-03-18".date)
  checkFilterPredicate(
Literal("2018-03-18".date) <=> '_1, classOf[Eq[_]], 
"2018-03-18".date)
  checkFilterPredicate(
Literal("2018-03-19".date) > '_1, classOf[Lt[_]], "2018-03-18".date)
  checkFilterPredicate(
Literal("2018-03-20".date) < '_1, classOf[Gt[_]], "2018-03-21".date)
  checkFilterPredicate(
Literal("2018-03-18".date) >= '_1, classOf[LtEq[_]], 
"2018-03-18".date)
  checkFilterPredicate(
Literal("2018-03-21".date) <= '_1, classOf[GtEq[_]], 
"2018-03-21".date)

  checkFilterPredicate(!('_1 < "2018-03-21".date), classOf[GtEq[_]], 
"2018-03-21".date)
  checkFilterPredicate(
'_1 < "2018-03-19".date || '_1 > "2018-03-20".date,
classOf[Operators.Or],
   

[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-21 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176306730
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +315,49 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class StringToDate(s: String) {
+  def date: Date = Date.valueOf(s)
+}
+
+withParquetDataFrame(
+  Seq("2018-03-18", "2018-03-19", "2018-03-20", "2018-03-21").map(i => 
Tuple1(i.date))) {
+implicit df =>
+  checkFilterPredicate('_1.isNull, classOf[Eq[_]], Seq.empty[Row])
+  checkFilterPredicate('_1.isNotNull, classOf[NotEq[_]],
+Seq("2018-03-18", "2018-03-19", "2018-03-20", 
"2018-03-21").map(i => Row.apply(i.date)))
+
+  checkFilterPredicate('_1 === "2018-03-18".date, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 <=> "2018-03-18".date, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 =!= "2018-03-18".date, 
classOf[NotEq[_]],
+Seq("2018-03-19", "2018-03-20", "2018-03-21").map(i => 
Row.apply(i.date)))
+
+  checkFilterPredicate('_1 < "2018-03-19".date, classOf[Lt[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 > "2018-03-20".date, classOf[Gt[_]], 
"2018-03-21".date)
+  checkFilterPredicate('_1 <= "2018-03-18".date, classOf[LtEq[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 >= "2018-03-21".date, classOf[GtEq[_]], 
"2018-03-21".date)
+
+  checkFilterPredicate(
+Literal("2018-03-18".date) === '_1, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-18".date) <=> '_1, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-19".date) > '_1, classOf[Lt[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-20".date) < '_1, classOf[Gt[_]], 
"2018-03-21".date)
+  checkFilterPredicate(
+Literal("2018-03-18".date) >= '_1, classOf[LtEq[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-21".date) <= '_1, classOf[GtEq[_]], 
"2018-03-21".date)
+
+  checkFilterPredicate(!('_1 < "2018-03-21".date), 
classOf[GtEq[_]], "2018-03-21".date)
+  checkFilterPredicate(
+'_1 < "2018-03-19".date || '_1 > "2018-03-20".date,
+classOf[Operators.Or],
+Seq(Row("2018-03-18".date), Row("2018-03-21".date)))
+}
--- End diff --

Could you kindly show me how to improve? Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176304057
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +315,49 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class StringToDate(s: String) {
+  def date: Date = Date.valueOf(s)
+}
+
+withParquetDataFrame(
+  Seq("2018-03-18", "2018-03-19", "2018-03-20", "2018-03-21").map(i => 
Tuple1(i.date))) {
+implicit df =>
+  checkFilterPredicate('_1.isNull, classOf[Eq[_]], Seq.empty[Row])
+  checkFilterPredicate('_1.isNotNull, classOf[NotEq[_]],
+Seq("2018-03-18", "2018-03-19", "2018-03-20", 
"2018-03-21").map(i => Row.apply(i.date)))
+
+  checkFilterPredicate('_1 === "2018-03-18".date, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 <=> "2018-03-18".date, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 =!= "2018-03-18".date, 
classOf[NotEq[_]],
+Seq("2018-03-19", "2018-03-20", "2018-03-21").map(i => 
Row.apply(i.date)))
+
+  checkFilterPredicate('_1 < "2018-03-19".date, classOf[Lt[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 > "2018-03-20".date, classOf[Gt[_]], 
"2018-03-21".date)
+  checkFilterPredicate('_1 <= "2018-03-18".date, classOf[LtEq[_]], 
"2018-03-18".date)
+  checkFilterPredicate('_1 >= "2018-03-21".date, classOf[GtEq[_]], 
"2018-03-21".date)
+
+  checkFilterPredicate(
+Literal("2018-03-18".date) === '_1, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-18".date) <=> '_1, classOf[Eq[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-19".date) > '_1, classOf[Lt[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-20".date) < '_1, classOf[Gt[_]], 
"2018-03-21".date)
+  checkFilterPredicate(
+Literal("2018-03-18".date) >= '_1, classOf[LtEq[_]], 
"2018-03-18".date)
+  checkFilterPredicate(
+Literal("2018-03-21".date) <= '_1, classOf[GtEq[_]], 
"2018-03-21".date)
+
+  checkFilterPredicate(!('_1 < "2018-03-21".date), 
classOf[GtEq[_]], "2018-03-21".date)
+  checkFilterPredicate(
+'_1 < "2018-03-19".date || '_1 > "2018-03-20".date,
+classOf[Operators.Or],
+Seq(Row("2018-03-18".date), Row("2018-03-21".date)))
+}
--- End diff --

nit. Indentation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176302671
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -50,6 +59,11 @@ private[parquet] object ParquetFilters {
   (n: String, v: Any) => FilterApi.eq(
 binaryColumn(n),
 Option(v).map(b => 
Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull)
+case DateType if SQLConf.get.parquetFilterPushDownDate =>
+  (n: String, v: Any) => FilterApi.eq(
+intColumn(n),
+Option(v).map(date => 
dateToDays(date.asInstanceOf[java.sql.Date]).asInstanceOf[Integer])
--- End diff --

Since we imported `Date` at line 20, we can remove `java.sql`.
```scala
-Option(v).map(date => 
dateToDays(date.asInstanceOf[java.sql.Date]).asInstanceOf[Integer])
+Option(v).map(date => 
dateToDays(date.asInstanceOf[Date]).asInstanceOf[Integer])
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176302394
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(true)
--- End diff --

For this kind of thing, we had better start with `false`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175990680
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +316,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
--- End diff --

the string one is more readable to me


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-20 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175989972
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +316,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
--- End diff --

As per @cloud-fan 's suggestion, I have changed 1.d to 1.date, which one is 
perferred? 
1. "2017-08-18".d
2. 1.date
3. "2017-08-18".date


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175987083
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +316,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
--- End diff --

Yup, I think this one is better than the current one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175986514
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +316,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
--- End diff --

I think `"2017-08-19".d` is at least better than `1.d`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-20 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175975025
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +315,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
+  def d: Date = new Date(Date.valueOf("2018-03-01").getTime + 24 * 60 
* 60 * 1000 * (int - 1))
+}
+
+withParquetDataFrame((1 to 4).map(i => Tuple1(i.d))) { implicit df =>
+  checkFilterPredicate('_1.isNull, classOf[Eq[_]], Seq.empty[Row])
+  checkFilterPredicate('_1.isNotNull, classOf[NotEq[_]], (1 to 
4).map(i => Row.apply(i.d)))
+
+  checkFilterPredicate('_1 === 1.d, classOf[Eq[_]], 1.d)
--- End diff --

Got it, thanks very much for explanation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175970994
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +315,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
+  def d: Date = new Date(Date.valueOf("2018-03-01").getTime + 24 * 60 
* 60 * 1000 * (int - 1))
+}
+
+withParquetDataFrame((1 to 4).map(i => Tuple1(i.d))) { implicit df =>
+  checkFilterPredicate('_1.isNull, classOf[Eq[_]], Seq.empty[Row])
+  checkFilterPredicate('_1.isNotNull, classOf[NotEq[_]], (1 to 
4).map(i => Row.apply(i.d)))
+
+  checkFilterPredicate('_1 === 1.d, classOf[Eq[_]], 1.d)
--- End diff --

`b` might be OK, but `d` is confusing as it's also a postfix for double.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-20 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175962888
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +315,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
+  def d: Date = new Date(Date.valueOf("2018-03-01").getTime + 24 * 60 
* 60 * 1000 * (int - 1))
+}
+
+withParquetDataFrame((1 to 4).map(i => Tuple1(i.d))) { implicit df =>
+  checkFilterPredicate('_1.isNull, classOf[Eq[_]], Seq.empty[Row])
+  checkFilterPredicate('_1.isNotNull, classOf[NotEq[_]], (1 to 
4).map(i => Row.apply(i.d)))
+
+  checkFilterPredicate('_1 === 1.d, classOf[Eq[_]], 1.d)
--- End diff --

I agree 1.date is better, but binary is using "1.b", should we keep the 
same pattern with it?
```
  test("filter pushdown - binary") {
implicit class IntToBinary(int: Int) {
  def b: Array[Byte] = int.toString.getBytes(StandardCharsets.UTF_8)
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175854222
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -50,6 +52,12 @@ private[parquet] object ParquetFilters {
   (n: String, v: Any) => FilterApi.eq(
 binaryColumn(n),
 Option(v).map(b => 
Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull)
+case DateType if SQLConf.get.parquetFilterPushDownDate =>
+  (n: String, v: Any) => FilterApi.eq(
+  intColumn(n),
+  Option(v).map { d =>
+
DateTimeUtils.fromJavaDate(d.asInstanceOf[java.sql.Date]).asInstanceOf[Integer]
--- End diff --

I think we should do 
`DateTimeUtils.fromJavaDate(d.asInstanceOf[java.sql.Date], 
SQLConf.get.sessionLocalTimeZone).asInstanceOf[Integer]`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175853868
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -50,6 +52,12 @@ private[parquet] object ParquetFilters {
   (n: String, v: Any) => FilterApi.eq(
 binaryColumn(n),
 Option(v).map(b => 
Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull)
+case DateType if SQLConf.get.parquetFilterPushDownDate =>
+  (n: String, v: Any) => FilterApi.eq(
+  intColumn(n),
+  Option(v).map { d =>
+
DateTimeUtils.fromJavaDate(d.asInstanceOf[java.sql.Date]).asInstanceOf[Integer]
--- End diff --

shall we respect session local timezone here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175851942
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +315,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
+  def d: Date = new Date(Date.valueOf("2018-03-01").getTime + 24 * 60 
* 60 * 1000 * (int - 1))
+}
+
+withParquetDataFrame((1 to 4).map(i => Tuple1(i.d))) { implicit df =>
+  checkFilterPredicate('_1.isNull, classOf[Eq[_]], Seq.empty[Row])
+  checkFilterPredicate('_1.isNotNull, classOf[NotEq[_]], (1 to 
4).map(i => Row.apply(i.d)))
+
+  checkFilterPredicate('_1 === 1.d, classOf[Eq[_]], 1.d)
--- End diff --

`1.d` is weird, can we name it `1.date`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-20 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175839248
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +316,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
--- End diff --

Ok, do you mean this way? Looks like we need more words :).
```
implicit class StringToDate(s: String) {
  def d: Date = Date.valueOf(s)
}

withParquetDataFrame(
  Seq("2017-08-18", "2017-08-19", "2017-08-20", "2017-08-21").map(i => 
Tuple1(i.d))) {
implicit df =>
  checkFilterPredicate('_1.isNull, classOf[Eq[_]], Seq.empty[Row])
  checkFilterPredicate('_1.isNotNull, classOf[NotEq[_]],
Seq("2017-08-18", "2017-08-19", "2017-08-20", 
"2017-08-21").map(i => Row.apply(i.d)))

  checkFilterPredicate('_1 === "2017-08-18".d, classOf[Eq[_]], 
"2017-08-18".d)
  checkFilterPredicate('_1 =!= "2017-08-18".d, classOf[NotEq[_]],
Seq("2017-08-19", "2017-08-20", "2017-08-21").map(i => 
Row.apply(i.d)))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175687977
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +316,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
--- End diff --

Either way looks fine but I usually stick to what other codes do around the 
fix though ..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-20 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175684322
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +316,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
--- End diff --

How about this way? It is from ORC's date push down.
```
  test("filter pushdown - date") {
val dates = Seq("2017-08-18", "2017-08-19", "2017-08-20", 
"2017-08-21").map { day =>
  Date.valueOf(day)
}
withOrcDataFrame(dates.map(Tuple1(_))) { implicit df =>
  checkFilterPredicate('_1.isNull, PredicateLeaf.Operator.IS_NULL)

  checkFilterPredicate('_1 === dates(0), PredicateLeaf.Operator.EQUALS)
  checkFilterPredicate('_1 <=> dates(0), 
PredicateLeaf.Operator.NULL_SAFE_EQUALS)

  checkFilterPredicate('_1 < dates(1), PredicateLeaf.Operator.LESS_THAN)
  checkFilterPredicate('_1 > dates(2), 
PredicateLeaf.Operator.LESS_THAN_EQUALS)
  checkFilterPredicate('_1 <= dates(0), 
PredicateLeaf.Operator.LESS_THAN_EQUALS)
  checkFilterPredicate('_1 >= dates(3), 
PredicateLeaf.Operator.LESS_THAN)

  checkFilterPredicate(Literal(dates(0)) === '_1, 
PredicateLeaf.Operator.EQUALS)
  checkFilterPredicate(Literal(dates(0)) <=> '_1, 
PredicateLeaf.Operator.NULL_SAFE_EQUALS)
  checkFilterPredicate(Literal(dates(1)) > '_1, 
PredicateLeaf.Operator.LESS_THAN)
  checkFilterPredicate(Literal(dates(2)) < '_1, 
PredicateLeaf.Operator.LESS_THAN_EQUALS)
  checkFilterPredicate(Literal(dates(0)) >= '_1, 
PredicateLeaf.Operator.LESS_THAN_EQUALS)
  checkFilterPredicate(Literal(dates(3)) <= '_1, 
PredicateLeaf.Operator.LESS_THAN)
}
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175651639
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -72,6 +82,14 @@ private[parquet] object ParquetFilters {
   (n: String, v: Any) => FilterApi.notEq(
 binaryColumn(n),
 Option(v).map(b => 
Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull)
+case DateType if SQLConf.get.parquetFilterPushDownDate =>
+  (n: String, v: Any) => {
--- End diff --

nit:

```
(n: String, v: Any) => FilterApi.notEq(
 intColumn(n),
 Option(v).map { d =>
   
DateTimeUtils.fromJavaDate(d.asInstanceOf[java.sql.Date]).asInstanceOf[Integer]
 }.orNull)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175649724
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -76,7 +77,9 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
   expected: Seq[Row]): Unit = {
 val output = predicate.collect { case a: Attribute => a }.distinct
 
-withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true") {
+withSQLConf(
+  SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true",
+  SQLConf.PARQUET_FILTER_PUSHDOWN_DATE_ENABLED.key -> "true") {
--- End diff --

nit: 
```scala
withSQLConf(
  SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true",
  SQLConf.PARQUET_FILTER_PUSHDOWN_DATE_ENABLED.key -> "true",
  SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") {
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175650228
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +316,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
--- End diff --

Shall we pass a string here and convert it into a date?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-18 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175342992
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,12 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
--- End diff --

@gatorsmile any idea?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-18 Thread manku-timma
Github user manku-timma commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175327686
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -72,6 +82,15 @@ private[parquet] object ParquetFilters {
   (n: String, v: Any) => FilterApi.notEq(
 binaryColumn(n),
 Option(v).map(b => 
Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull)
+case DateType if SQLConf.get.parquetFilterPushDownDate =>
+  (n: String, v: Any) => {
+FilterApi.notEq(
+  intColumn(n),
+  Option(v).map { date =>
--- End diff --

These 3 lines are repeated several times in this change. Please refactor to 
a new method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-18 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175317152
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -50,6 +51,15 @@ private[parquet] object ParquetFilters {
   (n: String, v: Any) => FilterApi.eq(
 binaryColumn(n),
 Option(v).map(b => 
Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull)
+case DateType if SQLConf.get.parquetFilterPushDownDate =>
+  (n: String, v: Any) => {
+FilterApi.eq(
+  intColumn(n),
+  Option(v).map { date =>
+val days = date.asInstanceOf[java.sql.Date].getTime / (24 * 60 
* 60 * 1000)
+days.toInt.asInstanceOf[Integer]
--- End diff --

Use `DateTimeUtils.fromJavaDate` here and below?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-18 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175320790
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,12 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
--- End diff --

Should be an internal conf, i.e., `.internal()`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-18 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175293032
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -148,6 +193,15 @@ private[parquet] object ParquetFilters {
 case BinaryType =>
   (n: String, v: Any) =>
 FilterApi.gtEq(binaryColumn(n), 
Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]]))
+case DateType =>
--- End diff --

Have added, kindly help review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-18 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175293026
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +314,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
+  def d: Date = new Date(Date.valueOf("2018-03-01").getTime + 24 * 60 
* 60 * 1000 * (int - 1))
+}
+
+withParquetDataFrame((1 to 4).map(i => Tuple1(i.d))) { implicit df =>
--- End diff --

Could you kindly give me some examples about what kind of boundary tests? I 
checked parquet integer push down and ORC date type push down, seems like have 
covered all their tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org