[GitHub] spark pull request #23004: [SPARK-26004][SQL] InMemoryTable support StartsWi...

2018-11-19 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/23004#discussion_r234857747
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
@@ -237,6 +237,13 @@ case class InMemoryTableScanExec(
   if list.forall(ExtractableLiteral.unapply(_).isDefined) && 
list.nonEmpty =>
   list.map(l => statsFor(a).lowerBound <= l.asInstanceOf[Literal] &&
 l.asInstanceOf[Literal] <= statsFor(a).upperBound).reduce(_ || _)
+
+case StartsWith(a: AttributeReference, ExtractableLiteral(l)) =>
--- End diff --

Added to pr description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23004: [SPARK-26004][SQL] InMemoryTable support StartsWi...

2018-11-13 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/23004#discussion_r233272718
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
@@ -237,6 +237,13 @@ case class InMemoryTableScanExec(
   if list.forall(ExtractableLiteral.unapply(_).isDefined) && 
list.nonEmpty =>
   list.map(l => statsFor(a).lowerBound <= l.asInstanceOf[Literal] &&
 l.asInstanceOf[Literal] <= statsFor(a).upperBound).reduce(_ || _)
+
+case StartsWith(a: AttributeReference, ExtractableLiteral(l)) =>
+  statsFor(a).lowerBound.substr(0, Length(l)) <= l &&
+l <= statsFor(a).upperBound.substr(0, Length(l))
+case StartsWith(ExtractableLiteral(l), a: AttributeReference) =>
--- End diff --

Good question, The last one should be removed, `DataSourceStrategy` has the 
same logic:  
https://github.com/apache/spark/blob/3d6b68b030ee85a0f639dd8e9b68aedf5f27b46f/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L512-L513


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23004: [SPARK-26004][SQL] InMemoryTable support StartsWi...

2018-11-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23004#discussion_r233033597
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
@@ -237,6 +237,13 @@ case class InMemoryTableScanExec(
   if list.forall(ExtractableLiteral.unapply(_).isDefined) && 
list.nonEmpty =>
   list.map(l => statsFor(a).lowerBound <= l.asInstanceOf[Literal] &&
 l.asInstanceOf[Literal] <= statsFor(a).upperBound).reduce(_ || _)
+
+case StartsWith(a: AttributeReference, ExtractableLiteral(l)) =>
+  statsFor(a).lowerBound.substr(0, Length(l)) <= l &&
+l <= statsFor(a).upperBound.substr(0, Length(l))
+case StartsWith(ExtractableLiteral(l), a: AttributeReference) =>
--- End diff --

same question


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23004: [SPARK-26004][SQL] InMemoryTable support StartsWi...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23004#discussion_r233012392
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
@@ -237,6 +237,13 @@ case class InMemoryTableScanExec(
   if list.forall(ExtractableLiteral.unapply(_).isDefined) && 
list.nonEmpty =>
   list.map(l => statsFor(a).lowerBound <= l.asInstanceOf[Literal] &&
 l.asInstanceOf[Literal] <= statsFor(a).upperBound).reduce(_ || _)
+
+case StartsWith(a: AttributeReference, ExtractableLiteral(l)) =>
+  statsFor(a).lowerBound.substr(0, Length(l)) <= l &&
+l <= statsFor(a).upperBound.substr(0, Length(l))
+case StartsWith(ExtractableLiteral(l), a: AttributeReference) =>
--- End diff --

BTW,  a.startswith(b) and b.startswith(a) are not same but why are they 
same here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23004: [SPARK-26004][SQL] InMemoryTable support StartsWi...

2018-11-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23004#discussion_r232945864
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
@@ -237,6 +237,13 @@ case class InMemoryTableScanExec(
   if list.forall(ExtractableLiteral.unapply(_).isDefined) && 
list.nonEmpty =>
   list.map(l => statsFor(a).lowerBound <= l.asInstanceOf[Literal] &&
 l.asInstanceOf[Literal] <= statsFor(a).upperBound).reduce(_ || _)
+
+case StartsWith(a: AttributeReference, ExtractableLiteral(l)) =>
--- End diff --

can you add some comment to explain it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23004: [SPARK-26004][SQL] InMemoryTable support StartsWi...

2018-11-10 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/23004

[SPARK-26004][SQL] InMemoryTable support StartsWith predicate push down

## What changes were proposed in this pull request?

[SPARK-24638](https://issues.apache.org/jira/browse/SPARK-24638) adds 
support for Parquet file `StartsWith` predicate push down.
`InMemoryTable` can also support this feature.


## How was this patch tested?

 unit tests and benchmark tests

benchmark test result:
```


Pushdown benchmark for StringStartsWith



Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
StringStartsWith filter: (value like '10%'): Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


InMemoryTable Vectorized12068 / 14198  1.3  
   767.3   1.0X
InMemoryTable Vectorized (Pushdown)   5457 / 8662  2.9  
   347.0   2.2X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
StringStartsWith filter: (value like '1000%'): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative


InMemoryTable Vectorized  5246 / 5355  3.0  
   333.5   1.0X
InMemoryTable Vectorized (Pushdown)   2185 / 2346  7.2  
   138.9   2.4X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
StringStartsWith filter: (value like '786432%'): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative


InMemoryTable Vectorized  5112 / 5312  3.1  
   325.0   1.0X
InMemoryTable Vectorized (Pushdown)   2292 / 2522  6.9  
   145.7   2.2X
```



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-26004

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23004.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23004


commit 7bbdb0713056f387e49cf3921a226554e9af5557
Author: Yuming Wang 
Date:   2018-11-11T03:56:36Z

InMemoryTable support StartsWith predicate push down




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org