[jira] [Comment Edited] (SPARK-19638) Filter pushdown not working for struct fields

2017-02-18 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873237#comment-15873237
 ] 

Takeshi Yamamuro edited comment on SPARK-19638 at 2/18/17 4:25 PM:
---

I found this ticket is duplicated to SPARK-17636, so I'll close as "Duplicated".


was (Author: maropu):
I found this ticket is duplicated to SPARK-17636, so I'll close as 
"Duplicated". Thanks.

> Filter pushdown not working for struct fields
> -
>
> Key: SPARK-19638
> URL: https://issues.apache.org/jira/browse/SPARK-19638
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Nick Dimiduk
>
> Working with a dataset containing struct fields, and enabling debug logging 
> in the ES connector, I'm seeing the following behavior. The dataframe is 
> created over the ES connector and then the schema is extended with a couple 
> column aliases, such as.
> {noformat}
> df.withColumn("f2", df("foo"))
> {noformat}
> Queries vs those alias columns work as expected for fields that are 
> non-struct members.
> {noformat}
> scala> df.withColumn("f2", df("foo")).where("f2 == '1'").limit(0).show
> 17/02/16 15:06:49 DEBUG DataSource: Pushing down filters 
> [IsNotNull(foo),EqualTo(foo,1)]
> 17/02/16 15:06:49 TRACE DataSource: Transformed filters into DSL 
> [{"exists":{"field":"foo"}},{"match":{"foo":"1"}}]
> {noformat}
> However, try the same with an alias over a struct field, and no filters are 
> pushed down.
> {noformat}
> scala> df.withColumn("bar_baz", df("bar.baz")).where("bar_baz == 
> '1'").limit(1).show
> {noformat}
> In fact, this is the case even when no alias is used at all.
> {noformat}
> scala> df.where("bar.baz == '1'").limit(1).show
> {noformat}
> Basically, pushdown for structs doesn't work at all.
> Maybe this is specific to the ES connector?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19638) Filter pushdown not working for struct fields

2017-02-17 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873021#comment-15873021
 ] 

Takeshi Yamamuro edited comment on SPARK-19638 at 2/18/17 6:52 AM:
---

Aha, I got you and you're right; in that case, catalyst does not push down such 
a condition.
{code}
scala> Seq((1, ("a", 0.1)), (2, ("b", 0.3))).toDF("a", 
"b").write.parquet("/Users/maropu/Desktop/data")

scala> val df = spark.read.load("/Users/maropu/Desktop/data")
df: org.apache.spark.sql.DataFrame = [a: int, b: struct<_1: string, _2: double>]

scala> df.where($"a" === 1).explain
== Physical Plan ==
*Project [a#108, b#109]
+- *Filter (isnotnull(a#108) && (a#108 = 1))
   +- *FileScan parquet [a#108,b#109] Batched: false, Format: Parquet, 
Location: InMemoryFileIndex[file:/Users/maropu/Desktop/data], PartitionFilters: 
[], PushedFilters: [IsNotNull(a), EqualTo(a,1)], ReadSchema: 
struct>

scala> df.where($"b._1" === "b").explain
== Physical Plan ==
*Filter (b#109._1 = b)
+- *FileScan parquet [a#108,b#109] Batched: false, Format: Parquet, Location: 
InMemoryFileIndex[file:/Users/maropu/Desktop/data], PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct>
{code}
BTW, this is not a bug, but improvement in the Type because this kind of 
queries does not return incorrect results.


was (Author: maropu):
Aha, I got you and you're right; in that case, catalyst does not push down such 
a condition.
{code}
scala> Seq((1, ("a", 0.1)), (2, ("b", 0.3))).toDF("a", 
"b").write.parquet("/Users/maropu/Desktop/data")
scala> val df = spark.read.load("/Users/maropu/Desktop/data")
df: org.apache.spark.sql.DataFrame = [a: int, b: struct<_1: string, _2: double>]
scala> df.where($"a" === 1).explain
== Physical Plan ==
*Project [a#108, b#109]
+- *Filter (isnotnull(a#108) && (a#108 = 1))
   +- *FileScan parquet [a#108,b#109] Batched: false, Format: Parquet, 
Location: InMemoryFileIndex[file:/Users/maropu/Desktop/data], PartitionFilters: 
[], PushedFilters: [IsNotNull(a), EqualTo(a,1)], ReadSchema: 
struct>
scala> df.where($"b._1" === "b").explain
== Physical Plan ==
*Filter (b#109._1 = b)
+- *FileScan parquet [a#108,b#109] Batched: false, Format: Parquet, Location: 
InMemoryFileIndex[file:/Users/maropu/Desktop/data], PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct>
{code}
BTW, this is not a bug, but improvement in the Type because this kind of 
queries does not return incorrect results.

> Filter pushdown not working for struct fields
> -
>
> Key: SPARK-19638
> URL: https://issues.apache.org/jira/browse/SPARK-19638
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Nick Dimiduk
>
> Working with a dataset containing struct fields, and enabling debug logging 
> in the ES connector, I'm seeing the following behavior. The dataframe is 
> created over the ES connector and then the schema is extended with a couple 
> column aliases, such as.
> {noformat}
> df.withColumn("f2", df("foo"))
> {noformat}
> Queries vs those alias columns work as expected for fields that are 
> non-struct members.
> {noformat}
> scala> df.withColumn("f2", df("foo")).where("f2 == '1'").limit(0).show
> 17/02/16 15:06:49 DEBUG DataSource: Pushing down filters 
> [IsNotNull(foo),EqualTo(foo,1)]
> 17/02/16 15:06:49 TRACE DataSource: Transformed filters into DSL 
> [{"exists":{"field":"foo"}},{"match":{"foo":"1"}}]
> {noformat}
> However, try the same with an alias over a struct field, and no filters are 
> pushed down.
> {noformat}
> scala> df.withColumn("bar_baz", df("bar.baz")).where("bar_baz == 
> '1'").limit(1).show
> {noformat}
> In fact, this is the case even when no alias is used at all.
> {noformat}
> scala> df.where("bar.baz == '1'").limit(1).show
> {noformat}
> Basically, pushdown for structs doesn't work at all.
> Maybe this is specific to the ES connector?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19638) Filter pushdown not working for struct fields

2017-02-17 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873021#comment-15873021
 ] 

Takeshi Yamamuro edited comment on SPARK-19638 at 2/18/17 6:53 AM:
---

Aha, I got you and you're right; in that case, catalyst does not push down such 
a condition.
{code}
scala> Seq((1, ("a", 0.1)), (2, ("b", 0.3))).toDF("a", 
"b").write.parquet("/Users/maropu/Desktop/data")

scala> val df = spark.read.load("/Users/maropu/Desktop/data")
df: org.apache.spark.sql.DataFrame = [a: int, b: struct<_1: string, _2: double>]

scala> df.where($"a" === 1).explain
== Physical Plan ==
*Project [a#108, b#109]
+- *Filter (isnotnull(a#108) && (a#108 = 1))
   +- *FileScan parquet [a#108,b#109] Batched: false, Format: Parquet, 
Location: InMemoryFileIndex[file:/Users/maropu/Desktop/data], PartitionFilters: 
[], PushedFilters: [IsNotNull(a), EqualTo(a,1)], ReadSchema: 
struct>

scala> df.where($"b._1" === "b").explain
== Physical Plan ==
*Filter (b#109._1 = b)
+- *FileScan parquet [a#108,b#109] Batched: false, Format: Parquet, Location: 
InMemoryFileIndex[file:/Users/maropu/Desktop/data], PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct>
{code}
BTW, this is not a bug, but improvement in the Type because this kind of 
queries does not return incorrect results.


was (Author: maropu):
Aha, I got you and you're right; in that case, catalyst does not push down such 
a condition.
{code}
scala> Seq((1, ("a", 0.1)), (2, ("b", 0.3))).toDF("a", 
"b").write.parquet("/Users/maropu/Desktop/data")

scala> val df = spark.read.load("/Users/maropu/Desktop/data")
df: org.apache.spark.sql.DataFrame = [a: int, b: struct<_1: string, _2: double>]

scala> df.where($"a" === 1).explain
== Physical Plan ==
*Project [a#108, b#109]
+- *Filter (isnotnull(a#108) && (a#108 = 1))
   +- *FileScan parquet [a#108,b#109] Batched: false, Format: Parquet, 
Location: InMemoryFileIndex[file:/Users/maropu/Desktop/data], PartitionFilters: 
[], PushedFilters: [IsNotNull(a), EqualTo(a,1)], ReadSchema: 
struct>

scala> df.where($"b._1" === "b").explain
== Physical Plan ==
*Filter (b#109._1 = b)
+- *FileScan parquet [a#108,b#109] Batched: false, Format: Parquet, Location: 
InMemoryFileIndex[file:/Users/maropu/Desktop/data], PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct>
{code}
BTW, this is not a bug, but improvement in the Type because this kind of 
queries does not return incorrect results.

> Filter pushdown not working for struct fields
> -
>
> Key: SPARK-19638
> URL: https://issues.apache.org/jira/browse/SPARK-19638
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Nick Dimiduk
>
> Working with a dataset containing struct fields, and enabling debug logging 
> in the ES connector, I'm seeing the following behavior. The dataframe is 
> created over the ES connector and then the schema is extended with a couple 
> column aliases, such as.
> {noformat}
> df.withColumn("f2", df("foo"))
> {noformat}
> Queries vs those alias columns work as expected for fields that are 
> non-struct members.
> {noformat}
> scala> df.withColumn("f2", df("foo")).where("f2 == '1'").limit(0).show
> 17/02/16 15:06:49 DEBUG DataSource: Pushing down filters 
> [IsNotNull(foo),EqualTo(foo,1)]
> 17/02/16 15:06:49 TRACE DataSource: Transformed filters into DSL 
> [{"exists":{"field":"foo"}},{"match":{"foo":"1"}}]
> {noformat}
> However, try the same with an alias over a struct field, and no filters are 
> pushed down.
> {noformat}
> scala> df.withColumn("bar_baz", df("bar.baz")).where("bar_baz == 
> '1'").limit(1).show
> {noformat}
> In fact, this is the case even when no alias is used at all.
> {noformat}
> scala> df.where("bar.baz == '1'").limit(1).show
> {noformat}
> Basically, pushdown for structs doesn't work at all.
> Maybe this is specific to the ES connector?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org