[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-04-21 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at:

https://github.com/apache/spark/pull/10502


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-04-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-213244434
  
Let me close this for now. Please let me know although this is closed. I 
will reopen this when I start to work on this again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-25 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-189130651
  
yea will do. Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-189119103
  
@yhuai No problem. Then, please inform me when I am supposed to do 
something else.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-25 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-189116088
  
@HyukjinKwon Thank you for working on it. We have been actively improving 
the efficiency of code-gen and unsafe parquet reader. For the long term, 
letting parquet to evaluate filters like `a=10` row by row is not as efficient 
as using the code gen version of Spark filter. We do plan to let unsafe parquet 
reader to evaluate very cheap predicates like `IsNotNull` and `IsNull` for 
every row. However, before it happens, can we hold off the work on this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-189100946
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-189100947
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52014/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-189100796
  
**[Test build #52014 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52014/consoleFull)**
 for PR 10502 at commit 
[`0e149da`](https://github.com/apache/spark/commit/0e149dad44d1787f30122795961a97fd44c6761c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-189080574
  
**[Test build #52014 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52014/consoleFull)**
 for PR 10502 at commit 
[`0e149da`](https://github.com/apache/spark/commit/0e149dad44d1787f30122795961a97fd44c6761c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-182688789
  
@yhuai Would you look through this please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-182686261
  
**[Test build #51075 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51075/consoleFull)**
 for PR 10502 at commit 
[`ad757be`](https://github.com/apache/spark/commit/ad757beb5922350a7db556203d05a2c84f7f7fe1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-182686749
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51075/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-182686745
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-182672430
  
**[Test build #51075 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51075/consoleFull)**
 for PR 10502 at commit 
[`ad757be`](https://github.com/apache/spark/commit/ad757beb5922350a7db556203d05a2c84f7f7fe1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-182670583
  
@yhuai Would you look through this please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-180324662
  
I will resolve the conflict on Monday.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-01-11 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-170468892
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-01-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-170470378
  
**[Test build #49122 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49122/consoleFull)**
 for PR 10502 at commit 
[`dfc7506`](https://github.com/apache/spark/commit/dfc7506fc579bcd4539f2ab26aebaa802ed5dc17).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-01-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-170467935
  
**[Test build #49115 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49115/consoleFull)**
 for PR 10502 at commit 
[`dfc7506`](https://github.com/apache/spark/commit/dfc7506fc579bcd4539f2ab26aebaa802ed5dc17).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-170501213
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49115/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-170501209
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-01-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-170500927
  
**[Test build #49115 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49115/consoleFull)**
 for PR 10502 at commit 
[`dfc7506`](https://github.com/apache/spark/commit/dfc7506fc579bcd4539f2ab26aebaa802ed5dc17).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-01-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-170542317
  
**[Test build #49122 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49122/consoleFull)**
 for PR 10502 at commit 
[`dfc7506`](https://github.com/apache/spark/commit/dfc7506fc579bcd4539f2ab26aebaa802ed5dc17).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-170542473
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49122/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-170542471
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-30 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/10502#discussion_r48591978
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -208,11 +210,30 @@ private[sql] object ParquetFilters {
   }
 
   /**
+   *  Return referenced columns in [[sources.Filter]].
+   */
+  def referencedColumns(schema: StructType, predicate: sources.Filter): 
Array[String] = {
--- End diff --

Agreed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-30 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/10502#discussion_r48592707
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -208,11 +210,30 @@ private[sql] object ParquetFilters {
   }
 
   /**
+   *  Return referenced columns in [[sources.Filter]].
+   */
+  def referencedColumns(schema: StructType, predicate: sources.Filter): 
Array[String] = {
+val dataTypeOf = schema.map(f => f.name -> f.dataType).toMap
+val referencedColumns = ArrayBuffer.empty[String]
+def getDataTypeOf(name: String): DataType = {
+  referencedColumns += name
+  dataTypeOf(name)
+}
+createParquetFilter(getDataTypeOf, predicate)
--- End diff --

Understood though, I think easy-to-read is better.
This entry is a little hacky and I was not clearly sure what it made.
Anyway, we should follow comitters comments.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-30 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/10502#discussion_r48593431
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala
 ---
@@ -288,20 +293,28 @@ private[sql] class ParquetRelation(
 }
   }
 
+  override def unhandledFilters(filters: Array[Filter]): Array[Filter] = {
+// The unsafe row RecordReader does not support row by row filtering 
so for this case
+// it should wrap this with Spark-side filtering.
+val enableUnsafeRowParquetReader =
+  
sqlContext.getConf(SQLConf.PARQUET_UNSAFE_ROW_RECORD_READER_ENABLED.key).toBoolean
--- End diff --

Since `SQLConf.PARQUET_UNSAFE_ROW_RECORD_READER_ENABLED` is internally true 
by default, this `unhandledFilters` always returns `filters` as it is. So, this 
optimization is little meaningful to end users.
IMO if `UnsafeRowParquetRecordReader` supports row-by-row filtering, then 
we implement this.
Though?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/10502#discussion_r48593890
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala
 ---
@@ -288,20 +293,28 @@ private[sql] class ParquetRelation(
 }
   }
 
+  override def unhandledFilters(filters: Array[Filter]): Array[Filter] = {
+// The unsafe row RecordReader does not support row by row filtering 
so for this case
+// it should wrap this with Spark-side filtering.
+val enableUnsafeRowParquetReader =
+  
sqlContext.getConf(SQLConf.PARQUET_UNSAFE_ROW_RECORD_READER_ENABLED.key).toBoolean
--- End diff --

Hm.. I think we can implement this for now and then simply change the 
condition when `UnsafeRowParquetRecordReader` supports filtering row by row.

The option for Parquet filter push down was also false by default but it 
was implemented. I don't think a default value can deicde if a feature should 
be implemented or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/10502#discussion_r48603240
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -208,11 +210,30 @@ private[sql] object ParquetFilters {
   }
 
   /**
+   *  Return referenced columns in [[sources.Filter]].
+   */
+  def referencedColumns(schema: StructType, predicate: sources.Filter): 
Array[String] = {
+val dataTypeOf = schema.map(f => f.name -> f.dataType).toMap
+val referencedColumns = ArrayBuffer.empty[String]
+def getDataTypeOf(name: String): DataType = {
+  referencedColumns += name
+  dataTypeOf(name)
+}
+createParquetFilter(getDataTypeOf, predicate)
--- End diff --

For me I think I would not be sure why `createFilter` retuens filters with 
referenced columns. I see `OrcFilters.createFilter` returns only filters and I 
do not think `Parquet.createFilter` returning multiple objects is eazy to read.

If you think we don't have to run the function twice  and the 
implementation is hacky, please suggest possible implementation.

 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-29 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167835640
  
@HyukjinKwon Thank you for the PR? Can you post some benchmarking results 
(with your testing code)? It will be good to have these numbers to help others 
understand if it can provide benefit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167895015
  
@yhuai Sure. I will try!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-29 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/10502#discussion_r48590244
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -208,11 +210,30 @@ private[sql] object ParquetFilters {
   }
 
   /**
+   *  Return referenced columns in [[sources.Filter]].
+   */
+  def referencedColumns(schema: StructType, predicate: sources.Filter): 
Array[String] = {
+val dataTypeOf = schema.map(f => f.name -> f.dataType).toMap
+val referencedColumns = ArrayBuffer.empty[String]
+def getDataTypeOf(name: String): DataType = {
+  referencedColumns += name
+  dataTypeOf(name)
+}
+createParquetFilter(getDataTypeOf, predicate)
--- End diff --

We need this here? ISTM that `ParquetFilters.createFilter' calls 
`createParquetFilter' again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/10502#discussion_r48590599
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -208,11 +210,30 @@ private[sql] object ParquetFilters {
   }
 
   /**
+   *  Return referenced columns in [[sources.Filter]].
+   */
+  def referencedColumns(schema: StructType, predicate: sources.Filter): 
Array[String] = {
--- End diff --

Oh, yes it looks so. I think I might also have to change `createFilter()` 
in that way because I just followed up in the way of `createFilter()` for this 
function because both `createFilter()` and `referencedColumns()` are called in 
the same places. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167933518
  
# Benchmark (Removed Spark-side Filter)

## Motivation

This PR simplifies the query plans for Parquet files by stripping 
duplicated Spark-side filtering,

from:

```
== Physical Plan ==
Filter (a#8 = 2)
  +- Scan ParquetRelation[a#8] InputPaths: 
file:/private/var/folders/9j/gf_c342d7d150mwrxvkqnc18gn/T/spark-ef271ec6-95e1-43ae-9b3e-1d4dae6f69c3/part=1,
 PushedFilters: [EqualTo(a,2)]
```

to :

```
== Physical Plan ==
Scan ParquetRelation[a#8] InputPaths: 
file:/private/var/folders/9j/gf_c342d7d150mwrxvkqnc18gn/T/spark-ef271ec6-95e1-43ae-9b3e-1d4dae6f69c3/part=1,
 PushedFilters: [EqualTo(a,2)]
```

However, in terms of performance, it is unkown if there is benefit. So, 
this benchmark was performed.
Simply, several quries were executed with duplicated Spark-side filtering 
and without it.

## Environment

- Machine: MacBook Pro Retina
- CPU: 4
- Memory: 8G

## Method

In this benchmark, 
 - The filters that Parquet can generate were composed with 9 simple 
quries. `IS NULL`, `IS NOT NULL`, `=`, `!=`, `<=>`, `<`, `>`, `<=` and `>=` 
operators were tested here. Each query was executed 5 times and calculated the 
average times.
 - The [commit right before this 
PR](https://github.com/apache/spark/commit/73862a1) was used against this PR. 
Namely, The tests were performed with Spark-side filtering and without 
Spark-side filtering.
 - In order to test cleanly, the target Parquet file was uncompressed and 
in order to enable Parquet filtering row by row, 
`spark.sql.parquet.enableUnsafeRowRecordReader` was disabled.


## Dataset
 Raw Data
- [TPC-H](http://www.tpc.org/tpch/) Lineitem Table created with factor 1 
([generate data](https://github.com/rxin/TPC-H-Hive/tree/master/dbgen)) 
- Size : 724.66 MB

 Create Target Parquet File
- `case` class for Lineitem table
```scala
case class Lineitem(l_orderkey: Int,
l_partkey: Int,
l_suppkey: Int,
l_linenumber: Int,
l_quantity: Float,
l_extendedprice: Float,
l_discount: Float,
l_tax: Float,
l_returnflag: String,
l_linestatus: String,
l_shipdate: String,
l_commitdate: String,
l_receiptdate: String,
l_shipinstruct: String,
l_shipmode: String,
l_comment: String)
```

- Create Parquet file
```scala
import sqlContext.implicits._
var conf = new SparkConf()
conf.setAppName("Test").setMaster("local")
conf.set("spark.sql.parquet.compression.codec", "uncompressed")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
sc.textFile("lineitem.tbl").map(_.split('|')).map { v =>
  Lineitem(
v(0).trim.toInt,
v(1).trim.toInt,
v(2).trim.toInt,
v(3).trim.toInt,
v(4).trim.toFloat,
v(5).trim.toFloat,
v(6).trim.toFloat,
v(7).trim.toFloat,
v(8),
v(9),
v(10),
v(11),
v(12),
v(13),
v(14),
v(15))
}.toDF()
df.save("lineitem", "parquet")
```

 Parquet file

```
-rw-r--r--  hyukjinkwon hyukjinkwon 0 B _SUCCESS
-rw-r--r--  hyukjinkwon hyukjinkwon 1.54 KB _common_metadata
-rw-r--r--  hyukjinkwon hyukjinkwon 13.72 KB_metadata
-rw-r--r--  hyukjinkwon hyukjinkwon 52.16 MB
part-r-0-2aa739b2-6194-47db-9dd5-461905cea976.parquet
-rw-r--r--  hyukjinkwon hyukjinkwon 51.74 MB
part-r-1-2aa739b2-6194-47db-9dd5-461905cea976.parquet
-rw-r--r--  hyukjinkwon hyukjinkwon 51.97 MB
part-r-2-2aa739b2-6194-47db-9dd5-461905cea976.parquet
-rw-r--r--  hyukjinkwon hyukjinkwon 51.71 MB
part-r-3-2aa739b2-6194-47db-9dd5-461905cea976.parquet
-rw-r--r--  hyukjinkwon hyukjinkwon 51.89 MB
part-r-4-2aa739b2-6194-47db-9dd5-461905cea976.parquet
-rw-r--r--  hyukjinkwon hyukjinkwon 34.25 MB
part-r-5-2aa739b2-6194-47db-9dd5-461905cea976.parquet
```

### Test Codes

- Function to measure time
```scala
def time[A](f: => A) = {
  val s = System.nanoTime
  val ret = f
  println("time: "+(System.nanoTime-s)/1e6+"ms")
  ret
}
```

- Configuration and `SQLContext`
```scala
var conf = new SparkConf()
conf.setAppName("Test").setMaster("local")
conf.set("spark.sql.parquet.enableUnsafeRowRecordReader", "false")
val sc = new SparkContext(conf)
val 

[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167933544
  
@yhuai Would you look through this please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-29 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/10502#discussion_r48589974
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -208,11 +210,30 @@ private[sql] object ParquetFilters {
   }
 
   /**
+   *  Return referenced columns in [[sources.Filter]].
+   */
+  def referencedColumns(schema: StructType, predicate: sources.Filter): 
Array[String] = {
--- End diff --

Better to add `private[parquet]`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/10502#discussion_r48590538
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -208,11 +210,30 @@ private[sql] object ParquetFilters {
   }
 
   /**
+   *  Return referenced columns in [[sources.Filter]].
+   */
+  def referencedColumns(schema: StructType, predicate: sources.Filter): 
Array[String] = {
+val dataTypeOf = schema.map(f => f.name -> f.dataType).toMap
+val referencedColumns = ArrayBuffer.empty[String]
+def getDataTypeOf(name: String): DataType = {
+  referencedColumns += name
+  dataTypeOf(name)
+}
+createParquetFilter(getDataTypeOf, predicate)
--- End diff --

`createParquetFilter ` needs to be called to have the list of referenced 
columns while creating filters.

If you meant `ParquetFilters.createFilter` should be called once as 
returning both the columns and filter, I originally wanted to do so but I 
realised calling twice like this would reduce the changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167750147
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48405/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167750144
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167750071
  
**[Test build #48405 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48405/consoleFull)**
 for PR 10502 at commit 
[`2ad8182`](https://github.com/apache/spark/commit/2ad81822100229318d1d80796edb92916a10d820).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167750403
  
cc @yhuai @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/10502

[SPARK-12355][SQL] Implement unhandledFilter interface for Parquet

https://issues.apache.org/jira/browse/SPARK-12355
This is similar with https://github.com/apache/spark/pull/10427.

As discussed here https://github.com/apache/spark/pull/10221, this PR 
implemented `unhandledFilter` to remove duplicated Spark-side filtering.

In case of Parquet, the columns referenced in pushed down filters should be 
given to `org.apache.spark.sql.parquet.row.requested_schema` whereas general 
datasources such as JDBC do not require the columns.

However, the current interface of `DataSourceStrategy` removes the columns 
only referenced in pushed down filters. Therefore, this PR resolved this 
problem by manually generating the columns referenced in pushed down filters.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-12355

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10502.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10502


commit a55ad54a820f095e5116df05979f169c0fe8e0cf
Author: hyukjinkwon 
Date:   2015-12-29T04:34:53Z

Implement unhandled filters for Parquet

commit cf331a453c3f99ee40ee5ca6f5029dadee3d07f6
Author: hyukjinkwon 
Date:   2015-12-29T04:36:42Z

Correct existing tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167719205
  
**[Test build #48396 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48396/consoleFull)**
 for PR 10502 at commit 
[`cf331a4`](https://github.com/apache/spark/commit/cf331a453c3f99ee40ee5ca6f5029dadee3d07f6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167720172
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48396/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167720147
  
**[Test build #48396 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48396/consoleFull)**
 for PR 10502 at commit 
[`cf331a4`](https://github.com/apache/spark/commit/cf331a453c3f99ee40ee5ca6f5029dadee3d07f6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167720165
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167722257
  
The test is failed from wrong results from Parquet.

The test result was below:
```
== Physical Plan ==
Scan ParquetRelation[_1#4] InputPaths: 
file:/private/var/folders/9j/gf_c342d7d150mwrxvkqnc18gn/T/spark-2fa8f75a-8ac1-4273-afcd-f271529a24a7,
 PushedFilters: [GreaterThan(_1,5)]
```
```
== Results ==
!== Correct Answer - 4 ==   == Spark Answer - 5 ==
![6][5]
![7][6]
![8][7]
![9][8]
!   [9]
```

As can be seen, `GreaterThan(_1,5)` is pushed down correctly but the 
results includes `5`.
I am looking into this deeper.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167729051
  
I see. `UnsafeRowParquetRecordReader` at Parquet does not support filter 
record by record but just block. So, even with `=` operator produces the same 
results below. 

```
+---+
| _1|
+---+
|  5|
|  6|
|  7|
|  8|
|  9|
+---+
```

I think I should disable `unhandledFilters` when 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167733162
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167733908
  
**[Test build #48403 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48403/consoleFull)**
 for PR 10502 at commit 
[`8c376af`](https://github.com/apache/spark/commit/8c376af467b939d3687e24f7d0e1ccf4e0ea5fca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167736506
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48403/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167736500
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167736372
  
**[Test build #48403 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48403/consoleFull)**
 for PR 10502 at commit 
[`8c376af`](https://github.com/apache/spark/commit/8c376af467b939d3687e24f7d0e1ccf4e0ea5fca).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167737190
  
**[Test build #48402 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48402/consoleFull)**
 for PR 10502 at commit 
[`8c376af`](https://github.com/apache/spark/commit/8c376af467b939d3687e24f7d0e1ccf4e0ea5fca).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167737265
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48402/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167737261
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167734521
  
**[Test build #48402 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48402/consoleFull)**
 for PR 10502 at commit 
[`8c376af`](https://github.com/apache/spark/commit/8c376af467b939d3687e24f7d0e1ccf4e0ea5fca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2015-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10502#issuecomment-167738930
  
**[Test build #48405 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48405/consoleFull)**
 for PR 10502 at commit 
[`2ad8182`](https://github.com/apache/spark/commit/2ad81822100229318d1d80796edb92916a10d820).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org