[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-16 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
@liancheng Thanks! I didn't notice that. I will rerun the benchmark. I've 
re-submitted this PR at #13701.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-16 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13371
  
@viirya One problem in your new benchmark code is that `1 << 50` is 
actually very small since it's an `Int`:

```
scala> 1 << 50
res0: Int = 262144
```

Anyway, `1 << 50`, which is 1PB, might be too large a value for such a 
microbenchmark :)

So the generated Parquet file probably only contains a single row group, I 
guess that's why the numbers are quite close no matter you enable row group 
filter push-down or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-15 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/13371
  
Yea. Since this one was closed by asfgit, I am not sure you can reopen it.





On Wed, Jun 15, 2016 at 7:39 PM -0700, "Liang-Chi Hsieh" 
 wrote:












@yhuai ok. Do you mean I need to create a new PR for this?



—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.


  
  











---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-15 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
@yhuai ok. Do you mean I need to create a new PR for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-14 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/13371
  
Can you add results showing that there are skipped row groups with this 
change (and before this patch all row groups are loaded)?

For those results, let's also put them in the description of the new PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-14 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
@liancheng 

I rerun the benchmark that excludes the time of writing Parquet file:

test("Benchmark for Parquet") {
  val N = 1 << 50
withParquetTable((0 until N).map(i => (101, i)), "t") {
  val benchmark = new Benchmark("Parquet reader", N)
  benchmark.addCase("reading Parquet file", 10) { iter =>
sql("SELECT _1 FROM t where t._1 < 100").collect()
  }
  benchmark.run()
  }
}

`withParquetTable` in default will run tests for vectorized reader 
non-vectorized readers. I only let it run vectorized reader.

After this patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_25-b17 on Linux 
3.13.0-57-generic
Westmere E56xx/L56xx/X56xx (Nehalem-C)
Parquet reader:  Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


reading Parquet file76 /   88  3.4  
   291.0   1.0X

Before this patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_25-b17 on Linux 
3.13.0-57-generic
Westmere E56xx/L56xx/X56xx (Nehalem-C)
Parquet reader:  Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


reading Parquet file81 /   91  3.2  
   310.2   1.0X

Next, I run the benchmark for non-pushdown case using the same benchmark 
code but with disabled pushdown configuration.

After this patch:

Parquet reader:  Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


reading Parquet file80 /   95  3.3  
   306.5   1.0X

Before this patch:

Parquet reader:  Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


reading Parquet file80 /  103  3.3  
   306.7   1.0X

For non-pushdown case, from the results, I think this patch doesn't affect 
normal code path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
@liancheng Got it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-10 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13371
  
Reverted from master and branch-2.0.

@viirya For the benchmark, there are two things:

1. The benchmark also counts Parquet file writing into it, so the real 
number should be much better than the posted one.
2. We should also benchmark for cases where no filters are pushed down to 
verify that this patch doesn't affect normal code path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
@rxin One thing needs to be explain is, because we just have one 
configuration to control filter push down, it affects row-based filter push 
down and this row-group filter push down.

The benchmark I posted above is running it against this patch and master 
branch individually. Of course it includes the time to write the parquet data, 
I will change it. I want to confirm if this kind of benchmark is enough?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-10 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13371
  
And once we have more data, it might make sense to merge this in 2.0!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-10 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13371
  
To be more clear, please write a proper benchmark that reads data when 
filter push down is not useful to compare whether this regress performance for 
the non-push-down case. Also make sure the benchmark does not include the time 
it takes to write the parquet data.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-10 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13371
  
I just talked to @liancheng offline. I don't think we should've merged this 
until we have verified there is no performance regression, and we definitely 
shouldn't have merged this in 2.0.

@liancheng can you revert this from both master and branch-2.0?

@viirya can you run some parquet scan benchmark and make sure this does not 
result in perf regression?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-10 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13371
  
@yhuai We used to support row group level filter push-down before 
refactoring `HadoopFsRelation` into `FileFormat`, but lost it (by accident I 
guess) after the refactoring. So now we only have row group level filtering 
when the vectorized reader is not used, [see here][1].

And yes, both `ParquetInputFormat` and `ParquetRecordReader` do row group 
level filtering.

This LGTM. Thanks for fixing it! Merging to master and 2.0.

[1]: 
https://github.com/apache/spark/blob/54f758b5fc60ecb0da6b191939a72ef5829be38c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L371-L378


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
@yhuai Your step 3 may not work. We are going to filter the row groups for 
each parquet file to read in `VectorizedParquetRecordReader`. I think we don't 
do anything regarding creating splits?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
@yhuai Parquet also does this filtering at ParquetRecordReader 
(https://github.com/apache/parquet-mr/blob/4b1ff8f4b9dfa0ccb064ef286cf2953bfb2c492d/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordReader.java#L178)
 and 
ParquetReader(https://github.com/apache/parquet-mr/blob/4b1ff8f4b9dfa0ccb064ef286cf2953bfb2c492d/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetReader.java#L145).

In Spark, we also did this at SpecificParquetRecordReaderBase 
(https://github.com/apache/spark/blob/f958c1c3e292aba98d283637606890f353a9836c/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java#L103).

I've manually tested it. But it should be good to have a formal test case 
for it as you said. I will try to add it later, maybe when I come back to work 
few days later...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/13371
  
@viirya I took a look at parquet's code. Seems parquet only evaluate row 
group level filters when generating splits 
(https://github.com/apache/parquet-mr/blob/apache-parquet-1.7.0/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputFormat.java#L673).
 With FileSourceStrategy in Spark, I am not sure we will actually evaluate 
filter unneeded row groups as expected. Can you take a look? Also, it will be 
great if you can have a test to make sure that we actually can skip unneeded 
row groups. This test can be created as follows.

1. We first write a parquet file containing multiple row groups. Also, 
let's that there is a column `c` and those row groups have disjoint ranges of 
`c`'s values.
2. We write a query having a filter on `c` and we make sure that this query 
only need a subset of row groups.
3. We verify that we only create splits for the needed row groups.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13371
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13371
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60256/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13371
  
**[Test build #60256 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60256/consoleFull)**
 for PR 13371 at commit 
[`077f7f8`](https://github.com/apache/spark/commit/077f7f8813a76d38c8a6d898ec54e401c91b6014).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13371
  
**[Test build #60256 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60256/consoleFull)**
 for PR 13371 at commit 
[`077f7f8`](https://github.com/apache/spark/commit/077f7f8813a76d38c8a6d898ec54e401c91b6014).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
The description is updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
It is not really a bug fix because without this filtering push-down, the 
thing still works. This should be a performance fix. I should modify the 
description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13371
  
Is this a bug fix or performance fix? Sorry I don't really understand after 
reading your description.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13371
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13371
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60246/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13371
  
**[Test build #60246 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60246/consoleFull)**
 for PR 13371 at commit 
[`077f7f8`](https://github.com/apache/spark/commit/077f7f8813a76d38c8a6d898ec54e401c91b6014).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13371
  
**[Test build #60246 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60246/consoleFull)**
 for PR 13371 at commit 
[`077f7f8`](https://github.com/apache/spark/commit/077f7f8813a76d38c8a6d898ec54e401c91b6014).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
ping @yhuai @rxin @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
cc @cloud-fan too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
cc @rxin Can you also take a look of this? This is staying for a while too. 
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
ping @yhuai again


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-03 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
ping @yhuai I've addressed the comments. Please take a look again. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-02 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
@yhuai I've run a simple benchmark as following:

test("Benchmark for Parquet") {
  val N = 1 << 20

  val benchmark = new Benchmark("Parquet reader", N)
  benchmark.addCase("reading Parquet file", 1) { iter =>
withParquetTable((0 until N).map(i => (101, i)), "t") {
  sql("SELECT _1 FROM t where t._1 < 100").show()
}
  }
  benchmark.run()
}

Before this patch:

Parquet reader:  Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


reading Parquet file34225 / 34225  0.0  
 32639.5   1.0X

After this patch:

Parquet reader:  Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


reading Parquet file31350 / 31350  0.0  
 29897.6   1.0X




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org