[ 
https://issues.apache.org/jira/browse/SPARK-13908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202604#comment-15202604
 ] 

Liang-Chi Hsieh edited comment on SPARK-13908 at 3/19/16 7:32 AM:
------------------------------------------------------------------

Rethink this issue, I think it should not be related to pushdown of limit. 
Because the latest CollectLimit only takes few rows (here is only 1 row) from 
the iterator of data, it should not scan all the data.


was (Author: viirya):
Rethink this issue, I think it should not related to pushdown of limit. Because 
the latest CollectLimit only takes few rows (here is only 1 row) from the 
iterator of data, it should not scan all the data.

> Limit not pushed down
> ---------------------
>
>                 Key: SPARK-13908
>                 URL: https://issues.apache.org/jira/browse/SPARK-13908
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>         Environment: Spark compiled from git with commit 53ba6d6
>            Reporter: Luca Bruno
>              Labels: performance
>
> Hello,
> I'm doing a simple query like this on a single parquet file:
> {noformat}
> SELECT *
> FROM someparquet
> LIMIT 1
> {noformat}
> The someparquet table is just a parquet read and registered as temporary 
> table.
> The query takes as much time (minutes) as it would by scanning all the 
> records, instead of just taking the first record.
> Using parquet-tools head is instead very fast (seconds), hence I guess it's a 
> missing optimization opportunity from spark.
> The physical plan is the following:
> {noformat}
> == Physical Plan ==                                                           
>   
> CollectLimit 1
> +- WholeStageCodegen
>    :  +- Scan ParquetFormat part: struct<>, data: struct<........>[...] 
> InputPaths: hdfs://...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to