[jira] [Commented] (SPARK-12843) Spark should avoid scanning all partitions when limit is set

JIRA Thu, 21 Jan 2016 23:36:07 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-12843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15112052#comment-15112052
 ]


Maciej Bryński commented on SPARK-12843:
----------------------------------------

Let's assume that I have a big table.
200 partitions and few billions of records.

I want to get only 100 records, so I use a query
{code}
select * from table limit 100
{code}
Every partition has more than 100 records so it should be enough to scan one of 
them.
Unfortunately Spark scans all of them and after all returns result.
That's my issue.


> Spark should avoid scanning all partitions when limit is set
> ------------------------------------------------------------
>
>                 Key: SPARK-12843
>                 URL: https://issues.apache.org/jira/browse/SPARK-12843
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>            Reporter: Maciej Bryński
>
> SQL Query:
> {code}
> select * from table limit 100
> {code}
> force Spark to scan all partition even when data are available on the 
> beginning of scan.
> This behaviour should be avoided and scan should stop when enough data is 
> collected.
> Is it related to: [SPARK-9850] ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12843) Spark should avoid scanning all partitions when limit is set

Reply via email to