[ 
https://issues.apache.org/jira/browse/SPARK-12843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110709#comment-15110709
 ] 

dileep edited comment on SPARK-12843 at 1/21/16 2:57 PM:
---------------------------------------------------------

Please see the below Code snippet. We need to make use of caching mechanism of 
the data frame. 
DataFrame teenagers = sqlContext.sql("SELECT * FROM people limit 1");
teenagers.cache();

Which is making significant improvement in the select query. So for subsequent 
select query it wont select the entire data



was (Author: dileep):
Please see the above Code. We need to make use of caching mechanism of the data 
frame. 
DataFrame teenagers = sqlContext.sql("SELECT * FROM people limit 1");
teenagers.cache();

Which is making significant improvement in the select query. So for subsequent 
select query it wont select the entire data


> Spark should avoid scanning all partitions when limit is set
> ------------------------------------------------------------
>
>                 Key: SPARK-12843
>                 URL: https://issues.apache.org/jira/browse/SPARK-12843
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>            Reporter: Maciej BryƄski
>
> SQL Query:
> {code}
> select * from table limit 100
> {code}
> force Spark to scan all partition even when data are available on the 
> beginning of scan.
> This behaviour should be avoided and scan should stop when enough data is 
> collected.
> Is it related to: [SPARK-9850] ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to