This is an interesting point.

As far as I know in any database (practically all RDBMS Oracle, SAP etc),
the LIMIT affects the collection part of the result set.

The result set is carried out fully on the query that may involve multiple
joins on multiple underlying tables.

To limit the actual query by LIMIT on each underlying table does not make
sense and will not be industry standard AFAK.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 24 October 2016 at 06:48, Michael Armbrust <mich...@databricks.com>
wrote:

> - dev + user
>
> Can you give more info about the query?  Maybe a full explain()?  Are you
> using a datasource like JDBC?  The API does not currently push down limits,
> but the documentation talks about how you can use a query instead of a
> table if that is what you are looking to do.
>
> On Mon, Oct 24, 2016 at 5:40 AM, Liz Bai <liz...@icloud.com> wrote:
>
>> Hi all,
>>
>> Let me clarify the problem:
>>
>> Suppose we have a simple table `A` with 100 000 000 records
>>
>> Problem:
>> When we execute sql query ‘select * from A Limit 500`,
>> It scan through all 100 000 000 records.
>> Normal behaviour should be that once 500 records is found, engine stop
>> scanning.
>>
>> Detailed observation:
>> We found that there are “GlobalLimit / LocalLimit” physical operators
>> https://github.com/apache/spark/blob/branch-2.0/sql/core/
>> src/main/scala/org/apache/spark/sql/execution/limit.scala
>> But during query plan generation, GlobalLimit / LocalLimit is not applied
>> to the query plan.
>>
>> Could you please help us to inspect LIMIT problem?
>> Thanks.
>>
>> Best,
>> Liz
>>
>> On 23 Oct 2016, at 10:11 PM, Xiao Li <gatorsm...@gmail.com> wrote:
>>
>> Hi, Liz,
>>
>> CollectLimit means `Take the first `limit` elements and collect them to a
>> single partition.`
>>
>> Thanks,
>>
>> Xiao
>>
>> 2016-10-23 5:21 GMT-07:00 Ran Bai <liz...@icloud.com>:
>>
>>> Hi all,
>>>
>>> I found the runtime for query with or without “LIMIT” keyword is the
>>> same. We looked into it and found actually there is “GlobalLimit /
>>> LocalLimit” in logical plan, however no relevant physical plan there. Is
>>> this a bug or something else? Attached are the logical and physical plans
>>> when running "SELECT * FROM seq LIMIT 1".
>>>
>>>
>>> More specifically, We expected a early stop upon getting adequate
>>> results.
>>> Thanks so much.
>>>
>>> Best,
>>> Liz
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>
>>
>>
>

Reply via email to