>
> So my other question - for aggregation with the "group by" clause, we
> return an aggregated row which is computed from a group of rows - with my
> current implementation, it is approximated by counting the size of the
> largest row in that group - I think it is the safest and simplest
> approximation - wdyt?


I feel that there are something that was not discussed here. The storage
engine can return some rows that are much larger than the actual row
returned to the user depending on the projections being used. Therefore
there will only be a reliable matching between the size of the page loaded
internally and the size of the page returned to the user when the full row
is queried without transformation. For all the other case the difference
can be really significant. For a group by queries doing a count(*), the
approach suggested will return a page size that is totally off with what
was requested.

Le mar. 13 juin 2023 à 07:00, Jacek Lewandowski <lewandowski.ja...@gmail.com>
a écrit :

> Josh, that answers my question exactly; thank you.
>
> I will not implement limiting the result set in CQL (that is, by LIMIT
> clause) and stay with just paging. Whether the page size is defined in
> bytes or rows can be determined by a flag - there are many unused bits for
> that.
>
> So my other question - for aggregation with the "group by" clause, we
> return an aggregated row which is computed from a group of rows - with my
> current implementation, it is approximated by counting the size of the
> largest row in that group - I think it is the safest and simplest
> approximation - wdyt?
>
>
> pon., 12 cze 2023 o 22:55 Josh McKenzie <jmcken...@apache.org> napisał(a):
>
>> As long as it is valid in the paging protocol to return a short page, but
>> still say “there are more pages”, I think that is fine to do that.
>>
>> Thankfully the v3-v5 spec all make it clear that clients need to respect
>> what the server has to say about there being more pages:
>> https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v5.spec#L1247-L1253
>>
>>   - Clients should not rely on the actual size of the result set returned
>> to
>>     decide if there are more results to fetch or not. Instead, they
>> should always
>>     check the Has_more_pages flag (unless they did not enable paging for
>> the query
>>     obviously). Clients should also not assert that no result will have
>> more than
>>     <result_page_size> results. While the current implementation always
>> respects
>>     the exact value of <result_page_size>, we reserve the right to return
>>     slightly smaller or bigger pages in the future for performance
>> reasons.
>>
>>
>> On Mon, Jun 12, 2023, at 3:19 PM, Jeremiah Jordan wrote:
>>
>> As long as it is valid in the paging protocol to return a short page, but
>> still say “there are more pages”, I think that is fine to do that.  For an
>> actual LIMIT that is part of the user query, I think the server must always
>> have returned all data that fits into the LIMIT when all pages have been
>> returned.
>>
>> -Jeremiah
>>
>> On Jun 12, 2023 at 12:56:14 PM, Josh McKenzie <jmcken...@apache.org>
>> wrote:
>>
>>
>> Yeah, my bad. I have paging on the brain. Seriously.
>>
>> I can't think of a use-case in which a LIMIT based on # bytes makes sense
>> from a user perspective.
>>
>> On Mon, Jun 12, 2023, at 1:35 PM, Jeff Jirsa wrote:
>>
>>
>>
>> On Mon, Jun 12, 2023 at 9:50 AM Benjamin Lerer <b.le...@gmail.com> wrote:
>>
>> If you have rows that vary significantly in their size, your latencies
>> could end up being pretty unpredictable using a LIMIT BY <row_count>. Being
>> able to specify a limit by bytes at the driver / API level would allow app
>> devs to get more deterministic results out of their interaction w/the DB if
>> they're looking to respond back to a client within a certain time frame and
>> / or determine next steps in the app (continue paging, stop, etc) based on
>> how long it took to get results back.
>>
>>
>> Are you talking about the page size or the LIMIT. Once the LIMIT is
>> reached there is no "continue paging". LIMIT is also at the CQL level not
>> at the driver level.
>> I can totally understand the need for a page size in bytes not for a
>> LIMIT.
>>
>>
>> Would only ever EXPECT to see a page size in bytes, never a LIMIT
>> specifying bytes.
>>
>> I know the C-11745 ticket says LIMIT, too, but that feels very odd to me.
>>
>>
>>
>>

Reply via email to