Re: [DISCUSSION] Which pagination to use in the v2-API of DS

2024-03-13 Thread David Zollo
Adding another API that uses cursor-based pagination is a good choice.
good job!



Best Regards

---
Apache DolphinScheduler PMC Chair & Apache SeaTunnel PMC member
David
Linkedin: https://www.linkedin.com/in/davidzollo
Twitter: @WorkflowEasy 
---


On Tue, Mar 12, 2024 at 11:53 AM Xin Cheng  wrote:

> Thanks a lot for your reply. I agree with your point about performance
> issues. The amount of data in DS is indeed not very large, except for
> querying the task instance list.
>
> Also, cursor-based pagination does not have data conflict/loss issues. If
> the community thinks that this problem has little impact, I think v2-api
> can continue to maintain the original paging method. And for APIs that
> return a larger amount of data (e.g., task instance list), we can add
> another API that uses cursor-based pagination.
>
> Best Regards,
> Rick Cheng
>
> On 2024/03/08 03:30:45 David Zollo wrote:
> > I think offset-based pagination is more convenient for users,  usually
> > users don't have too many pages and it's convenient for users to use open
> > api.
> >
> > Cursor-Based Pagination is proper for much more data. but it's not a good
> > choice for scheduling
> >
> >
> >
> > Best Regards
> >
> > ---
> > Apache DolphinScheduler PMC Chair & Apache SeaTunnel PMC member
> > David
> > Linkedin: https://www.linkedin.com/in/davidzollo
> > Twitter: @WorkflowEasy 
> > ---
> >
> >
> > On Wed, Feb 21, 2024 at 10:49 AM Rick Cheng 
> wrote:
> >
> > > Hi, Community
> > >
> > > Recently, the community plans to reconstruct the API of DS in
> > > https://github.com/apache/dolphinscheduler/pull/15321/. The `listXXX`
> API
> > > generally has two methods for pagination: Offset-based pagination or
> > > Cursor-based (token-based) pagination. Currently, DS uses the
> offset-based
> > > pagination (e.g., pageSize and pageNo). When designing the v2 API, I
> think
> > > we need to discuss what paging method to use.
> > >
> > > ### Offset-based Pagination
> > > Users can see page numbers directly on the web UI and jump between
> > > different pages. But there are 2 problems with this approach:
> > > * Data conflicts and loss: If a data item is inserted or deleted during
> > > paging, users may see duplicate data items on the next page, or data
> item
> > > may be lost.
> > > * Performance issue: When the offset is large, query efficiency will
> become
> > > lower
> > >
> > > ### Cursor-based Pagination
> > > Instead of relying on numeric offsets, cursor-based pagination uses a
> > > unique identifier or token to mark the position in the dataset.
> Token-based
> > > pagination does not have data conflict/loss or performance issue. And
> cloud
> > > vendors (e.g., AWS / Azure / GCP) also use cursor-based pagination
> > > extensively.
> > >
> > > ### An example of Cursor-based Pagination
> > > * Assume table `records` is sorted using `id` and `update_time`
> > > * Assume that a page returns 10 data items, then take the `id` and
> > > `update_time` of the last data item as the cursor (usually encoded
> instead
> > > of returned to the user in plain text).
> > > * The user uses the cursor obtained from the previous query to
> initiate a
> > > second paging query, and DS decodes it to obtain the `last_data.id`
> and
> > > `last_data.update_time` of the last piece of data in the previous
> query.
> > > ```
> > > SELECT *
> > > FROM
> > > records
> > > WHERE
> > > user_id = xxx
> > > AND (
> > > update_time < last_data.update_time
> > > OR ( update_time = last_data.update_time AND id < last_data.id
> )
> > > )
> > > ORDER BY
> > > update_time DESC, id DESC
> > > LIMIT 10;
> > > ```
> > >
> > > I prefer to use cursor-based pagination in the v2 API of DS since it
> does
> > > not have data conflict/loss or performance issue. And I don't think
> users
> > > have a strong need to jump directly to a specific page. Any comments or
> > > suggestions are welcome.
> > >
> > > Ref:
> > > [1] [Offset vs Cursor-Based Pagination: Which is the Right Choice for
> Your
> > > Project?](
> > >
> > >
> https://medium.com/@oshiryaeva/offset-vs-cursor-based-pagination-which-is-the-right-choice-for-your-project-e46f65db062f
> > > )
> > > [2] [Offset vs Cursor-Based Pagination: Choosing the Best Approach](
> > >
> > >
> https://medium.com/@maryam-bit/offset-vs-cursor-based-pagination-choosing-the-best-approach-2e93702a118b
> > > )
> > >
> > >
> > > Best Regards,
> > > Rick Cheng
> > >
> >
>


Re: [DISCUSSION] Which pagination to use in the v2-API of DS

2024-03-11 Thread Xin Cheng
Thanks a lot for your reply. I agree with your point about performance issues. 
The amount of data in DS is indeed not very large, except for querying the task 
instance list.

Also, cursor-based pagination does not have data conflict/loss issues. If the 
community thinks that this problem has little impact, I think v2-api can 
continue to maintain the original paging method. And for APIs that return a 
larger amount of data (e.g., task instance list), we can add another API that 
uses cursor-based pagination.

Best Regards,
Rick Cheng

On 2024/03/08 03:30:45 David Zollo wrote:
> I think offset-based pagination is more convenient for users,  usually
> users don't have too many pages and it's convenient for users to use open
> api.
> 
> Cursor-Based Pagination is proper for much more data. but it's not a good
> choice for scheduling
> 
> 
> 
> Best Regards
> 
> ---
> Apache DolphinScheduler PMC Chair & Apache SeaTunnel PMC member
> David
> Linkedin: https://www.linkedin.com/in/davidzollo
> Twitter: @WorkflowEasy 
> ---
> 
> 
> On Wed, Feb 21, 2024 at 10:49 AM Rick Cheng  wrote:
> 
> > Hi, Community
> >
> > Recently, the community plans to reconstruct the API of DS in
> > https://github.com/apache/dolphinscheduler/pull/15321/. The `listXXX` API
> > generally has two methods for pagination: Offset-based pagination or
> > Cursor-based (token-based) pagination. Currently, DS uses the offset-based
> > pagination (e.g., pageSize and pageNo). When designing the v2 API, I think
> > we need to discuss what paging method to use.
> >
> > ### Offset-based Pagination
> > Users can see page numbers directly on the web UI and jump between
> > different pages. But there are 2 problems with this approach:
> > * Data conflicts and loss: If a data item is inserted or deleted during
> > paging, users may see duplicate data items on the next page, or data item
> > may be lost.
> > * Performance issue: When the offset is large, query efficiency will become
> > lower
> >
> > ### Cursor-based Pagination
> > Instead of relying on numeric offsets, cursor-based pagination uses a
> > unique identifier or token to mark the position in the dataset. Token-based
> > pagination does not have data conflict/loss or performance issue. And cloud
> > vendors (e.g., AWS / Azure / GCP) also use cursor-based pagination
> > extensively.
> >
> > ### An example of Cursor-based Pagination
> > * Assume table `records` is sorted using `id` and `update_time`
> > * Assume that a page returns 10 data items, then take the `id` and
> > `update_time` of the last data item as the cursor (usually encoded instead
> > of returned to the user in plain text).
> > * The user uses the cursor obtained from the previous query to initiate a
> > second paging query, and DS decodes it to obtain the `last_data.id` and
> > `last_data.update_time` of the last piece of data in the previous query.
> > ```
> > SELECT *
> > FROM
> > records
> > WHERE
> > user_id = xxx
> > AND (
> > update_time < last_data.update_time
> > OR ( update_time = last_data.update_time AND id < last_data.id )
> > )
> > ORDER BY
> > update_time DESC, id DESC
> > LIMIT 10;
> > ```
> >
> > I prefer to use cursor-based pagination in the v2 API of DS since it does
> > not have data conflict/loss or performance issue. And I don't think users
> > have a strong need to jump directly to a specific page. Any comments or
> > suggestions are welcome.
> >
> > Ref:
> > [1] [Offset vs Cursor-Based Pagination: Which is the Right Choice for Your
> > Project?](
> >
> > https://medium.com/@oshiryaeva/offset-vs-cursor-based-pagination-which-is-the-right-choice-for-your-project-e46f65db062f
> > )
> > [2] [Offset vs Cursor-Based Pagination: Choosing the Best Approach](
> >
> > https://medium.com/@maryam-bit/offset-vs-cursor-based-pagination-choosing-the-best-approach-2e93702a118b
> > )
> >
> >
> > Best Regards,
> > Rick Cheng
> >
> 


Re: [DISCUSSION] Which pagination to use in the v2-API of DS

2024-03-07 Thread David Zollo
I think offset-based pagination is more convenient for users,  usually
users don't have too many pages and it's convenient for users to use open
api.

Cursor-Based Pagination is proper for much more data. but it's not a good
choice for scheduling



Best Regards

---
Apache DolphinScheduler PMC Chair & Apache SeaTunnel PMC member
David
Linkedin: https://www.linkedin.com/in/davidzollo
Twitter: @WorkflowEasy 
---


On Wed, Feb 21, 2024 at 10:49 AM Rick Cheng  wrote:

> Hi, Community
>
> Recently, the community plans to reconstruct the API of DS in
> https://github.com/apache/dolphinscheduler/pull/15321/. The `listXXX` API
> generally has two methods for pagination: Offset-based pagination or
> Cursor-based (token-based) pagination. Currently, DS uses the offset-based
> pagination (e.g., pageSize and pageNo). When designing the v2 API, I think
> we need to discuss what paging method to use.
>
> ### Offset-based Pagination
> Users can see page numbers directly on the web UI and jump between
> different pages. But there are 2 problems with this approach:
> * Data conflicts and loss: If a data item is inserted or deleted during
> paging, users may see duplicate data items on the next page, or data item
> may be lost.
> * Performance issue: When the offset is large, query efficiency will become
> lower
>
> ### Cursor-based Pagination
> Instead of relying on numeric offsets, cursor-based pagination uses a
> unique identifier or token to mark the position in the dataset. Token-based
> pagination does not have data conflict/loss or performance issue. And cloud
> vendors (e.g., AWS / Azure / GCP) also use cursor-based pagination
> extensively.
>
> ### An example of Cursor-based Pagination
> * Assume table `records` is sorted using `id` and `update_time`
> * Assume that a page returns 10 data items, then take the `id` and
> `update_time` of the last data item as the cursor (usually encoded instead
> of returned to the user in plain text).
> * The user uses the cursor obtained from the previous query to initiate a
> second paging query, and DS decodes it to obtain the `last_data.id` and
> `last_data.update_time` of the last piece of data in the previous query.
> ```
> SELECT *
> FROM
> records
> WHERE
> user_id = xxx
> AND (
> update_time < last_data.update_time
> OR ( update_time = last_data.update_time AND id < last_data.id )
> )
> ORDER BY
> update_time DESC, id DESC
> LIMIT 10;
> ```
>
> I prefer to use cursor-based pagination in the v2 API of DS since it does
> not have data conflict/loss or performance issue. And I don't think users
> have a strong need to jump directly to a specific page. Any comments or
> suggestions are welcome.
>
> Ref:
> [1] [Offset vs Cursor-Based Pagination: Which is the Right Choice for Your
> Project?](
>
> https://medium.com/@oshiryaeva/offset-vs-cursor-based-pagination-which-is-the-right-choice-for-your-project-e46f65db062f
> )
> [2] [Offset vs Cursor-Based Pagination: Choosing the Best Approach](
>
> https://medium.com/@maryam-bit/offset-vs-cursor-based-pagination-choosing-the-best-approach-2e93702a118b
> )
>
>
> Best Regards,
> Rick Cheng
>