[
https://issues.apache.org/jira/browse/DRILL-8092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464891#comment-17464891
]
ASF GitHub Bot commented on DRILL-8092:
---------------------------------------
cgivre opened a new pull request #2414:
URL: https://github.com/apache/drill/pull/2414
# [DRILL-8092](https://issues.apache.org/jira/browse/DRILL-8092): Add Auto
Pagination to HTTP Storage Plugin
## Description
This PR adds the ability for Drill to access APIs that have some sort of
pagination. In a nutshell, let's say an API limits you to 100 records per
page. This improvement allows Drill to execute multiple HTTP requests to
retrieve the complete dataset.
This update works in two ways: with a limit and without. In the event a
limit is pushed down, the new paginator object will generate the correct number
of URLs and BatchReaders, execute the queries and return the results.
Currently, this is executed in series, but in future work this could be
parallelized.
In the event a limit is not pushed down, the reader will keep generating
URLs and retrieving data until the row count of data returned is less than the
page size.
## Documentation
(From README)
Remote APIs frequently implement some sort of pagination as a way of
limiting results. However, if you are performing bulk data analysis, it is
necessary to reassemble the
data into one larger dataset. Drill's auto-pagination features allow this
to happen in the background, so that the user will get clean data back.
To use a paginator, you simply have to configure the paginator in the
connection for the particular API.
## Offset Pagination
Offset Pagination uses commands similar to SQL which has a `LIMIT` and an
`OFFSET`. With an offset paginator, let's say you want 200 records and the
maximum page size is 50
records, the offset paginator will break up your query into 4 requests as
shown below:
* myapi.com?limit=50&offset=0
* myapi.com?limit=50?offset=50
* myapi.com?limit=50&offset=100
* myapi.com?limit=50&offset=150
### Configuring Offset Pagination
To configure an offset paginator, simply add the following to the
configuration for your connection.
```json
"paginator": {
"limitField": "<limit>",
"offsetField": "<offset>",
"maxPageSize": 100,
"method": "OFFSET"
}
```
## Page Pagination
Page pagination is very similar to offset pagination except instead of using
an `OFFSET` it uses a page number.
```json
"paginator": {
"pageField": "page",
"pageSizeField": "per_page",
"maxPageSize": 100,
"method": "PAGE"
}
```
## Testing
Added unit tests and tested manually.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Add Auto Pagination to HTTP Storage Plugin
> ------------------------------------------
>
> Key: DRILL-8092
> URL: https://issues.apache.org/jira/browse/DRILL-8092
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Other
> Affects Versions: 1.19.0
> Reporter: Charles Givre
> Assignee: Charles Givre
> Priority: Major
> Fix For: 1.20.0
>
>
> See github
--
This message was sent by Atlassian Jira
(v8.20.1#820001)