loserwang1024 opened a new issue, #2793: URL: https://github.com/apache/fluss/issues/2793
### Search before asking - [x] I searched in the [issues](https://github.com/apache/fluss/issues) and found nothing similar. ### Motivation ### problem In arrow, table.to_batches will return a collection of arrow batch: https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_batches ```python table.to_batches()[0].to_pandas() n_legs animals 0 2 Flamingo 1 4 Horse 2 5 Brittle stars 3 100 Centipede ``` In iceberg, support a Scanner for table. ```java org.apache.iceberg.Scanner<Record> scanner = icebergTable.newScan().limit(100).build(); ``` However, in fluss, only support batch scanner for tablet bucket. If user wang to get limit of a table, they have to use as followers: ```java try (Connection connection = ConnectionFactory.createConnection(flussConfig); Table table = connection.getTable(tablePath); Admin flussAdmin = connection.getAdmin()) { TableInfo tableInfo = flussAdmin.getTableInfo(tablePath).get(); int bucketCount = tableInfo.getNumBuckets(); List<TableBucket> tableBuckets; if (tableInfo.isPartitioned()) { List<PartitionInfo> partitionInfos = flussAdmin.listPartitionInfos(tablePath).get(); tableBuckets = partitionInfos.stream() .flatMap( partitionInfo -> IntStream.range(0, bucketCount) .mapToObj( bucketId -> new TableBucket( tableInfo .getTableId(), partitionInfo .getPartitionId(), bucketId))) .collect(Collectors.toList()); } else { tableBuckets = IntStream.range(0, bucketCount) .mapToObj( bucketId -> new TableBucket(tableInfo.getTableId(), bucketId)) .collect(Collectors.toList()); } Scan scan = table.newScan().limit(limit).project(projectedFields); List<BatchScanner> scanners = tableBuckets.stream() .map(scan::createBatchScanner) .collect(Collectors.toList()); List<InternalRow> scannedRows = BatchScanUtils.collectLimitedRows(scanners, limit); } ``` ### Solution I recommend to privode a batch scanner for the whole table: ```java Table table = connection.getTable(tablePath); BatchScanner batchScanner = table.newScan() .project(projectedFields) .limit(limit) .createBatchScanner() ``` ### Anything else? _No response_ ### Willingness to contribute - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
