snazy commented on code in PR #273: URL: https://github.com/apache/polaris/pull/273#discussion_r2023042696
########## polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/ReadEverythingPageToken.java: ########## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.polaris.core.persistence.pagination; + +import java.util.List; + +/** + * A {@link PageToken} implementation for readers who want to read everything. The behavior when + * using this token should be the same as when reading without a token. + */ +public class ReadEverythingPageToken extends PageToken { Review Comment: This mixes two different things: A "page token" is meant as an (opaque) piece of data to indicate where the next page result should start. "Read everything" is a filter criteria, not a page token. ########## polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PolarisPage.java: ########## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.polaris.core.persistence.pagination; + +import java.util.List; + +/** + * A wrapper for a {@link List} of data and a {@link PageToken} that can be used to continue the + * listing operation that generated that data. + */ +public class PolarisPage<T> { Review Comment: `Polaris` is superfluous - the type's already in an `org.apache.polaris` package, right? ########## extension/persistence/eclipselink/src/main/java/org/apache/polaris/extension/persistence/impl/eclipselink/PolarisEclipseLinkMetaStoreSessionImpl.java: ########## @@ -441,27 +445,57 @@ public List<EntityNameLookupRecord> lookupEntityActiveBatchInCurrentTxn( entity.getParentId(), entity.getName(), entity.getTypeCode(), - entity.getSubTypeCode())); + entity.getSubTypeCode()), + pageToken); } @Override - public @Nonnull <T> List<T> listEntitiesInCurrentTxn( + public @Nonnull <T> PolarisPage<T> listEntitiesInCurrentTxn( @Nonnull PolarisCallContext callCtx, long catalogId, long parentId, @Nonnull PolarisEntityType entityType, - int limit, @Nonnull Predicate<PolarisBaseEntity> entityFilter, - @Nonnull Function<PolarisBaseEntity, T> transformer) { - // full range scan under the parent for that type - return this.store - .lookupFullEntitiesActive(localSession.get(), catalogId, parentId, entityType) - .stream() - .map(ModelEntity::toEntity) - .filter(entityFilter) - .limit(limit) - .map(transformer) - .collect(Collectors.toList()); + @Nonnull Function<PolarisBaseEntity, T> transformer, + @Nonnull PageToken pageToken) { + List<T> data; + if (entityFilter.equals(Predicates.alwaysTrue())) { + // In this case, we can push the filter down into the query + data = + this.store + .lookupFullEntitiesActive( + localSession.get(), catalogId, parentId, entityType, pageToken) + .stream() + .map(ModelEntity::toEntity) + .filter(entityFilter) + .map(transformer) + .collect(Collectors.toList()); + } else { + // In this case, we cannot push the filter down into the query. We must therefore remove + // the page size limit from the PageToken and filter on the client side. + // TODO Implement a generic predicate that can be pushed down into different metastores + PageToken unlimitedPageSizeToken = pageToken.withPageSize(Integer.MAX_VALUE); + List<ModelEntity> rawData = + this.store.lookupFullEntitiesActive( + localSession.get(), catalogId, parentId, entityType, unlimitedPageSizeToken); Review Comment: I disagree that an inconsistency (duplicate and omitted) results is acceptable. ########## polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/PageToken.java: ########## @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.polaris.core.persistence.pagination; + +import java.nio.charset.StandardCharsets; +import java.util.Arrays; +import java.util.Base64; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; + +/** + * Represents a page token that can be used by operations like `listTables`. Clients that specify a + * `pageSize` (or a `pageToken`) may receive a `next-page-token` in the response, the content of + * which is a serialized PageToken. + * + * <p>By providing that in the next query's `pageToken`, the client can resume listing where they + * left off. If the client provides a `pageToken` or `pageSize` but `next-page-token` is null in the + * response, that means there is no more data to read. + */ +public abstract class PageToken { + + public int pageSize; Review Comment: I noticed this elsewhere as well. Page token is a an opaque server generated value - page size is a _hint_ from a client. Two very different things. ########## polaris-core/src/main/java/org/apache/polaris/core/config/FeatureConfiguration.java: ########## @@ -183,4 +183,12 @@ protected FeatureConfiguration( "How many times to retry refreshing metadata when the previous error was retryable") .defaultValue(2) .buildFeatureConfiguration(); + + public static final PolarisConfiguration<Boolean> LIST_PAGINATION_ENABLED = Review Comment: > But we need a feature flag on the server side. What's the reason for that? Whether an Iceberg client supports pagination is clear from the request. ########## service/common/src/test/java/org/apache/polaris/service/persistence/pagination/PageTokenTest.java: ########## @@ -0,0 +1,207 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.polaris.service.persistence.pagination; + +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.polaris.core.persistence.pagination.OffsetPageToken; +import org.apache.polaris.core.persistence.pagination.PageToken; +import org.apache.polaris.core.persistence.pagination.ReadEverythingPageToken; +import org.apache.polaris.jpa.models.EntityIdPageToken; +import org.apache.polaris.jpa.models.ModelEntity; +import org.assertj.core.api.Assertions; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.MethodSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class PageTokenTest { Review Comment: I wish there'd be much better test coverage about all the code and use cases have are changed in this PR. ########## polaris-core/src/main/java/org/apache/polaris/core/persistence/dao/entity/ListEntitiesResult.java: ########## @@ -23,45 +23,67 @@ import jakarta.annotation.Nonnull; import jakarta.annotation.Nullable; import java.util.List; +import java.util.Optional; import org.apache.polaris.core.entity.EntityNameLookupRecord; +import org.apache.polaris.core.persistence.pagination.PageToken; +import org.apache.polaris.core.persistence.pagination.PolarisPage; /** the return the result for a list entities call */ public class ListEntitiesResult extends BaseResult { // null if not success. Else the list of entities being returned private final List<EntityNameLookupRecord> entities; + private final Optional<PageToken> pageTokenOpt; + + /** Create a {@link ListEntitiesResult} from a {@link PolarisPage} */ + public static ListEntitiesResult fromPolarisPage( + PolarisPage<EntityNameLookupRecord> polarisPage) { + return new ListEntitiesResult(polarisPage.data, Optional.ofNullable(polarisPage.pageToken)); + } /** * Constructor for an error * * @param errorCode error code, cannot be SUCCESS * @param extraInformation extra information */ - public ListEntitiesResult(@Nonnull ReturnStatus errorCode, @Nullable String extraInformation) { + public ListEntitiesResult( + @Nonnull ReturnStatus errorCode, + @Nullable String extraInformation, + @Nonnull Optional<PageToken> pageTokenOpt) { super(errorCode, extraInformation); this.entities = null; + this.pageTokenOpt = pageTokenOpt; } /** * Constructor for success * * @param entities list of entities being returned, implies success */ - public ListEntitiesResult(@Nonnull List<EntityNameLookupRecord> entities) { + public ListEntitiesResult( + @Nonnull List<EntityNameLookupRecord> entities, @Nullable Optional<PageToken> pageTokenOpt) { Review Comment: `@Nullable Optional<PageToken>` that's not good practice. `Optional`s should really always be non-null. ########## extension/persistence/eclipselink/src/main/java/org/apache/polaris/extension/persistence/impl/eclipselink/PolarisEclipseLinkMetaStoreSessionImpl.java: ########## @@ -441,27 +445,57 @@ public List<EntityNameLookupRecord> lookupEntityActiveBatchInCurrentTxn( entity.getParentId(), entity.getName(), entity.getTypeCode(), - entity.getSubTypeCode())); + entity.getSubTypeCode()), + pageToken); } @Override - public @Nonnull <T> List<T> listEntitiesInCurrentTxn( + public @Nonnull <T> PolarisPage<T> listEntitiesInCurrentTxn( @Nonnull PolarisCallContext callCtx, long catalogId, long parentId, @Nonnull PolarisEntityType entityType, - int limit, @Nonnull Predicate<PolarisBaseEntity> entityFilter, - @Nonnull Function<PolarisBaseEntity, T> transformer) { - // full range scan under the parent for that type - return this.store - .lookupFullEntitiesActive(localSession.get(), catalogId, parentId, entityType) - .stream() - .map(ModelEntity::toEntity) - .filter(entityFilter) - .limit(limit) - .map(transformer) - .collect(Collectors.toList()); + @Nonnull Function<PolarisBaseEntity, T> transformer, + @Nonnull PageToken pageToken) { + List<T> data; + if (entityFilter.equals(Predicates.alwaysTrue())) { Review Comment: Does `Predicate.alwaysTrue` return something that can be compared and will that hold true for future? `Predicate`s are rather functions/lambdas - those are not meant to be compared with anything. ########## polaris-core/src/main/java/org/apache/polaris/core/persistence/pagination/OffsetPageToken.java: ########## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.polaris.core.persistence.pagination; + +import java.util.List; + +/** + * A simple {@link PageToken} implementation that tracks the number of records returned. Entities + * are meant to be filtered during listing such that when a token with offset N is supplied, the + * first N records are omitted from the results. + */ +public class OffsetPageToken extends PageToken { Review Comment: This mechanism can lead to duplicate results and/or omitted results. The order of the entities returned is undefined - it can vary on every list() invocation. Even if the order is (somewhat?) deterministic, a no-longer existing entity in a previous page will lead to a missing result - and vice versa for an entity added "in between" a previous page will lead to duplicate results. ########## extension/persistence/jpa-model/src/main/java/org/apache/polaris/jpa/models/EntityIdPageToken.java: ########## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.polaris.jpa.models; + +import java.util.List; +import org.apache.polaris.core.entity.PolarisBaseEntity; +import org.apache.polaris.core.persistence.pagination.PageToken; + +/** + * A {@link PageToken} implementation that tracks the greatest ID from either {@link + * PolarisBaseEntity} or {@link ModelEntity} objects supplied in updates. Entities are meant to be + * filtered during listing such that only entities with and ID greater than the ID of the token are Review Comment: Noting: the order of values returned by database sequences is not necessarily strictly increasing. Aka: it is wrong to assume that each retrieved values from a database sequence is higher than any value retrieved before. ########## extension/persistence/eclipselink/src/main/java/org/apache/polaris/extension/persistence/impl/eclipselink/PolarisEclipseLinkMetaStoreSessionImpl.java: ########## @@ -441,27 +445,57 @@ public List<EntityNameLookupRecord> lookupEntityActiveBatchInCurrentTxn( entity.getParentId(), entity.getName(), entity.getTypeCode(), - entity.getSubTypeCode())); + entity.getSubTypeCode()), + pageToken); } @Override - public @Nonnull <T> List<T> listEntitiesInCurrentTxn( + public @Nonnull <T> PolarisPage<T> listEntitiesInCurrentTxn( @Nonnull PolarisCallContext callCtx, long catalogId, long parentId, @Nonnull PolarisEntityType entityType, - int limit, @Nonnull Predicate<PolarisBaseEntity> entityFilter, - @Nonnull Function<PolarisBaseEntity, T> transformer) { - // full range scan under the parent for that type - return this.store - .lookupFullEntitiesActive(localSession.get(), catalogId, parentId, entityType) - .stream() - .map(ModelEntity::toEntity) - .filter(entityFilter) - .limit(limit) - .map(transformer) - .collect(Collectors.toList()); + @Nonnull Function<PolarisBaseEntity, T> transformer, + @Nonnull PageToken pageToken) { + List<T> data; + if (entityFilter.equals(Predicates.alwaysTrue())) { + // In this case, we can push the filter down into the query + data = + this.store + .lookupFullEntitiesActive( + localSession.get(), catalogId, parentId, entityType, pageToken) + .stream() + .map(ModelEntity::toEntity) + .filter(entityFilter) + .map(transformer) + .collect(Collectors.toList()); + } else { + // In this case, we cannot push the filter down into the query. We must therefore remove + // the page size limit from the PageToken and filter on the client side. + // TODO Implement a generic predicate that can be pushed down into different metastores + PageToken unlimitedPageSizeToken = pageToken.withPageSize(Integer.MAX_VALUE); + List<ModelEntity> rawData = + this.store.lookupFullEntitiesActive( + localSession.get(), catalogId, parentId, entityType, unlimitedPageSizeToken); Review Comment: I also have a strong concern that there's more data processed here than absolutely necessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
