[GitHub] [incubator-druid] leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed o
leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r342938668 ## File path: server/src/main/java/org/apache/druid/indexing/overlord/Segments.java ## @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.indexing.overlord; + +/** + * This enum is used as a parameter for several methods in {@link IndexerMetadataStorageCoordinator}, specifying whether + * only visible segments, or visible as well as overshadowed segments should be included in results. + * + * Visibility (and overshadowness - these terms are antonyms) may be defined on an interval (or a series of intervals). + * Consider the following example: + * + * || I + * || S' + * |---| S + * + * Here, I denotes an interval in question, S and S' are segments. S' is newer (has a higher version) than S. + * Segment S is overshadowed (by S') on the interval I, though it's visible (non-overshadowed) outside of I: more + * specifically, it's visible on the interval [end of S', end of S]. + * + * A segment is considered visible on a series of intervals if it's visible on any of the intervals in the series. A + * segment is considered (fully) overshadowed on a series of intervals if it's overshadowed (= non-visible) on all of + * the intervals in the series. + * + * If not specified otherwise, visibility (or overshadowness) should be assumed on the interval (-inf, +inf). This + * visibility may also be called "global" or "general" visibility. Review comment: Ok, removed the last sentence. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed o
leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r34128 ## File path: server/src/main/java/org/apache/druid/indexing/overlord/IndexerMetadataStorageCoordinator.java ## @@ -36,41 +37,63 @@ public interface IndexerMetadataStorageCoordinator { /** - * Get all segments which may include any data in the interval and are flagged as used. + * Get all segments which may include any data in the interval and are marked as used. * - * @param dataSource The datasource to query - * @param interval The interval for which all applicable and used datasources are requested. Start is inclusive, end is exclusive - * - * @return The DataSegments which include data in the requested interval. These segments may contain data outside the requested interval. + * The order of segments within the returned collection is unspecified, but each segment is guaranteed to appear in + * the collection only once. * - * @throws IOException + * @param dataSource The datasource to query + * @param interval The interval for which all applicable and used datasources are requested. Start is inclusive, + * end is exclusive + * @param visibility Whether only visible or visible as well as overshadowed segments should be returned. The + * visibility is considered within the specified interval: that is, a segment which is globally Review comment: I didn't understand your question but added a more detailed description of visibility to the Javadoc for `Segments` and linked from the methods. See [this commmit](https://github.com/apache/incubator-druid/pull/8564/commits/2fc7c5b5b0cf5bf4538d018010844c4fef812977). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed o
leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r340660374 ## File path: server/src/main/java/org/apache/druid/indexing/overlord/IndexerMetadataStorageCoordinator.java ## @@ -36,41 +37,63 @@ public interface IndexerMetadataStorageCoordinator { /** - * Get all segments which may include any data in the interval and are flagged as used. + * Get all segments which may include any data in the interval and are marked as used. * - * @param dataSource The datasource to query - * @param interval The interval for which all applicable and used datasources are requested. Start is inclusive, end is exclusive - * - * @return The DataSegments which include data in the requested interval. These segments may contain data outside the requested interval. + * The order of segments within the returned collection is unspecified, but each segment is guaranteed to appear in + * the collection only once. * - * @throws IOException + * @param dataSource The datasource to query + * @param interval The interval for which all applicable and used datasources are requested. Start is inclusive, + * end is exclusive + * @param visibility Whether only visible or visible as well as overshadowed segments should be returned. The + * visibility is considered within the specified interval: that is, a segment which is globally Review comment: Reworded to "visible outside of the specified interval(s)" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed o
leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r340660112 ## File path: indexing-service/src/test/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskTestBase.java ## @@ -0,0 +1,404 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.indexing.seekablestream; + +import com.fasterxml.jackson.core.type.TypeReference; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.base.Predicates; +import com.google.common.base.Throwables; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.ImmutableSet; +import com.google.common.io.Files; +import com.google.common.util.concurrent.ListenableFuture; +import com.google.common.util.concurrent.ListeningExecutorService; +import org.apache.druid.data.input.impl.DimensionsSpec; +import org.apache.druid.data.input.impl.FloatDimensionSchema; +import org.apache.druid.data.input.impl.JSONParseSpec; +import org.apache.druid.data.input.impl.LongDimensionSchema; +import org.apache.druid.data.input.impl.StringDimensionSchema; +import org.apache.druid.data.input.impl.StringInputRowParser; +import org.apache.druid.data.input.impl.TimestampSpec; +import org.apache.druid.indexer.TaskStatus; +import org.apache.druid.indexing.common.IngestionStatsAndErrorsTaskReportData; +import org.apache.druid.indexing.common.LockGranularity; +import org.apache.druid.indexing.common.TaskReport; +import org.apache.druid.indexing.common.TaskToolbox; +import org.apache.druid.indexing.common.TaskToolboxFactory; +import org.apache.druid.indexing.common.TestUtils; +import org.apache.druid.indexing.common.task.Task; +import org.apache.druid.indexing.common.task.Tasks; +import org.apache.druid.indexing.overlord.IndexerMetadataStorageCoordinator; +import org.apache.druid.indexing.overlord.Segments; +import org.apache.druid.indexing.overlord.TaskLockbox; +import org.apache.druid.indexing.overlord.TaskStorage; +import org.apache.druid.java.util.common.ISE; +import org.apache.druid.java.util.common.Intervals; +import org.apache.druid.java.util.common.StringUtils; +import org.apache.druid.java.util.common.granularity.Granularities; +import org.apache.druid.java.util.common.guava.Comparators; +import org.apache.druid.java.util.common.logger.Logger; +import org.apache.druid.java.util.common.parsers.JSONPathSpec; +import org.apache.druid.metadata.EntryExistsException; +import org.apache.druid.query.Druids; +import org.apache.druid.query.QueryPlus; +import org.apache.druid.query.Result; +import org.apache.druid.query.SegmentDescriptor; +import org.apache.druid.query.aggregation.AggregatorFactory; +import org.apache.druid.query.aggregation.CountAggregatorFactory; +import org.apache.druid.query.aggregation.DoubleSumAggregatorFactory; +import org.apache.druid.query.aggregation.LongSumAggregatorFactory; +import org.apache.druid.query.timeseries.TimeseriesQuery; +import org.apache.druid.query.timeseries.TimeseriesResultValue; +import org.apache.druid.segment.DimensionHandlerUtils; +import org.apache.druid.segment.IndexIO; +import org.apache.druid.segment.QueryableIndex; +import org.apache.druid.segment.column.DictionaryEncodedColumn; +import org.apache.druid.segment.indexing.DataSchema; +import org.apache.druid.segment.indexing.granularity.UniformGranularitySpec; +import org.apache.druid.segment.realtime.appenderator.AppenderatorImpl; +import org.apache.druid.timeline.DataSegment; +import org.apache.druid.utils.CompressionUtils; +import org.assertj.core.api.Assertions; +import org.easymock.EasyMockSupport; +import org.joda.time.Interval; +import org.junit.Assert; + +import java.io.File; +import java.io.IOException; +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.Comparator; +import java.util.List;
[GitHub] [incubator-druid] leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed o
leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r340659783 ## File path: server/src/main/java/org/apache/druid/indexing/overlord/Segments.java ## @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.indexing.overlord; + +/** + * This enum is used as a parameter for several methods in {@link IndexerMetadataStorageCoordinator}, specifying whether + * only visible segments, or visible as well as overshadowed segments should be included in results. Review comment: I think this would be overspecification for the enum. Instead, I've added "published" adjective to descriptions of all related methods in `IndexerMetadataStorageCoordinator`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed o
leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r340659984 ## File path: indexing-service/src/test/java/org/apache/druid/indexing/common/actions/SegmentListActionsTest.java ## @@ -94,21 +95,18 @@ private DataSegment createSegment(Interval interval, String version) } @Test - public void testSegmentListUsedAction() + public void testRetrieveUsedSegmentsAction() Review comment: Thanks, changed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed o
leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r340658231 ## File path: core/src/main/java/org/apache/druid/timeline/Partitions.java ## @@ -17,28 +17,16 @@ * under the License. */ -package org.apache.druid.indexer.path; - -import org.apache.druid.timeline.DataSegment; -import org.joda.time.Interval; - -import java.io.IOException; -import java.util.List; +package org.apache.druid.timeline; /** + * This enum is used a parameter for several methods in {@link VersionedIntervalTimeline}, specifying whether only + * complete partitions should be considered, or incomplete partitions as well. Review comment: Since `VersionedIntervalTimeline` treats completion status mechanically rather than semantically (that is, the implementation of `VersionedIntervalTimeline` would be the same regardless of what `isComplete()` semantically means), I just added reference to the method. I think Javadoc should be added to `isComplete()` method to explain illustratively what does it mean, but I don't feel confident about it. I've also opened issue #8788 which is related. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed o
leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r340658583 ## File path: server/src/main/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinator.java ## @@ -177,6 +175,32 @@ public void start() final String dataSource, final List intervals ) + { +Query> sql = createUsedSegmentsSqlQueryForIntervals(handle, dataSource, intervals); + +try (final ResultIterator dbSegments = sql.map(ByteArrayMapper.FIRST).iterator()) { + return VersionedIntervalTimeline.forSegments( + Iterators.transform(dbSegments, payload -> JacksonUtils.readValue(jsonMapper, payload, DataSegment.class)) + ); +} + } + + private Collection getAllUsedSegmentsForIntervalsWithHandle( + final Handle handle, + final String dataSource, + final List intervals + ) + { +return createUsedSegmentsSqlQueryForIntervals(handle, dataSource, intervals) +.map((index, r, ctx) -> JacksonUtils.readValue(jsonMapper, r.getBytes("payload"), DataSegment.class)) +.list(); + } + + private Query> createUsedSegmentsSqlQueryForIntervals( Review comment: Added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed o
leventov commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r340658231 ## File path: core/src/main/java/org/apache/druid/timeline/Partitions.java ## @@ -17,28 +17,16 @@ * under the License. */ -package org.apache.druid.indexer.path; - -import org.apache.druid.timeline.DataSegment; -import org.joda.time.Interval; - -import java.io.IOException; -import java.util.List; +package org.apache.druid.timeline; /** + * This enum is used a parameter for several methods in {@link VersionedIntervalTimeline}, specifying whether only + * complete partitions should be considered, or incomplete partitions as well. Review comment: Since `VersionedIntervalTimeline` treats completion status mechanically rather than semantically (that is, the implementation of `VersionedIntervalTimeline` would be the same regardless of what `isComplete()` semantically means), I just reference the method. I think Javadoc should be added to `isComplete()` method to explain illustratively what does it mean, but I don't feel confident about it. I've also opened issue #8788 which is related. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org