[GitHub] [incubator-druid] jihoonson commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed

2019-11-05 Thread GitBox
jihoonson commented on a change in pull request #8564: Fix ambiguity about 
IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning 
only non-overshadowed or all used segments
URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r342834195
 
 

 ##
 File path: 
server/src/main/java/org/apache/druid/indexing/overlord/Segments.java
 ##
 @@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexing.overlord;
+
+/**
+ * This enum is used as a parameter for several methods in {@link 
IndexerMetadataStorageCoordinator}, specifying whether
+ * only visible segments, or visible as well as overshadowed segments should 
be included in results.
+ *
+ * Visibility (and overshadowness - these terms are antonyms) may be defined 
on an interval (or a series of intervals).
+ * Consider the following example:
+ *
+ * || I
+ *   || S'
+ *   |---| S
+ *
+ * Here, I denotes an interval in question, S and S' are segments. S' is newer 
(has a higher version) than S.
+ * Segment S is overshadowed (by S') on the interval I, though it's visible 
(non-overshadowed) outside of I: more
+ * specifically, it's visible on the interval [end of S', end of S].
+ *
+ * A segment is considered visible on a series of intervals if it's visible on 
any of the intervals in the series. A
+ * segment is considered (fully) overshadowed on a series of intervals if it's 
overshadowed (= non-visible) on all of
+ * the intervals in the series.
+ *
+ * If not specified otherwise, visibility (or overshadowness) should be 
assumed on the interval (-inf, +inf). This
+ * visibility may also be called "global" or "general" visibility.
 
 Review comment:
   This new doc looks nice. Just one question is, do we need the concept of 
"global visibility"? Seems like it's enough to say `If not specified otherwise, 
visibility (or overshadowness) should be assumed on the interval (-inf, +inf).`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] jihoonson commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed

2019-10-30 Thread GitBox
jihoonson commented on a change in pull request #8564: Fix ambiguity about 
IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning 
only non-overshadowed or all used segments
URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r340854094
 
 

 ##
 File path: 
server/src/main/java/org/apache/druid/indexing/overlord/IndexerMetadataStorageCoordinator.java
 ##
 @@ -36,41 +37,63 @@
 public interface IndexerMetadataStorageCoordinator
 {
   /**
-   * Get all segments which may include any data in the interval and are 
flagged as used.
+   * Get all segments which may include any data in the interval and are 
marked as used.
*
-   * @param dataSource The datasource to query
-   * @param interval   The interval for which all applicable and used 
datasources are requested. Start is inclusive, end is exclusive
-   *
-   * @return The DataSegments which include data in the requested interval. 
These segments may contain data outside the requested interval.
+   * The order of segments within the returned collection is unspecified, but 
each segment is guaranteed to appear in
+   * the collection only once.
*
-   * @throws IOException
+   * @param dataSource The datasource to query
+   * @param interval   The interval for which all applicable and used 
datasources are requested. Start is inclusive,
+   *   end is exclusive
+   * @param visibility Whether only visible or visible as well as overshadowed 
segments should be returned. The
+   *   visibility is considered within the specified interval: 
that is, a segment which is globally
 
 Review comment:
   It's still unclear to me what "visible outside of an interval" means. Does 
this imply that a segment can be overshadowed only when we want to find the 
most recent one among segments in the same interval?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] jihoonson commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed

2019-10-29 Thread GitBox
jihoonson commented on a change in pull request #8564: Fix ambiguity about 
IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning 
only non-overshadowed or all used segments
URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r340227159
 
 

 ##
 File path: 
server/src/main/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinator.java
 ##
 @@ -177,6 +175,32 @@ public void start()
   final String dataSource,
   final List intervals
   )
+  {
+Query> sql = 
createUsedSegmentsSqlQueryForIntervals(handle, dataSource, intervals);
+
+try (final ResultIterator dbSegments = 
sql.map(ByteArrayMapper.FIRST).iterator()) {
+  return VersionedIntervalTimeline.forSegments(
+  Iterators.transform(dbSegments, payload -> 
JacksonUtils.readValue(jsonMapper, payload, DataSegment.class))
+  );
+}
+  }
+
+  private Collection getAllUsedSegmentsForIntervalsWithHandle(
+  final Handle handle,
+  final String dataSource,
+  final List intervals
+  )
+  {
+return createUsedSegmentsSqlQueryForIntervals(handle, dataSource, 
intervals)
+.map((index, r, ctx) -> JacksonUtils.readValue(jsonMapper, 
r.getBytes("payload"), DataSegment.class))
+.list();
+  }
+
+  private Query> createUsedSegmentsSqlQueryForIntervals(
 
 Review comment:
   Would you please add a javadoc for this method?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] jihoonson commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed

2019-10-29 Thread GitBox
jihoonson commented on a change in pull request #8564: Fix ambiguity about 
IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning 
only non-overshadowed or all used segments
URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r340239104
 
 

 ##
 File path: 
server/src/main/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinator.java
 ##
 @@ -186,7 +210,7 @@ public void start()
 sb.append("SELECT payload FROM %s WHERE used = true AND dataSource = ? AND 
(");
 for (int i = 0; i < intervals.size(); i++) {
   sb.append(
-  StringUtils.format("(start <= ? AND %1$send%1$s >= ?)", 
connector.getQuoteString())
+  StringUtils.format("(start < ? AND %1$send%1$s > ?)", 
connector.getQuoteString())
 
 Review comment:
   Nice catch!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] jihoonson commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed

2019-10-29 Thread GitBox
jihoonson commented on a change in pull request #8564: Fix ambiguity about 
IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning 
only non-overshadowed or all used segments
URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r340199636
 
 

 ##
 File path: core/src/main/java/org/apache/druid/timeline/Partitions.java
 ##
 @@ -17,28 +17,16 @@
  * under the License.
  */
 
-package org.apache.druid.indexer.path;
-
-import org.apache.druid.timeline.DataSegment;
-import org.joda.time.Interval;
-
-import java.io.IOException;
-import java.util.List;
+package org.apache.druid.timeline;
 
 /**
+ * This enum is used a parameter for several methods in {@link 
VersionedIntervalTimeline}, specifying whether only
+ * complete partitions should be considered, or incomplete partitions as well.
 
 Review comment:
   Should this explain what "complete partition" means? Also it can link 
`PartitionHolder.isComplete()`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] jihoonson commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed

2019-10-29 Thread GitBox
jihoonson commented on a change in pull request #8564: Fix ambiguity about 
IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning 
only non-overshadowed or all used segments
URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r340233270
 
 

 ##
 File path: 
server/src/main/java/org/apache/druid/indexing/overlord/Segments.java
 ##
 @@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexing.overlord;
+
+/**
+ * This enum is used as a parameter for several methods in {@link 
IndexerMetadataStorageCoordinator}, specifying whether
+ * only visible segments, or visible as well as overshadowed segments should 
be included in results.
 
 Review comment:
   Since this enum is used only in `IndexerMetadataStorageCoordinator`, it will 
be more accurate if it says all segments in the result are published segments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] jihoonson commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed

2019-10-29 Thread GitBox
jihoonson commented on a change in pull request #8564: Fix ambiguity about 
IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning 
only non-overshadowed or all used segments
URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r340234591
 
 

 ##
 File path: 
indexing-service/src/test/java/org/apache/druid/indexing/common/actions/SegmentListActionsTest.java
 ##
 @@ -94,21 +95,18 @@ private DataSegment createSegment(Interval interval, 
String version)
   }
 
   @Test
-  public void testSegmentListUsedAction()
+  public void testRetrieveUsedSegmentsAction()
 
 Review comment:
   Please change the name of this class as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] jihoonson commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed

2019-10-29 Thread GitBox
jihoonson commented on a change in pull request #8564: Fix ambiguity about 
IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning 
only non-overshadowed or all used segments
URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r340237967
 
 

 ##
 File path: 
server/src/main/java/org/apache/druid/indexing/overlord/IndexerMetadataStorageCoordinator.java
 ##
 @@ -36,41 +37,63 @@
 public interface IndexerMetadataStorageCoordinator
 {
   /**
-   * Get all segments which may include any data in the interval and are 
flagged as used.
+   * Get all segments which may include any data in the interval and are 
marked as used.
*
-   * @param dataSource The datasource to query
-   * @param interval   The interval for which all applicable and used 
datasources are requested. Start is inclusive, end is exclusive
-   *
-   * @return The DataSegments which include data in the requested interval. 
These segments may contain data outside the requested interval.
+   * The order of segments within the returned collection is unspecified, but 
each segment is guaranteed to appear in
+   * the collection only once.
*
-   * @throws IOException
+   * @param dataSource The datasource to query
+   * @param interval   The interval for which all applicable and used 
datasources are requested. Start is inclusive,
+   *   end is exclusive
+   * @param visibility Whether only visible or visible as well as overshadowed 
segments should be returned. The
+   *   visibility is considered within the specified interval: 
that is, a segment which is globally
 
 Review comment:
   What does it mean by "globally visible"?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] jihoonson commented on a change in pull request #8564: Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed

2019-10-29 Thread GitBox
jihoonson commented on a change in pull request #8564: Fix ambiguity about 
IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning 
only non-overshadowed or all used segments
URL: https://github.com/apache/incubator-druid/pull/8564#discussion_r340236753
 
 

 ##
 File path: 
indexing-service/src/test/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskTestBase.java
 ##
 @@ -0,0 +1,404 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexing.seekablestream;
+
+import com.fasterxml.jackson.core.type.TypeReference;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.base.Predicates;
+import com.google.common.base.Throwables;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.ImmutableSet;
+import com.google.common.io.Files;
+import com.google.common.util.concurrent.ListenableFuture;
+import com.google.common.util.concurrent.ListeningExecutorService;
+import org.apache.druid.data.input.impl.DimensionsSpec;
+import org.apache.druid.data.input.impl.FloatDimensionSchema;
+import org.apache.druid.data.input.impl.JSONParseSpec;
+import org.apache.druid.data.input.impl.LongDimensionSchema;
+import org.apache.druid.data.input.impl.StringDimensionSchema;
+import org.apache.druid.data.input.impl.StringInputRowParser;
+import org.apache.druid.data.input.impl.TimestampSpec;
+import org.apache.druid.indexer.TaskStatus;
+import org.apache.druid.indexing.common.IngestionStatsAndErrorsTaskReportData;
+import org.apache.druid.indexing.common.LockGranularity;
+import org.apache.druid.indexing.common.TaskReport;
+import org.apache.druid.indexing.common.TaskToolbox;
+import org.apache.druid.indexing.common.TaskToolboxFactory;
+import org.apache.druid.indexing.common.TestUtils;
+import org.apache.druid.indexing.common.task.Task;
+import org.apache.druid.indexing.common.task.Tasks;
+import org.apache.druid.indexing.overlord.IndexerMetadataStorageCoordinator;
+import org.apache.druid.indexing.overlord.Segments;
+import org.apache.druid.indexing.overlord.TaskLockbox;
+import org.apache.druid.indexing.overlord.TaskStorage;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.java.util.common.Intervals;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.java.util.common.guava.Comparators;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.java.util.common.parsers.JSONPathSpec;
+import org.apache.druid.metadata.EntryExistsException;
+import org.apache.druid.query.Druids;
+import org.apache.druid.query.QueryPlus;
+import org.apache.druid.query.Result;
+import org.apache.druid.query.SegmentDescriptor;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.CountAggregatorFactory;
+import org.apache.druid.query.aggregation.DoubleSumAggregatorFactory;
+import org.apache.druid.query.aggregation.LongSumAggregatorFactory;
+import org.apache.druid.query.timeseries.TimeseriesQuery;
+import org.apache.druid.query.timeseries.TimeseriesResultValue;
+import org.apache.druid.segment.DimensionHandlerUtils;
+import org.apache.druid.segment.IndexIO;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.column.DictionaryEncodedColumn;
+import org.apache.druid.segment.indexing.DataSchema;
+import org.apache.druid.segment.indexing.granularity.UniformGranularitySpec;
+import org.apache.druid.segment.realtime.appenderator.AppenderatorImpl;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.utils.CompressionUtils;
+import org.assertj.core.api.Assertions;
+import org.easymock.EasyMockSupport;
+import org.joda.time.Interval;
+import org.junit.Assert;
+
+import java.io.File;
+import java.io.IOException;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;