[GitHub] [flink] JingsongLi commented on a change in pull request #13963: [FLINK-19888][hive] Migrate Hive source to FLIP-27 source interface for streaming

GitBox Sat, 07 Nov 2020 17:05:15 -0800


JingsongLi commented on a change in pull request #13963:
URL: https://github.com/apache/flink/pull/13963#discussion_r519241002




##########
File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveSource.java
##########
@@ -46,29 +58,98 @@
 
        private static final long serialVersionUID = 1L;
 
+       private final JobConfWrapper jobConfWrapper;
+       private final List<String> partitionKeys;
+       private final ContinuousPartitionFetcher<Partition, ?> fetcher;
+       private final HiveTableSource.HiveContinuousPartitionFetcherContext<?> 
fetcherContext;
+       private final ObjectPath tablePath;
+
        HiveSource(
                        JobConf jobConf,
+                       ObjectPath tablePath,
                        CatalogTable catalogTable,
                        List<HiveTablePartition> partitions,
                        @Nullable Long limit,
                        String hiveVersion,
                        boolean useMapRedReader,
-                       boolean isStreamingSource,
+                       @Nullable ContinuousEnumerationSettings 
continuousEnumerationSettings,
+                       ContinuousPartitionFetcher<Partition, ?> fetcher,
+                       
HiveTableSource.HiveContinuousPartitionFetcherContext<?> fetcherContext,
                        RowType producedRowType) {
                super(
                                new org.apache.flink.core.fs.Path[1],
                                new 
HiveSourceFileEnumerator.Provider(partitions, new JobConfWrapper(jobConf)),
-                               DEFAULT_SPLIT_ASSIGNER,
+                               continuousEnumerationSettings == null ? 
DEFAULT_SPLIT_ASSIGNER : SimpleSplitAssigner::new,
                                createBulkFormat(new JobConf(jobConf), 
catalogTable, hiveVersion, producedRowType, useMapRedReader, limit),
-                               null);
-               Preconditions.checkArgument(!isStreamingSource, "HiveSource 
currently only supports bounded mode");
+                               continuousEnumerationSettings);
+               this.jobConfWrapper = new JobConfWrapper(jobConf);
+               this.tablePath = tablePath;
+               this.partitionKeys = catalogTable.getPartitionKeys();
+               this.fetcher = fetcher;
+               this.fetcherContext = fetcherContext;
        }
 
        @Override
        public SimpleVersionedSerializer<HiveSourceSplit> getSplitSerializer() {
                return HiveSourceSplitSerializer.INSTANCE;
        }
 
+       @Override
+       public 
SimpleVersionedSerializer<PendingSplitsCheckpoint<HiveSourceSplit>> 
getEnumeratorCheckpointSerializer() {
+               if (needContinuousSplitEnumerator()) {
+                       return new 
ContinuousHivePendingSplitsCheckpointSerializer(getSplitSerializer());
+               } else {
+                       return super.getEnumeratorCheckpointSerializer();
+               }
+       }
+
+       @Override
+       public SplitEnumerator<HiveSourceSplit, 
PendingSplitsCheckpoint<HiveSourceSplit>> createEnumerator(
+                       SplitEnumeratorContext<HiveSourceSplit> enumContext) {
+               if (needContinuousSplitEnumerator()) {
+                       return createContinuousSplitEnumerator(
+                                       enumContext, 
fetcherContext.getConsumeStartOffset(), Collections.emptyList(), 
Collections.emptyList());
+               } else {
+                       return super.createEnumerator(enumContext);
+               }
+       }
+
+       @Override
+       public SplitEnumerator<HiveSourceSplit, 
PendingSplitsCheckpoint<HiveSourceSplit>> restoreEnumerator(
+                       SplitEnumeratorContext<HiveSourceSplit> enumContext, 
PendingSplitsCheckpoint<HiveSourceSplit> checkpoint) {
+               if (needContinuousSplitEnumerator()) {
+                       Preconditions.checkState(checkpoint instanceof 
ContinuousHivePendingSplitsCheckpoint,
+                                       "Illegal type of splits checkpoint %s 
for streaming read partitioned table", checkpoint.getClass().getName());
+                       ContinuousHivePendingSplitsCheckpoint hiveCheckpoint = 
(ContinuousHivePendingSplitsCheckpoint) checkpoint;
+                       return createContinuousSplitEnumerator(
+                                       enumContext, 
hiveCheckpoint.getCurrentReadOffset(), hiveCheckpoint.getSeenPartitions(), 
hiveCheckpoint.getSplits());
+               } else {
+                       return super.restoreEnumerator(enumContext, checkpoint);
+               }
+       }
+
+       private boolean needContinuousSplitEnumerator() {
+               return getBoundedness() == Boundedness.CONTINUOUS_UNBOUNDED && 
!partitionKeys.isEmpty();
+       }
+
+       private SplitEnumerator<HiveSourceSplit, 
PendingSplitsCheckpoint<HiveSourceSplit>> createContinuousSplitEnumerator(
+                       SplitEnumeratorContext<HiveSourceSplit> enumContext,
+                       Comparable<?> currentReadOffset,
+                       Collection<List<String>> seenPartitions,
+                       Collection<HiveSourceSplit> splits) {
+               return new ContinuousHiveSplitEnumerator(

Review comment:
       Ditto




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] JingsongLi commented on a change in pull request #13963: [FLINK-19888][hive] Migrate Hive source to FLIP-27 source interface for streaming

Reply via email to