[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes
[ https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=339178&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339178 ] ASF GitHub Bot logged work on GOBBLIN-945: -- Author: ASF GitHub Bot Created on: 06/Nov/19 06:08 Start Date: 06/Nov/19 06:08 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342929380 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java ## @@ -216,25 +176,18 @@ public D readRecordImpl(D reuse) throws DataRecordException, IOException { D record = decodeKafkaMessage(nextValidMessage); - this.currentPartitionDecodeRecordTime += System.nanoTime() - decodeStartTime; - this.currentPartitionRecordCount++; - this.currentPartitionTotalSize += nextValidMessage.getValueSizeInBytes(); - this.currentPartitionReadRecordTime += System.nanoTime() - readStartTime; + this.statsTracker.onDecodeableRecord(this.currentPartitionIdx, readStartTime, decodeStartTime, nextValidMessage.getValueSizeInBytes()); Review comment: Don't understand this comment. You are probably referring to the onUndecodeableRecord() method. Let's sync up on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339178) Time Spent: 2h 20m (was: 2h 10m) > Refactor Kafka extractor statistics tracking to allow code reuse across both > batch and streaming execution modes > > > Key: GOBBLIN-945 > URL: https://issues.apache.org/jira/browse/GOBBLIN-945 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-kafka >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Shirshanka Das >Priority: Major > Fix For: 0.15.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Current implementation of kafka extractor stats tracking is deeply integrated > with the batch implementation of KafkaExtractor preventing it from being used > in streaming Kafka extractor implementations. In addition to code reuse, the > refactoring allows for writing unit tests for statistics tracker. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…
sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342929380 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java ## @@ -216,25 +176,18 @@ public D readRecordImpl(D reuse) throws DataRecordException, IOException { D record = decodeKafkaMessage(nextValidMessage); - this.currentPartitionDecodeRecordTime += System.nanoTime() - decodeStartTime; - this.currentPartitionRecordCount++; - this.currentPartitionTotalSize += nextValidMessage.getValueSizeInBytes(); - this.currentPartitionReadRecordTime += System.nanoTime() - readStartTime; + this.statsTracker.onDecodeableRecord(this.currentPartitionIdx, readStartTime, decodeStartTime, nextValidMessage.getValueSizeInBytes()); Review comment: Don't understand this comment. You are probably referring to the onUndecodeableRecord() method. Let's sync up on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…
sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342929192 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractorStatsTracker.java ## @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.source.extractor.extract.kafka; + +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import com.google.common.collect.Maps; +import com.google.common.collect.Sets; + +import lombok.Data; +import lombok.Getter; +import lombok.extern.slf4j.Slf4j; + +import org.apache.gobblin.configuration.WorkUnitState; +import org.apache.gobblin.metrics.MetricContext; +import org.apache.gobblin.metrics.event.EventSubmitter; + + +/** + * A class that tracks KafkaExtractor statistics such as record decode time, #processed records, #undecodeable records etc. + * + */ +@Slf4j +public class KafkaExtractorStatsTracker { + // Constants for event submission + public static final String TOPIC = "topic"; + public static final String PARTITION = "partition"; + + private static final String GOBBLIN_KAFKA_NAMESPACE = "gobblin.kafka"; + private static final String KAFKA_EXTRACTOR_TOPIC_METADATA_EVENT_NAME = "KafkaExtractorTopicMetadata"; + private static final String LOW_WATERMARK = "lowWatermark"; + private static final String ACTUAL_HIGH_WATERMARK = "actualHighWatermark"; + private static final String EXPECTED_HIGH_WATERMARK = "expectedHighWatermark"; + private static final String ELAPSED_TIME = "elapsedTime"; + private static final String PROCESSED_RECORD_COUNT = "processedRecordCount"; + private static final String UNDECODABLE_MESSAGE_COUNT = "undecodableMessageCount"; + private static final String PARTITION_TOTAL_SIZE = "partitionTotalSize"; + private static final String AVG_RECORD_PULL_TIME = "avgRecordPullTime"; + private static final String AVG_RECORD_SIZE = "avgRecordSize"; + private static final String READ_RECORD_TIME = "readRecordTime"; + private static final String DECODE_RECORD_TIME = "decodeRecordTime"; + private static final String FETCH_MESSAGE_BUFFER_TIME = "fetchMessageBufferTime"; + private static final String LAST_RECORD_HEADER_TIMESTAMP = "lastRecordHeaderTimestamp"; + + @Getter + private final Map statsMap; + private final Set errorPartitions; + private final WorkUnitState workUnitState; + + //A global count of number of undecodeable messages encountered by the KafkaExtractor across all Kafka + //TopicPartitions. + @Getter + private int undecodableMessageCount = 0; + private List partitions; + + public KafkaExtractorStatsTracker(WorkUnitState state, List partitions) { Review comment: Actually, streaming extractor can get KafkaPartition the same way as KafkaExtractor does i.e. via KafkaUtils in the constructor. Further, Tag generation for tracking events needs the KafkaPartition objects. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes
[ https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=339177&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339177 ] ASF GitHub Bot logged work on GOBBLIN-945: -- Author: ASF GitHub Bot Created on: 06/Nov/19 06:07 Start Date: 06/Nov/19 06:07 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342929192 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractorStatsTracker.java ## @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.source.extractor.extract.kafka; + +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import com.google.common.collect.Maps; +import com.google.common.collect.Sets; + +import lombok.Data; +import lombok.Getter; +import lombok.extern.slf4j.Slf4j; + +import org.apache.gobblin.configuration.WorkUnitState; +import org.apache.gobblin.metrics.MetricContext; +import org.apache.gobblin.metrics.event.EventSubmitter; + + +/** + * A class that tracks KafkaExtractor statistics such as record decode time, #processed records, #undecodeable records etc. + * + */ +@Slf4j +public class KafkaExtractorStatsTracker { + // Constants for event submission + public static final String TOPIC = "topic"; + public static final String PARTITION = "partition"; + + private static final String GOBBLIN_KAFKA_NAMESPACE = "gobblin.kafka"; + private static final String KAFKA_EXTRACTOR_TOPIC_METADATA_EVENT_NAME = "KafkaExtractorTopicMetadata"; + private static final String LOW_WATERMARK = "lowWatermark"; + private static final String ACTUAL_HIGH_WATERMARK = "actualHighWatermark"; + private static final String EXPECTED_HIGH_WATERMARK = "expectedHighWatermark"; + private static final String ELAPSED_TIME = "elapsedTime"; + private static final String PROCESSED_RECORD_COUNT = "processedRecordCount"; + private static final String UNDECODABLE_MESSAGE_COUNT = "undecodableMessageCount"; + private static final String PARTITION_TOTAL_SIZE = "partitionTotalSize"; + private static final String AVG_RECORD_PULL_TIME = "avgRecordPullTime"; + private static final String AVG_RECORD_SIZE = "avgRecordSize"; + private static final String READ_RECORD_TIME = "readRecordTime"; + private static final String DECODE_RECORD_TIME = "decodeRecordTime"; + private static final String FETCH_MESSAGE_BUFFER_TIME = "fetchMessageBufferTime"; + private static final String LAST_RECORD_HEADER_TIMESTAMP = "lastRecordHeaderTimestamp"; + + @Getter + private final Map statsMap; + private final Set errorPartitions; + private final WorkUnitState workUnitState; + + //A global count of number of undecodeable messages encountered by the KafkaExtractor across all Kafka + //TopicPartitions. + @Getter + private int undecodableMessageCount = 0; + private List partitions; + + public KafkaExtractorStatsTracker(WorkUnitState state, List partitions) { Review comment: Actually, streaming extractor can get KafkaPartition the same way as KafkaExtractor does i.e. via KafkaUtils in the constructor. Further, Tag generation for tracking events needs the KafkaPartition objects. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339177) Time Spent: 2h 10m (was: 2h) > Refactor Kafka extractor statistics tracking to allow code reuse across both > batch and streaming execution modes > > > Key: GOBBLIN-945 >
[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes
[ https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=339174&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339174 ] ASF GitHub Bot logged work on GOBBLIN-945: -- Author: ASF GitHub Bot Created on: 06/Nov/19 06:03 Start Date: 06/Nov/19 06:03 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342928259 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaPartition.java ## @@ -31,6 +31,7 @@ private final int id; private final String topicName; private KafkaLeader leader; + private int hashCode; Review comment: Hmm. I can, but is there a significant benefit to using auto-generated hashCode? The only change I am making here is to cache the hashCode to avoid re-computing it every time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339174) Time Spent: 2h (was: 1h 50m) > Refactor Kafka extractor statistics tracking to allow code reuse across both > batch and streaming execution modes > > > Key: GOBBLIN-945 > URL: https://issues.apache.org/jira/browse/GOBBLIN-945 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-kafka >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Shirshanka Das >Priority: Major > Fix For: 0.15.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Current implementation of kafka extractor stats tracking is deeply integrated > with the batch implementation of KafkaExtractor preventing it from being used > in streaming Kafka extractor implementations. In addition to code reuse, the > refactoring allows for writing unit tests for statistics tracker. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…
sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342928259 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaPartition.java ## @@ -31,6 +31,7 @@ private final int id; private final String topicName; private KafkaLeader leader; + private int hashCode; Review comment: Hmm. I can, but is there a significant benefit to using auto-generated hashCode? The only change I am making here is to cache the hashCode to avoid re-computing it every time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes
[ https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=339171&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339171 ] ASF GitHub Bot logged work on GOBBLIN-945: -- Author: ASF GitHub Bot Created on: 06/Nov/19 05:57 Start Date: 06/Nov/19 05:57 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342927295 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaSource.java ## @@ -115,6 +114,7 @@ public static final String PREVIOUS_LATEST_OFFSET = "previousLatestOffset"; public static final String OFFSET_FETCH_EPOCH_TIME = "offsetFetchEpochTime"; public static final String PREVIOUS_OFFSET_FETCH_EPOCH_TIME = "previousOffsetFetchEpochTime"; + public static final String NUM_TOPIC_PARTITIONS = "numTopicPartitions"; Review comment: This was a small change for a different PR, and accidentally got pulled in. Will take it out and submit a separate PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339171) Time Spent: 1h 50m (was: 1h 40m) > Refactor Kafka extractor statistics tracking to allow code reuse across both > batch and streaming execution modes > > > Key: GOBBLIN-945 > URL: https://issues.apache.org/jira/browse/GOBBLIN-945 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-kafka >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Shirshanka Das >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Current implementation of kafka extractor stats tracking is deeply integrated > with the batch implementation of KafkaExtractor preventing it from being used > in streaming Kafka extractor implementations. In addition to code reuse, the > refactoring allows for writing unit tests for statistics tracker. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…
sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342927295 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaSource.java ## @@ -115,6 +114,7 @@ public static final String PREVIOUS_LATEST_OFFSET = "previousLatestOffset"; public static final String OFFSET_FETCH_EPOCH_TIME = "offsetFetchEpochTime"; public static final String PREVIOUS_OFFSET_FETCH_EPOCH_TIME = "previousOffsetFetchEpochTime"; + public static final String NUM_TOPIC_PARTITIONS = "numTopicPartitions"; Review comment: This was a small change for a different PR, and accidentally got pulled in. Will take it out and submit a separate PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes
[ https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=339165&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339165 ] ASF GitHub Bot logged work on GOBBLIN-945: -- Author: ASF GitHub Bot Created on: 06/Nov/19 05:53 Start Date: 06/Nov/19 05:53 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342926449 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java ## @@ -392,112 +303,16 @@ public long getExpectedRecordCount() { @Override public void close() throws IOException { -if (currentPartitionIdx != INITIAL_PARTITION_IDX) { - updateStatisticsForCurrentPartition(); +if (!allPartitionsFinished()) { Review comment: Yes, the current implementation is confusing when end of partitions is reached. It calls updateStatisticsForCurrentPartition(), but essentially does nothing inside the method, since recordCount == 0. The change IMO is more readable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339165) Time Spent: 1h 40m (was: 1.5h) > Refactor Kafka extractor statistics tracking to allow code reuse across both > batch and streaming execution modes > > > Key: GOBBLIN-945 > URL: https://issues.apache.org/jira/browse/GOBBLIN-945 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-kafka >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Shirshanka Das >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Current implementation of kafka extractor stats tracking is deeply integrated > with the batch implementation of KafkaExtractor preventing it from being used > in streaming Kafka extractor implementations. In addition to code reuse, the > refactoring allows for writing unit tests for statistics tracker. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…
sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342926449 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java ## @@ -392,112 +303,16 @@ public long getExpectedRecordCount() { @Override public void close() throws IOException { -if (currentPartitionIdx != INITIAL_PARTITION_IDX) { - updateStatisticsForCurrentPartition(); +if (!allPartitionsFinished()) { Review comment: Yes, the current implementation is confusing when end of partitions is reached. It calls updateStatisticsForCurrentPartition(), but essentially does nothing inside the method, since recordCount == 0. The change IMO is more readable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] autumnust commented on issue #2783: Update Hive-Avro-To-ORC-Converter.md for typos
autumnust commented on issue #2783: Update Hive-Avro-To-ORC-Converter.md for typos URL: https://github.com/apache/incubator-gobblin/pull/2783#issuecomment-550095354 > Ahhh the binary files won't be a problem, when we glob over them we can exclude anything save the `.md`'s we wish to spellcheck. Moreso I am wondering about how we can compile a list of spellcheck exceptions so that aspell doesn't blow up on the project specific jargon like "Avro". > > There might not be a nice way to do that programatically, luckily because aspell spits out .bak files we could diff and then gain a quick understanding by eye of common terms that are being flagged. Ah I misunderstood. I am not familiar with aspell and quickly googled it and seems there's no support for blacklist type of thing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] codecov-io commented on issue #2797: [GOBBLIN-948] add hive data node and descriptor
codecov-io commented on issue #2797: [GOBBLIN-948] add hive data node and descriptor URL: https://github.com/apache/incubator-gobblin/pull/2797#issuecomment-550088163 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=h1) Report > Merging [#2797](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc) will **increase** coverage by `<.01%`. > The diff coverage is `14.28%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2797 +/- ## + Coverage 45.32% 45.32% +<.01% - Complexity 8862 8863 +1 Files 1894 1896 +2 Lines 7091070931 +21 Branches 7799 7803 +4 + Hits 3214132152 +11 - Misses3580335811 +8 - Partials 2966 2968 +2 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...modules/flowgraph/datanodes/hive/HiveDataNode.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvZGF0YW5vZGVzL2hpdmUvSGl2ZURhdGFOb2RlLmphdmE=) | `0% <0%> (ø)` | `0 <0> (?)` | | | [.../service/modules/dataset/SqlDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L1NxbERhdGFzZXREZXNjcmlwdG9yLmphdmE=) | `64.86% <100%> (+0.97%)` | `7 <0> (ø)` | :arrow_down: | | [...service/modules/dataset/HiveDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0hpdmVEYXRhc2V0RGVzY3JpcHRvci5qYXZh) | `66.66% <66.66%> (ø)` | `1 <1> (?)` | | | [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `63.33% <0%> (-1.12%)` | `15% <0%> (-1%)` | | | [.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=) | `79.43% <0%> (-0.94%)` | `24% <0%> (ø)` | | | [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `64.81% <0%> (+1.38%)` | `28% <0%> (+1%)` | :arrow_up: | | [...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh) | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: | | ... and [2 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=footer). Last update [94a508b...beac3b2](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-948) add hive data node and hive descriptor for gobblin service
[ https://issues.apache.org/jira/browse/GOBBLIN-948?focusedWorklogId=339094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339094 ] ASF GitHub Bot logged work on GOBBLIN-948: -- Author: ASF GitHub Bot Created on: 06/Nov/19 00:38 Start Date: 06/Nov/19 00:38 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2797: [GOBBLIN-948] add hive data node and descriptor URL: https://github.com/apache/incubator-gobblin/pull/2797#issuecomment-550088163 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=h1) Report > Merging [#2797](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc) will **increase** coverage by `<.01%`. > The diff coverage is `14.28%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2797 +/- ## + Coverage 45.32% 45.32% +<.01% - Complexity 8862 8863 +1 Files 1894 1896 +2 Lines 7091070931 +21 Branches 7799 7803 +4 + Hits 3214132152 +11 - Misses3580335811 +8 - Partials 2966 2968 +2 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...modules/flowgraph/datanodes/hive/HiveDataNode.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvZGF0YW5vZGVzL2hpdmUvSGl2ZURhdGFOb2RlLmphdmE=) | `0% <0%> (ø)` | `0 <0> (?)` | | | [.../service/modules/dataset/SqlDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L1NxbERhdGFzZXREZXNjcmlwdG9yLmphdmE=) | `64.86% <100%> (+0.97%)` | `7 <0> (ø)` | :arrow_down: | | [...service/modules/dataset/HiveDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0hpdmVEYXRhc2V0RGVzY3JpcHRvci5qYXZh) | `66.66% <66.66%> (ø)` | `1 <1> (?)` | | | [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `63.33% <0%> (-1.12%)` | `15% <0%> (-1%)` | | | [.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=) | `79.43% <0%> (-0.94%)` | `24% <0%> (ø)` | | | [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `64.81% <0%> (+1.38%)` | `28% <0%> (+1%)` | :arrow_up: | | [...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh) | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: | | ... and [2 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=footer). Last update [94a508b...beac3b2](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-re
[jira] [Work logged] (GOBBLIN-940) Synchronization between workunit persistency and Helix job launching
[ https://issues.apache.org/jira/browse/GOBBLIN-940?focusedWorklogId=339078&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339078 ] ASF GitHub Bot logged work on GOBBLIN-940: -- Author: ASF GitHub Bot Created on: 06/Nov/19 00:08 Start Date: 06/Nov/19 00:08 Worklog Time Spent: 10m Work Description: asfgit commented on pull request #2789: [GOBBLIN-940]Add synchronization on workunit persistency before Helix job launching URL: https://github.com/apache/incubator-gobblin/pull/2789 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339078) Time Spent: 1h (was: 50m) > Synchronization between workunit persistency and Helix job launching > > > Key: GOBBLIN-940 > URL: https://issues.apache.org/jira/browse/GOBBLIN-940 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] asfgit closed pull request #2789: [GOBBLIN-940]Add synchronization on workunit persistency before Helix job launching
asfgit closed pull request #2789: [GOBBLIN-940]Add synchronization on workunit persistency before Helix job launching URL: https://github.com/apache/incubator-gobblin/pull/2789 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] bstaudacher commented on issue #2783: Update Hive-Avro-To-ORC-Converter.md for typos
bstaudacher commented on issue #2783: Update Hive-Avro-To-ORC-Converter.md for typos URL: https://github.com/apache/incubator-gobblin/pull/2783#issuecomment-550080700 Ahhh the binary files won't be a problem, when we glob over them we can exclude anything save the `.md`'s we wish to spellcheck. Moreso I am wondering about how we can compile a list of spellcheck exceptions so that aspell doesn't blow up on the project specific jargon like "Avro". There might not be a nice way to do that programatically, luckily because aspell spits out .bak files we could diff and then gain a quick understanding by eye of common terms that are being flagged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] autumnust commented on issue #2783: Update Hive-Avro-To-ORC-Converter.md for typos
autumnust commented on issue #2783: Update Hive-Avro-To-ORC-Converter.md for typos URL: https://github.com/apache/incubator-gobblin/pull/2783#issuecomment-550079172 > @autumnust I was looking to utilize [aspell](http://aspell.net/). > A good guide on basic usage can be found here: > https://alexwlchan.net/2016/09/please-use-aspell > > The only consideration that stands out to me would be how to easily compile a list of exceptions (Avro, ORC etc.) to place in the `~/.aspell.lang.pws` file. > > Thoughts? I am assuming all these binary files are named in a specific way (.avro for example). I would be interested to know how much Avro/ORC files sit in the repo since we really should just get rid of them by replacing with .json and deserialize in memory when necessary. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-897) Implement Local FS Spec Executor
[ https://issues.apache.org/jira/browse/GOBBLIN-897?focusedWorklogId=339063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339063 ] ASF GitHub Bot logged work on GOBBLIN-897: -- Author: ASF GitHub Bot Created on: 05/Nov/19 23:58 Start Date: 05/Nov/19 23:58 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir URL: https://github.com/apache/incubator-gobblin/pull/2755#issuecomment-539170866 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=h1) Report > Merging [#2755](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc) will **decrease** coverage by `0.02%`. > The diff coverage is `0%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #2755 +/- ## === - Coverage 45.32% 45.3% -0.03% + Complexity 88628861 -1 === Files 18941896 +2 Lines 70910 70951 +41 Branches 77997802 +3 === Hits 32141 32141 - Misses35803 35845 +42 + Partials 29662965 -1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ime/spec\_executorInstance/LocalFsSpecExecutor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvc3BlY19leGVjdXRvckluc3RhbmNlL0xvY2FsRnNTcGVjRXhlY3V0b3IuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...ime/spec\_executorInstance/LocalFsSpecProducer.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvc3BlY19leGVjdXRvckluc3RhbmNlL0xvY2FsRnNTcGVjUHJvZHVjZXIuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `32.71% <0%> (-2.81%)` | `11% <0%> (-1%)` | | | [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `63.33% <0%> (-1.12%)` | `15% <0%> (-1%)` | | | [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `64.81% <0%> (+1.38%)` | `28% <0%> (+1%)` | :arrow_up: | | [...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh) | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=footer). Last update [94a508b...fe75404](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339063) Time Spent: 2.5h (was: 2h 20m) > I
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir
codecov-io edited a comment on issue #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir URL: https://github.com/apache/incubator-gobblin/pull/2755#issuecomment-539170866 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=h1) Report > Merging [#2755](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc) will **decrease** coverage by `0.02%`. > The diff coverage is `0%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #2755 +/- ## === - Coverage 45.32% 45.3% -0.03% + Complexity 88628861 -1 === Files 18941896 +2 Lines 70910 70951 +41 Branches 77997802 +3 === Hits 32141 32141 - Misses35803 35845 +42 + Partials 29662965 -1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ime/spec\_executorInstance/LocalFsSpecExecutor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvc3BlY19leGVjdXRvckluc3RhbmNlL0xvY2FsRnNTcGVjRXhlY3V0b3IuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...ime/spec\_executorInstance/LocalFsSpecProducer.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvc3BlY19leGVjdXRvckluc3RhbmNlL0xvY2FsRnNTcGVjUHJvZHVjZXIuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `32.71% <0%> (-2.81%)` | `11% <0%> (-1%)` | | | [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `63.33% <0%> (-1.12%)` | `15% <0%> (-1%)` | | | [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `64.81% <0%> (+1.38%)` | `28% <0%> (+1%)` | :arrow_up: | | [...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh) | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=footer). Last update [94a508b...fe75404](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] bstaudacher commented on issue #2783: Update Hive-Avro-To-ORC-Converter.md for typos
bstaudacher commented on issue #2783: Update Hive-Avro-To-ORC-Converter.md for typos URL: https://github.com/apache/incubator-gobblin/pull/2783#issuecomment-550076702 @autumnust I was looking to utilize [aspell](http://aspell.net/). A good guide on basic usage can be found here: https://alexwlchan.net/2016/09/please-use-aspell The only consideration that stands out to me would be how to easily compile a list of exceptions (Avro, ORC etc.) to place in the `~/.aspell.lang.pws` file. Thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-897) Implement Local FS Spec Executor
[ https://issues.apache.org/jira/browse/GOBBLIN-897?focusedWorklogId=339056&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339056 ] ASF GitHub Bot logged work on GOBBLIN-897: -- Author: ASF GitHub Bot Created on: 05/Nov/19 23:48 Start Date: 05/Nov/19 23:48 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir URL: https://github.com/apache/incubator-gobblin/pull/2755#issuecomment-539170866 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=h1) Report > Merging [#2755](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc) will **decrease** coverage by `41.17%`. > The diff coverage is `0%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #2755 +/- ## - Coverage 45.32% 4.15% -41.18% + Complexity 8862 745 -8117 Files 18941896+2 Lines 70910 70951 +41 Branches 77997802+3 - Hits 321412945-29196 - Misses35803 67691+31888 + Partials 2966 315 -2651 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ime/spec\_executorInstance/LocalFsSpecExecutor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvc3BlY19leGVjdXRvckluc3RhbmNlL0xvY2FsRnNTcGVjRXhlY3V0b3IuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...ime/spec\_executorInstance/LocalFsSpecProducer.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvc3BlY19leGVjdXRvckluc3RhbmNlL0xvY2FsRnNTcGVjUHJvZHVjZXIuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...n/converter/AvroStringFieldDecryptorConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY3J5cHRvLXByb3ZpZGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9BdnJvU3RyaW5nRmllbGREZWNyeXB0b3JDb252ZXJ0ZXIuamF2YQ==) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | | | [...he/gobblin/cluster/TaskRunnerSuiteThreadModel.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvVGFza1J1bm5lclN1aXRlVGhyZWFkTW9kZWwuamF2YQ==) | `0% <0%> (-100%)` | `0% <0%> (-5%)` | | | [...n/mapreduce/avro/AvroKeyCompactorOutputFormat.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL2F2cm8vQXZyb0tleUNvbXBhY3Rvck91dHB1dEZvcm1hdC5qYXZh) | `0% <0%> (-100%)` | `0% <0%> (-3%)` | | | [...apache/gobblin/fork/CopyNotSupportedException.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZm9yay9Db3B5Tm90U3VwcG9ydGVkRXhjZXB0aW9uLmphdmE=) | `0% <0%> (-100%)` | `0% <0%> (-1%)` | | | [.../gobblin/kafka/writer/KafkaWriterCommonConfig.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2thZmthL3dyaXRlci9LYWZrYVdyaXRlckNvbW1vbkNvbmZpZy5qYXZh) | `0% <0%> (-100%)` | `0% <0%> (-7%)` | | | [...ker/task/TaskLevelPolicyCheckerBuilderFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3F1YWxpdHljaGVja2VyL3Rhc2svVGFza0xldmVsUG9saWN5Q2hlY2tlckJ1aWxkZXJGYWN0b3J5LmphdmE=) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | | | [...bblin/data/management/copy/AllEqualComparator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvQWxsRXF1YWxDb21wYXJhdG9yLmphdmE=) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | | | [...blin/co
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir
codecov-io edited a comment on issue #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir URL: https://github.com/apache/incubator-gobblin/pull/2755#issuecomment-539170866 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=h1) Report > Merging [#2755](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc) will **decrease** coverage by `41.17%`. > The diff coverage is `0%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #2755 +/- ## - Coverage 45.32% 4.15% -41.18% + Complexity 8862 745 -8117 Files 18941896+2 Lines 70910 70951 +41 Branches 77997802+3 - Hits 321412945-29196 - Misses35803 67691+31888 + Partials 2966 315 -2651 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ime/spec\_executorInstance/LocalFsSpecExecutor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvc3BlY19leGVjdXRvckluc3RhbmNlL0xvY2FsRnNTcGVjRXhlY3V0b3IuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...ime/spec\_executorInstance/LocalFsSpecProducer.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvc3BlY19leGVjdXRvckluc3RhbmNlL0xvY2FsRnNTcGVjUHJvZHVjZXIuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...n/converter/AvroStringFieldDecryptorConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY3J5cHRvLXByb3ZpZGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9BdnJvU3RyaW5nRmllbGREZWNyeXB0b3JDb252ZXJ0ZXIuamF2YQ==) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | | | [...he/gobblin/cluster/TaskRunnerSuiteThreadModel.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvVGFza1J1bm5lclN1aXRlVGhyZWFkTW9kZWwuamF2YQ==) | `0% <0%> (-100%)` | `0% <0%> (-5%)` | | | [...n/mapreduce/avro/AvroKeyCompactorOutputFormat.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL2F2cm8vQXZyb0tleUNvbXBhY3Rvck91dHB1dEZvcm1hdC5qYXZh) | `0% <0%> (-100%)` | `0% <0%> (-3%)` | | | [...apache/gobblin/fork/CopyNotSupportedException.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZm9yay9Db3B5Tm90U3VwcG9ydGVkRXhjZXB0aW9uLmphdmE=) | `0% <0%> (-100%)` | `0% <0%> (-1%)` | | | [.../gobblin/kafka/writer/KafkaWriterCommonConfig.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2thZmthL3dyaXRlci9LYWZrYVdyaXRlckNvbW1vbkNvbmZpZy5qYXZh) | `0% <0%> (-100%)` | `0% <0%> (-7%)` | | | [...ker/task/TaskLevelPolicyCheckerBuilderFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3F1YWxpdHljaGVja2VyL3Rhc2svVGFza0xldmVsUG9saWN5Q2hlY2tlckJ1aWxkZXJGYWN0b3J5LmphdmE=) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | | | [...bblin/data/management/copy/AllEqualComparator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvQWxsRXF1YWxDb21wYXJhdG9yLmphdmE=) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | | | [...blin/converter/string/ObjectToStringConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9zdHJpbmcvT2JqZWN0VG9TdHJpbmdDb252ZXJ0ZXIuamF2YQ==) | `0% <0%> (-100%)` | `0% <0%> (-3%)` | | | ... and [1092 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree-more) | | --
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2797: [GOBBLIN-948] add hive data node and descriptor
sv2000 commented on a change in pull request #2797: [GOBBLIN-948] add hive data node and descriptor URL: https://github.com/apache/incubator-gobblin/pull/2797#discussion_r342851353 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/SqlDatasetDescriptor.java ## @@ -52,6 +52,7 @@ private final Config rawConfig; public enum Platform { +HIVE("hive"), Review comment: Can we move the "hive" platform to the HiveDatasetDescriptor? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-948) add hive data node and hive descriptor for gobblin service
[ https://issues.apache.org/jira/browse/GOBBLIN-948?focusedWorklogId=339051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339051 ] ASF GitHub Bot logged work on GOBBLIN-948: -- Author: ASF GitHub Bot Created on: 05/Nov/19 23:33 Start Date: 05/Nov/19 23:33 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2797: [GOBBLIN-948] add hive data node and descriptor URL: https://github.com/apache/incubator-gobblin/pull/2797#discussion_r342851353 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/SqlDatasetDescriptor.java ## @@ -52,6 +52,7 @@ private final Config rawConfig; public enum Platform { +HIVE("hive"), Review comment: Can we move the "hive" platform to the HiveDatasetDescriptor? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339051) Time Spent: 20m (was: 10m) > add hive data node and hive descriptor for gobblin service > -- > > Key: GOBBLIN-948 > URL: https://issues.apache.org/jira/browse/GOBBLIN-948 > Project: Apache Gobblin > Issue Type: New Feature >Reporter: Arjun Singh Bora >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-948) add hive data node and hive descriptor for gobblin service
[ https://issues.apache.org/jira/browse/GOBBLIN-948?focusedWorklogId=339044&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339044 ] ASF GitHub Bot logged work on GOBBLIN-948: -- Author: ASF GitHub Bot Created on: 05/Nov/19 23:09 Start Date: 05/Nov/19 23:09 Worklog Time Spent: 10m Work Description: arjun4084346 commented on pull request #2797: [GOBBLIN-948] add hive data node and descriptor URL: https://github.com/apache/incubator-gobblin/pull/2797 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! @sv2000 @jack-moseley please review ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-948 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): added hive data node and hive dataset descriptor ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: a unit test ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339044) Remaining Estimate: 0h Time Spent: 10m > add hive data node and hive descriptor for gobblin service > -- > > Key: GOBBLIN-948 > URL: https://issues.apache.org/jira/browse/GOBBLIN-948 > Project: Apache Gobblin > Issue Type: New Feature >Reporter: Arjun Singh Bora >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] arjun4084346 opened a new pull request #2797: [GOBBLIN-948] add hive data node and descriptor
arjun4084346 opened a new pull request #2797: [GOBBLIN-948] add hive data node and descriptor URL: https://github.com/apache/incubator-gobblin/pull/2797 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! @sv2000 @jack-moseley please review ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-948 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): added hive data node and hive dataset descriptor ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: a unit test ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (GOBBLIN-948) add hive data node and hive descriptor for gobblin service
Arjun Singh Bora created GOBBLIN-948: Summary: add hive data node and hive descriptor for gobblin service Key: GOBBLIN-948 URL: https://issues.apache.org/jira/browse/GOBBLIN-948 Project: Apache Gobblin Issue Type: New Feature Reporter: Arjun Singh Bora -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-897) Implement Local FS Spec Executor
[ https://issues.apache.org/jira/browse/GOBBLIN-897?focusedWorklogId=339016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339016 ] ASF GitHub Bot logged work on GOBBLIN-897: -- Author: ASF GitHub Bot Created on: 05/Nov/19 21:51 Start Date: 05/Nov/19 21:51 Worklog Time Spent: 10m Work Description: Will-Lo commented on pull request #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir URL: https://github.com/apache/incubator-gobblin/pull/2755#discussion_r342816187 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/spec_executorInstance/LocalFsSpecProducer.java ## @@ -94,7 +96,14 @@ public LocalFsSpecProducer(Config config) { * @param headers*/ @Override public Future deleteSpec(URI deletedSpecURI, Properties headers) { -return new CompletedFuture<>(Boolean.TRUE, null); +String[] uriTokens = deletedSpecURI.getPath().split("/"); +String jobFileName = String.join("_", uriTokens) + ".job"; +File file = new File(jobFileName); +if (file.delete()) { + log.info("Deleted spec: {}", jobFileName); + return new CompletedFuture<>(Boolean.TRUE, null); +} +throw new RuntimeException(String.format("Failed to delete file with uri %s", deletedSpecURI)); Review comment: That's a good point actually. In the other `SpecProducers` they throw a runtime exception as well, would it affect anything? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339016) Time Spent: 2h 10m (was: 2h) > Implement Local FS Spec Executor > > > Key: GOBBLIN-897 > URL: https://issues.apache.org/jira/browse/GOBBLIN-897 > Project: Apache Gobblin > Issue Type: New Feature > Components: gobblin-core, gobblin-service >Affects Versions: 0.15.0 >Reporter: William Lo >Assignee: Abhishek Tiwari >Priority: Major > Fix For: 0.15.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] Will-Lo commented on a change in pull request #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir
Will-Lo commented on a change in pull request #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir URL: https://github.com/apache/incubator-gobblin/pull/2755#discussion_r342816187 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/spec_executorInstance/LocalFsSpecProducer.java ## @@ -94,7 +96,14 @@ public LocalFsSpecProducer(Config config) { * @param headers*/ @Override public Future deleteSpec(URI deletedSpecURI, Properties headers) { -return new CompletedFuture<>(Boolean.TRUE, null); +String[] uriTokens = deletedSpecURI.getPath().split("/"); +String jobFileName = String.join("_", uriTokens) + ".job"; +File file = new File(jobFileName); +if (file.delete()) { + log.info("Deleted spec: {}", jobFileName); + return new CompletedFuture<>(Boolean.TRUE, null); +} +throw new RuntimeException(String.format("Failed to delete file with uri %s", deletedSpecURI)); Review comment: That's a good point actually. In the other `SpecProducers` they throw a runtime exception as well, would it affect anything? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (GOBBLIN-947) Support "any" capability in IdentityFlowToJobSpecCompiler
Chen Guo created GOBBLIN-947: Summary: Support "any" capability in IdentityFlowToJobSpecCompiler Key: GOBBLIN-947 URL: https://issues.apache.org/jira/browse/GOBBLIN-947 Project: Apache Gobblin Issue Type: New Feature Reporter: Chen Guo Currently the IdentityFlowToJobSpecCompiler does a hard check on the capability string for both source and destination. "Any" is not supported in IdentityFlowToJobSpecCompiler as in MultiHopFlowCompiler. The behaviour should be made consistent across these two compilers. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-944) Provide the flexibility of deriving platform field in a different way for subclasses of BaseDatasetDescriptor
[ https://issues.apache.org/jira/browse/GOBBLIN-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Guo resolved GOBBLIN-944. -- Resolution: Won't Fix > Provide the flexibility of deriving platform field in a different way for > subclasses of BaseDatasetDescriptor > - > > Key: GOBBLIN-944 > URL: https://issues.apache.org/jira/browse/GOBBLIN-944 > Project: Apache Gobblin > Issue Type: New Feature >Reporter: Chen Guo >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Current BaseDatasetDescriptor always gets the platform from the configuration > object, but there could be use cases where the platform is derived in a > different way. We need to change the implementation of BaseDatasetDescriptor > constructor such that derived classes have the flexibility to change this > default behaviour. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-944) Provide the flexibility of deriving platform field in a different way for subclasses of BaseDatasetDescriptor
[ https://issues.apache.org/jira/browse/GOBBLIN-944?focusedWorklogId=339007&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339007 ] ASF GitHub Bot logged work on GOBBLIN-944: -- Author: ASF GitHub Bot Created on: 05/Nov/19 21:30 Start Date: 05/Nov/19 21:30 Worklog Time Spent: 10m Work Description: enjoyear commented on pull request #2794: [GOBBLIN-944] Provide the flexibility of deriving platform field in a different way for subclasses of BaseDatasetDescriptor URL: https://github.com/apache/incubator-gobblin/pull/2794 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339007) Time Spent: 40m (was: 0.5h) > Provide the flexibility of deriving platform field in a different way for > subclasses of BaseDatasetDescriptor > - > > Key: GOBBLIN-944 > URL: https://issues.apache.org/jira/browse/GOBBLIN-944 > Project: Apache Gobblin > Issue Type: New Feature >Reporter: Chen Guo >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Current BaseDatasetDescriptor always gets the platform from the configuration > object, but there could be use cases where the platform is derived in a > different way. We need to change the implementation of BaseDatasetDescriptor > constructor such that derived classes have the flexibility to change this > default behaviour. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] enjoyear closed pull request #2794: [GOBBLIN-944] Provide the flexibility of deriving platform field in a different way for subclasses of BaseDatasetDescriptor
enjoyear closed pull request #2794: [GOBBLIN-944] Provide the flexibility of deriving platform field in a different way for subclasses of BaseDatasetDescriptor URL: https://github.com/apache/incubator-gobblin/pull/2794 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-933) JsonRecordAvroSchemaToAvroConverter does not handle arrays of unions
[ https://issues.apache.org/jira/browse/GOBBLIN-933?focusedWorklogId=338997&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338997 ] ASF GitHub Bot logged work on GOBBLIN-933: -- Author: ASF GitHub Bot Created on: 05/Nov/19 21:00 Start Date: 05/Nov/19 21:00 Worklog Time Spent: 10m Work Description: arjun4084346 commented on pull request #2792: [GOBBLIN-933] add support for array of unions in json schema URL: https://github.com/apache/incubator-gobblin/pull/2792#discussion_r342794498 ## File path: gobblin-core/src/main/java/org/apache/gobblin/converter/avro/JsonElementConversionWithAvroSchemaFactory.java ## @@ -210,4 +215,49 @@ public Schema schema() { return this.schema; } } + + public static class UnionConverter extends ComplexConverter { +private final Schema firstSchema; +private final Schema secondSchema; +private final JsonElementConverter firstConverter; +private final JsonElementConverter secondConverter; + +public UnionConverter(String fieldName, boolean nullable, String sourceType, Schema schemaNode, +WorkUnitState state, List ignoreFields) throws UnsupportedDateTypeException { + super(fieldName, nullable, sourceType); + List types = schemaNode.getTypes(); + if (types.size() != 2) { +throw new RuntimeException("Unions must be size 2."); Review comment: the existing code only supports two types. https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/converter/avro/JsonRecordAvroSchemaToAvroConverter.java#L93 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338997) Time Spent: 2h 10m (was: 2h) > JsonRecordAvroSchemaToAvroConverter does not handle arrays of unions > > > Key: GOBBLIN-933 > URL: https://issues.apache.org/jira/browse/GOBBLIN-933 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Ahmed Abdul Hamid >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Using {{JsonRecordAvroSchemaToAvroConverter}} to convert an array of a union > type fails. For instance, using it with the following Avro schema: > {code:java} > { > "name": "arrayField", > "type": { > "type": "array", > "items": ["string", "null"] > } > } {code} > yields the following error: > {code:java} > java.lang.StackOverflowError > at org.apache.gobblin.configuration.State.getProp(State.java) > at > org.apache.gobblin.configuration.WorkUnitState.getProp(WorkUnitState.java:333) > at > org.apache.gobblin.converter.avro.JsonElementConversionFactory.getConvertor(JsonElementConversionFactory.java:106) > at > org.apache.gobblin.converter.avro.JsonElementConversionFactory$UnionConverter.getConverter(JsonElementConversionFactory.java:737) > at > org.apache.gobblin.converter.avro.JsonElementConversionFactory$UnionConverter.(JsonElementConversionFactory.java:729) > at > org.apache.gobblin.converter.avro.JsonElementConversionFactory.getConvertor(JsonElementConversionFactory.java:160) > at > org.apache.gobblin.converter.avro.JsonElementConversionFactory$UnionConverter.getConverter(JsonElementConversionFactory.java:737) > at > org.apache.gobblin.converter.avro.JsonElementConversionFactory$UnionConverter.(JsonElementConversionFactory.java:729) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] arjun4084346 commented on a change in pull request #2792: [GOBBLIN-933] add support for array of unions in json schema
arjun4084346 commented on a change in pull request #2792: [GOBBLIN-933] add support for array of unions in json schema URL: https://github.com/apache/incubator-gobblin/pull/2792#discussion_r342794498 ## File path: gobblin-core/src/main/java/org/apache/gobblin/converter/avro/JsonElementConversionWithAvroSchemaFactory.java ## @@ -210,4 +215,49 @@ public Schema schema() { return this.schema; } } + + public static class UnionConverter extends ComplexConverter { +private final Schema firstSchema; +private final Schema secondSchema; +private final JsonElementConverter firstConverter; +private final JsonElementConverter secondConverter; + +public UnionConverter(String fieldName, boolean nullable, String sourceType, Schema schemaNode, +WorkUnitState state, List ignoreFields) throws UnsupportedDateTypeException { + super(fieldName, nullable, sourceType); + List types = schemaNode.getTypes(); + if (types.size() != 2) { +throw new RuntimeException("Unions must be size 2."); Review comment: the existing code only supports two types. https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/converter/avro/JsonRecordAvroSchemaToAvroConverter.java#L93 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-946) Support HTTP source in Gobblin Service
[ https://issues.apache.org/jira/browse/GOBBLIN-946?focusedWorklogId=338995&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338995 ] ASF GitHub Bot logged work on GOBBLIN-946: -- Author: ASF GitHub Bot Created on: 05/Nov/19 20:57 Start Date: 05/Nov/19 20:57 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2796: [GOBBLIN-946] Add HttpDatasetDescriptor and HttpDataNode to Gobblin Service URL: https://github.com/apache/incubator-gobblin/pull/2796#issuecomment-550016552 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=h1) Report > Merging [#2796](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc) will **increase** coverage by `0.01%`. > The diff coverage is `61.76%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2796 +/- ## + Coverage 45.32% 45.34% +0.01% - Complexity 8862 8869 +7 Files 1894 1896 +2 Lines 7091070943 +33 Branches 7799 7802 +3 + Hits 3214132169 +28 - Misses3580335806 +3 - Partials 2966 2968 +2 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [.../modules/flowgraph/FlowGraphConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvRmxvd0dyYXBoQ29uZmlndXJhdGlvbktleXMuamF2YQ==) | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: | | [.../service/modules/dataset/SqlDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L1NxbERhdGFzZXREZXNjcmlwdG9yLmphdmE=) | `63.88% <100%> (ø)` | `7 <0> (ø)` | :arrow_down: | | [...vice/modules/flowgraph/datanodes/HttpDataNode.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvZGF0YW5vZGVzL0h0dHBEYXRhTm9kZS5qYXZh) | `41.66% <41.66%> (ø)` | `1 <1> (?)` | | | [...service/modules/dataset/HttpDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0h0dHBEYXRhc2V0RGVzY3JpcHRvci5qYXZh) | `71.42% <71.42%> (ø)` | `5 <5> (?)` | | | [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `63.33% <0%> (-1.12%)` | `15% <0%> (-1%)` | | | [.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=) | `79.43% <0%> (-0.94%)` | `24% <0%> (ø)` | | | [...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh) | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: | | [...lin/restli/throttling/ZookeeperLeaderElection.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2Utc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3Jlc3RsaS90aHJvdHRsaW5nL1pvb2tlZXBlckxlYWRlckVsZWN0aW9uLmphdmE=) | `72.22% <0%> (+2.22%)` | `13% <0%> (ø)` | :arrow_down: | | ... and [3 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?
[GitHub] [incubator-gobblin] codecov-io commented on issue #2796: [GOBBLIN-946] Add HttpDatasetDescriptor and HttpDataNode to Gobblin Service
codecov-io commented on issue #2796: [GOBBLIN-946] Add HttpDatasetDescriptor and HttpDataNode to Gobblin Service URL: https://github.com/apache/incubator-gobblin/pull/2796#issuecomment-550016552 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=h1) Report > Merging [#2796](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc) will **increase** coverage by `0.01%`. > The diff coverage is `61.76%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2796 +/- ## + Coverage 45.32% 45.34% +0.01% - Complexity 8862 8869 +7 Files 1894 1896 +2 Lines 7091070943 +33 Branches 7799 7802 +3 + Hits 3214132169 +28 - Misses3580335806 +3 - Partials 2966 2968 +2 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [.../modules/flowgraph/FlowGraphConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvRmxvd0dyYXBoQ29uZmlndXJhdGlvbktleXMuamF2YQ==) | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: | | [.../service/modules/dataset/SqlDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L1NxbERhdGFzZXREZXNjcmlwdG9yLmphdmE=) | `63.88% <100%> (ø)` | `7 <0> (ø)` | :arrow_down: | | [...vice/modules/flowgraph/datanodes/HttpDataNode.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvZGF0YW5vZGVzL0h0dHBEYXRhTm9kZS5qYXZh) | `41.66% <41.66%> (ø)` | `1 <1> (?)` | | | [...service/modules/dataset/HttpDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0h0dHBEYXRhc2V0RGVzY3JpcHRvci5qYXZh) | `71.42% <71.42%> (ø)` | `5 <5> (?)` | | | [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `63.33% <0%> (-1.12%)` | `15% <0%> (-1%)` | | | [.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=) | `79.43% <0%> (-0.94%)` | `24% <0%> (ø)` | | | [...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh) | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: | | [...lin/restli/throttling/ZookeeperLeaderElection.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2Utc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3Jlc3RsaS90aHJvdHRsaW5nL1pvb2tlZXBlckxlYWRlckVsZWN0aW9uLmphdmE=) | `72.22% <0%> (+2.22%)` | `13% <0%> (ø)` | :arrow_down: | | ... and [3 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=footer). Last update [94a508b...dbbc0e7](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=lastupdated). Read the [comment docs](https://
[jira] [Work logged] (GOBBLIN-933) JsonRecordAvroSchemaToAvroConverter does not handle arrays of unions
[ https://issues.apache.org/jira/browse/GOBBLIN-933?focusedWorklogId=338993&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338993 ] ASF GitHub Bot logged work on GOBBLIN-933: -- Author: ASF GitHub Bot Created on: 05/Nov/19 20:56 Start Date: 05/Nov/19 20:56 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2792: [GOBBLIN-933] add support for array of unions in json schema URL: https://github.com/apache/incubator-gobblin/pull/2792#discussion_r342770317 ## File path: gobblin-core/src/main/java/org/apache/gobblin/converter/avro/JsonElementConversionWithAvroSchemaFactory.java ## @@ -210,4 +215,49 @@ public Schema schema() { return this.schema; } } + + public static class UnionConverter extends ComplexConverter { +private final Schema firstSchema; +private final Schema secondSchema; +private final JsonElementConverter firstConverter; +private final JsonElementConverter secondConverter; + +public UnionConverter(String fieldName, boolean nullable, String sourceType, Schema schemaNode, +WorkUnitState state, List ignoreFields) throws UnsupportedDateTypeException { + super(fieldName, nullable, sourceType); + List types = schemaNode.getTypes(); + if (types.size() != 2) { +throw new RuntimeException("Unions must be size 2."); Review comment: This is not a valid assumption in the context of Avro schema. What's the reason to enforce `size == 2` for union here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338993) Time Spent: 2h (was: 1h 50m) > JsonRecordAvroSchemaToAvroConverter does not handle arrays of unions > > > Key: GOBBLIN-933 > URL: https://issues.apache.org/jira/browse/GOBBLIN-933 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Ahmed Abdul Hamid >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Using {{JsonRecordAvroSchemaToAvroConverter}} to convert an array of a union > type fails. For instance, using it with the following Avro schema: > {code:java} > { > "name": "arrayField", > "type": { > "type": "array", > "items": ["string", "null"] > } > } {code} > yields the following error: > {code:java} > java.lang.StackOverflowError > at org.apache.gobblin.configuration.State.getProp(State.java) > at > org.apache.gobblin.configuration.WorkUnitState.getProp(WorkUnitState.java:333) > at > org.apache.gobblin.converter.avro.JsonElementConversionFactory.getConvertor(JsonElementConversionFactory.java:106) > at > org.apache.gobblin.converter.avro.JsonElementConversionFactory$UnionConverter.getConverter(JsonElementConversionFactory.java:737) > at > org.apache.gobblin.converter.avro.JsonElementConversionFactory$UnionConverter.(JsonElementConversionFactory.java:729) > at > org.apache.gobblin.converter.avro.JsonElementConversionFactory.getConvertor(JsonElementConversionFactory.java:160) > at > org.apache.gobblin.converter.avro.JsonElementConversionFactory$UnionConverter.getConverter(JsonElementConversionFactory.java:737) > at > org.apache.gobblin.converter.avro.JsonElementConversionFactory$UnionConverter.(JsonElementConversionFactory.java:729) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2792: [GOBBLIN-933] add support for array of unions in json schema
autumnust commented on a change in pull request #2792: [GOBBLIN-933] add support for array of unions in json schema URL: https://github.com/apache/incubator-gobblin/pull/2792#discussion_r342770317 ## File path: gobblin-core/src/main/java/org/apache/gobblin/converter/avro/JsonElementConversionWithAvroSchemaFactory.java ## @@ -210,4 +215,49 @@ public Schema schema() { return this.schema; } } + + public static class UnionConverter extends ComplexConverter { +private final Schema firstSchema; +private final Schema secondSchema; +private final JsonElementConverter firstConverter; +private final JsonElementConverter secondConverter; + +public UnionConverter(String fieldName, boolean nullable, String sourceType, Schema schemaNode, +WorkUnitState state, List ignoreFields) throws UnsupportedDateTypeException { + super(fieldName, nullable, sourceType); + List types = schemaNode.getTypes(); + if (types.size() != 2) { +throw new RuntimeException("Unions must be size 2."); Review comment: This is not a valid assumption in the context of Avro schema. What's the reason to enforce `size == 2` for union here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes
[ https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338974&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338974 ] ASF GitHub Bot logged work on GOBBLIN-945: -- Author: ASF GitHub Bot Created on: 05/Nov/19 19:58 Start Date: 05/Nov/19 19:58 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342768883 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractorStatsTracker.java ## @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.source.extractor.extract.kafka; + +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import com.google.common.collect.Maps; +import com.google.common.collect.Sets; + +import lombok.Data; +import lombok.Getter; +import lombok.extern.slf4j.Slf4j; + +import org.apache.gobblin.configuration.WorkUnitState; +import org.apache.gobblin.metrics.MetricContext; +import org.apache.gobblin.metrics.event.EventSubmitter; + + +/** + * A class that tracks KafkaExtractor statistics such as record decode time, #processed records, #undecodeable records etc. + * + */ +@Slf4j +public class KafkaExtractorStatsTracker { + // Constants for event submission + public static final String TOPIC = "topic"; + public static final String PARTITION = "partition"; + + private static final String GOBBLIN_KAFKA_NAMESPACE = "gobblin.kafka"; + private static final String KAFKA_EXTRACTOR_TOPIC_METADATA_EVENT_NAME = "KafkaExtractorTopicMetadata"; + private static final String LOW_WATERMARK = "lowWatermark"; + private static final String ACTUAL_HIGH_WATERMARK = "actualHighWatermark"; + private static final String EXPECTED_HIGH_WATERMARK = "expectedHighWatermark"; + private static final String ELAPSED_TIME = "elapsedTime"; + private static final String PROCESSED_RECORD_COUNT = "processedRecordCount"; + private static final String UNDECODABLE_MESSAGE_COUNT = "undecodableMessageCount"; + private static final String PARTITION_TOTAL_SIZE = "partitionTotalSize"; + private static final String AVG_RECORD_PULL_TIME = "avgRecordPullTime"; + private static final String AVG_RECORD_SIZE = "avgRecordSize"; + private static final String READ_RECORD_TIME = "readRecordTime"; + private static final String DECODE_RECORD_TIME = "decodeRecordTime"; + private static final String FETCH_MESSAGE_BUFFER_TIME = "fetchMessageBufferTime"; + private static final String LAST_RECORD_HEADER_TIMESTAMP = "lastRecordHeaderTimestamp"; + + @Getter + private final Map statsMap; + private final Set errorPartitions; + private final WorkUnitState workUnitState; + + //A global count of number of undecodeable messages encountered by the KafkaExtractor across all Kafka + //TopicPartitions. + @Getter + private int undecodableMessageCount = 0; + private List partitions; + + public KafkaExtractorStatsTracker(WorkUnitState state, List partitions) { Review comment: A little bit nit-pick here but for ease of usage in streaming-extractor as well: Streaming extractor doesn't contain a list of `KafkaPartition` objects, what it really needs is the partition id that kafka-client can be assigned and consume, so does here: We just need an identifier of a kafka-partition as the key of `statsMap`. It would be easier if we use `List` with the partitionId as the value here as the constructor arguments, so that you don't have to reassemble a `KafkaPartition` object in streaming-extractor This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id:
[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes
[ https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338969 ] ASF GitHub Bot logged work on GOBBLIN-945: -- Author: ASF GitHub Bot Created on: 05/Nov/19 19:58 Start Date: 05/Nov/19 19:58 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342726471 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaPartition.java ## @@ -31,6 +31,7 @@ private final int id; private final String topicName; private KafkaLeader leader; + private int hashCode; Review comment: Can you use auto-generated `hashCode` using IDE instead ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338969) Time Spent: 50m (was: 40m) > Refactor Kafka extractor statistics tracking to allow code reuse across both > batch and streaming execution modes > > > Key: GOBBLIN-945 > URL: https://issues.apache.org/jira/browse/GOBBLIN-945 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-kafka >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Shirshanka Das >Priority: Major > Fix For: 0.15.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Current implementation of kafka extractor stats tracking is deeply integrated > with the batch implementation of KafkaExtractor preventing it from being used > in streaming Kafka extractor implementations. In addition to code reuse, the > refactoring allows for writing unit tests for statistics tracker. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes
[ https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338971 ] ASF GitHub Bot logged work on GOBBLIN-945: -- Author: ASF GitHub Bot Created on: 05/Nov/19 19:58 Start Date: 05/Nov/19 19:58 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342727497 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaSource.java ## @@ -115,6 +114,7 @@ public static final String PREVIOUS_LATEST_OFFSET = "previousLatestOffset"; public static final String OFFSET_FETCH_EPOCH_TIME = "offsetFetchEpochTime"; public static final String PREVIOUS_OFFSET_FETCH_EPOCH_TIME = "previousOffsetFetchEpochTime"; + public static final String NUM_TOPIC_PARTITIONS = "numTopicPartitions"; Review comment: Where is this being used? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338971) Time Spent: 1h 10m (was: 1h) > Refactor Kafka extractor statistics tracking to allow code reuse across both > batch and streaming execution modes > > > Key: GOBBLIN-945 > URL: https://issues.apache.org/jira/browse/GOBBLIN-945 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-kafka >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Shirshanka Das >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Current implementation of kafka extractor stats tracking is deeply integrated > with the batch implementation of KafkaExtractor preventing it from being used > in streaming Kafka extractor implementations. In addition to code reuse, the > refactoring allows for writing unit tests for statistics tracker. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes
[ https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338972&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338972 ] ASF GitHub Bot logged work on GOBBLIN-945: -- Author: ASF GitHub Bot Created on: 05/Nov/19 19:58 Start Date: 05/Nov/19 19:58 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342734740 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java ## @@ -216,25 +176,18 @@ public D readRecordImpl(D reuse) throws DataRecordException, IOException { D record = decodeKafkaMessage(nextValidMessage); - this.currentPartitionDecodeRecordTime += System.nanoTime() - decodeStartTime; - this.currentPartitionRecordCount++; - this.currentPartitionTotalSize += nextValidMessage.getValueSizeInBytes(); - this.currentPartitionReadRecordTime += System.nanoTime() - readStartTime; + this.statsTracker.onDecodeableRecord(this.currentPartitionIdx, readStartTime, decodeStartTime, nextValidMessage.getValueSizeInBytes()); Review comment: Seems there are some difference in terms of the intention for these two catch blocks, the former one indicates that there are exceptions thrown when current partition is drained and need to move to the next one, while the latter indicates "undecodability". Shall we preserve original intention here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338972) Time Spent: 1h 20m (was: 1h 10m) > Refactor Kafka extractor statistics tracking to allow code reuse across both > batch and streaming execution modes > > > Key: GOBBLIN-945 > URL: https://issues.apache.org/jira/browse/GOBBLIN-945 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-kafka >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Shirshanka Das >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Current implementation of kafka extractor stats tracking is deeply integrated > with the batch implementation of KafkaExtractor preventing it from being used > in streaming Kafka extractor implementations. In addition to code reuse, the > refactoring allows for writing unit tests for statistics tracker. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes
[ https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338970&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338970 ] ASF GitHub Bot logged work on GOBBLIN-945: -- Author: ASF GitHub Bot Created on: 05/Nov/19 19:58 Start Date: 05/Nov/19 19:58 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342739486 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java ## @@ -392,112 +303,16 @@ public long getExpectedRecordCount() { @Override public void close() throws IOException { -if (currentPartitionIdx != INITIAL_PARTITION_IDX) { - updateStatisticsForCurrentPartition(); +if (!allPartitionsFinished()) { Review comment: condition for `allPartitionsFinished()` is stronger than `currentIdx != INITIAL_PARTITION_IDX` where the former requires ending of current partition as well. Is that expected ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338970) Time Spent: 1h (was: 50m) > Refactor Kafka extractor statistics tracking to allow code reuse across both > batch and streaming execution modes > > > Key: GOBBLIN-945 > URL: https://issues.apache.org/jira/browse/GOBBLIN-945 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-kafka >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Shirshanka Das >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Current implementation of kafka extractor stats tracking is deeply integrated > with the batch implementation of KafkaExtractor preventing it from being used > in streaming Kafka extractor implementations. In addition to code reuse, the > refactoring allows for writing unit tests for statistics tracker. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes
[ https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338973 ] ASF GitHub Bot logged work on GOBBLIN-945: -- Author: ASF GitHub Bot Created on: 05/Nov/19 19:58 Start Date: 05/Nov/19 19:58 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342765903 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/kafka/client/GobblinKafkaConsumerClient.java ## @@ -72,6 +73,15 @@ */ public long getLatestOffset(KafkaPartition partition) throws KafkaOffsetRetrievalFailureException; + /** + * Get the latest available offset for a partition + * + * @param partitions for which latest offset is retrieved + * + * @throws UnsupportedOperationException - If the underlying kafka-client does not support getting latest offset + */ + public Map getLatestOffsets(Collection partitions) throws KafkaOffsetRetrievalFailureException; Review comment: Make a default impl. in interface instead ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338973) Time Spent: 1.5h (was: 1h 20m) > Refactor Kafka extractor statistics tracking to allow code reuse across both > batch and streaming execution modes > > > Key: GOBBLIN-945 > URL: https://issues.apache.org/jira/browse/GOBBLIN-945 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-kafka >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Shirshanka Das >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Current implementation of kafka extractor stats tracking is deeply integrated > with the batch implementation of KafkaExtractor preventing it from being used > in streaming Kafka extractor implementations. In addition to code reuse, the > refactoring allows for writing unit tests for statistics tracker. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…
autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342739486 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java ## @@ -392,112 +303,16 @@ public long getExpectedRecordCount() { @Override public void close() throws IOException { -if (currentPartitionIdx != INITIAL_PARTITION_IDX) { - updateStatisticsForCurrentPartition(); +if (!allPartitionsFinished()) { Review comment: condition for `allPartitionsFinished()` is stronger than `currentIdx != INITIAL_PARTITION_IDX` where the former requires ending of current partition as well. Is that expected ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…
autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342768883 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractorStatsTracker.java ## @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.source.extractor.extract.kafka; + +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import com.google.common.collect.Maps; +import com.google.common.collect.Sets; + +import lombok.Data; +import lombok.Getter; +import lombok.extern.slf4j.Slf4j; + +import org.apache.gobblin.configuration.WorkUnitState; +import org.apache.gobblin.metrics.MetricContext; +import org.apache.gobblin.metrics.event.EventSubmitter; + + +/** + * A class that tracks KafkaExtractor statistics such as record decode time, #processed records, #undecodeable records etc. + * + */ +@Slf4j +public class KafkaExtractorStatsTracker { + // Constants for event submission + public static final String TOPIC = "topic"; + public static final String PARTITION = "partition"; + + private static final String GOBBLIN_KAFKA_NAMESPACE = "gobblin.kafka"; + private static final String KAFKA_EXTRACTOR_TOPIC_METADATA_EVENT_NAME = "KafkaExtractorTopicMetadata"; + private static final String LOW_WATERMARK = "lowWatermark"; + private static final String ACTUAL_HIGH_WATERMARK = "actualHighWatermark"; + private static final String EXPECTED_HIGH_WATERMARK = "expectedHighWatermark"; + private static final String ELAPSED_TIME = "elapsedTime"; + private static final String PROCESSED_RECORD_COUNT = "processedRecordCount"; + private static final String UNDECODABLE_MESSAGE_COUNT = "undecodableMessageCount"; + private static final String PARTITION_TOTAL_SIZE = "partitionTotalSize"; + private static final String AVG_RECORD_PULL_TIME = "avgRecordPullTime"; + private static final String AVG_RECORD_SIZE = "avgRecordSize"; + private static final String READ_RECORD_TIME = "readRecordTime"; + private static final String DECODE_RECORD_TIME = "decodeRecordTime"; + private static final String FETCH_MESSAGE_BUFFER_TIME = "fetchMessageBufferTime"; + private static final String LAST_RECORD_HEADER_TIMESTAMP = "lastRecordHeaderTimestamp"; + + @Getter + private final Map statsMap; + private final Set errorPartitions; + private final WorkUnitState workUnitState; + + //A global count of number of undecodeable messages encountered by the KafkaExtractor across all Kafka + //TopicPartitions. + @Getter + private int undecodableMessageCount = 0; + private List partitions; + + public KafkaExtractorStatsTracker(WorkUnitState state, List partitions) { Review comment: A little bit nit-pick here but for ease of usage in streaming-extractor as well: Streaming extractor doesn't contain a list of `KafkaPartition` objects, what it really needs is the partition id that kafka-client can be assigned and consume, so does here: We just need an identifier of a kafka-partition as the key of `statsMap`. It would be easier if we use `List` with the partitionId as the value here as the constructor arguments, so that you don't have to reassemble a `KafkaPartition` object in streaming-extractor This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…
autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342734740 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java ## @@ -216,25 +176,18 @@ public D readRecordImpl(D reuse) throws DataRecordException, IOException { D record = decodeKafkaMessage(nextValidMessage); - this.currentPartitionDecodeRecordTime += System.nanoTime() - decodeStartTime; - this.currentPartitionRecordCount++; - this.currentPartitionTotalSize += nextValidMessage.getValueSizeInBytes(); - this.currentPartitionReadRecordTime += System.nanoTime() - readStartTime; + this.statsTracker.onDecodeableRecord(this.currentPartitionIdx, readStartTime, decodeStartTime, nextValidMessage.getValueSizeInBytes()); Review comment: Seems there are some difference in terms of the intention for these two catch blocks, the former one indicates that there are exceptions thrown when current partition is drained and need to move to the next one, while the latter indicates "undecodability". Shall we preserve original intention here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…
autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342727497 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaSource.java ## @@ -115,6 +114,7 @@ public static final String PREVIOUS_LATEST_OFFSET = "previousLatestOffset"; public static final String OFFSET_FETCH_EPOCH_TIME = "offsetFetchEpochTime"; public static final String PREVIOUS_OFFSET_FETCH_EPOCH_TIME = "previousOffsetFetchEpochTime"; + public static final String NUM_TOPIC_PARTITIONS = "numTopicPartitions"; Review comment: Where is this being used? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…
autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342765903 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/kafka/client/GobblinKafkaConsumerClient.java ## @@ -72,6 +73,15 @@ */ public long getLatestOffset(KafkaPartition partition) throws KafkaOffsetRetrievalFailureException; + /** + * Get the latest available offset for a partition + * + * @param partitions for which latest offset is retrieved + * + * @throws UnsupportedOperationException - If the underlying kafka-client does not support getting latest offset + */ + public Map getLatestOffsets(Collection partitions) throws KafkaOffsetRetrievalFailureException; Review comment: Make a default impl. in interface instead ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…
autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342726471 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaPartition.java ## @@ -31,6 +31,7 @@ private final int id; private final String topicName; private KafkaLeader leader; + private int hashCode; Review comment: Can you use auto-generated `hashCode` using IDE instead ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-946) Support HTTP source in Gobblin Service
[ https://issues.apache.org/jira/browse/GOBBLIN-946?focusedWorklogId=338963&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338963 ] ASF GitHub Bot logged work on GOBBLIN-946: -- Author: ASF GitHub Bot Created on: 05/Nov/19 19:45 Start Date: 05/Nov/19 19:45 Worklog Time Spent: 10m Work Description: haojiliu commented on pull request #2796: [GOBBLIN-946] Add HttpDatasetDescriptor and HttpDataNode to Gobblin Service URL: https://github.com/apache/incubator-gobblin/pull/2796 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [ ] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-946 ### Description - [ ] Here are some details about my PR, including screenshots (if applicable): Adding support for transmitting data from a http/https source. This is the first effort and minimum number of attributes were introduced. A HttpDataNode will specify two attributes: 1. domain - This is the domain component of a http url 2. authentication type - This is the authentication protocol supported for a given domain A HttpDatasetDescriptor will specify one attribute: 1. path - This is the full url of the data source's endpoint, e.g, https://a.b.c.com/api/dataset_abc ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Custom logics are covered by unit tests ### Commits - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338963) Remaining Estimate: 35h 50m (was: 36h) Time Spent: 10m > Support HTTP source in Gobblin Service > -- > > Key: GOBBLIN-946 > URL: https://issues.apache.org/jira/browse/GOBBLIN-946 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-service >Reporter: Haoji Liu >Assignee: Abhishek Tiwari >Priority: Major > Original Estimate: 36h > Time Spent: 10m > Remaining Estimate: 35h 50m > > Adding support for transmitting data from a http/https source. This will > require a new DataNode and DatasetDescriptor class. > > Minimal logic should be implemented, logics should reside in the actual > gobblin connectors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-946) Support HTTP source in Gobblin Service
Haoji Liu created GOBBLIN-946: - Summary: Support HTTP source in Gobblin Service Key: GOBBLIN-946 URL: https://issues.apache.org/jira/browse/GOBBLIN-946 Project: Apache Gobblin Issue Type: Improvement Components: gobblin-service Reporter: Haoji Liu Assignee: Abhishek Tiwari Adding support for transmitting data from a http/https source. This will require a new DataNode and DatasetDescriptor class. Minimal logic should be implemented, logics should reside in the actual gobblin connectors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] haojiliu opened a new pull request #2796: [GOBBLIN-946] Add HttpDatasetDescriptor and HttpDataNode to Gobblin Service
haojiliu opened a new pull request #2796: [GOBBLIN-946] Add HttpDatasetDescriptor and HttpDataNode to Gobblin Service URL: https://github.com/apache/incubator-gobblin/pull/2796 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [ ] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-946 ### Description - [ ] Here are some details about my PR, including screenshots (if applicable): Adding support for transmitting data from a http/https source. This is the first effort and minimum number of attributes were introduced. A HttpDataNode will specify two attributes: 1. domain - This is the domain component of a http url 2. authentication type - This is the authentication protocol supported for a given domain A HttpDatasetDescriptor will specify one attribute: 1. path - This is the full url of the data source's endpoint, e.g, https://a.b.c.com/api/dataset_abc ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Custom logics are covered by unit tests ### Commits - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes
[ https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338961 ] ASF GitHub Bot logged work on GOBBLIN-945: -- Author: ASF GitHub Bot Created on: 05/Nov/19 19:42 Start Date: 05/Nov/19 19:42 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#issuecomment-549956451 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=h1) Report > Merging [#2795](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc) will **decrease** coverage by `0.84%`. > The diff coverage is `40.58%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2795 +/- ## - Coverage 45.32% 44.48% -0.85% + Complexity 8862 8714 -148 Files 1894 1895 +1 Lines 7091070939 +29 Branches 7799 7799 - Hits 3214131555 -586 - Misses3580336460 +657 + Partials 2966 2924 -42 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...bblin/kafka/client/GobblinKafkaConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2thZmthL2NsaWVudC9Hb2JibGluS2Fma2FDb25zdW1lckNsaWVudC5qYXZh) | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...in/source/extractor/extract/kafka/KafkaSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVNvdXJjZS5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...ce/extractor/extract/kafka/KafkaAvroExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUF2cm9FeHRyYWN0b3IuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...he/gobblin/kafka/client/Kafka08ConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtMDgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4va2Fma2EvY2xpZW50L0thZmthMDhDb25zdW1lckNsaWVudC5qYXZh) | `55.69% <0%> (-0.36%)` | `22 <0> (ø)` | | | [...source/extractor/extract/kafka/KafkaExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUV4dHJhY3Rvci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...he/gobblin/kafka/client/Kafka09ConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtMDkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4va2Fma2EvY2xpZW50L0thZmthMDlDb25zdW1lckNsaWVudC5qYXZh) | `75.71% <0%> (-1.1%)` | `8 <0> (ø)` | | | [...ctor/extract/kafka/KafkaExtractorStatsTracker.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUV4dHJhY3RvclN0YXRzVHJhY2tlci5qYXZh) | `47.36% <47.36%> (ø)` | `19 <19> (?)` | | | [...source/extractor/extract/kafka/KafkaPartition.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVBhcnRpdGlvbi5qYXZh) | `44.64% <85.71%> (+44.64%)` | `3 <2> (+3)` | :arrow_up: | | [...re/filesystem/FsDatasetStateStoreEntryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…
codecov-io edited a comment on issue #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#issuecomment-549956451 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=h1) Report > Merging [#2795](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc) will **decrease** coverage by `0.84%`. > The diff coverage is `40.58%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2795 +/- ## - Coverage 45.32% 44.48% -0.85% + Complexity 8862 8714 -148 Files 1894 1895 +1 Lines 7091070939 +29 Branches 7799 7799 - Hits 3214131555 -586 - Misses3580336460 +657 + Partials 2966 2924 -42 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...bblin/kafka/client/GobblinKafkaConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2thZmthL2NsaWVudC9Hb2JibGluS2Fma2FDb25zdW1lckNsaWVudC5qYXZh) | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...in/source/extractor/extract/kafka/KafkaSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVNvdXJjZS5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...ce/extractor/extract/kafka/KafkaAvroExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUF2cm9FeHRyYWN0b3IuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...he/gobblin/kafka/client/Kafka08ConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtMDgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4va2Fma2EvY2xpZW50L0thZmthMDhDb25zdW1lckNsaWVudC5qYXZh) | `55.69% <0%> (-0.36%)` | `22 <0> (ø)` | | | [...source/extractor/extract/kafka/KafkaExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUV4dHJhY3Rvci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...he/gobblin/kafka/client/Kafka09ConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtMDkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4va2Fma2EvY2xpZW50L0thZmthMDlDb25zdW1lckNsaWVudC5qYXZh) | `75.71% <0%> (-1.1%)` | `8 <0> (ø)` | | | [...ctor/extract/kafka/KafkaExtractorStatsTracker.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUV4dHJhY3RvclN0YXRzVHJhY2tlci5qYXZh) | `47.36% <47.36%> (ø)` | `19 <19> (?)` | | | [...source/extractor/extract/kafka/KafkaPartition.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVBhcnRpdGlvbi5qYXZh) | `44.64% <85.71%> (+44.64%)` | `3 <2> (+3)` | :arrow_up: | | [...re/filesystem/FsDatasetStateStoreEntryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWV0YXN0b3JlL2ZpbGVzeXN0ZW0vRnNEYXRhc2V0U3RhdGVTdG9yZUVudHJ5TWFuYWdlci5qYXZh) | `0% <0%> (-100%)` | `0% <0%> (-3%)` | | | [...askStateCollectorServiceHiveRegHandlerFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvVGFza1N0
[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes
[ https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338916&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338916 ] ASF GitHub Bot logged work on GOBBLIN-945: -- Author: ASF GitHub Bot Created on: 05/Nov/19 18:30 Start Date: 05/Nov/19 18:30 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#issuecomment-549956451 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=h1) Report > Merging [#2795](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc) will **decrease** coverage by `41.17%`. > The diff coverage is `0%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #2795 +/- ## - Coverage 45.32% 4.15% -41.18% + Complexity 8862 745 -8117 Files 18941895+1 Lines 70910 70939 +29 Branches 77997799 - Hits 321412945-29196 - Misses35803 67679+31876 + Partials 2966 315 -2651 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...bblin/kafka/client/GobblinKafkaConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2thZmthL2NsaWVudC9Hb2JibGluS2Fma2FDb25zdW1lckNsaWVudC5qYXZh) | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...in/source/extractor/extract/kafka/KafkaSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVNvdXJjZS5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...ce/extractor/extract/kafka/KafkaAvroExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUF2cm9FeHRyYWN0b3IuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...he/gobblin/kafka/client/Kafka08ConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtMDgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4va2Fma2EvY2xpZW50L0thZmthMDhDb25zdW1lckNsaWVudC5qYXZh) | `0% <0%> (-56.06%)` | `0 <0> (-22)` | | | [...source/extractor/extract/kafka/KafkaExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUV4dHJhY3Rvci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...ctor/extract/kafka/KafkaExtractorStatsTracker.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUV4dHJhY3RvclN0YXRzVHJhY2tlci5qYXZh) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...source/extractor/extract/kafka/KafkaPartition.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVBhcnRpdGlvbi5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...he/gobblin/kafka/client/Kafka09ConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtMDkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4va2Fma2EvY2xpZW50L0thZmthMDlDb25zdW1lckNsaWVudC5qYXZh) | `0% <0%> (-76.82%)` | `0 <0> (-8)` | | | [...n/converter/AvroStringFieldDecryptorConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsa
[GitHub] [incubator-gobblin] codecov-io commented on issue #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…
codecov-io commented on issue #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#issuecomment-549956451 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=h1) Report > Merging [#2795](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc) will **decrease** coverage by `41.17%`. > The diff coverage is `0%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #2795 +/- ## - Coverage 45.32% 4.15% -41.18% + Complexity 8862 745 -8117 Files 18941895+1 Lines 70910 70939 +29 Branches 77997799 - Hits 321412945-29196 - Misses35803 67679+31876 + Partials 2966 315 -2651 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...bblin/kafka/client/GobblinKafkaConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2thZmthL2NsaWVudC9Hb2JibGluS2Fma2FDb25zdW1lckNsaWVudC5qYXZh) | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...in/source/extractor/extract/kafka/KafkaSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVNvdXJjZS5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...ce/extractor/extract/kafka/KafkaAvroExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUF2cm9FeHRyYWN0b3IuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...he/gobblin/kafka/client/Kafka08ConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtMDgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4va2Fma2EvY2xpZW50L0thZmthMDhDb25zdW1lckNsaWVudC5qYXZh) | `0% <0%> (-56.06%)` | `0 <0> (-22)` | | | [...source/extractor/extract/kafka/KafkaExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUV4dHJhY3Rvci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...ctor/extract/kafka/KafkaExtractorStatsTracker.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUV4dHJhY3RvclN0YXRzVHJhY2tlci5qYXZh) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...source/extractor/extract/kafka/KafkaPartition.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVBhcnRpdGlvbi5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...he/gobblin/kafka/client/Kafka09ConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtMDkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4va2Fma2EvY2xpZW50L0thZmthMDlDb25zdW1lckNsaWVudC5qYXZh) | `0% <0%> (-76.82%)` | `0 <0> (-8)` | | | [...n/converter/AvroStringFieldDecryptorConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY3J5cHRvLXByb3ZpZGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9BdnJvU3RyaW5nRmllbGREZWNyeXB0b3JDb252ZXJ0ZXIuamF2YQ==) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | | | [...he/gobblin/cluster/TaskRunnerSuiteThreadModel.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvVGFza1J1bm5lclN1aXRlVGhyZWFkTW9k
[jira] [Work logged] (GOBBLIN-897) Implement Local FS Spec Executor
[ https://issues.apache.org/jira/browse/GOBBLIN-897?focusedWorklogId=338914&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338914 ] ASF GitHub Bot logged work on GOBBLIN-897: -- Author: ASF GitHub Bot Created on: 05/Nov/19 18:29 Start Date: 05/Nov/19 18:29 Worklog Time Spent: 10m Work Description: zxcware commented on pull request #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir URL: https://github.com/apache/incubator-gobblin/pull/2755#discussion_r342724920 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/spec_executorInstance/LocalFsSpecProducer.java ## @@ -94,7 +96,14 @@ public LocalFsSpecProducer(Config config) { * @param headers*/ @Override public Future deleteSpec(URI deletedSpecURI, Properties headers) { -return new CompletedFuture<>(Boolean.TRUE, null); +String[] uriTokens = deletedSpecURI.getPath().split("/"); +String jobFileName = String.join("_", uriTokens) + ".job"; +File file = new File(jobFileName); +if (file.delete()) { + log.info("Deleted spec: {}", jobFileName); + return new CompletedFuture<>(Boolean.TRUE, null); +} +throw new RuntimeException(String.format("Failed to delete file with uri %s", deletedSpecURI)); Review comment: This shouldn't halt the service, return a `CompletedFuture` with an exception and expect the invoker to do proper error handling. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338914) Time Spent: 1h 50m (was: 1h 40m) > Implement Local FS Spec Executor > > > Key: GOBBLIN-897 > URL: https://issues.apache.org/jira/browse/GOBBLIN-897 > Project: Apache Gobblin > Issue Type: New Feature > Components: gobblin-core, gobblin-service >Affects Versions: 0.15.0 >Reporter: William Lo >Assignee: Abhishek Tiwari >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-897) Implement Local FS Spec Executor
[ https://issues.apache.org/jira/browse/GOBBLIN-897?focusedWorklogId=338915&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338915 ] ASF GitHub Bot logged work on GOBBLIN-897: -- Author: ASF GitHub Bot Created on: 05/Nov/19 18:29 Start Date: 05/Nov/19 18:29 Worklog Time Spent: 10m Work Description: zxcware commented on pull request #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir URL: https://github.com/apache/incubator-gobblin/pull/2755#discussion_r342724245 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/spec_executorInstance/LocalFsSpecProducer.java ## @@ -71,9 +72,10 @@ public LocalFsSpecProducer(Config config) { private Future writeSpec(Spec spec, SpecExecutor.Verb verb) { if (spec instanceof JobSpec) { - String specString = spec.toString(); + URI specUri = spec.getUri(); // format the JobSpec to have file of _.job - String jobFileName = specString.replace('/', '_').substring(specString.lastIndexOf(':')+2, specString.length()-1) + ".job"; + String[] uriTokens = specUri.getPath().split("/"); Review comment: We can be more DRY(Don't Repeat Yourself) by making a small function named `getJobFileName(uri)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338915) Time Spent: 2h (was: 1h 50m) > Implement Local FS Spec Executor > > > Key: GOBBLIN-897 > URL: https://issues.apache.org/jira/browse/GOBBLIN-897 > Project: Apache Gobblin > Issue Type: New Feature > Components: gobblin-core, gobblin-service >Affects Versions: 0.15.0 >Reporter: William Lo >Assignee: Abhishek Tiwari >Priority: Major > Fix For: 0.15.0 > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] zxcware commented on a change in pull request #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir
zxcware commented on a change in pull request #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir URL: https://github.com/apache/incubator-gobblin/pull/2755#discussion_r342724245 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/spec_executorInstance/LocalFsSpecProducer.java ## @@ -71,9 +72,10 @@ public LocalFsSpecProducer(Config config) { private Future writeSpec(Spec spec, SpecExecutor.Verb verb) { if (spec instanceof JobSpec) { - String specString = spec.toString(); + URI specUri = spec.getUri(); // format the JobSpec to have file of _.job - String jobFileName = specString.replace('/', '_').substring(specString.lastIndexOf(':')+2, specString.length()-1) + ".job"; + String[] uriTokens = specUri.getPath().split("/"); Review comment: We can be more DRY(Don't Repeat Yourself) by making a small function named `getJobFileName(uri)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] zxcware commented on a change in pull request #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir
zxcware commented on a change in pull request #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir URL: https://github.com/apache/incubator-gobblin/pull/2755#discussion_r342724920 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/spec_executorInstance/LocalFsSpecProducer.java ## @@ -94,7 +96,14 @@ public LocalFsSpecProducer(Config config) { * @param headers*/ @Override public Future deleteSpec(URI deletedSpecURI, Properties headers) { -return new CompletedFuture<>(Boolean.TRUE, null); +String[] uriTokens = deletedSpecURI.getPath().split("/"); +String jobFileName = String.join("_", uriTokens) + ".job"; +File file = new File(jobFileName); +if (file.delete()) { + log.info("Deleted spec: {}", jobFileName); + return new CompletedFuture<>(Boolean.TRUE, null); +} +throw new RuntimeException(String.format("Failed to delete file with uri %s", deletedSpecURI)); Review comment: This shouldn't halt the service, return a `CompletedFuture` with an exception and expect the invoker to do proper error handling. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-940) Synchronization between workunit persistency and Helix job launching
[ https://issues.apache.org/jira/browse/GOBBLIN-940?focusedWorklogId=338905&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338905 ] ASF GitHub Bot logged work on GOBBLIN-940: -- Author: ASF GitHub Bot Created on: 05/Nov/19 18:07 Start Date: 05/Nov/19 18:07 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2789: [GOBBLIN-940]Add synchronization on workunit persistency before Helix job launching URL: https://github.com/apache/incubator-gobblin/pull/2789#issuecomment-548527186 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2789?src=pr&el=h1) Report > Merging [#2789](https://codecov.io/gh/apache/incubator-gobblin/pull/2789?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9ee4dcaf66257b6e2926cf1470b16b912cd343ff?src=pr&el=desc) will **increase** coverage by `40.22%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2789?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2789 +/- ## = + Coverage 4.15% 44.38% +40.22% - Complexity 746 8694 +7948 = Files 1894 1894 Lines 7087770918 +41 Branches 7793 7800+7 = + Hits 294631475+28529 + Misses6761736519-31098 - Partials314 2924 +2610 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2789?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...n/java/org/apache/gobblin/util/ParallelRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvUGFyYWxsZWxSdW5uZXIuamF2YQ==) | `63.96% <100%> (+63.96%)` | `15 <1> (+15)` | :arrow_up: | | [...pache/gobblin/cluster/GobblinHelixJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkhlbGl4Sm9iTGF1bmNoZXIuamF2YQ==) | `82% <100%> (+82%)` | `27 <0> (+27)` | :arrow_up: | | [...re/filesystem/FsDatasetStateStoreEntryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWV0YXN0b3JlL2ZpbGVzeXN0ZW0vRnNEYXRhc2V0U3RhdGVTdG9yZUVudHJ5TWFuYWdlci5qYXZh) | `0% <0%> (-100%)` | `0% <0%> (-3%)` | | | [...askStateCollectorServiceHiveRegHandlerFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvVGFza1N0YXRlQ29sbGVjdG9yU2VydmljZUhpdmVSZWdIYW5kbGVyRmFjdG9yeS5qYXZh) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | | | [...gobblin/runtime/mapreduce/GobblinOutputFormat.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0dvYmJsaW5PdXRwdXRGb3JtYXQuamF2YQ==) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | | | [...in/runtime/mapreduce/CustomizedProgresserBase.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0N1c3RvbWl6ZWRQcm9ncmVzc2VyQmFzZS5qYXZh) | `0% <0%> (-83.34%)` | `0% <0%> (-1%)` | | | [...rg/apache/gobblin/runtime/ZkDatasetStateStore.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcnVudGltZS9aa0RhdGFzZXRTdGF0ZVN0b3JlLmphdmE=) | `0% <0%> (-80.77%)` | `0% <0%> (-7%)` | | | [...lin/runtime/locks/LegacyJobLockFactoryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvTGVnYWN5Sm9iTG9ja0ZhY3RvcnlNYW5hZ2VyLmphdmE=) | `0% <0%> (-78.58%)` | `0% <0%> (-2%)` | | | [...e/HiveRegTaskStateCollectorServiceHandlerImpl.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvSGl2ZVJlZ1Rhc2tTdGF0ZUNvbGxlY3RvclNlcnZpY2VIYW5kbGVySW1wbC5qYXZh) | `0% <0%> (-75%)` | `0% <0%> (-3%)` | | | [.../apache/gobblin/me
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2789: [GOBBLIN-940]Add synchronization on workunit persistency before Helix job launching
codecov-io edited a comment on issue #2789: [GOBBLIN-940]Add synchronization on workunit persistency before Helix job launching URL: https://github.com/apache/incubator-gobblin/pull/2789#issuecomment-548527186 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2789?src=pr&el=h1) Report > Merging [#2789](https://codecov.io/gh/apache/incubator-gobblin/pull/2789?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9ee4dcaf66257b6e2926cf1470b16b912cd343ff?src=pr&el=desc) will **increase** coverage by `40.22%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2789?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2789 +/- ## = + Coverage 4.15% 44.38% +40.22% - Complexity 746 8694 +7948 = Files 1894 1894 Lines 7087770918 +41 Branches 7793 7800+7 = + Hits 294631475+28529 + Misses6761736519-31098 - Partials314 2924 +2610 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2789?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...n/java/org/apache/gobblin/util/ParallelRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvUGFyYWxsZWxSdW5uZXIuamF2YQ==) | `63.96% <100%> (+63.96%)` | `15 <1> (+15)` | :arrow_up: | | [...pache/gobblin/cluster/GobblinHelixJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkhlbGl4Sm9iTGF1bmNoZXIuamF2YQ==) | `82% <100%> (+82%)` | `27 <0> (+27)` | :arrow_up: | | [...re/filesystem/FsDatasetStateStoreEntryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWV0YXN0b3JlL2ZpbGVzeXN0ZW0vRnNEYXRhc2V0U3RhdGVTdG9yZUVudHJ5TWFuYWdlci5qYXZh) | `0% <0%> (-100%)` | `0% <0%> (-3%)` | | | [...askStateCollectorServiceHiveRegHandlerFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvVGFza1N0YXRlQ29sbGVjdG9yU2VydmljZUhpdmVSZWdIYW5kbGVyRmFjdG9yeS5qYXZh) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | | | [...gobblin/runtime/mapreduce/GobblinOutputFormat.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0dvYmJsaW5PdXRwdXRGb3JtYXQuamF2YQ==) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | | | [...in/runtime/mapreduce/CustomizedProgresserBase.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0N1c3RvbWl6ZWRQcm9ncmVzc2VyQmFzZS5qYXZh) | `0% <0%> (-83.34%)` | `0% <0%> (-1%)` | | | [...rg/apache/gobblin/runtime/ZkDatasetStateStore.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcnVudGltZS9aa0RhdGFzZXRTdGF0ZVN0b3JlLmphdmE=) | `0% <0%> (-80.77%)` | `0% <0%> (-7%)` | | | [...lin/runtime/locks/LegacyJobLockFactoryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvTGVnYWN5Sm9iTG9ja0ZhY3RvcnlNYW5hZ2VyLmphdmE=) | `0% <0%> (-78.58%)` | `0% <0%> (-2%)` | | | [...e/HiveRegTaskStateCollectorServiceHandlerImpl.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvSGl2ZVJlZ1Rhc2tTdGF0ZUNvbGxlY3RvclNlcnZpY2VIYW5kbGVySW1wbC5qYXZh) | `0% <0%> (-75%)` | `0% <0%> (-3%)` | | | [.../apache/gobblin/metastore/ZkStateStoreFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0YXN0b3JlL1prU3RhdGVTdG9yZUZhY3RvcnkuamF2YQ==) | `0% <0%> (-71.43%)` | `0% <0%> (-2%)` | | | ... and [1117 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree-more) | | --
[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes
[ https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338900&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338900 ] ASF GitHub Bot logged work on GOBBLIN-945: -- Author: ASF GitHub Bot Created on: 05/Nov/19 17:52 Start Date: 05/Nov/19 17:52 Worklog Time Spent: 10m Work Description: sv2000 commented on issue #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#issuecomment-549938182 @autumnust @htran1 Please review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338900) Time Spent: 20m (was: 10m) > Refactor Kafka extractor statistics tracking to allow code reuse across both > batch and streaming execution modes > > > Key: GOBBLIN-945 > URL: https://issues.apache.org/jira/browse/GOBBLIN-945 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-kafka >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Shirshanka Das >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Current implementation of kafka extractor stats tracking is deeply integrated > with the batch implementation of KafkaExtractor preventing it from being used > in streaming Kafka extractor implementations. In addition to code reuse, the > refactoring allows for writing unit tests for statistics tracker. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes
[ https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338899&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338899 ] ASF GitHub Bot logged work on GOBBLIN-945: -- Author: ASF GitHub Bot Created on: 05/Nov/19 17:51 Start Date: 05/Nov/19 17:51 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795 …de reuse across both batch and streaming execution modes Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-945 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): Current implementation of kafka extractor stats tracking is deeply integrated with the batch implementation of KafkaExtractor preventing it from being used in streaming Kafka extractor implementations. In addition to code reuse, the refactoring allows for writing unit tests for statistics tracker. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Added unit tests to KafkaExtractorStatsTrackerTest class. ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338899) Remaining Estimate: 0h Time Spent: 10m > Refactor Kafka extractor statistics tracking to allow code reuse across both > batch and streaming execution modes > > > Key: GOBBLIN-945 > URL: https://issues.apache.org/jira/browse/GOBBLIN-945 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-kafka >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Shirshanka Das >Priority: Major > Fix For: 0.15.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Current implementation of kafka extractor stats tracking is deeply integrated > with the batch implementation of KafkaExtractor preventing it from being used > in streaming Kafka extractor implementations. In addition to code reuse, the > refactoring allows for writing unit tests for statistics tracker. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] sv2000 commented on issue #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…
sv2000 commented on issue #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795#issuecomment-549938182 @autumnust @htran1 Please review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] sv2000 opened a new pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…
sv2000 opened a new pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co… URL: https://github.com/apache/incubator-gobblin/pull/2795 …de reuse across both batch and streaming execution modes Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-945 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): Current implementation of kafka extractor stats tracking is deeply integrated with the batch implementation of KafkaExtractor preventing it from being used in streaming Kafka extractor implementations. In addition to code reuse, the refactoring allows for writing unit tests for statistics tracker. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Added unit tests to KafkaExtractorStatsTrackerTest class. ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes
Sudarshan Vasudevan created GOBBLIN-945: --- Summary: Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes Key: GOBBLIN-945 URL: https://issues.apache.org/jira/browse/GOBBLIN-945 Project: Apache Gobblin Issue Type: Improvement Components: gobblin-kafka Affects Versions: 0.15.0 Reporter: Sudarshan Vasudevan Assignee: Shirshanka Das Fix For: 0.15.0 Current implementation of kafka extractor stats tracking is deeply integrated with the batch implementation of KafkaExtractor preventing it from being used in streaming Kafka extractor implementations. In addition to code reuse, the refactoring allows for writing unit tests for statistics tracker. -- This message was sent by Atlassian Jira (v8.3.4#803005)