[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=339178&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339178
 ]

ASF GitHub Bot logged work on GOBBLIN-945:
--

Author: ASF GitHub Bot
Created on: 06/Nov/19 06:08
Start Date: 06/Nov/19 06:08
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2795: GOBBLIN-945: 
Refactor Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342929380
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java
 ##
 @@ -216,25 +176,18 @@ public D readRecordImpl(D reuse) throws 
DataRecordException, IOException {
 
   D record = decodeKafkaMessage(nextValidMessage);
 
-  this.currentPartitionDecodeRecordTime += System.nanoTime() - 
decodeStartTime;
-  this.currentPartitionRecordCount++;
-  this.currentPartitionTotalSize += 
nextValidMessage.getValueSizeInBytes();
-  this.currentPartitionReadRecordTime += System.nanoTime() - 
readStartTime;
+  this.statsTracker.onDecodeableRecord(this.currentPartitionIdx, 
readStartTime, decodeStartTime, nextValidMessage.getValueSizeInBytes());
 
 Review comment:
   Don't understand this comment. You are probably referring to the 
onUndecodeableRecord() method. Let's sync up on this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339178)
Time Spent: 2h 20m  (was: 2h 10m)

> Refactor Kafka extractor statistics tracking to allow code reuse across both 
> batch and streaming execution modes
> 
>
> Key: GOBBLIN-945
> URL: https://issues.apache.org/jira/browse/GOBBLIN-945
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-kafka
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Shirshanka Das
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Current implementation of kafka extractor stats tracking is deeply integrated 
> with the batch implementation of KafkaExtractor preventing it from being used 
> in streaming Kafka extractor implementations. In addition to code reuse, the 
> refactoring allows for writing unit tests for statistics tracker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…

2019-11-05 Thread GitBox
sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka 
extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342929380
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java
 ##
 @@ -216,25 +176,18 @@ public D readRecordImpl(D reuse) throws 
DataRecordException, IOException {
 
   D record = decodeKafkaMessage(nextValidMessage);
 
-  this.currentPartitionDecodeRecordTime += System.nanoTime() - 
decodeStartTime;
-  this.currentPartitionRecordCount++;
-  this.currentPartitionTotalSize += 
nextValidMessage.getValueSizeInBytes();
-  this.currentPartitionReadRecordTime += System.nanoTime() - 
readStartTime;
+  this.statsTracker.onDecodeableRecord(this.currentPartitionIdx, 
readStartTime, decodeStartTime, nextValidMessage.getValueSizeInBytes());
 
 Review comment:
   Don't understand this comment. You are probably referring to the 
onUndecodeableRecord() method. Let's sync up on this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…

2019-11-05 Thread GitBox
sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka 
extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342929192
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractorStatsTracker.java
 ##
 @@ -0,0 +1,320 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.source.extractor.extract.kafka;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+
+import lombok.Data;
+import lombok.Getter;
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.configuration.WorkUnitState;
+import org.apache.gobblin.metrics.MetricContext;
+import org.apache.gobblin.metrics.event.EventSubmitter;
+
+
+/**
+ * A class that tracks KafkaExtractor statistics such as record decode time, 
#processed records, #undecodeable records etc.
+ *
+ */
+@Slf4j
+public class KafkaExtractorStatsTracker {
+  // Constants for event submission
+  public static final String TOPIC = "topic";
+  public static final String PARTITION = "partition";
+
+  private static final String GOBBLIN_KAFKA_NAMESPACE = "gobblin.kafka";
+  private static final String KAFKA_EXTRACTOR_TOPIC_METADATA_EVENT_NAME = 
"KafkaExtractorTopicMetadata";
+  private static final String LOW_WATERMARK = "lowWatermark";
+  private static final String ACTUAL_HIGH_WATERMARK = "actualHighWatermark";
+  private static final String EXPECTED_HIGH_WATERMARK = 
"expectedHighWatermark";
+  private static final String ELAPSED_TIME = "elapsedTime";
+  private static final String PROCESSED_RECORD_COUNT = "processedRecordCount";
+  private static final String UNDECODABLE_MESSAGE_COUNT = 
"undecodableMessageCount";
+  private static final String PARTITION_TOTAL_SIZE = "partitionTotalSize";
+  private static final String AVG_RECORD_PULL_TIME = "avgRecordPullTime";
+  private static final String AVG_RECORD_SIZE = "avgRecordSize";
+  private static final String READ_RECORD_TIME = "readRecordTime";
+  private static final String DECODE_RECORD_TIME = "decodeRecordTime";
+  private static final String FETCH_MESSAGE_BUFFER_TIME = 
"fetchMessageBufferTime";
+  private static final String LAST_RECORD_HEADER_TIMESTAMP = 
"lastRecordHeaderTimestamp";
+
+  @Getter
+  private final Map statsMap;
+  private final Set errorPartitions;
+  private final WorkUnitState workUnitState;
+
+  //A global count of number of undecodeable messages encountered by the 
KafkaExtractor across all Kafka
+  //TopicPartitions.
+  @Getter
+  private int undecodableMessageCount = 0;
+  private List partitions;
+
+  public KafkaExtractorStatsTracker(WorkUnitState state, List 
partitions) {
 
 Review comment:
   Actually, streaming extractor can get KafkaPartition the same way as 
KafkaExtractor does i.e. via KafkaUtils in the constructor. Further, Tag 
generation for tracking events needs the KafkaPartition objects.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=339177&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339177
 ]

ASF GitHub Bot logged work on GOBBLIN-945:
--

Author: ASF GitHub Bot
Created on: 06/Nov/19 06:07
Start Date: 06/Nov/19 06:07
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2795: GOBBLIN-945: 
Refactor Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342929192
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractorStatsTracker.java
 ##
 @@ -0,0 +1,320 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.source.extractor.extract.kafka;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+
+import lombok.Data;
+import lombok.Getter;
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.configuration.WorkUnitState;
+import org.apache.gobblin.metrics.MetricContext;
+import org.apache.gobblin.metrics.event.EventSubmitter;
+
+
+/**
+ * A class that tracks KafkaExtractor statistics such as record decode time, 
#processed records, #undecodeable records etc.
+ *
+ */
+@Slf4j
+public class KafkaExtractorStatsTracker {
+  // Constants for event submission
+  public static final String TOPIC = "topic";
+  public static final String PARTITION = "partition";
+
+  private static final String GOBBLIN_KAFKA_NAMESPACE = "gobblin.kafka";
+  private static final String KAFKA_EXTRACTOR_TOPIC_METADATA_EVENT_NAME = 
"KafkaExtractorTopicMetadata";
+  private static final String LOW_WATERMARK = "lowWatermark";
+  private static final String ACTUAL_HIGH_WATERMARK = "actualHighWatermark";
+  private static final String EXPECTED_HIGH_WATERMARK = 
"expectedHighWatermark";
+  private static final String ELAPSED_TIME = "elapsedTime";
+  private static final String PROCESSED_RECORD_COUNT = "processedRecordCount";
+  private static final String UNDECODABLE_MESSAGE_COUNT = 
"undecodableMessageCount";
+  private static final String PARTITION_TOTAL_SIZE = "partitionTotalSize";
+  private static final String AVG_RECORD_PULL_TIME = "avgRecordPullTime";
+  private static final String AVG_RECORD_SIZE = "avgRecordSize";
+  private static final String READ_RECORD_TIME = "readRecordTime";
+  private static final String DECODE_RECORD_TIME = "decodeRecordTime";
+  private static final String FETCH_MESSAGE_BUFFER_TIME = 
"fetchMessageBufferTime";
+  private static final String LAST_RECORD_HEADER_TIMESTAMP = 
"lastRecordHeaderTimestamp";
+
+  @Getter
+  private final Map statsMap;
+  private final Set errorPartitions;
+  private final WorkUnitState workUnitState;
+
+  //A global count of number of undecodeable messages encountered by the 
KafkaExtractor across all Kafka
+  //TopicPartitions.
+  @Getter
+  private int undecodableMessageCount = 0;
+  private List partitions;
+
+  public KafkaExtractorStatsTracker(WorkUnitState state, List 
partitions) {
 
 Review comment:
   Actually, streaming extractor can get KafkaPartition the same way as 
KafkaExtractor does i.e. via KafkaUtils in the constructor. Further, Tag 
generation for tracking events needs the KafkaPartition objects.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339177)
Time Spent: 2h 10m  (was: 2h)

> Refactor Kafka extractor statistics tracking to allow code reuse across both 
> batch and streaming execution modes
> 
>
> Key: GOBBLIN-945
>

[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=339174&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339174
 ]

ASF GitHub Bot logged work on GOBBLIN-945:
--

Author: ASF GitHub Bot
Created on: 06/Nov/19 06:03
Start Date: 06/Nov/19 06:03
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2795: GOBBLIN-945: 
Refactor Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342928259
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaPartition.java
 ##
 @@ -31,6 +31,7 @@
   private final int id;
   private final String topicName;
   private KafkaLeader leader;
+  private int hashCode;
 
 Review comment:
   Hmm. I can, but is there a significant benefit to using auto-generated 
hashCode? The only change I am making here is to cache the hashCode to avoid 
re-computing it every time.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339174)
Time Spent: 2h  (was: 1h 50m)

> Refactor Kafka extractor statistics tracking to allow code reuse across both 
> batch and streaming execution modes
> 
>
> Key: GOBBLIN-945
> URL: https://issues.apache.org/jira/browse/GOBBLIN-945
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-kafka
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Shirshanka Das
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Current implementation of kafka extractor stats tracking is deeply integrated 
> with the batch implementation of KafkaExtractor preventing it from being used 
> in streaming Kafka extractor implementations. In addition to code reuse, the 
> refactoring allows for writing unit tests for statistics tracker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…

2019-11-05 Thread GitBox
sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka 
extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342928259
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaPartition.java
 ##
 @@ -31,6 +31,7 @@
   private final int id;
   private final String topicName;
   private KafkaLeader leader;
+  private int hashCode;
 
 Review comment:
   Hmm. I can, but is there a significant benefit to using auto-generated 
hashCode? The only change I am making here is to cache the hashCode to avoid 
re-computing it every time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=339171&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339171
 ]

ASF GitHub Bot logged work on GOBBLIN-945:
--

Author: ASF GitHub Bot
Created on: 06/Nov/19 05:57
Start Date: 06/Nov/19 05:57
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2795: GOBBLIN-945: 
Refactor Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342927295
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaSource.java
 ##
 @@ -115,6 +114,7 @@
   public static final String PREVIOUS_LATEST_OFFSET = "previousLatestOffset";
   public static final String OFFSET_FETCH_EPOCH_TIME = "offsetFetchEpochTime";
   public static final String PREVIOUS_OFFSET_FETCH_EPOCH_TIME = 
"previousOffsetFetchEpochTime";
+  public static final String NUM_TOPIC_PARTITIONS = "numTopicPartitions";
 
 Review comment:
   This was a small change for a different PR, and accidentally got pulled in. 
Will take it out and submit a separate PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339171)
Time Spent: 1h 50m  (was: 1h 40m)

> Refactor Kafka extractor statistics tracking to allow code reuse across both 
> batch and streaming execution modes
> 
>
> Key: GOBBLIN-945
> URL: https://issues.apache.org/jira/browse/GOBBLIN-945
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-kafka
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Shirshanka Das
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Current implementation of kafka extractor stats tracking is deeply integrated 
> with the batch implementation of KafkaExtractor preventing it from being used 
> in streaming Kafka extractor implementations. In addition to code reuse, the 
> refactoring allows for writing unit tests for statistics tracker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…

2019-11-05 Thread GitBox
sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka 
extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342927295
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaSource.java
 ##
 @@ -115,6 +114,7 @@
   public static final String PREVIOUS_LATEST_OFFSET = "previousLatestOffset";
   public static final String OFFSET_FETCH_EPOCH_TIME = "offsetFetchEpochTime";
   public static final String PREVIOUS_OFFSET_FETCH_EPOCH_TIME = 
"previousOffsetFetchEpochTime";
+  public static final String NUM_TOPIC_PARTITIONS = "numTopicPartitions";
 
 Review comment:
   This was a small change for a different PR, and accidentally got pulled in. 
Will take it out and submit a separate PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=339165&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339165
 ]

ASF GitHub Bot logged work on GOBBLIN-945:
--

Author: ASF GitHub Bot
Created on: 06/Nov/19 05:53
Start Date: 06/Nov/19 05:53
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2795: GOBBLIN-945: 
Refactor Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342926449
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java
 ##
 @@ -392,112 +303,16 @@ public long getExpectedRecordCount() {
 
   @Override
   public void close() throws IOException {
-if (currentPartitionIdx != INITIAL_PARTITION_IDX) {
-  updateStatisticsForCurrentPartition();
+if (!allPartitionsFinished()) {
 
 Review comment:
   Yes, the current implementation is confusing when end of partitions is 
reached. It calls updateStatisticsForCurrentPartition(), but essentially does 
nothing inside the method, since recordCount == 0. The change IMO is more 
readable.  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339165)
Time Spent: 1h 40m  (was: 1.5h)

> Refactor Kafka extractor statistics tracking to allow code reuse across both 
> batch and streaming execution modes
> 
>
> Key: GOBBLIN-945
> URL: https://issues.apache.org/jira/browse/GOBBLIN-945
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-kafka
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Shirshanka Das
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Current implementation of kafka extractor stats tracking is deeply integrated 
> with the batch implementation of KafkaExtractor preventing it from being used 
> in streaming Kafka extractor implementations. In addition to code reuse, the 
> refactoring allows for writing unit tests for statistics tracker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…

2019-11-05 Thread GitBox
sv2000 commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka 
extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342926449
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java
 ##
 @@ -392,112 +303,16 @@ public long getExpectedRecordCount() {
 
   @Override
   public void close() throws IOException {
-if (currentPartitionIdx != INITIAL_PARTITION_IDX) {
-  updateStatisticsForCurrentPartition();
+if (!allPartitionsFinished()) {
 
 Review comment:
   Yes, the current implementation is confusing when end of partitions is 
reached. It calls updateStatisticsForCurrentPartition(), but essentially does 
nothing inside the method, since recordCount == 0. The change IMO is more 
readable.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] autumnust commented on issue #2783: Update Hive-Avro-To-ORC-Converter.md for typos

2019-11-05 Thread GitBox
autumnust commented on issue #2783: Update Hive-Avro-To-ORC-Converter.md for 
typos
URL: 
https://github.com/apache/incubator-gobblin/pull/2783#issuecomment-550095354
 
 
   > Ahhh the binary files won't be a problem, when we glob over them we can 
exclude anything save the `.md`'s we wish to spellcheck. Moreso I am wondering 
about how we can compile a list of spellcheck exceptions so that aspell doesn't 
blow up on the project specific jargon like "Avro".
   > 
   > There might not be a nice way to do that programatically, luckily because 
aspell spits out .bak files we could diff and then gain a quick understanding 
by eye of common terms that are being flagged.
   
   Ah I misunderstood. I am not familiar with aspell and quickly googled it and 
seems there's no support for blacklist type of thing. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] codecov-io commented on issue #2797: [GOBBLIN-948] add hive data node and descriptor

2019-11-05 Thread GitBox
codecov-io commented on issue #2797: [GOBBLIN-948] add hive data node and 
descriptor
URL: 
https://github.com/apache/incubator-gobblin/pull/2797#issuecomment-550088163
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=h1)
 Report
   > Merging 
[#2797](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `14.28%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2797  +/-   ##
   
   + Coverage 45.32%   45.32%   +<.01% 
   - Complexity 8862 8863   +1 
   
 Files  1894 1896   +2 
 Lines 7091070931  +21 
 Branches   7799 7803   +4 
   
   + Hits  3214132152  +11 
   - Misses3580335811   +8 
   - Partials   2966 2968   +2
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...modules/flowgraph/datanodes/hive/HiveDataNode.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvZGF0YW5vZGVzL2hpdmUvSGl2ZURhdGFOb2RlLmphdmE=)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[.../service/modules/dataset/SqlDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L1NxbERhdGFzZXREZXNjcmlwdG9yLmphdmE=)
 | `64.86% <100%> (+0.97%)` | `7 <0> (ø)` | :arrow_down: |
   | 
[...service/modules/dataset/HiveDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0hpdmVEYXRhc2V0RGVzY3JpcHRvci5qYXZh)
 | `66.66% <66.66%> (ø)` | `1 <1> (?)` | |
   | 
[...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=)
 | `63.33% <0%> (-1.12%)` | `15% <0%> (-1%)` | |
   | 
[.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=)
 | `79.43% <0%> (-0.94%)` | `24% <0%> (ø)` | |
   | 
[.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==)
 | `64.81% <0%> (+1.38%)` | `28% <0%> (+1%)` | :arrow_up: |
   | 
[...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh)
 | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: |
   | ... and [2 
more](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=footer).
 Last update 
[94a508b...beac3b2](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-948) add hive data node and hive descriptor for gobblin service

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-948?focusedWorklogId=339094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339094
 ]

ASF GitHub Bot logged work on GOBBLIN-948:
--

Author: ASF GitHub Bot
Created on: 06/Nov/19 00:38
Start Date: 06/Nov/19 00:38
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2797: [GOBBLIN-948] add 
hive data node and descriptor
URL: 
https://github.com/apache/incubator-gobblin/pull/2797#issuecomment-550088163
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=h1)
 Report
   > Merging 
[#2797](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `14.28%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2797  +/-   ##
   
   + Coverage 45.32%   45.32%   +<.01% 
   - Complexity 8862 8863   +1 
   
 Files  1894 1896   +2 
 Lines 7091070931  +21 
 Branches   7799 7803   +4 
   
   + Hits  3214132152  +11 
   - Misses3580335811   +8 
   - Partials   2966 2968   +2
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...modules/flowgraph/datanodes/hive/HiveDataNode.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvZGF0YW5vZGVzL2hpdmUvSGl2ZURhdGFOb2RlLmphdmE=)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[.../service/modules/dataset/SqlDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L1NxbERhdGFzZXREZXNjcmlwdG9yLmphdmE=)
 | `64.86% <100%> (+0.97%)` | `7 <0> (ø)` | :arrow_down: |
   | 
[...service/modules/dataset/HiveDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0hpdmVEYXRhc2V0RGVzY3JpcHRvci5qYXZh)
 | `66.66% <66.66%> (ø)` | `1 <1> (?)` | |
   | 
[...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=)
 | `63.33% <0%> (-1.12%)` | `15% <0%> (-1%)` | |
   | 
[.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=)
 | `79.43% <0%> (-0.94%)` | `24% <0%> (ø)` | |
   | 
[.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==)
 | `64.81% <0%> (+1.38%)` | `28% <0%> (+1%)` | :arrow_up: |
   | 
[...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh)
 | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: |
   | ... and [2 
more](https://codecov.io/gh/apache/incubator-gobblin/pull/2797/diff?src=pr&el=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=footer).
 Last update 
[94a508b...beac3b2](https://codecov.io/gh/apache/incubator-gobblin/pull/2797?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-re

[jira] [Work logged] (GOBBLIN-940) Synchronization between workunit persistency and Helix job launching

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-940?focusedWorklogId=339078&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339078
 ]

ASF GitHub Bot logged work on GOBBLIN-940:
--

Author: ASF GitHub Bot
Created on: 06/Nov/19 00:08
Start Date: 06/Nov/19 00:08
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #2789: 
[GOBBLIN-940]Add synchronization on workunit persistency before Helix job 
launching
URL: https://github.com/apache/incubator-gobblin/pull/2789
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339078)
Time Spent: 1h  (was: 50m)

> Synchronization between workunit persistency and Helix job launching
> 
>
> Key: GOBBLIN-940
> URL: https://issues.apache.org/jira/browse/GOBBLIN-940
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] asfgit closed pull request #2789: [GOBBLIN-940]Add synchronization on workunit persistency before Helix job launching

2019-11-05 Thread GitBox
asfgit closed pull request #2789: [GOBBLIN-940]Add synchronization on workunit 
persistency before Helix job launching
URL: https://github.com/apache/incubator-gobblin/pull/2789
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] bstaudacher commented on issue #2783: Update Hive-Avro-To-ORC-Converter.md for typos

2019-11-05 Thread GitBox
bstaudacher commented on issue #2783: Update Hive-Avro-To-ORC-Converter.md for 
typos
URL: 
https://github.com/apache/incubator-gobblin/pull/2783#issuecomment-550080700
 
 
   Ahhh the binary files won't be a problem, when we glob over them we can 
exclude anything save the `.md`'s we wish to spellcheck. Moreso I am wondering 
about how we can compile a list of spellcheck exceptions so that aspell doesn't 
blow up on the project specific jargon like "Avro".
   
   There might not be a nice way to do that programatically, luckily because 
aspell spits out .bak files we could diff and then gain a quick understanding 
by eye of common terms that are being flagged.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] autumnust commented on issue #2783: Update Hive-Avro-To-ORC-Converter.md for typos

2019-11-05 Thread GitBox
autumnust commented on issue #2783: Update Hive-Avro-To-ORC-Converter.md for 
typos
URL: 
https://github.com/apache/incubator-gobblin/pull/2783#issuecomment-550079172
 
 
   > @autumnust I was looking to utilize [aspell](http://aspell.net/).
   > A good guide on basic usage can be found here:
   > https://alexwlchan.net/2016/09/please-use-aspell
   > 
   > The only consideration that stands out to me would be how to easily 
compile a list of exceptions (Avro, ORC etc.) to place in the 
`~/.aspell.lang.pws` file.
   > 
   > Thoughts?
   
   I am assuming all these binary files are named in a specific way (.avro for 
example). I would be interested to know how much Avro/ORC files sit in the repo 
since we really should just get rid of them by replacing with .json and 
deserialize in memory when necessary. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-897) Implement Local FS Spec Executor

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-897?focusedWorklogId=339063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339063
 ]

ASF GitHub Bot logged work on GOBBLIN-897:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 23:58
Start Date: 05/Nov/19 23:58
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2755: [GOBBLIN-897] adds 
local FS spec executor to write jobs to a local dir
URL: 
https://github.com/apache/incubator-gobblin/pull/2755#issuecomment-539170866
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=h1)
 Report
   > Merging 
[#2755](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `0%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff @@
   ## master   #2755  +/-   ##
   ===
   - Coverage 45.32%   45.3%   -0.03% 
   + Complexity 88628861   -1 
   ===
 Files  18941896   +2 
 Lines 70910   70951  +41 
 Branches   77997802   +3 
   ===
 Hits  32141   32141  
   - Misses35803   35845  +42 
   + Partials   29662965   -1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ime/spec\_executorInstance/LocalFsSpecExecutor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvc3BlY19leGVjdXRvckluc3RhbmNlL0xvY2FsRnNTcGVjRXhlY3V0b3IuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...ime/spec\_executorInstance/LocalFsSpecProducer.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvc3BlY19leGVjdXRvckluc3RhbmNlL0xvY2FsRnNTcGVjUHJvZHVjZXIuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh)
 | `32.71% <0%> (-2.81%)` | `11% <0%> (-1%)` | |
   | 
[...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=)
 | `63.33% <0%> (-1.12%)` | `15% <0%> (-1%)` | |
   | 
[.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==)
 | `64.81% <0%> (+1.38%)` | `28% <0%> (+1%)` | :arrow_up: |
   | 
[...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh)
 | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=footer).
 Last update 
[94a508b...fe75404](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339063)
Time Spent: 2.5h  (was: 2h 20m)

> I

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir

2019-11-05 Thread GitBox
codecov-io edited a comment on issue #2755: [GOBBLIN-897] adds local FS spec 
executor to write jobs to a local dir
URL: 
https://github.com/apache/incubator-gobblin/pull/2755#issuecomment-539170866
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=h1)
 Report
   > Merging 
[#2755](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `0%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff @@
   ## master   #2755  +/-   ##
   ===
   - Coverage 45.32%   45.3%   -0.03% 
   + Complexity 88628861   -1 
   ===
 Files  18941896   +2 
 Lines 70910   70951  +41 
 Branches   77997802   +3 
   ===
 Hits  32141   32141  
   - Misses35803   35845  +42 
   + Partials   29662965   -1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ime/spec\_executorInstance/LocalFsSpecExecutor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvc3BlY19leGVjdXRvckluc3RhbmNlL0xvY2FsRnNTcGVjRXhlY3V0b3IuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...ime/spec\_executorInstance/LocalFsSpecProducer.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvc3BlY19leGVjdXRvckluc3RhbmNlL0xvY2FsRnNTcGVjUHJvZHVjZXIuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh)
 | `32.71% <0%> (-2.81%)` | `11% <0%> (-1%)` | |
   | 
[...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=)
 | `63.33% <0%> (-1.12%)` | `15% <0%> (-1%)` | |
   | 
[.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==)
 | `64.81% <0%> (+1.38%)` | `28% <0%> (+1%)` | :arrow_up: |
   | 
[...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh)
 | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=footer).
 Last update 
[94a508b...fe75404](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] bstaudacher commented on issue #2783: Update Hive-Avro-To-ORC-Converter.md for typos

2019-11-05 Thread GitBox
bstaudacher commented on issue #2783: Update Hive-Avro-To-ORC-Converter.md for 
typos
URL: 
https://github.com/apache/incubator-gobblin/pull/2783#issuecomment-550076702
 
 
   @autumnust I was looking to utilize [aspell](http://aspell.net/).
   A good guide on basic usage can be found here:
   https://alexwlchan.net/2016/09/please-use-aspell
   
   The only consideration that stands out to me would be how to easily compile 
a list of exceptions (Avro, ORC etc.) to place in the `~/.aspell.lang.pws` file.
   
   Thoughts?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-897) Implement Local FS Spec Executor

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-897?focusedWorklogId=339056&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339056
 ]

ASF GitHub Bot logged work on GOBBLIN-897:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 23:48
Start Date: 05/Nov/19 23:48
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2755: [GOBBLIN-897] adds 
local FS spec executor to write jobs to a local dir
URL: 
https://github.com/apache/incubator-gobblin/pull/2755#issuecomment-539170866
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=h1)
 Report
   > Merging 
[#2755](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc)
 will **decrease** coverage by `41.17%`.
   > The diff coverage is `0%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2755   +/-   ##
   
   - Coverage 45.32%   4.15%   -41.18% 
   + Complexity 8862 745 -8117 
   
 Files  18941896+2 
 Lines 70910   70951   +41 
 Branches   77997802+3 
   
   - Hits  321412945-29196 
   - Misses35803   67691+31888 
   + Partials   2966 315 -2651
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ime/spec\_executorInstance/LocalFsSpecExecutor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvc3BlY19leGVjdXRvckluc3RhbmNlL0xvY2FsRnNTcGVjRXhlY3V0b3IuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...ime/spec\_executorInstance/LocalFsSpecProducer.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvc3BlY19leGVjdXRvckluc3RhbmNlL0xvY2FsRnNTcGVjUHJvZHVjZXIuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...n/converter/AvroStringFieldDecryptorConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY3J5cHRvLXByb3ZpZGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9BdnJvU3RyaW5nRmllbGREZWNyeXB0b3JDb252ZXJ0ZXIuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
   | 
[...he/gobblin/cluster/TaskRunnerSuiteThreadModel.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvVGFza1J1bm5lclN1aXRlVGhyZWFkTW9kZWwuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (-5%)` | |
   | 
[...n/mapreduce/avro/AvroKeyCompactorOutputFormat.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL2F2cm8vQXZyb0tleUNvbXBhY3Rvck91dHB1dEZvcm1hdC5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (-3%)` | |
   | 
[...apache/gobblin/fork/CopyNotSupportedException.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZm9yay9Db3B5Tm90U3VwcG9ydGVkRXhjZXB0aW9uLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (-1%)` | |
   | 
[.../gobblin/kafka/writer/KafkaWriterCommonConfig.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2thZmthL3dyaXRlci9LYWZrYVdyaXRlckNvbW1vbkNvbmZpZy5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (-7%)` | |
   | 
[...ker/task/TaskLevelPolicyCheckerBuilderFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3F1YWxpdHljaGVja2VyL3Rhc2svVGFza0xldmVsUG9saWN5Q2hlY2tlckJ1aWxkZXJGYWN0b3J5LmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
   | 
[...bblin/data/management/copy/AllEqualComparator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvQWxsRXF1YWxDb21wYXJhdG9yLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
   | 
[...blin/co

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir

2019-11-05 Thread GitBox
codecov-io edited a comment on issue #2755: [GOBBLIN-897] adds local FS spec 
executor to write jobs to a local dir
URL: 
https://github.com/apache/incubator-gobblin/pull/2755#issuecomment-539170866
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=h1)
 Report
   > Merging 
[#2755](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc)
 will **decrease** coverage by `41.17%`.
   > The diff coverage is `0%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2755   +/-   ##
   
   - Coverage 45.32%   4.15%   -41.18% 
   + Complexity 8862 745 -8117 
   
 Files  18941896+2 
 Lines 70910   70951   +41 
 Branches   77997802+3 
   
   - Hits  321412945-29196 
   - Misses35803   67691+31888 
   + Partials   2966 315 -2651
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2755?src=pr&el=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ime/spec\_executorInstance/LocalFsSpecExecutor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvc3BlY19leGVjdXRvckluc3RhbmNlL0xvY2FsRnNTcGVjRXhlY3V0b3IuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...ime/spec\_executorInstance/LocalFsSpecProducer.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvc3BlY19leGVjdXRvckluc3RhbmNlL0xvY2FsRnNTcGVjUHJvZHVjZXIuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...n/converter/AvroStringFieldDecryptorConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY3J5cHRvLXByb3ZpZGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9BdnJvU3RyaW5nRmllbGREZWNyeXB0b3JDb252ZXJ0ZXIuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
   | 
[...he/gobblin/cluster/TaskRunnerSuiteThreadModel.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvVGFza1J1bm5lclN1aXRlVGhyZWFkTW9kZWwuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (-5%)` | |
   | 
[...n/mapreduce/avro/AvroKeyCompactorOutputFormat.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL2F2cm8vQXZyb0tleUNvbXBhY3Rvck91dHB1dEZvcm1hdC5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (-3%)` | |
   | 
[...apache/gobblin/fork/CopyNotSupportedException.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZm9yay9Db3B5Tm90U3VwcG9ydGVkRXhjZXB0aW9uLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (-1%)` | |
   | 
[.../gobblin/kafka/writer/KafkaWriterCommonConfig.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2thZmthL3dyaXRlci9LYWZrYVdyaXRlckNvbW1vbkNvbmZpZy5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (-7%)` | |
   | 
[...ker/task/TaskLevelPolicyCheckerBuilderFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3F1YWxpdHljaGVja2VyL3Rhc2svVGFza0xldmVsUG9saWN5Q2hlY2tlckJ1aWxkZXJGYWN0b3J5LmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
   | 
[...bblin/data/management/copy/AllEqualComparator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvQWxsRXF1YWxDb21wYXJhdG9yLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
   | 
[...blin/converter/string/ObjectToStringConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9zdHJpbmcvT2JqZWN0VG9TdHJpbmdDb252ZXJ0ZXIuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (-3%)` | |
   | ... and [1092 
more](https://codecov.io/gh/apache/incubator-gobblin/pull/2755/diff?src=pr&el=tree-more)
 | |
   
   --
 

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2797: [GOBBLIN-948] add hive data node and descriptor

2019-11-05 Thread GitBox
sv2000 commented on a change in pull request #2797: [GOBBLIN-948] add hive data 
node and descriptor
URL: https://github.com/apache/incubator-gobblin/pull/2797#discussion_r342851353
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/SqlDatasetDescriptor.java
 ##
 @@ -52,6 +52,7 @@
   private final Config rawConfig;
 
   public enum  Platform {
+HIVE("hive"),
 
 Review comment:
   Can we move the "hive" platform to the HiveDatasetDescriptor? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-948) add hive data node and hive descriptor for gobblin service

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-948?focusedWorklogId=339051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339051
 ]

ASF GitHub Bot logged work on GOBBLIN-948:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 23:33
Start Date: 05/Nov/19 23:33
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2797: [GOBBLIN-948] 
add hive data node and descriptor
URL: https://github.com/apache/incubator-gobblin/pull/2797#discussion_r342851353
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/SqlDatasetDescriptor.java
 ##
 @@ -52,6 +52,7 @@
   private final Config rawConfig;
 
   public enum  Platform {
+HIVE("hive"),
 
 Review comment:
   Can we move the "hive" platform to the HiveDatasetDescriptor? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339051)
Time Spent: 20m  (was: 10m)

> add hive data node and hive descriptor for gobblin service
> --
>
> Key: GOBBLIN-948
> URL: https://issues.apache.org/jira/browse/GOBBLIN-948
> Project: Apache Gobblin
>  Issue Type: New Feature
>Reporter: Arjun Singh Bora
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-948) add hive data node and hive descriptor for gobblin service

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-948?focusedWorklogId=339044&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339044
 ]

ASF GitHub Bot logged work on GOBBLIN-948:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 23:09
Start Date: 05/Nov/19 23:09
Worklog Time Spent: 10m 
  Work Description: arjun4084346 commented on pull request #2797: 
[GOBBLIN-948] add hive data node and descriptor
URL: https://github.com/apache/incubator-gobblin/pull/2797
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below! @sv2000  @jack-moseley  please review
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-948
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   added hive data node and hive dataset descriptor
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   a unit test
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339044)
Remaining Estimate: 0h
Time Spent: 10m

> add hive data node and hive descriptor for gobblin service
> --
>
> Key: GOBBLIN-948
> URL: https://issues.apache.org/jira/browse/GOBBLIN-948
> Project: Apache Gobblin
>  Issue Type: New Feature
>Reporter: Arjun Singh Bora
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] arjun4084346 opened a new pull request #2797: [GOBBLIN-948] add hive data node and descriptor

2019-11-05 Thread GitBox
arjun4084346 opened a new pull request #2797: [GOBBLIN-948] add hive data node 
and descriptor
URL: https://github.com/apache/incubator-gobblin/pull/2797
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below! @sv2000  @jack-moseley  please review
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-948
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   added hive data node and hive dataset descriptor
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   a unit test
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (GOBBLIN-948) add hive data node and hive descriptor for gobblin service

2019-11-05 Thread Arjun Singh Bora (Jira)
Arjun Singh Bora created GOBBLIN-948:


 Summary: add hive data node and hive descriptor for gobblin service
 Key: GOBBLIN-948
 URL: https://issues.apache.org/jira/browse/GOBBLIN-948
 Project: Apache Gobblin
  Issue Type: New Feature
Reporter: Arjun Singh Bora






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-897) Implement Local FS Spec Executor

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-897?focusedWorklogId=339016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339016
 ]

ASF GitHub Bot logged work on GOBBLIN-897:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 21:51
Start Date: 05/Nov/19 21:51
Worklog Time Spent: 10m 
  Work Description: Will-Lo commented on pull request #2755: [GOBBLIN-897] 
adds local FS spec executor to write jobs to a local dir
URL: https://github.com/apache/incubator-gobblin/pull/2755#discussion_r342816187
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/spec_executorInstance/LocalFsSpecProducer.java
 ##
 @@ -94,7 +96,14 @@ public LocalFsSpecProducer(Config config) {
* @param headers*/
   @Override
   public Future deleteSpec(URI deletedSpecURI, Properties headers) {
-return new CompletedFuture<>(Boolean.TRUE, null);
+String[] uriTokens = deletedSpecURI.getPath().split("/");
+String jobFileName = String.join("_", uriTokens) + ".job";
+File file = new File(jobFileName);
+if (file.delete()) {
+  log.info("Deleted spec: {}", jobFileName);
+  return new CompletedFuture<>(Boolean.TRUE, null);
+}
+throw new RuntimeException(String.format("Failed to delete file with uri 
%s", deletedSpecURI));
 
 Review comment:
   That's a good point actually. In the other `SpecProducers` they throw a 
runtime exception as well, would it affect anything?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339016)
Time Spent: 2h 10m  (was: 2h)

> Implement Local FS Spec Executor
> 
>
> Key: GOBBLIN-897
> URL: https://issues.apache.org/jira/browse/GOBBLIN-897
> Project: Apache Gobblin
>  Issue Type: New Feature
>  Components: gobblin-core, gobblin-service
>Affects Versions: 0.15.0
>Reporter: William Lo
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] Will-Lo commented on a change in pull request #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir

2019-11-05 Thread GitBox
Will-Lo commented on a change in pull request #2755: [GOBBLIN-897] adds local 
FS spec executor to write jobs to a local dir
URL: https://github.com/apache/incubator-gobblin/pull/2755#discussion_r342816187
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/spec_executorInstance/LocalFsSpecProducer.java
 ##
 @@ -94,7 +96,14 @@ public LocalFsSpecProducer(Config config) {
* @param headers*/
   @Override
   public Future deleteSpec(URI deletedSpecURI, Properties headers) {
-return new CompletedFuture<>(Boolean.TRUE, null);
+String[] uriTokens = deletedSpecURI.getPath().split("/");
+String jobFileName = String.join("_", uriTokens) + ".job";
+File file = new File(jobFileName);
+if (file.delete()) {
+  log.info("Deleted spec: {}", jobFileName);
+  return new CompletedFuture<>(Boolean.TRUE, null);
+}
+throw new RuntimeException(String.format("Failed to delete file with uri 
%s", deletedSpecURI));
 
 Review comment:
   That's a good point actually. In the other `SpecProducers` they throw a 
runtime exception as well, would it affect anything?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (GOBBLIN-947) Support "any" capability in IdentityFlowToJobSpecCompiler

2019-11-05 Thread Chen Guo (Jira)
Chen Guo created GOBBLIN-947:


 Summary: Support "any" capability in IdentityFlowToJobSpecCompiler
 Key: GOBBLIN-947
 URL: https://issues.apache.org/jira/browse/GOBBLIN-947
 Project: Apache Gobblin
  Issue Type: New Feature
Reporter: Chen Guo


Currently the IdentityFlowToJobSpecCompiler does a hard check on the capability 
string for both source and destination. "Any" is not supported in 
IdentityFlowToJobSpecCompiler as in MultiHopFlowCompiler. The behaviour should 
be made consistent across these two compilers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-944) Provide the flexibility of deriving platform field in a different way for subclasses of BaseDatasetDescriptor

2019-11-05 Thread Chen Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Guo resolved GOBBLIN-944.
--
Resolution: Won't Fix

> Provide the flexibility of deriving platform field in a different way for 
> subclasses of BaseDatasetDescriptor
> -
>
> Key: GOBBLIN-944
> URL: https://issues.apache.org/jira/browse/GOBBLIN-944
> Project: Apache Gobblin
>  Issue Type: New Feature
>Reporter: Chen Guo
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Current BaseDatasetDescriptor always gets the platform from the configuration 
> object, but there could be use cases where the platform is derived in a 
> different way. We need to change the implementation of BaseDatasetDescriptor 
> constructor such that derived classes have the flexibility to change this 
> default behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-944) Provide the flexibility of deriving platform field in a different way for subclasses of BaseDatasetDescriptor

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-944?focusedWorklogId=339007&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339007
 ]

ASF GitHub Bot logged work on GOBBLIN-944:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 21:30
Start Date: 05/Nov/19 21:30
Worklog Time Spent: 10m 
  Work Description: enjoyear commented on pull request #2794: [GOBBLIN-944] 
Provide the flexibility of deriving platform field in a different way for 
subclasses of BaseDatasetDescriptor
URL: https://github.com/apache/incubator-gobblin/pull/2794
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339007)
Time Spent: 40m  (was: 0.5h)

> Provide the flexibility of deriving platform field in a different way for 
> subclasses of BaseDatasetDescriptor
> -
>
> Key: GOBBLIN-944
> URL: https://issues.apache.org/jira/browse/GOBBLIN-944
> Project: Apache Gobblin
>  Issue Type: New Feature
>Reporter: Chen Guo
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Current BaseDatasetDescriptor always gets the platform from the configuration 
> object, but there could be use cases where the platform is derived in a 
> different way. We need to change the implementation of BaseDatasetDescriptor 
> constructor such that derived classes have the flexibility to change this 
> default behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] enjoyear closed pull request #2794: [GOBBLIN-944] Provide the flexibility of deriving platform field in a different way for subclasses of BaseDatasetDescriptor

2019-11-05 Thread GitBox
enjoyear closed pull request #2794: [GOBBLIN-944] Provide the flexibility of 
deriving platform field in a different way for subclasses of 
BaseDatasetDescriptor
URL: https://github.com/apache/incubator-gobblin/pull/2794
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-933) JsonRecordAvroSchemaToAvroConverter does not handle arrays of unions

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-933?focusedWorklogId=338997&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338997
 ]

ASF GitHub Bot logged work on GOBBLIN-933:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 21:00
Start Date: 05/Nov/19 21:00
Worklog Time Spent: 10m 
  Work Description: arjun4084346 commented on pull request #2792: 
[GOBBLIN-933] add support for array of unions in json schema
URL: https://github.com/apache/incubator-gobblin/pull/2792#discussion_r342794498
 
 

 ##
 File path: 
gobblin-core/src/main/java/org/apache/gobblin/converter/avro/JsonElementConversionWithAvroSchemaFactory.java
 ##
 @@ -210,4 +215,49 @@ public Schema schema() {
   return this.schema;
 }
   }
+
+  public static class UnionConverter extends ComplexConverter {
+private final Schema firstSchema;
+private final Schema secondSchema;
+private final JsonElementConverter firstConverter;
+private final JsonElementConverter secondConverter;
+
+public UnionConverter(String fieldName, boolean nullable, String 
sourceType, Schema schemaNode,
+WorkUnitState state, List ignoreFields) throws 
UnsupportedDateTypeException {
+  super(fieldName, nullable, sourceType);
+  List types = schemaNode.getTypes();
+  if (types.size() != 2) {
+throw new RuntimeException("Unions must be size 2.");
 
 Review comment:
   the existing code only supports two types.
   
https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/converter/avro/JsonRecordAvroSchemaToAvroConverter.java#L93
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338997)
Time Spent: 2h 10m  (was: 2h)

> JsonRecordAvroSchemaToAvroConverter does not handle arrays of unions
> 
>
> Key: GOBBLIN-933
> URL: https://issues.apache.org/jira/browse/GOBBLIN-933
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Ahmed Abdul Hamid
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Using {{JsonRecordAvroSchemaToAvroConverter}} to convert an array of a union 
> type fails. For instance, using it with the following Avro schema:
> {code:java}
> {
>   "name": "arrayField",
>   "type": {
> "type": "array",
> "items": ["string", "null"]
>   }
> } {code}
> yields the following error:
> {code:java}
> java.lang.StackOverflowError
>   at org.apache.gobblin.configuration.State.getProp(State.java)
>   at 
> org.apache.gobblin.configuration.WorkUnitState.getProp(WorkUnitState.java:333)
>   at 
> org.apache.gobblin.converter.avro.JsonElementConversionFactory.getConvertor(JsonElementConversionFactory.java:106)
>   at 
> org.apache.gobblin.converter.avro.JsonElementConversionFactory$UnionConverter.getConverter(JsonElementConversionFactory.java:737)
>   at 
> org.apache.gobblin.converter.avro.JsonElementConversionFactory$UnionConverter.(JsonElementConversionFactory.java:729)
>   at 
> org.apache.gobblin.converter.avro.JsonElementConversionFactory.getConvertor(JsonElementConversionFactory.java:160)
>   at 
> org.apache.gobblin.converter.avro.JsonElementConversionFactory$UnionConverter.getConverter(JsonElementConversionFactory.java:737)
>   at 
> org.apache.gobblin.converter.avro.JsonElementConversionFactory$UnionConverter.(JsonElementConversionFactory.java:729)
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] arjun4084346 commented on a change in pull request #2792: [GOBBLIN-933] add support for array of unions in json schema

2019-11-05 Thread GitBox
arjun4084346 commented on a change in pull request #2792: [GOBBLIN-933] add 
support for array of unions in json schema
URL: https://github.com/apache/incubator-gobblin/pull/2792#discussion_r342794498
 
 

 ##
 File path: 
gobblin-core/src/main/java/org/apache/gobblin/converter/avro/JsonElementConversionWithAvroSchemaFactory.java
 ##
 @@ -210,4 +215,49 @@ public Schema schema() {
   return this.schema;
 }
   }
+
+  public static class UnionConverter extends ComplexConverter {
+private final Schema firstSchema;
+private final Schema secondSchema;
+private final JsonElementConverter firstConverter;
+private final JsonElementConverter secondConverter;
+
+public UnionConverter(String fieldName, boolean nullable, String 
sourceType, Schema schemaNode,
+WorkUnitState state, List ignoreFields) throws 
UnsupportedDateTypeException {
+  super(fieldName, nullable, sourceType);
+  List types = schemaNode.getTypes();
+  if (types.size() != 2) {
+throw new RuntimeException("Unions must be size 2.");
 
 Review comment:
   the existing code only supports two types.
   
https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/converter/avro/JsonRecordAvroSchemaToAvroConverter.java#L93


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-946) Support HTTP source in Gobblin Service

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-946?focusedWorklogId=338995&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338995
 ]

ASF GitHub Bot logged work on GOBBLIN-946:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 20:57
Start Date: 05/Nov/19 20:57
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2796: [GOBBLIN-946] Add 
HttpDatasetDescriptor and HttpDataNode to Gobblin Service
URL: 
https://github.com/apache/incubator-gobblin/pull/2796#issuecomment-550016552
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=h1)
 Report
   > Merging 
[#2796](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `61.76%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2796  +/-   ##
   
   + Coverage 45.32%   45.34%   +0.01% 
   - Complexity 8862 8869   +7 
   
 Files  1894 1896   +2 
 Lines 7091070943  +33 
 Branches   7799 7802   +3 
   
   + Hits  3214132169  +28 
   - Misses3580335806   +3 
   - Partials   2966 2968   +2
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../modules/flowgraph/FlowGraphConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvRmxvd0dyYXBoQ29uZmlndXJhdGlvbktleXMuamF2YQ==)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[.../service/modules/dataset/SqlDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L1NxbERhdGFzZXREZXNjcmlwdG9yLmphdmE=)
 | `63.88% <100%> (ø)` | `7 <0> (ø)` | :arrow_down: |
   | 
[...vice/modules/flowgraph/datanodes/HttpDataNode.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvZGF0YW5vZGVzL0h0dHBEYXRhTm9kZS5qYXZh)
 | `41.66% <41.66%> (ø)` | `1 <1> (?)` | |
   | 
[...service/modules/dataset/HttpDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0h0dHBEYXRhc2V0RGVzY3JpcHRvci5qYXZh)
 | `71.42% <71.42%> (ø)` | `5 <5> (?)` | |
   | 
[...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=)
 | `63.33% <0%> (-1.12%)` | `15% <0%> (-1%)` | |
   | 
[.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=)
 | `79.43% <0%> (-0.94%)` | `24% <0%> (ø)` | |
   | 
[...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh)
 | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: |
   | 
[...lin/restli/throttling/ZookeeperLeaderElection.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2Utc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3Jlc3RsaS90aHJvdHRsaW5nL1pvb2tlZXBlckxlYWRlckVsZWN0aW9uLmphdmE=)
 | `72.22% <0%> (+2.22%)` | `13% <0%> (ø)` | :arrow_down: |
   | ... and [3 
more](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?

[GitHub] [incubator-gobblin] codecov-io commented on issue #2796: [GOBBLIN-946] Add HttpDatasetDescriptor and HttpDataNode to Gobblin Service

2019-11-05 Thread GitBox
codecov-io commented on issue #2796: [GOBBLIN-946] Add HttpDatasetDescriptor 
and HttpDataNode to Gobblin Service
URL: 
https://github.com/apache/incubator-gobblin/pull/2796#issuecomment-550016552
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=h1)
 Report
   > Merging 
[#2796](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `61.76%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2796  +/-   ##
   
   + Coverage 45.32%   45.34%   +0.01% 
   - Complexity 8862 8869   +7 
   
 Files  1894 1896   +2 
 Lines 7091070943  +33 
 Branches   7799 7802   +3 
   
   + Hits  3214132169  +28 
   - Misses3580335806   +3 
   - Partials   2966 2968   +2
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../modules/flowgraph/FlowGraphConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvRmxvd0dyYXBoQ29uZmlndXJhdGlvbktleXMuamF2YQ==)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[.../service/modules/dataset/SqlDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L1NxbERhdGFzZXREZXNjcmlwdG9yLmphdmE=)
 | `63.88% <100%> (ø)` | `7 <0> (ø)` | :arrow_down: |
   | 
[...vice/modules/flowgraph/datanodes/HttpDataNode.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvZGF0YW5vZGVzL0h0dHBEYXRhTm9kZS5qYXZh)
 | `41.66% <41.66%> (ø)` | `1 <1> (?)` | |
   | 
[...service/modules/dataset/HttpDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0h0dHBEYXRhc2V0RGVzY3JpcHRvci5qYXZh)
 | `71.42% <71.42%> (ø)` | `5 <5> (?)` | |
   | 
[...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=)
 | `63.33% <0%> (-1.12%)` | `15% <0%> (-1%)` | |
   | 
[.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=)
 | `79.43% <0%> (-0.94%)` | `24% <0%> (ø)` | |
   | 
[...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh)
 | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: |
   | 
[...lin/restli/throttling/ZookeeperLeaderElection.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2Utc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3Jlc3RsaS90aHJvdHRsaW5nL1pvb2tlZXBlckxlYWRlckVsZWN0aW9uLmphdmE=)
 | `72.22% <0%> (+2.22%)` | `13% <0%> (ø)` | :arrow_down: |
   | ... and [3 
more](https://codecov.io/gh/apache/incubator-gobblin/pull/2796/diff?src=pr&el=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=footer).
 Last update 
[94a508b...dbbc0e7](https://codecov.io/gh/apache/incubator-gobblin/pull/2796?src=pr&el=lastupdated).
 Read the [comment docs](https://

[jira] [Work logged] (GOBBLIN-933) JsonRecordAvroSchemaToAvroConverter does not handle arrays of unions

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-933?focusedWorklogId=338993&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338993
 ]

ASF GitHub Bot logged work on GOBBLIN-933:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 20:56
Start Date: 05/Nov/19 20:56
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2792: 
[GOBBLIN-933] add support for array of unions in json schema
URL: https://github.com/apache/incubator-gobblin/pull/2792#discussion_r342770317
 
 

 ##
 File path: 
gobblin-core/src/main/java/org/apache/gobblin/converter/avro/JsonElementConversionWithAvroSchemaFactory.java
 ##
 @@ -210,4 +215,49 @@ public Schema schema() {
   return this.schema;
 }
   }
+
+  public static class UnionConverter extends ComplexConverter {
+private final Schema firstSchema;
+private final Schema secondSchema;
+private final JsonElementConverter firstConverter;
+private final JsonElementConverter secondConverter;
+
+public UnionConverter(String fieldName, boolean nullable, String 
sourceType, Schema schemaNode,
+WorkUnitState state, List ignoreFields) throws 
UnsupportedDateTypeException {
+  super(fieldName, nullable, sourceType);
+  List types = schemaNode.getTypes();
+  if (types.size() != 2) {
+throw new RuntimeException("Unions must be size 2.");
 
 Review comment:
   This is not a valid assumption in the context of Avro schema. What's the 
reason to enforce `size == 2` for union here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338993)
Time Spent: 2h  (was: 1h 50m)

> JsonRecordAvroSchemaToAvroConverter does not handle arrays of unions
> 
>
> Key: GOBBLIN-933
> URL: https://issues.apache.org/jira/browse/GOBBLIN-933
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Ahmed Abdul Hamid
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Using {{JsonRecordAvroSchemaToAvroConverter}} to convert an array of a union 
> type fails. For instance, using it with the following Avro schema:
> {code:java}
> {
>   "name": "arrayField",
>   "type": {
> "type": "array",
> "items": ["string", "null"]
>   }
> } {code}
> yields the following error:
> {code:java}
> java.lang.StackOverflowError
>   at org.apache.gobblin.configuration.State.getProp(State.java)
>   at 
> org.apache.gobblin.configuration.WorkUnitState.getProp(WorkUnitState.java:333)
>   at 
> org.apache.gobblin.converter.avro.JsonElementConversionFactory.getConvertor(JsonElementConversionFactory.java:106)
>   at 
> org.apache.gobblin.converter.avro.JsonElementConversionFactory$UnionConverter.getConverter(JsonElementConversionFactory.java:737)
>   at 
> org.apache.gobblin.converter.avro.JsonElementConversionFactory$UnionConverter.(JsonElementConversionFactory.java:729)
>   at 
> org.apache.gobblin.converter.avro.JsonElementConversionFactory.getConvertor(JsonElementConversionFactory.java:160)
>   at 
> org.apache.gobblin.converter.avro.JsonElementConversionFactory$UnionConverter.getConverter(JsonElementConversionFactory.java:737)
>   at 
> org.apache.gobblin.converter.avro.JsonElementConversionFactory$UnionConverter.(JsonElementConversionFactory.java:729)
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2792: [GOBBLIN-933] add support for array of unions in json schema

2019-11-05 Thread GitBox
autumnust commented on a change in pull request #2792: [GOBBLIN-933] add 
support for array of unions in json schema
URL: https://github.com/apache/incubator-gobblin/pull/2792#discussion_r342770317
 
 

 ##
 File path: 
gobblin-core/src/main/java/org/apache/gobblin/converter/avro/JsonElementConversionWithAvroSchemaFactory.java
 ##
 @@ -210,4 +215,49 @@ public Schema schema() {
   return this.schema;
 }
   }
+
+  public static class UnionConverter extends ComplexConverter {
+private final Schema firstSchema;
+private final Schema secondSchema;
+private final JsonElementConverter firstConverter;
+private final JsonElementConverter secondConverter;
+
+public UnionConverter(String fieldName, boolean nullable, String 
sourceType, Schema schemaNode,
+WorkUnitState state, List ignoreFields) throws 
UnsupportedDateTypeException {
+  super(fieldName, nullable, sourceType);
+  List types = schemaNode.getTypes();
+  if (types.size() != 2) {
+throw new RuntimeException("Unions must be size 2.");
 
 Review comment:
   This is not a valid assumption in the context of Avro schema. What's the 
reason to enforce `size == 2` for union here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338974&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338974
 ]

ASF GitHub Bot logged work on GOBBLIN-945:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 19:58
Start Date: 05/Nov/19 19:58
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2795: GOBBLIN-945: 
Refactor Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342768883
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractorStatsTracker.java
 ##
 @@ -0,0 +1,320 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.source.extractor.extract.kafka;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+
+import lombok.Data;
+import lombok.Getter;
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.configuration.WorkUnitState;
+import org.apache.gobblin.metrics.MetricContext;
+import org.apache.gobblin.metrics.event.EventSubmitter;
+
+
+/**
+ * A class that tracks KafkaExtractor statistics such as record decode time, 
#processed records, #undecodeable records etc.
+ *
+ */
+@Slf4j
+public class KafkaExtractorStatsTracker {
+  // Constants for event submission
+  public static final String TOPIC = "topic";
+  public static final String PARTITION = "partition";
+
+  private static final String GOBBLIN_KAFKA_NAMESPACE = "gobblin.kafka";
+  private static final String KAFKA_EXTRACTOR_TOPIC_METADATA_EVENT_NAME = 
"KafkaExtractorTopicMetadata";
+  private static final String LOW_WATERMARK = "lowWatermark";
+  private static final String ACTUAL_HIGH_WATERMARK = "actualHighWatermark";
+  private static final String EXPECTED_HIGH_WATERMARK = 
"expectedHighWatermark";
+  private static final String ELAPSED_TIME = "elapsedTime";
+  private static final String PROCESSED_RECORD_COUNT = "processedRecordCount";
+  private static final String UNDECODABLE_MESSAGE_COUNT = 
"undecodableMessageCount";
+  private static final String PARTITION_TOTAL_SIZE = "partitionTotalSize";
+  private static final String AVG_RECORD_PULL_TIME = "avgRecordPullTime";
+  private static final String AVG_RECORD_SIZE = "avgRecordSize";
+  private static final String READ_RECORD_TIME = "readRecordTime";
+  private static final String DECODE_RECORD_TIME = "decodeRecordTime";
+  private static final String FETCH_MESSAGE_BUFFER_TIME = 
"fetchMessageBufferTime";
+  private static final String LAST_RECORD_HEADER_TIMESTAMP = 
"lastRecordHeaderTimestamp";
+
+  @Getter
+  private final Map statsMap;
+  private final Set errorPartitions;
+  private final WorkUnitState workUnitState;
+
+  //A global count of number of undecodeable messages encountered by the 
KafkaExtractor across all Kafka
+  //TopicPartitions.
+  @Getter
+  private int undecodableMessageCount = 0;
+  private List partitions;
+
+  public KafkaExtractorStatsTracker(WorkUnitState state, List 
partitions) {
 
 Review comment:
   A little bit nit-pick here but for ease of usage in streaming-extractor as 
well: Streaming extractor doesn't contain a list of `KafkaPartition` objects, 
what it really needs is the partition id that kafka-client can be assigned and 
consume, so does here: We just need an identifier of a kafka-partition as the 
key of `statsMap`.
   
   It would be easier if we use `List` with the partitionId as the 
value here as the constructor arguments, so that you don't have to reassemble a 
`KafkaPartition` object in streaming-extractor
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id:  

[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338969
 ]

ASF GitHub Bot logged work on GOBBLIN-945:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 19:58
Start Date: 05/Nov/19 19:58
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2795: GOBBLIN-945: 
Refactor Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342726471
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaPartition.java
 ##
 @@ -31,6 +31,7 @@
   private final int id;
   private final String topicName;
   private KafkaLeader leader;
+  private int hashCode;
 
 Review comment:
   Can you use auto-generated `hashCode` using IDE instead ? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338969)
Time Spent: 50m  (was: 40m)

> Refactor Kafka extractor statistics tracking to allow code reuse across both 
> batch and streaming execution modes
> 
>
> Key: GOBBLIN-945
> URL: https://issues.apache.org/jira/browse/GOBBLIN-945
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-kafka
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Shirshanka Das
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Current implementation of kafka extractor stats tracking is deeply integrated 
> with the batch implementation of KafkaExtractor preventing it from being used 
> in streaming Kafka extractor implementations. In addition to code reuse, the 
> refactoring allows for writing unit tests for statistics tracker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338971
 ]

ASF GitHub Bot logged work on GOBBLIN-945:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 19:58
Start Date: 05/Nov/19 19:58
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2795: GOBBLIN-945: 
Refactor Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342727497
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaSource.java
 ##
 @@ -115,6 +114,7 @@
   public static final String PREVIOUS_LATEST_OFFSET = "previousLatestOffset";
   public static final String OFFSET_FETCH_EPOCH_TIME = "offsetFetchEpochTime";
   public static final String PREVIOUS_OFFSET_FETCH_EPOCH_TIME = 
"previousOffsetFetchEpochTime";
+  public static final String NUM_TOPIC_PARTITIONS = "numTopicPartitions";
 
 Review comment:
   Where is this being used?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338971)
Time Spent: 1h 10m  (was: 1h)

> Refactor Kafka extractor statistics tracking to allow code reuse across both 
> batch and streaming execution modes
> 
>
> Key: GOBBLIN-945
> URL: https://issues.apache.org/jira/browse/GOBBLIN-945
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-kafka
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Shirshanka Das
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Current implementation of kafka extractor stats tracking is deeply integrated 
> with the batch implementation of KafkaExtractor preventing it from being used 
> in streaming Kafka extractor implementations. In addition to code reuse, the 
> refactoring allows for writing unit tests for statistics tracker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338972&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338972
 ]

ASF GitHub Bot logged work on GOBBLIN-945:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 19:58
Start Date: 05/Nov/19 19:58
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2795: GOBBLIN-945: 
Refactor Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342734740
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java
 ##
 @@ -216,25 +176,18 @@ public D readRecordImpl(D reuse) throws 
DataRecordException, IOException {
 
   D record = decodeKafkaMessage(nextValidMessage);
 
-  this.currentPartitionDecodeRecordTime += System.nanoTime() - 
decodeStartTime;
-  this.currentPartitionRecordCount++;
-  this.currentPartitionTotalSize += 
nextValidMessage.getValueSizeInBytes();
-  this.currentPartitionReadRecordTime += System.nanoTime() - 
readStartTime;
+  this.statsTracker.onDecodeableRecord(this.currentPartitionIdx, 
readStartTime, decodeStartTime, nextValidMessage.getValueSizeInBytes());
 
 Review comment:
   Seems there are some difference in terms of the intention for these two 
catch blocks, the former one indicates that there are exceptions thrown when 
current partition is drained and need to move to the next one, while the latter 
indicates "undecodability".  Shall we preserve original intention here? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338972)
Time Spent: 1h 20m  (was: 1h 10m)

> Refactor Kafka extractor statistics tracking to allow code reuse across both 
> batch and streaming execution modes
> 
>
> Key: GOBBLIN-945
> URL: https://issues.apache.org/jira/browse/GOBBLIN-945
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-kafka
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Shirshanka Das
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Current implementation of kafka extractor stats tracking is deeply integrated 
> with the batch implementation of KafkaExtractor preventing it from being used 
> in streaming Kafka extractor implementations. In addition to code reuse, the 
> refactoring allows for writing unit tests for statistics tracker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338970&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338970
 ]

ASF GitHub Bot logged work on GOBBLIN-945:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 19:58
Start Date: 05/Nov/19 19:58
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2795: GOBBLIN-945: 
Refactor Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342739486
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java
 ##
 @@ -392,112 +303,16 @@ public long getExpectedRecordCount() {
 
   @Override
   public void close() throws IOException {
-if (currentPartitionIdx != INITIAL_PARTITION_IDX) {
-  updateStatisticsForCurrentPartition();
+if (!allPartitionsFinished()) {
 
 Review comment:
   condition for `allPartitionsFinished()` is stronger than `currentIdx != 
INITIAL_PARTITION_IDX` where the former requires ending of current partition as 
well. Is that expected ? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338970)
Time Spent: 1h  (was: 50m)

> Refactor Kafka extractor statistics tracking to allow code reuse across both 
> batch and streaming execution modes
> 
>
> Key: GOBBLIN-945
> URL: https://issues.apache.org/jira/browse/GOBBLIN-945
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-kafka
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Shirshanka Das
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Current implementation of kafka extractor stats tracking is deeply integrated 
> with the batch implementation of KafkaExtractor preventing it from being used 
> in streaming Kafka extractor implementations. In addition to code reuse, the 
> refactoring allows for writing unit tests for statistics tracker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338973
 ]

ASF GitHub Bot logged work on GOBBLIN-945:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 19:58
Start Date: 05/Nov/19 19:58
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2795: GOBBLIN-945: 
Refactor Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342765903
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/kafka/client/GobblinKafkaConsumerClient.java
 ##
 @@ -72,6 +73,15 @@
*/
   public long getLatestOffset(KafkaPartition partition) throws 
KafkaOffsetRetrievalFailureException;
 
+  /**
+   * Get the latest available offset for a partition
+   *
+   * @param partitions for which latest offset is retrieved
+   *
+   * @throws UnsupportedOperationException - If the underlying kafka-client 
does not support getting latest offset
+   */
+  public Map getLatestOffsets(Collection 
partitions) throws KafkaOffsetRetrievalFailureException;
 
 Review comment:
   Make a default impl. in interface instead ? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338973)
Time Spent: 1.5h  (was: 1h 20m)

> Refactor Kafka extractor statistics tracking to allow code reuse across both 
> batch and streaming execution modes
> 
>
> Key: GOBBLIN-945
> URL: https://issues.apache.org/jira/browse/GOBBLIN-945
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-kafka
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Shirshanka Das
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Current implementation of kafka extractor stats tracking is deeply integrated 
> with the batch implementation of KafkaExtractor preventing it from being used 
> in streaming Kafka extractor implementations. In addition to code reuse, the 
> refactoring allows for writing unit tests for statistics tracker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…

2019-11-05 Thread GitBox
autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor 
Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342739486
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java
 ##
 @@ -392,112 +303,16 @@ public long getExpectedRecordCount() {
 
   @Override
   public void close() throws IOException {
-if (currentPartitionIdx != INITIAL_PARTITION_IDX) {
-  updateStatisticsForCurrentPartition();
+if (!allPartitionsFinished()) {
 
 Review comment:
   condition for `allPartitionsFinished()` is stronger than `currentIdx != 
INITIAL_PARTITION_IDX` where the former requires ending of current partition as 
well. Is that expected ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…

2019-11-05 Thread GitBox
autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor 
Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342768883
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractorStatsTracker.java
 ##
 @@ -0,0 +1,320 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.source.extractor.extract.kafka;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+
+import lombok.Data;
+import lombok.Getter;
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.configuration.WorkUnitState;
+import org.apache.gobblin.metrics.MetricContext;
+import org.apache.gobblin.metrics.event.EventSubmitter;
+
+
+/**
+ * A class that tracks KafkaExtractor statistics such as record decode time, 
#processed records, #undecodeable records etc.
+ *
+ */
+@Slf4j
+public class KafkaExtractorStatsTracker {
+  // Constants for event submission
+  public static final String TOPIC = "topic";
+  public static final String PARTITION = "partition";
+
+  private static final String GOBBLIN_KAFKA_NAMESPACE = "gobblin.kafka";
+  private static final String KAFKA_EXTRACTOR_TOPIC_METADATA_EVENT_NAME = 
"KafkaExtractorTopicMetadata";
+  private static final String LOW_WATERMARK = "lowWatermark";
+  private static final String ACTUAL_HIGH_WATERMARK = "actualHighWatermark";
+  private static final String EXPECTED_HIGH_WATERMARK = 
"expectedHighWatermark";
+  private static final String ELAPSED_TIME = "elapsedTime";
+  private static final String PROCESSED_RECORD_COUNT = "processedRecordCount";
+  private static final String UNDECODABLE_MESSAGE_COUNT = 
"undecodableMessageCount";
+  private static final String PARTITION_TOTAL_SIZE = "partitionTotalSize";
+  private static final String AVG_RECORD_PULL_TIME = "avgRecordPullTime";
+  private static final String AVG_RECORD_SIZE = "avgRecordSize";
+  private static final String READ_RECORD_TIME = "readRecordTime";
+  private static final String DECODE_RECORD_TIME = "decodeRecordTime";
+  private static final String FETCH_MESSAGE_BUFFER_TIME = 
"fetchMessageBufferTime";
+  private static final String LAST_RECORD_HEADER_TIMESTAMP = 
"lastRecordHeaderTimestamp";
+
+  @Getter
+  private final Map statsMap;
+  private final Set errorPartitions;
+  private final WorkUnitState workUnitState;
+
+  //A global count of number of undecodeable messages encountered by the 
KafkaExtractor across all Kafka
+  //TopicPartitions.
+  @Getter
+  private int undecodableMessageCount = 0;
+  private List partitions;
+
+  public KafkaExtractorStatsTracker(WorkUnitState state, List 
partitions) {
 
 Review comment:
   A little bit nit-pick here but for ease of usage in streaming-extractor as 
well: Streaming extractor doesn't contain a list of `KafkaPartition` objects, 
what it really needs is the partition id that kafka-client can be assigned and 
consume, so does here: We just need an identifier of a kafka-partition as the 
key of `statsMap`.
   
   It would be easier if we use `List` with the partitionId as the 
value here as the constructor arguments, so that you don't have to reassemble a 
`KafkaPartition` object in streaming-extractor


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…

2019-11-05 Thread GitBox
autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor 
Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342734740
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java
 ##
 @@ -216,25 +176,18 @@ public D readRecordImpl(D reuse) throws 
DataRecordException, IOException {
 
   D record = decodeKafkaMessage(nextValidMessage);
 
-  this.currentPartitionDecodeRecordTime += System.nanoTime() - 
decodeStartTime;
-  this.currentPartitionRecordCount++;
-  this.currentPartitionTotalSize += 
nextValidMessage.getValueSizeInBytes();
-  this.currentPartitionReadRecordTime += System.nanoTime() - 
readStartTime;
+  this.statsTracker.onDecodeableRecord(this.currentPartitionIdx, 
readStartTime, decodeStartTime, nextValidMessage.getValueSizeInBytes());
 
 Review comment:
   Seems there are some difference in terms of the intention for these two 
catch blocks, the former one indicates that there are exceptions thrown when 
current partition is drained and need to move to the next one, while the latter 
indicates "undecodability".  Shall we preserve original intention here? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…

2019-11-05 Thread GitBox
autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor 
Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342727497
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaSource.java
 ##
 @@ -115,6 +114,7 @@
   public static final String PREVIOUS_LATEST_OFFSET = "previousLatestOffset";
   public static final String OFFSET_FETCH_EPOCH_TIME = "offsetFetchEpochTime";
   public static final String PREVIOUS_OFFSET_FETCH_EPOCH_TIME = 
"previousOffsetFetchEpochTime";
+  public static final String NUM_TOPIC_PARTITIONS = "numTopicPartitions";
 
 Review comment:
   Where is this being used?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…

2019-11-05 Thread GitBox
autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor 
Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342765903
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/kafka/client/GobblinKafkaConsumerClient.java
 ##
 @@ -72,6 +73,15 @@
*/
   public long getLatestOffset(KafkaPartition partition) throws 
KafkaOffsetRetrievalFailureException;
 
+  /**
+   * Get the latest available offset for a partition
+   *
+   * @param partitions for which latest offset is retrieved
+   *
+   * @throws UnsupportedOperationException - If the underlying kafka-client 
does not support getting latest offset
+   */
+  public Map getLatestOffsets(Collection 
partitions) throws KafkaOffsetRetrievalFailureException;
 
 Review comment:
   Make a default impl. in interface instead ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…

2019-11-05 Thread GitBox
autumnust commented on a change in pull request #2795: GOBBLIN-945: Refactor 
Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795#discussion_r342726471
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaPartition.java
 ##
 @@ -31,6 +31,7 @@
   private final int id;
   private final String topicName;
   private KafkaLeader leader;
+  private int hashCode;
 
 Review comment:
   Can you use auto-generated `hashCode` using IDE instead ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-946) Support HTTP source in Gobblin Service

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-946?focusedWorklogId=338963&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338963
 ]

ASF GitHub Bot logged work on GOBBLIN-946:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 19:45
Start Date: 05/Nov/19 19:45
Worklog Time Spent: 10m 
  Work Description: haojiliu commented on pull request #2796: [GOBBLIN-946] 
Add HttpDatasetDescriptor and HttpDataNode to Gobblin Service
URL: https://github.com/apache/incubator-gobblin/pull/2796
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   ### JIRA
   - [ ] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-946
   
   
   ### Description
   - [ ] Here are some details about my PR, including screenshots (if 
applicable):
   
   Adding support for transmitting data from a http/https source. This is the 
first effort and minimum number of attributes were introduced.
   
   A HttpDataNode will specify two attributes:
   
   1. domain - This is the domain component of a http url
   2. authentication type - This is the authentication protocol supported for a 
given domain
   
   A HttpDatasetDescriptor will specify one attribute:
   
   1. path - This is the full url of the data source's endpoint, e.g, 
https://a.b.c.com/api/dataset_abc
   
   ### Tests
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Custom logics are covered by unit tests
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338963)
Remaining Estimate: 35h 50m  (was: 36h)
Time Spent: 10m

> Support HTTP source in Gobblin Service
> --
>
> Key: GOBBLIN-946
> URL: https://issues.apache.org/jira/browse/GOBBLIN-946
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-service
>Reporter: Haoji Liu
>Assignee: Abhishek Tiwari
>Priority: Major
>   Original Estimate: 36h
>  Time Spent: 10m
>  Remaining Estimate: 35h 50m
>
> Adding support for transmitting data from a http/https source. This will 
> require a new DataNode and DatasetDescriptor class.
>  
> Minimal logic should be implemented, logics should reside in the actual 
> gobblin connectors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-946) Support HTTP source in Gobblin Service

2019-11-05 Thread Haoji Liu (Jira)
Haoji Liu created GOBBLIN-946:
-

 Summary: Support HTTP source in Gobblin Service
 Key: GOBBLIN-946
 URL: https://issues.apache.org/jira/browse/GOBBLIN-946
 Project: Apache Gobblin
  Issue Type: Improvement
  Components: gobblin-service
Reporter: Haoji Liu
Assignee: Abhishek Tiwari


Adding support for transmitting data from a http/https source. This will 
require a new DataNode and DatasetDescriptor class.

 

Minimal logic should be implemented, logics should reside in the actual gobblin 
connectors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] haojiliu opened a new pull request #2796: [GOBBLIN-946] Add HttpDatasetDescriptor and HttpDataNode to Gobblin Service

2019-11-05 Thread GitBox
haojiliu opened a new pull request #2796: [GOBBLIN-946] Add 
HttpDatasetDescriptor and HttpDataNode to Gobblin Service
URL: https://github.com/apache/incubator-gobblin/pull/2796
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   ### JIRA
   - [ ] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-946
   
   
   ### Description
   - [ ] Here are some details about my PR, including screenshots (if 
applicable):
   
   Adding support for transmitting data from a http/https source. This is the 
first effort and minimum number of attributes were introduced.
   
   A HttpDataNode will specify two attributes:
   
   1. domain - This is the domain component of a http url
   2. authentication type - This is the authentication protocol supported for a 
given domain
   
   A HttpDatasetDescriptor will specify one attribute:
   
   1. path - This is the full url of the data source's endpoint, e.g, 
https://a.b.c.com/api/dataset_abc
   
   ### Tests
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Custom logics are covered by unit tests
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338961
 ]

ASF GitHub Bot logged work on GOBBLIN-945:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 19:42
Start Date: 05/Nov/19 19:42
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2795: GOBBLIN-945: 
Refactor Kafka extractor statistics tracking to allow co…
URL: 
https://github.com/apache/incubator-gobblin/pull/2795#issuecomment-549956451
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=h1)
 Report
   > Merging 
[#2795](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc)
 will **decrease** coverage by `0.84%`.
   > The diff coverage is `40.58%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2795  +/-   ##
   
   - Coverage 45.32%   44.48%   -0.85% 
   + Complexity 8862 8714 -148 
   
 Files  1894 1895   +1 
 Lines 7091070939  +29 
 Branches   7799 7799  
   
   - Hits  3214131555 -586 
   - Misses3580336460 +657 
   + Partials   2966 2924  -42
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...bblin/kafka/client/GobblinKafkaConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2thZmthL2NsaWVudC9Hb2JibGluS2Fma2FDb25zdW1lckNsaWVudC5qYXZh)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...in/source/extractor/extract/kafka/KafkaSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVNvdXJjZS5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...ce/extractor/extract/kafka/KafkaAvroExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUF2cm9FeHRyYWN0b3IuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...he/gobblin/kafka/client/Kafka08ConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtMDgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4va2Fma2EvY2xpZW50L0thZmthMDhDb25zdW1lckNsaWVudC5qYXZh)
 | `55.69% <0%> (-0.36%)` | `22 <0> (ø)` | |
   | 
[...source/extractor/extract/kafka/KafkaExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUV4dHJhY3Rvci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...he/gobblin/kafka/client/Kafka09ConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtMDkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4va2Fma2EvY2xpZW50L0thZmthMDlDb25zdW1lckNsaWVudC5qYXZh)
 | `75.71% <0%> (-1.1%)` | `8 <0> (ø)` | |
   | 
[...ctor/extract/kafka/KafkaExtractorStatsTracker.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUV4dHJhY3RvclN0YXRzVHJhY2tlci5qYXZh)
 | `47.36% <47.36%> (ø)` | `19 <19> (?)` | |
   | 
[...source/extractor/extract/kafka/KafkaPartition.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVBhcnRpdGlvbi5qYXZh)
 | `44.64% <85.71%> (+44.64%)` | `3 <2> (+3)` | :arrow_up: |
   | 
[...re/filesystem/FsDatasetStateStoreEntryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…

2019-11-05 Thread GitBox
codecov-io edited a comment on issue #2795: GOBBLIN-945: Refactor Kafka 
extractor statistics tracking to allow co…
URL: 
https://github.com/apache/incubator-gobblin/pull/2795#issuecomment-549956451
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=h1)
 Report
   > Merging 
[#2795](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc)
 will **decrease** coverage by `0.84%`.
   > The diff coverage is `40.58%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2795  +/-   ##
   
   - Coverage 45.32%   44.48%   -0.85% 
   + Complexity 8862 8714 -148 
   
 Files  1894 1895   +1 
 Lines 7091070939  +29 
 Branches   7799 7799  
   
   - Hits  3214131555 -586 
   - Misses3580336460 +657 
   + Partials   2966 2924  -42
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...bblin/kafka/client/GobblinKafkaConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2thZmthL2NsaWVudC9Hb2JibGluS2Fma2FDb25zdW1lckNsaWVudC5qYXZh)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...in/source/extractor/extract/kafka/KafkaSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVNvdXJjZS5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...ce/extractor/extract/kafka/KafkaAvroExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUF2cm9FeHRyYWN0b3IuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...he/gobblin/kafka/client/Kafka08ConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtMDgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4va2Fma2EvY2xpZW50L0thZmthMDhDb25zdW1lckNsaWVudC5qYXZh)
 | `55.69% <0%> (-0.36%)` | `22 <0> (ø)` | |
   | 
[...source/extractor/extract/kafka/KafkaExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUV4dHJhY3Rvci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...he/gobblin/kafka/client/Kafka09ConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtMDkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4va2Fma2EvY2xpZW50L0thZmthMDlDb25zdW1lckNsaWVudC5qYXZh)
 | `75.71% <0%> (-1.1%)` | `8 <0> (ø)` | |
   | 
[...ctor/extract/kafka/KafkaExtractorStatsTracker.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUV4dHJhY3RvclN0YXRzVHJhY2tlci5qYXZh)
 | `47.36% <47.36%> (ø)` | `19 <19> (?)` | |
   | 
[...source/extractor/extract/kafka/KafkaPartition.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVBhcnRpdGlvbi5qYXZh)
 | `44.64% <85.71%> (+44.64%)` | `3 <2> (+3)` | :arrow_up: |
   | 
[...re/filesystem/FsDatasetStateStoreEntryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWV0YXN0b3JlL2ZpbGVzeXN0ZW0vRnNEYXRhc2V0U3RhdGVTdG9yZUVudHJ5TWFuYWdlci5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (-3%)` | |
   | 
[...askStateCollectorServiceHiveRegHandlerFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvVGFza1N0

[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338916&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338916
 ]

ASF GitHub Bot logged work on GOBBLIN-945:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 18:30
Start Date: 05/Nov/19 18:30
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2795: GOBBLIN-945: 
Refactor Kafka extractor statistics tracking to allow co…
URL: 
https://github.com/apache/incubator-gobblin/pull/2795#issuecomment-549956451
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=h1)
 Report
   > Merging 
[#2795](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc)
 will **decrease** coverage by `41.17%`.
   > The diff coverage is `0%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2795   +/-   ##
   
   - Coverage 45.32%   4.15%   -41.18% 
   + Complexity 8862 745 -8117 
   
 Files  18941895+1 
 Lines 70910   70939   +29 
 Branches   77997799   
   
   - Hits  321412945-29196 
   - Misses35803   67679+31876 
   + Partials   2966 315 -2651
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...bblin/kafka/client/GobblinKafkaConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2thZmthL2NsaWVudC9Hb2JibGluS2Fma2FDb25zdW1lckNsaWVudC5qYXZh)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...in/source/extractor/extract/kafka/KafkaSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVNvdXJjZS5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...ce/extractor/extract/kafka/KafkaAvroExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUF2cm9FeHRyYWN0b3IuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...he/gobblin/kafka/client/Kafka08ConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtMDgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4va2Fma2EvY2xpZW50L0thZmthMDhDb25zdW1lckNsaWVudC5qYXZh)
 | `0% <0%> (-56.06%)` | `0 <0> (-22)` | |
   | 
[...source/extractor/extract/kafka/KafkaExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUV4dHJhY3Rvci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...ctor/extract/kafka/KafkaExtractorStatsTracker.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUV4dHJhY3RvclN0YXRzVHJhY2tlci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...source/extractor/extract/kafka/KafkaPartition.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVBhcnRpdGlvbi5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...he/gobblin/kafka/client/Kafka09ConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtMDkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4va2Fma2EvY2xpZW50L0thZmthMDlDb25zdW1lckNsaWVudC5qYXZh)
 | `0% <0%> (-76.82%)` | `0 <0> (-8)` | |
   | 
[...n/converter/AvroStringFieldDecryptorConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsa

[GitHub] [incubator-gobblin] codecov-io commented on issue #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…

2019-11-05 Thread GitBox
codecov-io commented on issue #2795: GOBBLIN-945: Refactor Kafka extractor 
statistics tracking to allow co…
URL: 
https://github.com/apache/incubator-gobblin/pull/2795#issuecomment-549956451
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=h1)
 Report
   > Merging 
[#2795](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/94a508b38ec8bd879614f2d9bf0eeb96513ca7cf?src=pr&el=desc)
 will **decrease** coverage by `41.17%`.
   > The diff coverage is `0%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2795   +/-   ##
   
   - Coverage 45.32%   4.15%   -41.18% 
   + Complexity 8862 745 -8117 
   
 Files  18941895+1 
 Lines 70910   70939   +29 
 Branches   77997799   
   
   - Hits  321412945-29196 
   - Misses35803   67679+31876 
   + Partials   2966 315 -2651
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2795?src=pr&el=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...bblin/kafka/client/GobblinKafkaConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2thZmthL2NsaWVudC9Hb2JibGluS2Fma2FDb25zdW1lckNsaWVudC5qYXZh)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...in/source/extractor/extract/kafka/KafkaSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVNvdXJjZS5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...ce/extractor/extract/kafka/KafkaAvroExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUF2cm9FeHRyYWN0b3IuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...he/gobblin/kafka/client/Kafka08ConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtMDgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4va2Fma2EvY2xpZW50L0thZmthMDhDb25zdW1lckNsaWVudC5qYXZh)
 | `0% <0%> (-56.06%)` | `0 <0> (-22)` | |
   | 
[...source/extractor/extract/kafka/KafkaExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUV4dHJhY3Rvci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...ctor/extract/kafka/KafkaExtractorStatsTracker.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYUV4dHJhY3RvclN0YXRzVHJhY2tlci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...source/extractor/extract/kafka/KafkaPartition.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVBhcnRpdGlvbi5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...he/gobblin/kafka/client/Kafka09ConsumerClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtMDkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4va2Fma2EvY2xpZW50L0thZmthMDlDb25zdW1lckNsaWVudC5qYXZh)
 | `0% <0%> (-76.82%)` | `0 <0> (-8)` | |
   | 
[...n/converter/AvroStringFieldDecryptorConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY3J5cHRvLXByb3ZpZGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9BdnJvU3RyaW5nRmllbGREZWNyeXB0b3JDb252ZXJ0ZXIuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
   | 
[...he/gobblin/cluster/TaskRunnerSuiteThreadModel.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2795/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvVGFza1J1bm5lclN1aXRlVGhyZWFkTW9k

[jira] [Work logged] (GOBBLIN-897) Implement Local FS Spec Executor

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-897?focusedWorklogId=338914&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338914
 ]

ASF GitHub Bot logged work on GOBBLIN-897:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 18:29
Start Date: 05/Nov/19 18:29
Worklog Time Spent: 10m 
  Work Description: zxcware commented on pull request #2755: [GOBBLIN-897] 
adds local FS spec executor to write jobs to a local dir
URL: https://github.com/apache/incubator-gobblin/pull/2755#discussion_r342724920
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/spec_executorInstance/LocalFsSpecProducer.java
 ##
 @@ -94,7 +96,14 @@ public LocalFsSpecProducer(Config config) {
* @param headers*/
   @Override
   public Future deleteSpec(URI deletedSpecURI, Properties headers) {
-return new CompletedFuture<>(Boolean.TRUE, null);
+String[] uriTokens = deletedSpecURI.getPath().split("/");
+String jobFileName = String.join("_", uriTokens) + ".job";
+File file = new File(jobFileName);
+if (file.delete()) {
+  log.info("Deleted spec: {}", jobFileName);
+  return new CompletedFuture<>(Boolean.TRUE, null);
+}
+throw new RuntimeException(String.format("Failed to delete file with uri 
%s", deletedSpecURI));
 
 Review comment:
   This shouldn't halt the service, return a `CompletedFuture` with an 
exception and expect the invoker to do proper error handling.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338914)
Time Spent: 1h 50m  (was: 1h 40m)

> Implement Local FS Spec Executor
> 
>
> Key: GOBBLIN-897
> URL: https://issues.apache.org/jira/browse/GOBBLIN-897
> Project: Apache Gobblin
>  Issue Type: New Feature
>  Components: gobblin-core, gobblin-service
>Affects Versions: 0.15.0
>Reporter: William Lo
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-897) Implement Local FS Spec Executor

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-897?focusedWorklogId=338915&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338915
 ]

ASF GitHub Bot logged work on GOBBLIN-897:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 18:29
Start Date: 05/Nov/19 18:29
Worklog Time Spent: 10m 
  Work Description: zxcware commented on pull request #2755: [GOBBLIN-897] 
adds local FS spec executor to write jobs to a local dir
URL: https://github.com/apache/incubator-gobblin/pull/2755#discussion_r342724245
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/spec_executorInstance/LocalFsSpecProducer.java
 ##
 @@ -71,9 +72,10 @@ public LocalFsSpecProducer(Config config) {
 
   private Future writeSpec(Spec spec, SpecExecutor.Verb verb) {
 if (spec instanceof JobSpec) {
-  String specString = spec.toString();
+  URI specUri = spec.getUri();
   // format the JobSpec to have file of _.job
-  String jobFileName = specString.replace('/', 
'_').substring(specString.lastIndexOf(':')+2, specString.length()-1) + ".job";
+  String[] uriTokens = specUri.getPath().split("/");
 
 Review comment:
   We can be more DRY(Don't Repeat Yourself) by making a small function named 
`getJobFileName(uri)`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338915)
Time Spent: 2h  (was: 1h 50m)

> Implement Local FS Spec Executor
> 
>
> Key: GOBBLIN-897
> URL: https://issues.apache.org/jira/browse/GOBBLIN-897
> Project: Apache Gobblin
>  Issue Type: New Feature
>  Components: gobblin-core, gobblin-service
>Affects Versions: 0.15.0
>Reporter: William Lo
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] zxcware commented on a change in pull request #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir

2019-11-05 Thread GitBox
zxcware commented on a change in pull request #2755: [GOBBLIN-897] adds local 
FS spec executor to write jobs to a local dir
URL: https://github.com/apache/incubator-gobblin/pull/2755#discussion_r342724245
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/spec_executorInstance/LocalFsSpecProducer.java
 ##
 @@ -71,9 +72,10 @@ public LocalFsSpecProducer(Config config) {
 
   private Future writeSpec(Spec spec, SpecExecutor.Verb verb) {
 if (spec instanceof JobSpec) {
-  String specString = spec.toString();
+  URI specUri = spec.getUri();
   // format the JobSpec to have file of _.job
-  String jobFileName = specString.replace('/', 
'_').substring(specString.lastIndexOf(':')+2, specString.length()-1) + ".job";
+  String[] uriTokens = specUri.getPath().split("/");
 
 Review comment:
   We can be more DRY(Don't Repeat Yourself) by making a small function named 
`getJobFileName(uri)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] zxcware commented on a change in pull request #2755: [GOBBLIN-897] adds local FS spec executor to write jobs to a local dir

2019-11-05 Thread GitBox
zxcware commented on a change in pull request #2755: [GOBBLIN-897] adds local 
FS spec executor to write jobs to a local dir
URL: https://github.com/apache/incubator-gobblin/pull/2755#discussion_r342724920
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/spec_executorInstance/LocalFsSpecProducer.java
 ##
 @@ -94,7 +96,14 @@ public LocalFsSpecProducer(Config config) {
* @param headers*/
   @Override
   public Future deleteSpec(URI deletedSpecURI, Properties headers) {
-return new CompletedFuture<>(Boolean.TRUE, null);
+String[] uriTokens = deletedSpecURI.getPath().split("/");
+String jobFileName = String.join("_", uriTokens) + ".job";
+File file = new File(jobFileName);
+if (file.delete()) {
+  log.info("Deleted spec: {}", jobFileName);
+  return new CompletedFuture<>(Boolean.TRUE, null);
+}
+throw new RuntimeException(String.format("Failed to delete file with uri 
%s", deletedSpecURI));
 
 Review comment:
   This shouldn't halt the service, return a `CompletedFuture` with an 
exception and expect the invoker to do proper error handling.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-940) Synchronization between workunit persistency and Helix job launching

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-940?focusedWorklogId=338905&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338905
 ]

ASF GitHub Bot logged work on GOBBLIN-940:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 18:07
Start Date: 05/Nov/19 18:07
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2789: [GOBBLIN-940]Add 
synchronization on workunit persistency before Helix job launching
URL: 
https://github.com/apache/incubator-gobblin/pull/2789#issuecomment-548527186
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2789?src=pr&el=h1)
 Report
   > Merging 
[#2789](https://codecov.io/gh/apache/incubator-gobblin/pull/2789?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/9ee4dcaf66257b6e2926cf1470b16b912cd343ff?src=pr&el=desc)
 will **increase** coverage by `40.22%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2789?src=pr&el=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2789   +/-   ##
   =
   + Coverage  4.15%   44.38%   +40.22% 
   - Complexity  746 8694 +7948 
   =
 Files  1894 1894   
 Lines 7087770918   +41 
 Branches   7793 7800+7 
   =
   + Hits   294631475+28529 
   + Misses6761736519-31098 
   - Partials314 2924 +2610
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2789?src=pr&el=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...n/java/org/apache/gobblin/util/ParallelRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvUGFyYWxsZWxSdW5uZXIuamF2YQ==)
 | `63.96% <100%> (+63.96%)` | `15 <1> (+15)` | :arrow_up: |
   | 
[...pache/gobblin/cluster/GobblinHelixJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkhlbGl4Sm9iTGF1bmNoZXIuamF2YQ==)
 | `82% <100%> (+82%)` | `27 <0> (+27)` | :arrow_up: |
   | 
[...re/filesystem/FsDatasetStateStoreEntryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWV0YXN0b3JlL2ZpbGVzeXN0ZW0vRnNEYXRhc2V0U3RhdGVTdG9yZUVudHJ5TWFuYWdlci5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (-3%)` | |
   | 
[...askStateCollectorServiceHiveRegHandlerFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvVGFza1N0YXRlQ29sbGVjdG9yU2VydmljZUhpdmVSZWdIYW5kbGVyRmFjdG9yeS5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
   | 
[...gobblin/runtime/mapreduce/GobblinOutputFormat.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0dvYmJsaW5PdXRwdXRGb3JtYXQuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
   | 
[...in/runtime/mapreduce/CustomizedProgresserBase.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0N1c3RvbWl6ZWRQcm9ncmVzc2VyQmFzZS5qYXZh)
 | `0% <0%> (-83.34%)` | `0% <0%> (-1%)` | |
   | 
[...rg/apache/gobblin/runtime/ZkDatasetStateStore.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcnVudGltZS9aa0RhdGFzZXRTdGF0ZVN0b3JlLmphdmE=)
 | `0% <0%> (-80.77%)` | `0% <0%> (-7%)` | |
   | 
[...lin/runtime/locks/LegacyJobLockFactoryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvTGVnYWN5Sm9iTG9ja0ZhY3RvcnlNYW5hZ2VyLmphdmE=)
 | `0% <0%> (-78.58%)` | `0% <0%> (-2%)` | |
   | 
[...e/HiveRegTaskStateCollectorServiceHandlerImpl.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvSGl2ZVJlZ1Rhc2tTdGF0ZUNvbGxlY3RvclNlcnZpY2VIYW5kbGVySW1wbC5qYXZh)
 | `0% <0%> (-75%)` | `0% <0%> (-3%)` | |
   | 
[.../apache/gobblin/me

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2789: [GOBBLIN-940]Add synchronization on workunit persistency before Helix job launching

2019-11-05 Thread GitBox
codecov-io edited a comment on issue #2789: [GOBBLIN-940]Add synchronization on 
workunit persistency before Helix job launching
URL: 
https://github.com/apache/incubator-gobblin/pull/2789#issuecomment-548527186
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2789?src=pr&el=h1)
 Report
   > Merging 
[#2789](https://codecov.io/gh/apache/incubator-gobblin/pull/2789?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/9ee4dcaf66257b6e2926cf1470b16b912cd343ff?src=pr&el=desc)
 will **increase** coverage by `40.22%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2789?src=pr&el=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2789   +/-   ##
   =
   + Coverage  4.15%   44.38%   +40.22% 
   - Complexity  746 8694 +7948 
   =
 Files  1894 1894   
 Lines 7087770918   +41 
 Branches   7793 7800+7 
   =
   + Hits   294631475+28529 
   + Misses6761736519-31098 
   - Partials314 2924 +2610
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2789?src=pr&el=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...n/java/org/apache/gobblin/util/ParallelRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvUGFyYWxsZWxSdW5uZXIuamF2YQ==)
 | `63.96% <100%> (+63.96%)` | `15 <1> (+15)` | :arrow_up: |
   | 
[...pache/gobblin/cluster/GobblinHelixJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkhlbGl4Sm9iTGF1bmNoZXIuamF2YQ==)
 | `82% <100%> (+82%)` | `27 <0> (+27)` | :arrow_up: |
   | 
[...re/filesystem/FsDatasetStateStoreEntryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWV0YXN0b3JlL2ZpbGVzeXN0ZW0vRnNEYXRhc2V0U3RhdGVTdG9yZUVudHJ5TWFuYWdlci5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (-3%)` | |
   | 
[...askStateCollectorServiceHiveRegHandlerFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvVGFza1N0YXRlQ29sbGVjdG9yU2VydmljZUhpdmVSZWdIYW5kbGVyRmFjdG9yeS5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
   | 
[...gobblin/runtime/mapreduce/GobblinOutputFormat.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0dvYmJsaW5PdXRwdXRGb3JtYXQuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
   | 
[...in/runtime/mapreduce/CustomizedProgresserBase.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0N1c3RvbWl6ZWRQcm9ncmVzc2VyQmFzZS5qYXZh)
 | `0% <0%> (-83.34%)` | `0% <0%> (-1%)` | |
   | 
[...rg/apache/gobblin/runtime/ZkDatasetStateStore.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcnVudGltZS9aa0RhdGFzZXRTdGF0ZVN0b3JlLmphdmE=)
 | `0% <0%> (-80.77%)` | `0% <0%> (-7%)` | |
   | 
[...lin/runtime/locks/LegacyJobLockFactoryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvTGVnYWN5Sm9iTG9ja0ZhY3RvcnlNYW5hZ2VyLmphdmE=)
 | `0% <0%> (-78.58%)` | `0% <0%> (-2%)` | |
   | 
[...e/HiveRegTaskStateCollectorServiceHandlerImpl.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvSGl2ZVJlZ1Rhc2tTdGF0ZUNvbGxlY3RvclNlcnZpY2VIYW5kbGVySW1wbC5qYXZh)
 | `0% <0%> (-75%)` | `0% <0%> (-3%)` | |
   | 
[.../apache/gobblin/metastore/ZkStateStoreFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0YXN0b3JlL1prU3RhdGVTdG9yZUZhY3RvcnkuamF2YQ==)
 | `0% <0%> (-71.43%)` | `0% <0%> (-2%)` | |
   | ... and [1117 
more](https://codecov.io/gh/apache/incubator-gobblin/pull/2789/diff?src=pr&el=tree-more)
 | |
   
   --
  

[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338900&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338900
 ]

ASF GitHub Bot logged work on GOBBLIN-945:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 17:52
Start Date: 05/Nov/19 17:52
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on issue #2795: GOBBLIN-945: Refactor 
Kafka extractor statistics tracking to allow co…
URL: 
https://github.com/apache/incubator-gobblin/pull/2795#issuecomment-549938182
 
 
   @autumnust @htran1 Please review.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338900)
Time Spent: 20m  (was: 10m)

> Refactor Kafka extractor statistics tracking to allow code reuse across both 
> batch and streaming execution modes
> 
>
> Key: GOBBLIN-945
> URL: https://issues.apache.org/jira/browse/GOBBLIN-945
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-kafka
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Shirshanka Das
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Current implementation of kafka extractor stats tracking is deeply integrated 
> with the batch implementation of KafkaExtractor preventing it from being used 
> in streaming Kafka extractor implementations. In addition to code reuse, the 
> refactoring allows for writing unit tests for statistics tracker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-945?focusedWorklogId=338899&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338899
 ]

ASF GitHub Bot logged work on GOBBLIN-945:
--

Author: ASF GitHub Bot
Created on: 05/Nov/19 17:51
Start Date: 05/Nov/19 17:51
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2795: GOBBLIN-945: 
Refactor Kafka extractor statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795
 
 
   …de reuse across both batch and streaming execution modes
   
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-945
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   Current implementation of kafka extractor stats tracking is deeply 
integrated with the batch implementation of KafkaExtractor preventing it from 
being used in streaming Kafka extractor implementations. In addition to code 
reuse, the refactoring allows for writing unit tests for statistics tracker. 
   
   
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Added unit tests to KafkaExtractorStatsTrackerTest class.
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338899)
Remaining Estimate: 0h
Time Spent: 10m

> Refactor Kafka extractor statistics tracking to allow code reuse across both 
> batch and streaming execution modes
> 
>
> Key: GOBBLIN-945
> URL: https://issues.apache.org/jira/browse/GOBBLIN-945
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-kafka
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Shirshanka Das
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Current implementation of kafka extractor stats tracking is deeply integrated 
> with the batch implementation of KafkaExtractor preventing it from being used 
> in streaming Kafka extractor implementations. In addition to code reuse, the 
> refactoring allows for writing unit tests for statistics tracker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] sv2000 commented on issue #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…

2019-11-05 Thread GitBox
sv2000 commented on issue #2795: GOBBLIN-945: Refactor Kafka extractor 
statistics tracking to allow co…
URL: 
https://github.com/apache/incubator-gobblin/pull/2795#issuecomment-549938182
 
 
   @autumnust @htran1 Please review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 opened a new pull request #2795: GOBBLIN-945: Refactor Kafka extractor statistics tracking to allow co…

2019-11-05 Thread GitBox
sv2000 opened a new pull request #2795: GOBBLIN-945: Refactor Kafka extractor 
statistics tracking to allow co…
URL: https://github.com/apache/incubator-gobblin/pull/2795
 
 
   …de reuse across both batch and streaming execution modes
   
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-945
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   Current implementation of kafka extractor stats tracking is deeply 
integrated with the batch implementation of KafkaExtractor preventing it from 
being used in streaming Kafka extractor implementations. In addition to code 
reuse, the refactoring allows for writing unit tests for statistics tracker. 
   
   
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Added unit tests to KafkaExtractorStatsTrackerTest class.
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (GOBBLIN-945) Refactor Kafka extractor statistics tracking to allow code reuse across both batch and streaming execution modes

2019-11-05 Thread Sudarshan Vasudevan (Jira)
Sudarshan Vasudevan created GOBBLIN-945:
---

 Summary: Refactor Kafka extractor statistics tracking to allow 
code reuse across both batch and streaming execution modes
 Key: GOBBLIN-945
 URL: https://issues.apache.org/jira/browse/GOBBLIN-945
 Project: Apache Gobblin
  Issue Type: Improvement
  Components: gobblin-kafka
Affects Versions: 0.15.0
Reporter: Sudarshan Vasudevan
Assignee: Shirshanka Das
 Fix For: 0.15.0


Current implementation of kafka extractor stats tracking is deeply integrated 
with the batch implementation of KafkaExtractor preventing it from being used 
in streaming Kafka extractor implementations. In addition to code reuse, the 
refactoring allows for writing unit tests for statistics tracker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)