[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=305649&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-305649 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 03/Sep/19 15:30 Start Date: 03/Sep/19 15:30 Worklog Time Spent: 10m Work Description: asfgit commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 305649) Time Spent: 4h (was: 3h 50m) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=303347&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-303347 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 29/Aug/19 04:07 Start Date: 29/Aug/19 04:07 Worklog Time Spent: 10m Work Description: autumnust commented on issue #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#issuecomment-526012735 LGTM, thanks for fixing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 303347) Time Spent: 3h 50m (was: 3h 40m) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=303323&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-303323 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 29/Aug/19 02:06 Start Date: 29/Aug/19 02:06 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#issuecomment-523697466 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2719?src=pr&el=h1) Report > Merging [#2719](https://codecov.io/gh/apache/incubator-gobblin/pull/2719?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/bd4a09ab77a4447490b25e31bc98cafbda9f2847?src=pr&el=desc) will **increase** coverage by `0.87%`. > The diff coverage is `3.37%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2719?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2719 +/- ## + Coverage 44.13% 45.01% +0.87% - Complexity 8576 8740 +164 Files 1880 1884 +4 Lines 7017070266 +96 Branches 7700 7709 +9 + Hits 3097331627 +654 + Misses3631235709 -603 - Partials 2885 2930 +45 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2719?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...che/gobblin/hive/metastore/HiveMetaStoreUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlVXRpbHMuamF2YQ==) | `31.83% <ø> (ø)` | `12 <0> (ø)` | :arrow_down: | | [...lin/hive/metastore/HiveMetaStoreBasedRegister.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlQmFzZWRSZWdpc3Rlci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...ain/java/org/apache/gobblin/hive/HiveRegister.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVSZWdpc3Rlci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...ain/java/org/apache/gobblin/hive/HiveLockImpl.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVMb2NrSW1wbC5qYXZh) | `0% <0%> (ø)` | `0 <0> (?)` | | | [.../java/org/apache/gobblin/hive/HiveLockFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVMb2NrRmFjdG9yeS5qYXZh) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...rc/main/java/org/apache/gobblin/hive/HiveLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVMb2NrLmphdmE=) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...org/apache/gobblin/hive/AutoCloseableHiveLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0F1dG9DbG9zZWFibGVIaXZlTG9jay5qYXZh) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...blin/runtime/locks/DistributedHiveLockFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvRGlzdHJpYnV0ZWRIaXZlTG9ja0ZhY3RvcnkuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `63.33% <100%> (+63.33%)` | `15 <2> (+15)` | :arrow_up: | | [...gobblin/cluster/ClusterEventMetadataGenerator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWl
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=303275&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-303275 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 28/Aug/19 22:37 Start Date: 28/Aug/19 22:37 Worklog Time Spent: 10m Work Description: ZihanLi58 commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318821215 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/locks/DistributedHiveLockFactory.java ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.gobblin.runtime.locks; + +import java.io.IOException; +import java.util.Properties; +import org.apache.gobblin.hive.HiveLockFactory; +import org.apache.gobblin.hive.HiveLockImpl; + +/** + * A lock factory that extends {@link HiveLockFactory} provide a get method for a distributed lock for a specific object + */ +public class DistributedHiveLockFactory extends HiveLockFactory { + public DistributedHiveLockFactory(Properties properties) { +super(properties); + } + public HiveLockImpl get(String name) { +return new HiveLockImpl(new ZookeeperBasedJobLock(properties, name)) { + @Override + public void lock() throws IOException { +try { + this.lock.lock(); +} catch (Exception e) { + throw new IOException(e); +} + } + + @Override + public void unlock() throws IOException { +try { + this.lock.unlock(); +} catch (Exception e) { + throw new IOException(e); Review comment: Sure. I'll address that This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 303275) Time Spent: 3.5h (was: 3h 20m) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=303267&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-303267 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 28/Aug/19 22:09 Start Date: 28/Aug/19 22:09 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318813950 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/locks/DistributedHiveLockFactory.java ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.gobblin.runtime.locks; + +import java.io.IOException; +import java.util.Properties; +import org.apache.gobblin.hive.HiveLockFactory; +import org.apache.gobblin.hive.HiveLockImpl; + +/** + * A lock factory that extends {@link HiveLockFactory} provide a get method for a distributed lock for a specific object + */ +public class DistributedHiveLockFactory extends HiveLockFactory { + public DistributedHiveLockFactory(Properties properties) { +super(properties); + } + public HiveLockImpl get(String name) { +return new HiveLockImpl(new ZookeeperBasedJobLock(properties, name)) { + @Override + public void lock() throws IOException { +try { + this.lock.lock(); +} catch (Exception e) { + throw new IOException(e); +} + } + + @Override + public void unlock() throws IOException { +try { + this.lock.unlock(); +} catch (Exception e) { + throw new IOException(e); Review comment: Fair enough. In that case, shall we specify `JobLockException` instead of `Exception` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 303267) Time Spent: 3h 20m (was: 3h 10m) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=303081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-303081 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 28/Aug/19 18:23 Start Date: 28/Aug/19 18:23 Worklog Time Spent: 10m Work Description: ZihanLi58 commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318727218 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/locks/DistributedHiveLockFactory.java ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.gobblin.runtime.locks; + +import java.io.IOException; +import java.util.Properties; +import org.apache.gobblin.hive.HiveLockFactory; +import org.apache.gobblin.hive.HiveLockImpl; + +/** + * A lock factory that extends {@link HiveLockFactory} provide a get method for a distributed lock for a specific object + */ +public class DistributedHiveLockFactory extends HiveLockFactory { + public DistributedHiveLockFactory(Properties properties) { +super(properties); + } + public HiveLockImpl get(String name) { +return new HiveLockImpl(new ZookeeperBasedJobLock(properties, name)) { + @Override + public void lock() throws IOException { +try { + this.lock.lock(); +} catch (Exception e) { + throw new IOException(e); +} + } + + @Override + public void unlock() throws IOException { +try { + this.lock.unlock(); +} catch (Exception e) { + throw new IOException(e); Review comment: Because it's a JobLockException, I want to catch it and throw as a IOException with can be caught later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 303081) Time Spent: 3h 10m (was: 3h) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=303078&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-303078 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 28/Aug/19 18:20 Start Date: 28/Aug/19 18:20 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318725927 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/locks/DistributedHiveLockFactory.java ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.gobblin.runtime.locks; + +import java.io.IOException; +import java.util.Properties; +import org.apache.gobblin.hive.HiveLockFactory; +import org.apache.gobblin.hive.HiveLockImpl; + +/** + * A lock factory that extends {@link HiveLockFactory} provide a get method for a distributed lock for a specific object + */ +public class DistributedHiveLockFactory extends HiveLockFactory { + public DistributedHiveLockFactory(Properties properties) { +super(properties); + } + public HiveLockImpl get(String name) { +return new HiveLockImpl(new ZookeeperBasedJobLock(properties, name)) { + @Override + public void lock() throws IOException { +try { + this.lock.lock(); +} catch (Exception e) { + throw new IOException(e); +} + } + + @Override + public void unlock() throws IOException { +try { + this.lock.unlock(); +} catch (Exception e) { + throw new IOException(e); Review comment: Why this exception needs to be caught and throw again? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 303078) Time Spent: 3h (was: 2h 50m) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=303077&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-303077 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 28/Aug/19 18:20 Start Date: 28/Aug/19 18:20 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318725463 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/locks/DistributedHiveLockFactory.java ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.gobblin.runtime.locks; + +import java.io.IOException; +import java.util.Properties; +import org.apache.gobblin.hive.HiveLockFactory; +import org.apache.gobblin.hive.HiveLockImpl; + +/** + * A lock factory that extends {@link HiveLockFactory} provide a get method for a distributed lock for a specific object + */ +public class DistributedHiveLockFactory extends HiveLockFactory { + public DistributedHiveLockFactory(Properties properties) { +super(properties); + } + public HiveLockImpl get(String name) { +return new HiveLockImpl(new ZookeeperBasedJobLock(properties, name)) { + @Override + public void lock() throws IOException { +try { + this.lock.lock(); +} catch (Exception e) { + throw new IOException(e); +} + } + + @Override + public void unlock() throws IOException { +try { + this.lock.unlock(); +} catch (Exception e) { Review comment: Why this exception needs to be caught and throw again? And why the exception handling between Distributed lock need be different from local lock ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 303077) Time Spent: 2h 50m (was: 2h 40m) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=303076&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-303076 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 28/Aug/19 18:19 Start Date: 28/Aug/19 18:19 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318725463 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/locks/DistributedHiveLockFactory.java ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.gobblin.runtime.locks; + +import java.io.IOException; +import java.util.Properties; +import org.apache.gobblin.hive.HiveLockFactory; +import org.apache.gobblin.hive.HiveLockImpl; + +/** + * A lock factory that extends {@link HiveLockFactory} provide a get method for a distributed lock for a specific object + */ +public class DistributedHiveLockFactory extends HiveLockFactory { + public DistributedHiveLockFactory(Properties properties) { +super(properties); + } + public HiveLockImpl get(String name) { +return new HiveLockImpl(new ZookeeperBasedJobLock(properties, name)) { + @Override + public void lock() throws IOException { +try { + this.lock.lock(); +} catch (Exception e) { + throw new IOException(e); +} + } + + @Override + public void unlock() throws IOException { +try { + this.lock.unlock(); +} catch (Exception e) { Review comment: Why this exception needs to be caught and throw again? And why the exception handling between Distributed lock need be different from local lock ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 303076) Time Spent: 2h 40m (was: 2.5h) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=302432&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302432 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 27/Aug/19 22:04 Start Date: 27/Aug/19 22:04 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#issuecomment-523697466 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2719?src=pr&el=h1) Report > Merging [#2719](https://codecov.io/gh/apache/incubator-gobblin/pull/2719?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/bd4a09ab77a4447490b25e31bc98cafbda9f2847?src=pr&el=desc) will **increase** coverage by `0.9%`. > The diff coverage is `3.37%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2719?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2719 +/- ## === + Coverage 44.13% 45.04% +0.9% - Complexity 8576 8745+169 === Files 1880 1884 +4 Lines 7017070231 +61 Branches 7700 7702 +2 === + Hits 3097331635+662 + Misses3631235671-641 - Partials 2885 2925 +40 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2719?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...che/gobblin/hive/metastore/HiveMetaStoreUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlVXRpbHMuamF2YQ==) | `31.83% <ø> (ø)` | `12 <0> (ø)` | :arrow_down: | | [...lin/hive/metastore/HiveMetaStoreBasedRegister.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlQmFzZWRSZWdpc3Rlci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...ain/java/org/apache/gobblin/hive/HiveRegister.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVSZWdpc3Rlci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...ain/java/org/apache/gobblin/hive/HiveLockImpl.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVMb2NrSW1wbC5qYXZh) | `0% <0%> (ø)` | `0 <0> (?)` | | | [.../java/org/apache/gobblin/hive/HiveLockFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVMb2NrRmFjdG9yeS5qYXZh) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...rc/main/java/org/apache/gobblin/hive/HiveLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVMb2NrLmphdmE=) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...org/apache/gobblin/hive/AutoCloseableHiveLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0F1dG9DbG9zZWFibGVIaXZlTG9jay5qYXZh) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...blin/runtime/locks/DistributedHiveLockFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvRGlzdHJpYnV0ZWRIaXZlTG9ja0ZhY3RvcnkuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `63.33% <100%> (+63.33%)` | `15 <2> (+15)` | :arrow_up: | | [...ache/gobblin/couchbase/writer/CouchbaseWriter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY291Y2hiYXNlL
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=302372&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302372 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 27/Aug/19 20:37 Start Date: 27/Aug/19 20:37 Worklog Time Spent: 10m Work Description: ZihanLi58 commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318285399 ## File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/HiveRegister.java ## @@ -94,29 +94,32 @@ protected HiveRegister(State state) { @Override public Void call() throws Exception { +try { Review comment: Because in streaming job, the TaskExecutor will not close, so it's hard to catch the exception when the executor finished. So I just want to print the error during the job, which I think will be more helpful for debug. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 302372) Time Spent: 2h 20m (was: 2h 10m) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=302370&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302370 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 27/Aug/19 20:34 Start Date: 27/Aug/19 20:34 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318284064 ## File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/HiveLock.java ## @@ -37,30 +39,45 @@ * */ public class HiveLock { + private static String HIVE_LOCK_TYPE = HiveMetaStoreBasedRegister.HIVE_REGISTER_METRICS_PREFIX + "lock.type"; + private static String HIVE_LOCK_TYPE_DEFAULT = "org.apache.gobblin.hive.HiveLockFactory"; - private static final Joiner JOINER = Joiner.on(' ').skipNulls(); + private Properties properties; - private final Striped locks = Striped.lazyWeakLock(Integer.MAX_VALUE); + private static final Joiner JOINER = Joiner.on('/').skipNulls(); Review comment: I see, makes sense if under ZK's context. Then it might be better make it implementation-specific, but I don't have strong opinion onit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 302370) Time Spent: 2h 10m (was: 2h) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=302335&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302335 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 27/Aug/19 19:26 Start Date: 27/Aug/19 19:26 Worklog Time Spent: 10m Work Description: ZihanLi58 commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318256274 ## File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/HiveLockImpl.java ## @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.gobblin.hive; + +import java.io.IOException; + + +/** + * A wrapper lock to be used by hive. + * @param The class of the real lock + */ +public abstract class HiveLockImpl { + public T lock; Review comment: Will address This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 302335) Time Spent: 2h (was: 1h 50m) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=302332&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302332 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 27/Aug/19 19:26 Start Date: 27/Aug/19 19:26 Worklog Time Spent: 10m Work Description: ZihanLi58 commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318255786 ## File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreBasedRegister.java ## @@ -197,6 +211,18 @@ private boolean ensureHiveTableExistenceBeforeAlternation(String tableName, Stri try (Timer.Context context = this.metricContext.timer(GET_HIVE_TABLE).time()) { existingTable = HiveMetaStoreUtils.getHiveTable(client.getTable(dbName, tableName)); } +//TODO: Determine whether we still use inline hive registration, +// if so, instead of fetching schema from schema registry, we need to enable schema version +if (this.schemaRegistry.isPresent()) { Review comment: This is the previous way we plan to solve the race condition. But I realize this is not the final schema which writer use. So I will implement schema version instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 302332) Time Spent: 1h 50m (was: 1h 40m) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=302331&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302331 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 27/Aug/19 19:25 Start Date: 27/Aug/19 19:25 Worklog Time Spent: 10m Work Description: ZihanLi58 commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318255786 ## File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreBasedRegister.java ## @@ -197,6 +211,18 @@ private boolean ensureHiveTableExistenceBeforeAlternation(String tableName, Stri try (Timer.Context context = this.metricContext.timer(GET_HIVE_TABLE).time()) { existingTable = HiveMetaStoreUtils.getHiveTable(client.getTable(dbName, tableName)); } +//TODO: Determine whether we still use inline hive registration, +// if so, instead of fetching schema from schema registry, we need to enable schema version +if (this.schemaRegistry.isPresent()) { Review comment: This is the previous thinking. But I realize this is not the final schema. So I will implement schema version instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 302331) Time Spent: 1h 40m (was: 1.5h) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=302329&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302329 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 27/Aug/19 19:24 Start Date: 27/Aug/19 19:24 Worklog Time Spent: 10m Work Description: ZihanLi58 commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318255221 ## File path: gobblin-docs/user-guide/Configuration-Properties-Glossary.md ## @@ -98,7 +98,7 @@ These properties are common to both the Job Launcher and the Command Line. | `job.group` | A way to group logically similar jobs together. | No | None | | `job.description` | A description of what the jobs does. | No | None | | `job.lock.enabled` | If set to true job locks are enabled, if set to false they are disabled | No | True | -| `job.lock.type` | The fully qualified name of the JobLock class to run. The JobLock is responsible for ensuring that only a single instance of a job runs at a time. Allowed values: [gobblin.runtime.locks.FileBasedJobLock](#FileBasedJobLock-Properties), [gobblin.runtime.locks.ZookeeperBasedJobLock](#ZookeeperBasedJobLock-Properties) | No | `gobblin.runtime.locks.FileBasedJobLock` | +| `job.lock.type` | The fully qualified name of the JobLock class to run. The JobLock is responsible for ensuring that only a single instance of a job runs at a time. Allowed values: [gobblin.runtime.locks.FfiFileBasedJobLock](#FileBasedJobLock-Properties), [gobblin.runtime.locks.ZookeeperBasedJobLock](#ZookeeperBasedJobLock-Properties) | No | `gobblin.runtime.locks.FileBasedJobLock` | Review comment: Will address This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 302329) Time Spent: 1.5h (was: 1h 20m) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=302328&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302328 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 27/Aug/19 19:24 Start Date: 27/Aug/19 19:24 Worklog Time Spent: 10m Work Description: ZihanLi58 commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318255121 ## File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/HiveLock.java ## @@ -37,30 +39,45 @@ * */ public class HiveLock { + private static String HIVE_LOCK_TYPE = HiveMetaStoreBasedRegister.HIVE_REGISTER_METRICS_PREFIX + "lock.type"; + private static String HIVE_LOCK_TYPE_DEFAULT = "org.apache.gobblin.hive.HiveLockFactory"; - private static final Joiner JOINER = Joiner.on(' ').skipNulls(); + private Properties properties; - private final Striped locks = Striped.lazyWeakLock(Integer.MAX_VALUE); + private static final Joiner JOINER = Joiner.on('/').skipNulls(); Review comment: Just for using zookeeper lock, this will be used to create a path, that's why I use "/" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 302328) Time Spent: 1h 20m (was: 1h 10m) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=302322&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302322 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 27/Aug/19 19:19 Start Date: 27/Aug/19 19:19 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318241215 ## File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/HiveLockImpl.java ## @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.gobblin.hive; + +import java.io.IOException; + + +/** + * A wrapper lock to be used by hive. + * @param The class of the real lock + */ +public abstract class HiveLockImpl { + public T lock; Review comment: should this be protected instead ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 302322) Time Spent: 1h (was: 50m) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=302323&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302323 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 27/Aug/19 19:19 Start Date: 27/Aug/19 19:19 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318252148 ## File path: gobblin-docs/user-guide/Configuration-Properties-Glossary.md ## @@ -98,7 +98,7 @@ These properties are common to both the Job Launcher and the Command Line. | `job.group` | A way to group logically similar jobs together. | No | None | | `job.description` | A description of what the jobs does. | No | None | | `job.lock.enabled` | If set to true job locks are enabled, if set to false they are disabled | No | True | -| `job.lock.type` | The fully qualified name of the JobLock class to run. The JobLock is responsible for ensuring that only a single instance of a job runs at a time. Allowed values: [gobblin.runtime.locks.FileBasedJobLock](#FileBasedJobLock-Properties), [gobblin.runtime.locks.ZookeeperBasedJobLock](#ZookeeperBasedJobLock-Properties) | No | `gobblin.runtime.locks.FileBasedJobLock` | +| `job.lock.type` | The fully qualified name of the JobLock class to run. The JobLock is responsible for ensuring that only a single instance of a job runs at a time. Allowed values: [gobblin.runtime.locks.FfiFileBasedJobLock](#FileBasedJobLock-Properties), [gobblin.runtime.locks.ZookeeperBasedJobLock](#ZookeeperBasedJobLock-Properties) | No | `gobblin.runtime.locks.FileBasedJobLock` | Review comment: typo in `FfiFileBasedJobLock`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 302323) Time Spent: 1h 10m (was: 1h) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=302324&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302324 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 27/Aug/19 19:19 Start Date: 27/Aug/19 19:19 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318245127 ## File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreBasedRegister.java ## @@ -197,6 +211,18 @@ private boolean ensureHiveTableExistenceBeforeAlternation(String tableName, Stri try (Timer.Context context = this.metricContext.timer(GET_HIVE_TABLE).time()) { existingTable = HiveMetaStoreUtils.getHiveTable(client.getTable(dbName, tableName)); } +//TODO: Determine whether we still use inline hive registration, +// if so, instead of fetching schema from schema registry, we need to enable schema version +if (this.schemaRegistry.isPresent()) { Review comment: Are we fetching the schema from registry ? Not sure if this is finalized ... let's separated this block out into a public method ( make schemaRegistry lazily init inside the method if possible ) so that we could switch to other ways of getting the right schema easily This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 302324) Time Spent: 1h 10m (was: 1h) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=302326&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302326 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 27/Aug/19 19:19 Start Date: 27/Aug/19 19:19 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318242226 ## File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/HiveRegister.java ## @@ -94,29 +94,32 @@ protected HiveRegister(State state) { @Override public Void call() throws Exception { +try { Review comment: Which specific exception are you trying to catch here ? The exception-handling seems strange here given the method is already throwing exception and the catch block isn't really do anything special other than printing a line of log. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 302326) Time Spent: 1h 10m (was: 1h) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=302325&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302325 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 27/Aug/19 19:19 Start Date: 27/Aug/19 19:19 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318253087 ## File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/HiveLock.java ## @@ -37,30 +39,45 @@ * */ public class HiveLock { + private static String HIVE_LOCK_TYPE = HiveMetaStoreBasedRegister.HIVE_REGISTER_METRICS_PREFIX + "lock.type"; + private static String HIVE_LOCK_TYPE_DEFAULT = "org.apache.gobblin.hive.HiveLockFactory"; - private static final Joiner JOINER = Joiner.on(' ').skipNulls(); + private Properties properties; - private final Striped locks = Striped.lazyWeakLock(Integer.MAX_VALUE); + private static final Joiner JOINER = Joiner.on('/').skipNulls(); Review comment: using `.` seems more intuitive when it comes to the relation between db and table. But I don't have strong option on this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 302325) Time Spent: 1h 10m (was: 1h) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=302321&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302321 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 27/Aug/19 19:19 Start Date: 27/Aug/19 19:19 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318241484 ## File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/HiveRegister.java ## @@ -94,29 +94,32 @@ protected HiveRegister(State state) { @Override public Void call() throws Exception { +try { + if (spec instanceof HiveSpecWithPredicates && !evaluatePredicates((HiveSpecWithPredicates) spec)) { +log.info("Skipping " + spec + " since predicates return false"); +return null; + } -if (spec instanceof HiveSpecWithPredicates && !evaluatePredicates((HiveSpecWithPredicates) spec)) { - log.info("Skipping " + spec + " since predicates return false"); - return null; -} - -if (spec instanceof HiveSpecWithPreActivities) { - for (Activity activity : ((HiveSpecWithPreActivities) spec).getPreActivities()) { -activity.execute(HiveRegister.this); + if (spec instanceof HiveSpecWithPreActivities) { +for (Activity activity : ((HiveSpecWithPreActivities) spec).getPreActivities()) { + activity.execute(HiveRegister.this); +} } -} -registerPath(spec); + registerPath(spec); Review comment: remove this space This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 302321) Time Spent: 1h (was: 50m) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=301638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301638 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 27/Aug/19 01:54 Start Date: 27/Aug/19 01:54 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#issuecomment-523697466 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2719?src=pr&el=h1) Report > Merging [#2719](https://codecov.io/gh/apache/incubator-gobblin/pull/2719?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/bd4a09ab77a4447490b25e31bc98cafbda9f2847?src=pr&el=desc) will **increase** coverage by `0.9%`. > The diff coverage is `3.52%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2719?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2719 +/- ## === + Coverage 44.13% 45.04% +0.9% - Complexity 8576 8743+167 === Files 1880 1884 +4 Lines 7017070227 +57 Branches 7700 7702 +2 === + Hits 3097331636+663 + Misses3631235667-645 - Partials 2885 2924 +39 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2719?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...che/gobblin/hive/metastore/HiveMetaStoreUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlVXRpbHMuamF2YQ==) | `31.83% <ø> (ø)` | `12 <0> (ø)` | :arrow_down: | | [...lin/hive/metastore/HiveMetaStoreBasedRegister.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlQmFzZWRSZWdpc3Rlci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...ain/java/org/apache/gobblin/hive/HiveLockImpl.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVMb2NrSW1wbC5qYXZh) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...rc/main/java/org/apache/gobblin/hive/HiveLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVMb2NrLmphdmE=) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...org/apache/gobblin/hive/AutoCloseableHiveLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0F1dG9DbG9zZWFibGVIaXZlTG9jay5qYXZh) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...ain/java/org/apache/gobblin/hive/HiveRegister.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVSZWdpc3Rlci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [.../java/org/apache/gobblin/hive/HiveLockFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVMb2NrRmFjdG9yeS5qYXZh) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...blin/runtime/locks/DistributedHiveLockFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvRGlzdHJpYnV0ZWRIaXZlTG9ja0ZhY3RvcnkuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (?)` | | | [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `63.33% <100%> (+63.33%)` | `15 <2> (+15)` | :arrow_up: | | [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=299934&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299934 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 23/Aug/19 04:48 Start Date: 23/Aug/19 04:48 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r316971288 ## File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreBasedRegister.java ## @@ -93,9 +98,11 @@ public static final String CREATE_HIVE_DATABASE = HIVE_REGISTER_METRICS_PREFIX + "createDatabaseTimer"; public static final String CREATE_HIVE_TABLE = HIVE_REGISTER_METRICS_PREFIX + "createTableTimer"; public static final String GET_HIVE_TABLE = HIVE_REGISTER_METRICS_PREFIX + "getTableTimer"; + public static final String GET_AND_SET_LATEST_SCHEMA = HIVE_REGISTER_METRICS_PREFIX + "getAndSetLatestSchemaTimer"; public static final String DROP_TABLE = HIVE_REGISTER_METRICS_PREFIX + "dropTableTimer"; public static final String PATH_REGISTER_TIMER = HIVE_REGISTER_METRICS_PREFIX + "pathRegisterTimer"; public static final String SKIP_PARTITION_DIFF_COMPUTATION = HIVE_REGISTER_METRICS_PREFIX + "skip.partition.diff.computation"; + public static final String FETCH_LATEST_SCHEMA = HIVE_REGISTER_METRICS_PREFIX + "fetch.latest.schema"; Review comment: How about HIVE_REGISTER_METRICS_PREFIX + "fetchLatestSchemaFromSchemaRegistry"? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299934) Time Spent: 40m (was: 0.5h) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=299935&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299935 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 23/Aug/19 04:48 Start Date: 23/Aug/19 04:48 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r316973043 ## File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreBasedRegister.java ## @@ -176,7 +189,7 @@ protected void registerPath(HiveSpec spec) throws IOException { * Or will create the table thru. RPC and return retVal from remote MetaStore. */ private boolean ensureHiveTableExistenceBeforeAlternation(String tableName, String dbName, IMetaStoreClient client, - Table table, HiveSpec spec) throws TException{ + Table table, HiveSpec spec) throws TException, IOException{ try (AutoCloseableLock lock = this.locks.getTableLock(dbName, tableName)) { Review comment: Isn't this lock a JVM-local lock? Won't we need Hive Metastore side locking to serialize updates? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299935) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=299936&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299936 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 23/Aug/19 04:48 Start Date: 23/Aug/19 04:48 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r316971795 ## File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreBasedRegister.java ## @@ -197,6 +210,16 @@ private boolean ensureHiveTableExistenceBeforeAlternation(String tableName, Stri try (Timer.Context context = this.metricContext.timer(GET_HIVE_TABLE).time()) { existingTable = HiveMetaStoreUtils.getHiveTable(client.getTable(dbName, tableName)); } +if (this.schemaRegistry.isPresent()) { + try (Timer.Context context = this.metricContext.timer(GET_AND_SET_LATEST_SCHEMA).time()) { +String latestSchema = this.schemaRegistry.get().getLatestSchema(topicName).toString(); + spec.getTable().getSerDeProps().setProp(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(), latestSchema); + table.getSd().setSerdeInfo(HiveMetaStoreUtils.getSerDeInfo(spec.getTable())); + } catch (SchemaRegistryException | IOException e) { +log.error(String.format("Error when fetch latest for topic %s", topicName), e); Review comment: Minor typo: "Error when fetching latest schema for topic...". This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299936) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=299094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299094 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 22/Aug/19 00:21 Start Date: 22/Aug/19 00:21 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#issuecomment-523697466 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2719?src=pr&el=h1) Report > Merging [#2719](https://codecov.io/gh/apache/incubator-gobblin/pull/2719?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/bd4a09ab77a4447490b25e31bc98cafbda9f2847?src=pr&el=desc) will **increase** coverage by `0.93%`. > The diff coverage is `0%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2719?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2719 +/- ## + Coverage 44.13% 45.07% +0.93% - Complexity 8576 8744 +168 Files 1880 1880 Lines 7017070184 +14 Branches 7700 7702 +2 + Hits 3097331635 +662 + Misses3631235625 -687 - Partials 2885 2924 +39 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2719?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...che/gobblin/hive/metastore/HiveMetaStoreUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlVXRpbHMuamF2YQ==) | `31.83% <ø> (ø)` | `12 <0> (ø)` | :arrow_down: | | [...lin/hive/metastore/HiveMetaStoreBasedRegister.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlQmFzZWRSZWdpc3Rlci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `35.51% <0%> (-3.74%)` | `12% <0%> (-1%)` | | | [...ache/gobblin/couchbase/writer/CouchbaseWriter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY291Y2hiYXNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvdWNoYmFzZS93cml0ZXIvQ291Y2hiYXNlV3JpdGVyLmphdmE=) | `66.27% <0%> (-2.33%)` | `11% <0%> (ø)` | | | [.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=) | `79.43% <0%> (-0.94%)` | `24% <0%> (ø)` | | | [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `65.72% <0%> (-0.47%)` | `29% <0%> (ø)` | | | [...src/main/java/org/apache/gobblin/runtime/Task.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvVGFzay5qYXZh) | `67.05% <0%> (+0.23%)` | `82% <0%> (+1%)` | :arrow_up: | | [.../org/apache/gobblin/runtime/SafeDatasetCommit.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvU2FmZURhdGFzZXRDb21taXQuamF2YQ==) | `46.53% <0%> (+0.49%)` | `28% <0%> (ø)` | :arrow_down: | | [...rg/apache/gobblin/runtime/AbstractJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvQWJzdHJhY3RKb2JMYXVuY2hlci5qYXZh) | `58.33% <0%> (+0.83%)` | `31% <0%> (+2%)` | :arrow_up: | | [...ain/java/org/apache/gobblin/runtime/fork/Fork.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2719/diff?src=pr&el=tree#diff-Z
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=299080&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299080 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 21/Aug/19 23:50 Start Date: 21/Aug/19 23:50 Worklog Time Spent: 10m Work Description: ZihanLi58 commented on issue #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719#issuecomment-523691691 @sv2000 Can you take a look at this change? Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299080) Time Spent: 20m (was: 10m) > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?focusedWorklogId=299076&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299076 ] ASF GitHub Bot logged work on GOBBLIN-863: -- Author: ASF GitHub Bot Created on: 21/Aug/19 23:47 Start Date: 21/Aug/19 23:47 Worklog Time Spent: 10m Work Description: ZihanLi58 commented on pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration URL: https://github.com/apache/incubator-gobblin/pull/2719 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [ ] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-863 ### Description - [ ] Here are some details about my PR, including screenshots (if applicable): When updating the hive table, lock the table and fetch the latest schema from schema registry and use the latest schema to update table to make sure the table always has the latest schema ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Test streaming job to see hive registration work as expected. And fetching latest latest schema may cost 0.05~0.15 second. But this should not happen frequently, only happen when there if new partition or schema change. ### Commits - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299076) Remaining Estimate: 0h Time Spent: 10m > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)