[GitHub] [incubator-gobblin] ZihanLi58 commented on a change in pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration

2019-08-28 Thread GitBox
ZihanLi58 commented on a change in pull request #2719: [GOBBLIN-863]Handle race 
condition issue for hive registration
URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318821215
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/locks/DistributedHiveLockFactory.java
 ##
 @@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.runtime.locks;
+
+import java.io.IOException;
+import java.util.Properties;
+import org.apache.gobblin.hive.HiveLockFactory;
+import org.apache.gobblin.hive.HiveLockImpl;
+
+/**
+ * A lock factory that extends {@link HiveLockFactory} provide a get method 
for a distributed lock for a specific object
+ */
+public class DistributedHiveLockFactory extends HiveLockFactory {
+  public DistributedHiveLockFactory(Properties properties) {
+super(properties);
+  }
+  public HiveLockImpl get(String name) {
+return new HiveLockImpl(new 
ZookeeperBasedJobLock(properties, name)) {
+  @Override
+  public void lock() throws IOException {
+try {
+  this.lock.lock();
+} catch (Exception e) {
+  throw new IOException(e);
+}
+  }
+
+  @Override
+  public void unlock() throws IOException {
+try {
+  this.lock.unlock();
+} catch (Exception e) {
+  throw new IOException(e);
 
 Review comment:
   Sure. I'll address that


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] ZihanLi58 commented on a change in pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration

2019-08-28 Thread GitBox
ZihanLi58 commented on a change in pull request #2719: [GOBBLIN-863]Handle race 
condition issue for hive registration
URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318727218
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/locks/DistributedHiveLockFactory.java
 ##
 @@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.runtime.locks;
+
+import java.io.IOException;
+import java.util.Properties;
+import org.apache.gobblin.hive.HiveLockFactory;
+import org.apache.gobblin.hive.HiveLockImpl;
+
+/**
+ * A lock factory that extends {@link HiveLockFactory} provide a get method 
for a distributed lock for a specific object
+ */
+public class DistributedHiveLockFactory extends HiveLockFactory {
+  public DistributedHiveLockFactory(Properties properties) {
+super(properties);
+  }
+  public HiveLockImpl get(String name) {
+return new HiveLockImpl(new 
ZookeeperBasedJobLock(properties, name)) {
+  @Override
+  public void lock() throws IOException {
+try {
+  this.lock.lock();
+} catch (Exception e) {
+  throw new IOException(e);
+}
+  }
+
+  @Override
+  public void unlock() throws IOException {
+try {
+  this.lock.unlock();
+} catch (Exception e) {
+  throw new IOException(e);
 
 Review comment:
   Because it's a JobLockException, I want to catch it and throw as a 
IOException with can be caught later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] ZihanLi58 commented on a change in pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration

2019-08-27 Thread GitBox
ZihanLi58 commented on a change in pull request #2719: [GOBBLIN-863]Handle race 
condition issue for hive registration
URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318285399
 
 

 ##
 File path: 
gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/HiveRegister.java
 ##
 @@ -94,29 +94,32 @@ protected HiveRegister(State state) {
 
   @Override
   public Void call() throws Exception {
+try {
 
 Review comment:
   Because in streaming job, the TaskExecutor will not close, so it's hard to 
catch the exception when the executor finished. So I just want to print the 
error during the job, which I think will be more helpful for debug.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] ZihanLi58 commented on a change in pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration

2019-08-27 Thread GitBox
ZihanLi58 commented on a change in pull request #2719: [GOBBLIN-863]Handle race 
condition issue for hive registration
URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318256274
 
 

 ##
 File path: 
gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/HiveLockImpl.java
 ##
 @@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.hive;
+
+import java.io.IOException;
+
+
+/**
+ * A wrapper lock to be used by hive.
+ * @param  The class of the real lock
+ */
+public abstract class HiveLockImpl {
+  public T lock;
 
 Review comment:
   Will address


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] ZihanLi58 commented on a change in pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration

2019-08-27 Thread GitBox
ZihanLi58 commented on a change in pull request #2719: [GOBBLIN-863]Handle race 
condition issue for hive registration
URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318255786
 
 

 ##
 File path: 
gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreBasedRegister.java
 ##
 @@ -197,6 +211,18 @@ private boolean 
ensureHiveTableExistenceBeforeAlternation(String tableName, Stri
 try (Timer.Context context = 
this.metricContext.timer(GET_HIVE_TABLE).time()) {
   existingTable = 
HiveMetaStoreUtils.getHiveTable(client.getTable(dbName, tableName));
 }
+//TODO: Determine whether we still use inline hive registration,
+// if so, instead of fetching schema from schema registry, we need to 
enable schema version
+if (this.schemaRegistry.isPresent()) {
 
 Review comment:
   This is the previous way we plan to solve the race condition. But I realize 
this is not the final schema which writer use. So I will implement schema 
version instead. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] ZihanLi58 commented on a change in pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration

2019-08-27 Thread GitBox
ZihanLi58 commented on a change in pull request #2719: [GOBBLIN-863]Handle race 
condition issue for hive registration
URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318255786
 
 

 ##
 File path: 
gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreBasedRegister.java
 ##
 @@ -197,6 +211,18 @@ private boolean 
ensureHiveTableExistenceBeforeAlternation(String tableName, Stri
 try (Timer.Context context = 
this.metricContext.timer(GET_HIVE_TABLE).time()) {
   existingTable = 
HiveMetaStoreUtils.getHiveTable(client.getTable(dbName, tableName));
 }
+//TODO: Determine whether we still use inline hive registration,
+// if so, instead of fetching schema from schema registry, we need to 
enable schema version
+if (this.schemaRegistry.isPresent()) {
 
 Review comment:
   This is the previous thinking. But I realize this is not the final schema. 
So I will implement schema version instead. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] ZihanLi58 commented on a change in pull request #2719: [GOBBLIN-863]Handle race condition issue for hive registration

2019-08-27 Thread GitBox
ZihanLi58 commented on a change in pull request #2719: [GOBBLIN-863]Handle race 
condition issue for hive registration
URL: https://github.com/apache/incubator-gobblin/pull/2719#discussion_r318255221
 
 

 ##
 File path: gobblin-docs/user-guide/Configuration-Properties-Glossary.md
 ##
 @@ -98,7 +98,7 @@ These properties are common to both the Job Launcher and the 
Command Line.
 | `job.group` | A way to group logically similar jobs together. | No | None |
 | `job.description` | A description of what the jobs does. | No | None |
 | `job.lock.enabled` | If set to true job locks are enabled, if set to false 
they are disabled | No | True |
-| `job.lock.type` | The fully qualified name of the JobLock class to run. The 
JobLock is responsible for ensuring that only a single instance of a job runs 
at a time.  Allowed values: 
[gobblin.runtime.locks.FileBasedJobLock](#FileBasedJobLock-Properties), 
[gobblin.runtime.locks.ZookeeperBasedJobLock](#ZookeeperBasedJobLock-Properties)
 | No | `gobblin.runtime.locks.FileBasedJobLock` |
+| `job.lock.type` | The fully qualified name of the JobLock class to run. The 
JobLock is responsible for ensuring that only a single instance of a job runs 
at a time.  Allowed values: 
[gobblin.runtime.locks.FfiFileBasedJobLock](#FileBasedJobLock-Properties), 
[gobblin.runtime.locks.ZookeeperBasedJobLock](#ZookeeperBasedJobLock-Properties)
 | No | `gobblin.runtime.locks.FileBasedJobLock` |
 
 Review comment:
   Will address


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services