[ 
https://issues.apache.org/jira/browse/GOBBLIN-1739?focusedWorklogId=824780&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-824780
 ]

ASF GitHub Bot logged work on GOBBLIN-1739:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Nov/22 23:05
            Start Date: 09/Nov/22 23:05
    Worklog Time Spent: 10m 
      Work Description: phet commented on code in PR #3596:
URL: https://github.com/apache/gobblin/pull/3596#discussion_r1018477670


##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/IcebergDatasetDescriptor.java:
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.service.modules.dataset;
+
+import java.io.IOException;
+import java.util.List;
+
+import com.google.common.base.Splitter;
+import com.typesafe.config.Config;
+
+import lombok.EqualsAndHashCode;
+
+
+@EqualsAndHashCode(callSuper = true)
+public class IcebergDatasetDescriptor extends SqlDatasetDescriptor {

Review Comment:
   deserves javadoc
   
   (I'm not per se against it from a pragmatic standpoint, but maybe 
contextualize the decision to make this a sql dataset.)  looking again at the 
base class, mostly we adopt here to follow hive, but whereas that is actually a 
SQL DB, iceberg is not inherently.  also I notice that the defaults there, e.g.:
   ```
   this.databaseName = ConfigUtils.getString(config, 
DatasetDescriptorConfigKeys.DATABASE_KEY, ".*");
   ```
   are not currently valid for iceberg.  so question is: do we gain enough from 
the derivation or should we fallback to extending `BaseDatasetDescriptor`?



##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/IcebergDatasetDescriptor.java:
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.service.modules.dataset;
+
+import java.io.IOException;
+import java.util.List;
+
+import com.google.common.base.Splitter;
+import com.typesafe.config.Config;
+
+import lombok.EqualsAndHashCode;
+
+
+@EqualsAndHashCode(callSuper = true)
+public class IcebergDatasetDescriptor extends SqlDatasetDescriptor {
+  public IcebergDatasetDescriptor(Config config)
+      throws IOException {
+    super(config);
+  }
+
+  @Override
+  protected boolean isPlatformValid() {
+    return "iceberg".equalsIgnoreCase(getPlatform());
+  }
+
+  @Override
+  protected boolean isPathContaining(DatasetDescriptor other) {
+    String otherPath = other.getPath();
+    if (otherPath == null) {
+      return false;
+    }
+
+    //Extract the dbName and tableName from otherPath
+    List<String> parts = Splitter.on(SEPARATION_CHAR).splitToList(otherPath);
+    if (parts.size() != 2) {

Review Comment:
   let's avoid *upper* bound on size.  AFAIU iceberg spec allows IDs to be more 
than two levels.



##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/flowgraph/datanodes/iceberg/IcebergDataNode.java:
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.service.modules.flowgraph.datanodes.iceberg;
+
+import com.typesafe.config.Config;
+
+import lombok.EqualsAndHashCode;
+
+import org.apache.gobblin.annotation.Alpha;
+import org.apache.gobblin.data.management.copy.iceberg.IcebergHiveCatalog;
+import org.apache.gobblin.service.modules.dataset.IcebergDatasetDescriptor;
+import 
org.apache.gobblin.service.modules.flowgraph.datanodes.hive.HiveDataNode;
+
+@Alpha
+@EqualsAndHashCode(callSuper = true)
+public class IcebergDataNode extends HiveDataNode {

Review Comment:
   I'm suspicious of the derivation, because it's not clear that iceberg passes 
the "can-substitute-for" test with hive.  in practice violating this may make 
RTTI brittle and order-dependent (i.e. when using `instanceof`).  I suspect you 
did it for the metastore URI, but maybe that should instead motivate breaking 
the commonality into a shared base class for both this and the `HiveDataNode` 
(separately).
   
   that said, that same HMS URI betrays the fact that hive is just one possible 
catalog for iceberg tables.  thus, likely we'd want each datanode type to 
represent the combo of iceberg + the specific catalog type.  as the community 
one day moves from iceberg-riding-atop-HMS, that will capture the difference.  
therefore I suggest renaming to `IcebergOnHiveDataNode`. 



##########
gobblin-service/src/test/java/org/apache/gobblin/service/modules/flowgraph/datanodes/iceberg/IcebergDataNodeTest.java:
##########
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.service.modules.flowgraph.datanodes.iceberg;
+
+import java.net.URI;
+import java.net.URISyntaxException;
+
+import org.testng.Assert;
+import org.testng.annotations.AfterMethod;
+import org.testng.annotations.BeforeMethod;
+import org.testng.annotations.Test;
+
+import com.typesafe.config.Config;
+import com.typesafe.config.ConfigFactory;
+import com.typesafe.config.ConfigValueFactory;
+
+import org.apache.gobblin.service.modules.flowgraph.DataNode;
+import org.apache.gobblin.service.modules.flowgraph.FlowGraphConfigurationKeys;
+
+
+public class IcebergDataNodeTest {
+
+  Config config = null;
+
+  @BeforeMethod
+  public void setUp() {
+    String expectedNodeId = "some-iceberg-node-id";
+    String expectedAdlFsUri = "hdfs://data.hdfs.core.windows.net";
+    String expectedHiveMetastoreUri = "thrift://hcat.company.com:7552";

Review Comment:
   nit: these don't seem to be expected, as much as what you actually use to 
initialize



##########
gobblin-service/src/test/java/org/apache/gobblin/service/modules/dataset/IcebergDatasetDescriptorTest.java:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.service.modules.dataset;
+
+import java.io.IOException;
+
+import org.testng.Assert;
+import org.testng.annotations.Test;
+
+import com.typesafe.config.Config;
+import com.typesafe.config.ConfigFactory;
+import com.typesafe.config.ConfigValueFactory;
+
+import 
org.apache.gobblin.service.modules.flowgraph.DatasetDescriptorConfigKeys;
+
+
+public class IcebergDatasetDescriptorTest {
+  Config baseConfig = 
ConfigFactory.empty().withValue(DatasetDescriptorConfigKeys.PLATFORM_KEY, 
ConfigValueFactory.fromAnyRef("iceberg"))
+      .withValue(DatasetDescriptorConfigKeys.DATABASE_KEY, 
ConfigValueFactory.fromAnyRef("testDb_Db1"))
+      .withValue(DatasetDescriptorConfigKeys.TABLE_KEY, 
ConfigValueFactory.fromAnyRef("testTable_Table1"));
+
+  @Test
+  public void testIsPathContaining() throws IOException {
+    Config config1 = 
ConfigFactory.empty().withValue(DatasetDescriptorConfigKeys.PLATFORM_KEY, 
ConfigValueFactory.fromAnyRef("iceberg"))
+        .withValue(DatasetDescriptorConfigKeys.DATABASE_KEY, 
ConfigValueFactory.fromAnyRef("testDb_Db1"))
+        .withValue(DatasetDescriptorConfigKeys.TABLE_KEY, 
ConfigValueFactory.fromAnyRef("testTable_Table1"));
+
+    Config config2 = 
ConfigFactory.empty().withValue(DatasetDescriptorConfigKeys.PLATFORM_KEY, 
ConfigValueFactory.fromAnyRef("iceberg"))
+        .withValue(DatasetDescriptorConfigKeys.DATABASE_KEY, 
ConfigValueFactory.fromAnyRef("testDb_Db2"))
+        .withValue(DatasetDescriptorConfigKeys.TABLE_KEY, 
ConfigValueFactory.fromAnyRef("testTable_Table2"));

Review Comment:
   might be more meaningful to test same DB, but different table name



##########
gobblin-service/src/test/java/org/apache/gobblin/service/modules/flowgraph/datanodes/iceberg/IcebergDataNodeTest.java:
##########
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.service.modules.flowgraph.datanodes.iceberg;
+
+import java.net.URI;
+import java.net.URISyntaxException;
+
+import org.testng.Assert;
+import org.testng.annotations.AfterMethod;
+import org.testng.annotations.BeforeMethod;
+import org.testng.annotations.Test;
+
+import com.typesafe.config.Config;
+import com.typesafe.config.ConfigFactory;
+import com.typesafe.config.ConfigValueFactory;
+
+import org.apache.gobblin.service.modules.flowgraph.DataNode;
+import org.apache.gobblin.service.modules.flowgraph.FlowGraphConfigurationKeys;
+
+
+public class IcebergDataNodeTest {
+
+  Config config = null;
+
+  @BeforeMethod
+  public void setUp() {
+    String expectedNodeId = "some-iceberg-node-id";
+    String expectedAdlFsUri = "hdfs://data.hdfs.core.windows.net";
+    String expectedHiveMetastoreUri = "thrift://hcat.company.com:7552";
+
+    config = ConfigFactory.empty()
+        .withValue(FlowGraphConfigurationKeys.DATA_NODE_ID_KEY, 
ConfigValueFactory.fromAnyRef(expectedNodeId))
+        .withValue(FlowGraphConfigurationKeys.DATA_NODE_PREFIX + "fs.uri", 
ConfigValueFactory.fromAnyRef(expectedAdlFsUri))
+        .withValue(FlowGraphConfigurationKeys.DATA_NODE_PREFIX + 
"hive.metastore.uri", ConfigValueFactory.fromAnyRef(expectedHiveMetastoreUri));
+  }
+
+  @AfterMethod
+  public void tearDown() {
+  }
+
+  @Test
+  public void testIcebergDataNodeWithValidMetastoreUri() throws 
DataNode.DataNodeCreationException, URISyntaxException {
+    IcebergDataNode icebergDataNode = new IcebergDataNode(config);
+    URI uri = new 
URI(config.getString(FlowGraphConfigurationKeys.DATA_NODE_PREFIX + 
"hive.metastore.uri"));
+    Assert.assertTrue(icebergDataNode.isMetastoreUriValid(uri));
+  }
+
+  @Test(expectedExceptions = DataNode.DataNodeCreationException.class)
+  public void testIcebergDataNodeWithInvalidMetastoreUri() throws 
DataNode.DataNodeCreationException, URISyntaxException {
+    String expectedHiveMetastoreUri = "thrift-1://hcat.company.com:7552";

Review Comment:
   maybe `badHiveMetastoreUri` or `bogusHM...`



##########
gobblin-service/src/test/java/org/apache/gobblin/service/modules/dataset/IcebergDatasetDescriptorTest.java:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.service.modules.dataset;
+
+import java.io.IOException;
+
+import org.testng.Assert;
+import org.testng.annotations.Test;
+
+import com.typesafe.config.Config;
+import com.typesafe.config.ConfigFactory;
+import com.typesafe.config.ConfigValueFactory;
+
+import 
org.apache.gobblin.service.modules.flowgraph.DatasetDescriptorConfigKeys;
+
+
+public class IcebergDatasetDescriptorTest {
+  Config baseConfig = 
ConfigFactory.empty().withValue(DatasetDescriptorConfigKeys.PLATFORM_KEY, 
ConfigValueFactory.fromAnyRef("iceberg"))
+      .withValue(DatasetDescriptorConfigKeys.DATABASE_KEY, 
ConfigValueFactory.fromAnyRef("testDb_Db1"))
+      .withValue(DatasetDescriptorConfigKeys.TABLE_KEY, 
ConfigValueFactory.fromAnyRef("testTable_Table1"));

Review Comment:
   how about a helper method taking `(dbName, tableName)`?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 824780)
    Time Spent: 0.5h  (was: 20m)

> Define Datanodes and Dataset Descriptor for Iceberg
> ---------------------------------------------------
>
>                 Key: GOBBLIN-1739
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1739
>             Project: Apache Gobblin
>          Issue Type: Task
>            Reporter: Meeth Gala
>            Priority: Minor
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to