[ 
https://issues.apache.org/jira/browse/GOBBLIN-2087?focusedWorklogId=923382&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-923382
 ]

ASF GitHub Bot logged work on GOBBLIN-2087:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Jun/24 20:33
            Start Date: 13/Jun/24 20:33
    Worklog Time Spent: 10m 
      Work Description: AndyJiang99 commented on code in PR #3972:
URL: https://github.com/apache/gobblin/pull/3972#discussion_r1638837143


##########
gobblin-iceberg/src/main/java/org/apache/gobblin/iceberg/predicates/DatasetHiveSchemaContainsNonOptionalUnion.java:
##########
@@ -42,16 +42,19 @@
 public class DatasetHiveSchemaContainsNonOptionalUnion<T extends Dataset> 
implements CheckedExceptionPredicate<T, IOException> {
   private final HiveRegister hiveRegister;
   private final Pattern pattern;
+  private final String optionalDbName;
 
   public static final String PREFIX = 
DatasetHiveSchemaContainsNonOptionalUnion.class.getName();
   /**
    * 1st match group is assumed to be the DB and the 2nd match group the Table 
for the pattern
    */
   public static final String PATTERN = PREFIX + ".db.table.pattern";
+  public static final String OPTIONAL_DB_NAME = PREFIX + ".db.optionalDbName";
 
   public DatasetHiveSchemaContainsNonOptionalUnion(Properties properties) {
     this.hiveRegister = getHiveRegister(new State(properties));
     this.pattern = Pattern.compile(properties.getProperty(PATTERN));
+    this.optionalDbName = properties.getProperty(OPTIONAL_DB_NAME, null);

Review Comment:
   Instead of fetching and handling the default with null, can we use the java 
class Optional instead? Similar to the Optional<HiveTable>  below in the 
getTable. It'll be better practice and would lead to fewer NPE as Optional 
class would force the consideration of the case when the value is not present





Issue Time Tracking
-------------------

    Worklog Id:     (was: 923382)
    Time Spent: 0.5h  (was: 20m)

> Enhance DatasetHiveSchemaContainsNonOptionalUnion to Support Optional 
> Database Name
> -----------------------------------------------------------------------------------
>
>                 Key: GOBBLIN-2087
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-2087
>             Project: Apache Gobblin
>          Issue Type: Improvement
>            Reporter: pawan teja
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> **Summary:**
> The current implementation of the `DatasetHiveSchemaContainsNonOptionalUnion` 
> class requires the database name to be extracted from the dataset URN using a 
> regex pattern. This approach limits flexibility and can lead to errors if the 
> URN format changes. To enhance the flexibility and usability of this class, 
> we need to add support for an optional database name.
> **Current Issue:**
> - The database name must be extracted from the dataset URN using a regex 
> pattern.
> - This dependency on the URN format limits flexibility and can lead to errors 
> if the format changes.
> - Users cannot specify a database name directly, which could be more 
> intuitive and flexible.
> **Proposed Solution:**
> - Introduce a new property `OPTIONAL_DB_NAME` in the 
> `DatasetHiveSchemaContainsNonOptionalUnion` class.
> - Update the constructor and methods to check for the optional database name 
> and use it if provided.
> - Add logging to indicate when the optional database name is used and when it 
> replaces the pattern-extracted database name.
> - Ensure backward compatibility by retaining the existing behavior when the 
> optional database name is not provided.
> **Acceptance Criteria:**
> - The `DatasetHiveSchemaContainsNonOptionalUnion` class should support an 
> optional database name.
> - If the optional database name is provided, it should replace the database 
> name extracted from the URN pattern.
> - The class should maintain its current functionality when the optional 
> database name is not provided.
> - Appropriate logging should be added to indicate the use of the optional 
> database name.
> - Tests should be added to verify the new functionality, including cases 
> where the optional database name is and is not provided.
> These enhancements will improve the flexibility and usability of the 
> `DatasetHiveSchemaContainsNonOptionalUnion` class, allowing for more dynamic 
> database configurations and reducing dependency on the dataset URN format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to