[
https://issues.apache.org/jira/browse/GOBBLIN-2087?focusedWorklogId=926424&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-926424
]
ASF GitHub Bot logged work on GOBBLIN-2087:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 17/Jul/24 23:19
Start Date: 17/Jul/24 23:19
Worklog Time Spent: 10m
Work Description: pawanbtej commented on code in PR #3972:
URL: https://github.com/apache/gobblin/pull/3972#discussion_r1681878747
##########
gobblin-iceberg/src/main/java/org/apache/gobblin/iceberg/predicates/DatasetHiveSchemaContainsNonOptionalUnion.java:
##########
@@ -42,16 +42,19 @@
public class DatasetHiveSchemaContainsNonOptionalUnion<T extends Dataset>
implements CheckedExceptionPredicate<T, IOException> {
private final HiveRegister hiveRegister;
private final Pattern pattern;
+ private final String optionalDbName;
public static final String PREFIX =
DatasetHiveSchemaContainsNonOptionalUnion.class.getName();
/**
* 1st match group is assumed to be the DB and the 2nd match group the Table
for the pattern
*/
public static final String PATTERN = PREFIX + ".db.table.pattern";
+ public static final String OPTIONAL_DB_NAME = PREFIX + ".db.optionalDbName";
public DatasetHiveSchemaContainsNonOptionalUnion(Properties properties) {
this.hiveRegister = getHiveRegister(new State(properties));
this.pattern = Pattern.compile(properties.getProperty(PATTERN));
+ this.optionalDbName = properties.getProperty(OPTIONAL_DB_NAME, null);
Review Comment:
Updated to Optional
Issue Time Tracking
-------------------
Worklog Id: (was: 926424)
Time Spent: 50m (was: 40m)
> Enhance DatasetHiveSchemaContainsNonOptionalUnion to Support Optional
> Database Name
> -----------------------------------------------------------------------------------
>
> Key: GOBBLIN-2087
> URL: https://issues.apache.org/jira/browse/GOBBLIN-2087
> Project: Apache Gobblin
> Issue Type: Improvement
> Reporter: pawan teja
> Priority: Major
> Time Spent: 50m
> Remaining Estimate: 0h
>
> **Summary:**
> The current implementation of the `DatasetHiveSchemaContainsNonOptionalUnion`
> class requires the database name to be extracted from the dataset URN using a
> regex pattern. This approach limits flexibility and can lead to errors if the
> URN format changes. To enhance the flexibility and usability of this class,
> we need to add support for an optional database name.
> **Current Issue:**
> - The database name must be extracted from the dataset URN using a regex
> pattern.
> - This dependency on the URN format limits flexibility and can lead to errors
> if the format changes.
> - Users cannot specify a database name directly, which could be more
> intuitive and flexible.
> **Proposed Solution:**
> - Introduce a new property `OPTIONAL_DB_NAME` in the
> `DatasetHiveSchemaContainsNonOptionalUnion` class.
> - Update the constructor and methods to check for the optional database name
> and use it if provided.
> - Add logging to indicate when the optional database name is used and when it
> replaces the pattern-extracted database name.
> - Ensure backward compatibility by retaining the existing behavior when the
> optional database name is not provided.
> **Acceptance Criteria:**
> - The `DatasetHiveSchemaContainsNonOptionalUnion` class should support an
> optional database name.
> - If the optional database name is provided, it should replace the database
> name extracted from the URN pattern.
> - The class should maintain its current functionality when the optional
> database name is not provided.
> - Appropriate logging should be added to indicate the use of the optional
> database name.
> - Tests should be added to verify the new functionality, including cases
> where the optional database name is and is not provided.
> These enhancements will improve the flexibility and usability of the
> `DatasetHiveSchemaContainsNonOptionalUnion` class, allowing for more dynamic
> database configurations and reducing dependency on the dataset URN format.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)