[
https://issues.apache.org/jira/browse/GOBBLIN-2087?focusedWorklogId=926603&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-926603
]
ASF GitHub Bot logged work on GOBBLIN-2087:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 19/Jul/24 00:32
Start Date: 19/Jul/24 00:32
Worklog Time Spent: 10m
Work Description: pawanbtej commented on code in PR #3972:
URL: https://github.com/apache/gobblin/pull/3972#discussion_r1683630550
##########
gobblin-iceberg/src/main/java/org/apache/gobblin/iceberg/predicates/DatasetHiveSchemaContainsNonOptionalUnion.java:
##########
@@ -77,7 +89,13 @@ private DbAndTable getDbAndTable(T dataset) {
"Expected pattern = %s", dataset.getUrn(), pattern.pattern()));
}
- return new DbAndTable(m.group(1),
HiveMetaStoreUtils.getHiveTableName(m.group(2)));
Review Comment:
If the pattern match doesn't work we won't anyway have the tableName so it
is fine for the 2nd check not to run and exit out saying pattern is not
matching.
As the 2nd check is only for the db, whereas the pattern check is for both
the db and table name and if no table name, no point having just the
optionalDbName present.
Issue Time Tracking
-------------------
Worklog Id: (was: 926603)
Time Spent: 1h 10m (was: 1h)
> Enhance DatasetHiveSchemaContainsNonOptionalUnion to Support Optional
> Database Name
> -----------------------------------------------------------------------------------
>
> Key: GOBBLIN-2087
> URL: https://issues.apache.org/jira/browse/GOBBLIN-2087
> Project: Apache Gobblin
> Issue Type: Improvement
> Reporter: pawan teja
> Priority: Major
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> **Summary:**
> The current implementation of the `DatasetHiveSchemaContainsNonOptionalUnion`
> class requires the database name to be extracted from the dataset URN using a
> regex pattern. This approach limits flexibility and can lead to errors if the
> URN format changes. To enhance the flexibility and usability of this class,
> we need to add support for an optional database name.
> **Current Issue:**
> - The database name must be extracted from the dataset URN using a regex
> pattern.
> - This dependency on the URN format limits flexibility and can lead to errors
> if the format changes.
> - Users cannot specify a database name directly, which could be more
> intuitive and flexible.
> **Proposed Solution:**
> - Introduce a new property `OPTIONAL_DB_NAME` in the
> `DatasetHiveSchemaContainsNonOptionalUnion` class.
> - Update the constructor and methods to check for the optional database name
> and use it if provided.
> - Add logging to indicate when the optional database name is used and when it
> replaces the pattern-extracted database name.
> - Ensure backward compatibility by retaining the existing behavior when the
> optional database name is not provided.
> **Acceptance Criteria:**
> - The `DatasetHiveSchemaContainsNonOptionalUnion` class should support an
> optional database name.
> - If the optional database name is provided, it should replace the database
> name extracted from the URN pattern.
> - The class should maintain its current functionality when the optional
> database name is not provided.
> - Appropriate logging should be added to indicate the use of the optional
> database name.
> - Tests should be added to verify the new functionality, including cases
> where the optional database name is and is not provided.
> These enhancements will improve the flexibility and usability of the
> `DatasetHiveSchemaContainsNonOptionalUnion` class, allowing for more dynamic
> database configurations and reducing dependency on the dataset URN format.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)