DanielCarter-stack commented on PR #10295:
URL: https://github.com/apache/seatunnel/pull/10295#issuecomment-3796500338
<!-- code-pr-reviewer -->
<!-- cpr:pr_reply_v2_parts {"group": "apache/seatunnel#10295", "part": 1,
"total": 1} -->
### Issue 1: The old `getRegionLocator(String)` method still exists and has
inconsistent behavior
**Location**: `HbaseClient.java:393-395`
```java
public RegionLocator getRegionLocator(String tableName) throws IOException {
return this.connection.getRegionLocator(TableName.valueOf(tableName));
}
```
**Related Context**:
- Caller: Has been changed by `HbaseSourceSplitEnumerator` to use the
two-parameter version
- Potential callers: Other code that may call this method (if any)
**Problem Description**:
The old `getRegionLocator(String tableName)` method is still retained, and
it will parse the passed `tableName` as `default:tableName` (HBase's default
behavior). This is inconsistent with the new `getRegionLocator(String
namespace, String tableName)` method.
Although in the current PR, `HbaseSourceSplitEnumerator` has been changed to
use the new two-parameter method, the existence of the old method may lead to:
1. Future maintainers misusing the old method, causing the same bug
2. If other code paths use this method, they will have the same problem
**Potential Risks**:
- Risk 1: Future code may incorrectly use the old method, reintroducing the
namespace bug
- Risk 2: API confusion, developers won't know which method to use
**Scope of Impact**:
- Direct impact: Public API of `HbaseClient`
- Indirect impact: Any code that calls `getRegionLocator(String)` (if any)
- Impact area: HBase Connector
**Severity**: MINOR
**Improvement Suggestion**:
```java
/**
* @deprecated Use {@link #getRegionLocator(String, String)} instead.
* This method does not properly handle custom namespaces.
*/
@Deprecated
public RegionLocator getRegionLocator(String tableName) throws IOException {
return this.connection.getRegionLocator(TableName.valueOf(tableName));
}
```
**Rationale**:
1. Marking as `@Deprecated` can warn developers not to use it
2. Keeping the method is for backward compatibility (if external code uses
it)
3. JavaDoc clearly indicates the alternative method
4. If it's certain there are no external callers, consider removing it in
the next major version
---
### Issue 2: Wrapping `IOException` as `RuntimeException` loses error context
**Location**: `HbaseSourceSplitEnumerator.java:311-313`
```java
} catch (IOException e) {
throw new RuntimeException(e);
}
```
**Related Context**:
- Parent interface: `SourceSplitEnumerator` - does not specify exception
types
- Other exception handling in the same class: uses `HbaseConnectorException`
- Callers: `HbaseSource.createEnumerator()` and `restoreEnumerator()`
**Problem Description**:
At the outermost layer of the `getTableSplits()` method, after catching
`IOException`, it is directly wrapped as `RuntimeException`. This is
inconsistent with the preceding code style (which used
`HbaseConnectorException`), and loses HBase-specific error context.
Although the code before modification was handled this way, since
`HbaseConnectorException` is used in other places, consistency should be
maintained here as well.
**Potential Risks**:
- Risk 1: Inconsistent error handling, increasing maintenance costs
- Risk 2: Upper-layer callers cannot distinguish between HBase errors and
other runtime errors
- Risk 3: Error logs may not be clear enough, making troubleshooting
difficult
**Scope of Impact**:
- Direct impact: `HbaseSourceSplitEnumerator.getTableSplits()`
- Indirect impact: Creation and recovery process of `HbaseSource`
- Impact area: HBase Source Connector
**Severity**: MINOR
**Improvement Suggestion**:
```java
} catch (IOException e) {
String errorMsg = String.format(
"Failed to enumerate splits for HBase table [%s]",
tableName.getNameAsString());
log.error(errorMsg, e);
throw new HbaseConnectorException(
HbaseConnectorErrorCode.TABLE_QUERY_EXCEPTION,
errorMsg,
e);
}
```
**Rationale**:
1. Consistent with preceding exception handling
2. Provides clearer error messages
3. Preserves the original exception as cause
4. Uses the correct error code
5. Facilitates specific error handling by upper-layer code
---
### Issue 3: Missing test coverage for extreme inputs
**Location**: `HbaseParametersTest.java` and
`HbaseSourceSplitEnumeratorTest.java`
**Related Context**:
- Test class: `HbaseParametersTest` - newly added unit tests
- Test class: `HbaseSourceSplitEnumeratorTest` - updated unit tests
- Configuration parsing logic: `HbaseParameters.buildWithSourceConfig()` -
lines 88-133
**Problem Description**:
Current tests cover normal scenarios and basic boundary conditions, but are
missing tests for some extreme inputs:
1. **When namespace is `null`**: Although the `getNamespace()` method
handles `null`, there is no test verification
2. **table parameter contains multiple colons**: For example
`ns:table:extra`, current parsing logic will take the part after the last colon
as table
3. **namespace or table contains spaces**: For example ` ns : table `, may
cause parsing errors
4. **Only colon without table**: For example `test:`, current logic will
take empty string as table
5. **Only table without colon but with spaces before and after**: For
example ` table `, current logic will not handle spaces
**Potential Risks**:
- Risk 1: Extreme user input may cause undefined behavior
- Risk 2: Bugs not covered by tests may appear in production environment
- Risk 3: Future code modifications may break handling of these extreme cases
**Scope of Impact**:
- Direct impact: Configuration parsing logic
- Indirect impact: Stability of the entire HBase Connector
- Impact area: HBase Connector
**Severity**: MINOR
**Improvement Suggestion**:
```java
// Added in HbaseParametersTest.java
@Test
public void testBuildWithSourceConfigWithNullNamespace() {
HbaseParameters parameters = HbaseParameters.builder()
.table("tbl")
.zookeeperQuorum("127.0.0.1:2181")
.build();
Assertions.assertEquals(HbaseParameters.DEFAULT_NAMESPACE,
parameters.getNamespace());
}
@Test
public void testBuildWithSourceConfigWithEmptyStringNamespace() {
Map<String, Object> configMap = new HashMap<>();
configMap.put(HbaseBaseOptions.ZOOKEEPER_QUORUM.key(), "127.0.0.1:2181");
configMap.put(HbaseBaseOptions.TABLE.key(), ":tbl"); // Only colon
ReadonlyConfig readonlyConfig = ReadonlyConfig.fromMap(configMap);
HbaseParameters parameters =
HbaseParameters.buildWithSourceConfig(readonlyConfig);
Assertions.assertEquals("", parameters.getNamespace()); // Empty string
Assertions.assertEquals("tbl", parameters.getTable());
}
@Test
public void testBuildWithSourceConfigWithMultipleColons() {
Map<String, Object> configMap = new HashMap<>();
configMap.put(HbaseBaseOptions.ZOOKEEPER_QUORUM.key(), "127.0.0.1:2181");
configMap.put(HbaseBaseOptions.TABLE.key(), "ns:tbl:extra"); //
Multiple colons
ReadonlyConfig readonlyConfig = ReadonlyConfig.fromMap(configMap);
HbaseParameters parameters =
HbaseParameters.buildWithSourceConfig(readonlyConfig);
// Current logic: take the part before the first colon as namespace, the
part after as table (including the colon)
Assertions.assertEquals("ns", parameters.getNamespace());
Assertions.assertEquals("tbl:extra", parameters.getTable());
}
@Test
public void testBuildWithSourceConfigWithSpaces() {
Map<String, Object> configMap = new HashMap<>();
configMap.put(HbaseBaseOptions.ZOOKEEPER_QUORUM.key(), "127.0.0.1:2181");
configMap.put(HbaseBaseOptions.TABLE.key(), " ns : tbl "); // With
spaces
ReadonlyConfig readonlyConfig = ReadonlyConfig.fromMap(configMap);
HbaseParameters parameters =
HbaseParameters.buildWithSourceConfig(readonlyConfig);
// Current logic: does not handle spaces
Assertions.assertEquals(" ns ", parameters.getNamespace());
Assertions.assertEquals(" tbl ", parameters.getTable());
}
```
**Rationale**:
1. Enhances test coverage
2. Clarifies current code behavior for extreme inputs
3. Provides baseline for future input validation improvements
4. These tests will not block the current PR's merge, but should be
supplemented in subsequent PRs
---
### Issue 4: Creating HbaseClient immediately in constructor may lead to
early failure
**Location**: `HbaseSourceSplitEnumerator.java:63-80`
**Related Context**:
- Call chain: `HbaseSource.createEnumerator()` → `new
HbaseSourceSplitEnumerator(...)`
- Resource creation: `HbaseClient.createInstance(hbaseParameters)` will
establish HBase connection
- Resource management: `Enumerator`'s `close()` method will close the
connection
**Problem Description**:
The modified code immediately creates `HbaseClient` instances in all
constructors, including scenarios where `getTableSplits()` may not be used
immediately. This means:
1. Even if just creating an Enumerator object (without enumerating splits),
an HBase connection will be established
2. If the HBase connection fails, an exception will be thrown at the
Enumerator creation stage rather than on first use
3. In some cases (e.g., recovering from checkpoint but not needing to
re-enumerate splits), this connection may be unnecessary
**Potential Risks**:
- Risk 1: Connection failure timing is advanced, which may affect error
recovery logic
- Risk 2: Unnecessary connections may be created in certain initialization
scenarios
- Risk 3: If connection creation is slow, it will block Enumerator creation
**Scope of Impact**:
- Direct impact: All constructors of `HbaseSourceSplitEnumerator`
- Indirect impact: Creation process of `HbaseSource`
- Impact area: HBase Source Connector
**Severity**: MINOR
**Improvement Suggestion**:
Consider lazy initialization of `HbaseClient`:
```java
private HbaseClient getHbaseClient() {
if (hbaseClient == null) {
this.hbaseClient = HbaseClient.createInstance(hbaseParameters);
}
return hbaseClient;
}
// Used in getTableSplits()
HbaseClient client = getHbaseClient();
try (RegionLocator regionLocator = client.getRegionLocator(...)) {
...
}
```
Or, if connection validation is indeed needed at construction time, this
should be clearly documented and error handling should be ensured correct.
**Rationale**:
1. Lazy initialization can avoid unnecessary connection creation
2. However, the current design (immediate initialization) is also
reasonable, as it can detect configuration errors early
3. This is more of a design trade-off than a clear bug
4. If maintaining current design, it's recommended to explain in JavaDoc
---
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]