LaughingVzr opened a new pull request, #5252:
URL: https://github.com/apache/hive/pull/5252
### What changes were proposed in this pull request?
modify LazyStruct#findIndexes function and LazyStruct#parseMultiDelimit
function, change fields.length Conditional judgment:
```java
public void parseMultiDelimit(byte[] rawRow, byte[] fieldDelimit) {
- if (fields.length > 1 && delimitIndexes[i - 1] != -1) {
+ if (delimitIndexes[i - 1] != -1) {
}
private int[] findIndexes(byte[] array, byte[] target) {
- if (fields.length <= 1) {
+ if (fields.length < 1) {
...
- for (int i = 1; i < indexes.length; i++) {
+ for (int i = 1; i <= indexes.length; i++) {
...
}
return indexes;
}
```
I add an test for this fix:
```java
@Test
public void testParseMultiDelimit() throws Throwable {
try {
// single column named id
List<String> columns = new ArrayList<>();
columns.add("id");
// column type is string
List<TypeInfo> columnTypes = new ArrayList<>();
PrimitiveTypeInfo primitiveTypeInfo = new PrimitiveTypeInfo();
primitiveTypeInfo.setTypeName("string");
columnTypes.add(primitiveTypeInfo);
// separators + escapeChar => "|"
byte[] separators = new byte[]{124, 2, 3, 4, 5, 6, 7, 8};
// sequence =>"\N"
Text sequence = new Text();
sequence.set(new byte[]{92, 78});
// create a lazy struct inspector
ObjectInspector objectInspector =
LazyFactory.createLazyStructInspector(columns, columnTypes, separators,
sequence, false, false, (byte) '0');
LazyStruct lazyStruct = (LazyStruct)
LazyFactory.createLazyObject(objectInspector);
// origin row data
String rowData = "1|@|";
// row field delimiter
String fieldDelimiter = "|@|";
// parse row use multi delimit
lazyStruct.parseMultiDelimit(rowData.getBytes(StandardCharsets.UTF_8),
fieldDelimiter.getBytes(StandardCharsets.UTF_8));
// check the first field and second field start position index
// before fix result: 0,1
// after fix result: 0,2
Assert.assertArrayEquals(new int[]{0, 2},
lazyStruct.startPosition);
} catch (Throwable e) {
e.printStackTrace();
throw e;
}
}
```
### Why are the changes needed?
If a table only have one column field with multidelimit,query this column
data is error data.
When I use this data to do other operation(e.g cast use UDFToLong
function),get result is NULL.
### Does this PR introduce _any_ user-facing change?
No
### Is the change a dependency upgrade?
No
### How was this patch tested?
test class:
serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyStruct.java
test function:
org.apache.hadoop.hive.serde2.lazy.TestLazyStruct#testParseMultiDelimit
test command: mvn test -Dtest=TestLazyStruct --pl serde
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]