officialasishkumar opened a new pull request, #4375:
URL: https://github.com/apache/texera/pull/4375
### What changes were proposed in this PR?
`ParallelCSVScanSourceOpDesc.getPhysicalOp` called `customDelimiter.get`
without
first checking whether the `Option` is defined. When `customDelimiter` is
`None`
(the field's default), this throws a `NoSuchElementException` before the
fallback
comma delimiter can be applied.
**Before:**
```scala
if (customDelimiter.get.isEmpty) { // throws NoSuchElementException when
None
customDelimiter = Option(",")
}
```
**After:**
```scala
if (customDelimiter.isEmpty || customDelimiter.get.isEmpty) {
customDelimiter = Option(",")
}
```
This brings the parallel variant in line with `CSVScanSourceOpDesc`, which
has
always used the correct two-part guard.
### Any related issues, documentation, discussions?
Closes #4374
### How was this PR tested?
Two new test cases were added to `CSVScanSourceOpDescSpec`:
1. `"use comma as the default delimiter when customDelimiter is not set for
parallel CSV"` — verifies that `getPhysicalOp` does not throw when
`customDelimiter` is `None` and that the default `,` is applied.
2. `"use comma as the default delimiter when customDelimiter is empty string
for parallel CSV"` — same verification for `Some("")`.
The existing parallel-CSV schema-inference tests continue to pass unchanged.
### Was this PR authored or co-authored using generative AI tooling?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]