aglinxinyuan opened a new issue, #5652:
URL: https://github.com/apache/texera/issues/5652

   ## Background
   
   Four modules in `common/workflow-operator` form a `FilterOpExec` inheritance 
hierarchy that lacks dedicated unit-spec coverage. The base `FilterOpExec` is 
abstract; the concrete subclasses parse a JSON descriptor at construction and 
call `setFilterFunc` with their per-class predicate.
   
   | Source class | Package | Purpose |
   | --- | --- | --- |
   | `FilterOpExec` | `operator.filter` | Abstract base — pluggable 
`filterFunc: Tuple => Boolean`; `processTuple` yields the tuple iff 
`filterFunc(tuple)` is true |
   | `RegexOpExec` | `operator.regex` | Compiles a `Pattern` from the 
descriptor; emits tuples whose attribute matches the pattern 
(`find`-semantics); honors `caseInsensitive` |
   | `SubstringSearchOpExec` | `operator.substringSearch` | Emits tuples whose 
attribute contains the descriptor's substring; honors `isCaseSensitive` |
   | `RandomKSamplingOpExec` | `operator.randomksampling` | Emits each tuple 
with probability `desc.percentage / 100.0`; seed = `workerCount` (deterministic 
for the same worker count) |
   
   `SpecializedFilterOpExec` already has its own spec; this PR covers the rest 
of the family.
   
   ## Behavior to pin
   
   | Surface | Contract |
   | --- | --- |
   | `FilterOpExec.processTuple` (matching predicate) | yields the single tuple 
|
   | `FilterOpExec.processTuple` (non-matching predicate) | yields an empty 
`Iterator` |
   | `FilterOpExec.setFilterFunc` | swapping the predicate changes the next 
`processTuple` result |
   | `RegexOpExec` (pattern matches) | yields the tuple via 
`Pattern.matcher.find` |
   | `RegexOpExec` (pattern does not match) | yields nothing |
   | `RegexOpExec` with `caseInsensitive = true` | matches case-insensitively |
   | `RegexOpExec` with `caseInsensitive = false` | matches case-sensitively |
   | `RegexOpExec` constructor with invalid descriptor JSON | propagates a 
Jackson parse exception |
   | `SubstringSearchOpExec` with `isCaseSensitive = true` | matches 
case-sensitively |
   | `SubstringSearchOpExec` with `isCaseSensitive = false` | matches by 
lowercased equality |
   | `SubstringSearchOpExec` (empty substring) | matches every tuple (because 
`"" `is in any string) |
   | `RandomKSamplingOpExec` with `percentage = 100` | accepts every tuple |
   | `RandomKSamplingOpExec` with `percentage = 0` | rejects every tuple |
   | `RandomKSamplingOpExec` (intermediate percentage, deterministic seed) | 
produces deterministic emission count over a large sample |
   
   ## Scope
   
   - New spec files (one per source class):
     - `FilterOpExecSpec.scala`
     - `RegexOpExecSpec.scala`
     - `SubstringSearchOpExecSpec.scala`
     - `RandomKSamplingOpExecSpec.scala`
   - No production-code changes.
   - `FilterOpExec` is exercised via a test-only concrete subclass (it is 
abstract).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to