[
https://issues.apache.org/jira/browse/CALCITE-7557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18091114#comment-18091114
]
Ruiqi Dong commented on CALCITE-7557:
-------------------------------------
Good question. I think the right behavior here is to match the existing
EnumerableDefaults/LINQ behavior: take(count <= 0) returns an empty sequence,
and skip(count <= 0) returns the original sequence.
The main reason is consistency with linq4j's model and with Calcite's existing
generic implementation:
1. linq4j's Enumerable is documented as analogous to LINQ's IEnumerable. LINQ
defines this behavior explicitly: Take with count <= 0 returns an empty
IEnumerable without enumerating the source, and Skip with count <= 0 yields all
source elements.
2. Calcite already has this behavior in the generic EnumerableDefaults path.
For negative counts, EnumerableDefaults.take returns empty, and
EnumerableDefaults.skip skips nothing and returns the original sequence. There
are also existing tests for negative Take in EnumerableDefaults and
QueryableDefaults.
3. ListEnumerable is selected by Linq4j.asEnumerable as a more efficient
implementation for List inputs. I read that as an optimization path, so it
should be semantically equivalent to the generic EnumerableDefaults behavior.
For completeness, Java Stream goes the other way: Stream.limit/skip reject
negative values with IllegalArgumentException. But linq4j is modeled after
LINQ, and Calcite's generic Enumerable implementation already follows the
LINQ-like behavior. So for this ticket, I think the narrowest and most
consistent fix is to make ListEnumerable match the existing generic Enumerable
behavior.
If the project prefers a fail-fast contract instead, that should probably be a
broader behavior change across EnumerableDefaults/QueryableDefaults as well,
rather than only the ListEnumerable optimized path.
If this direction sounds right to you, I can submit a PR that makes
ListEnumerable match the existing EnumerableDefaults behavior and adds
regression tests for negative take/skip counts.
> Linq4j.ListEnumerable.take(int) / skip(int) diverge from EnumerableDefaults
> on negative counts
> ----------------------------------------------------------------------------------------------
>
> Key: CALCITE-7557
> URL: https://issues.apache.org/jira/browse/CALCITE-7557
> Project: Calcite
> Issue Type: Bug
> Components: linq4j
> Affects Versions: 1.41.0
> Reporter: Ruiqi Dong
> Priority: Major
>
> *Summary*
> Linq4j.asEnumerable(list) exposes list-specialized take(int) and skip(int)
> implementations that are not behaviorally equivalent to the generic
> EnumerableDefaults versions for negative counts. For the same input
> enumerable and the same negative count:
> * EnumerableDefaults.take(source, -1) returns an empty enumerable, but
> ListEnumerable.take(-1) throws
> * EnumerableDefaults.skip(source, -1) returns the original sequence, but
> ListEnumerable.skip(-1) throws
> This is a clean semantic divergence between the generic and optimized paths
> of the same API.
>
> *Affected code*
> File: linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java
> {code:java}
> public static <TSource> Enumerable<TSource> skip(Enumerable<TSource> source,
> final int count) {
> return skipWhile(source, (v1, v2) -> {
> return v2 < count;
> });
> }
> public static <TSource> Enumerable<TSource> take(Enumerable<TSource> source,
> final int count) {
> return takeWhile(
> source, (v1, v2) -> {
> return v2 < count;
> });
> } {code}
> File: linq4j/src/main/java/org/apache/calcite/linq4j/Linq4j.java
> {code:java}
> @Override public Enumerable<T> skip(int count) {
> final List<T> list = toList();
> if (count >= list.size()) {
> return Linq4j.emptyEnumerable();
> }
> return new ListEnumerable<>(list.subList(count, list.size()));
> }
> @Override public Enumerable<T> take(int count) {
> final List<T> list = toList();
> if (count >= list.size()) {
> return this;
> }
> return new ListEnumerable<>(list.subList(0, count));
> }{code}
>
> *Reproducer*
> Add the following test to
> linq4j/src/test/java/org/apache/calcite/linq4j/test/Linq4jTest.java
> {code:java}
> @Test void testTakeListEnumerableNegativeSize() {
> final List<Integer> values = Arrays.asList(1, 2, 3);
> assertThat(EnumerableDefaults.take(Linq4j.asEnumerable(values),
> -1).toList(),
> is(empty()));
> assertThat(Linq4j.asEnumerable(values).take(-1).toList(), is(empty()));
> }
> @Test void testSkipListEnumerableNegativeSize() {
> final List<Integer> values = Arrays.asList(1, 2, 3);
> assertThat(EnumerableDefaults.skip(Linq4j.asEnumerable(values),
> -1).toList(),
> is(values));
> assertThat(Linq4j.asEnumerable(values).skip(-1).toList(), is(values));
> }{code}
> Run:
> {code:java}
> ./gradlew :linq4j:test \
> --tests
> org.apache.calcite.linq4j.test.Linq4jTest.testTakeListEnumerableNegativeSize \
> --tests
> org.apache.calcite.linq4j.test.Linq4jTest.testSkipListEnumerableNegativeSize
> {code}
> Observed behavior:
> The optimized list path throws in both methods instead of matching the
> generic enumerable semantics
> {code:java}
> java.lang.IllegalArgumentException: fromIndex(0) > toIndex(-1)
> at
> java.base/java.util.AbstractList.subListRangeCheck(AbstractList.java:509)
> at java.base/java.util.AbstractList.subList(AbstractList.java:497)
> at org.apache.calcite.linq4j.Linq4j$ListEnumerable.take(Linq4j.java:599)
> java.lang.IndexOutOfBoundsException: fromIndex = -1
> at
> java.base/java.util.AbstractList.subListRangeCheck(AbstractList.java:505)
> at java.base/java.util.AbstractList.subList(AbstractList.java:497)
> at org.apache.calcite.linq4j.Linq4j$ListEnumerable.skip(Linq4j.java:591)
> {code}
> Expected behavior:
> Negative counts should behave the same regardless of whether the source
> enumerable happens to be backed by a List.
> * Negative `take` should return an empty enumerable, matching
> EnumerableDefaults.take
> * Negative `skip` should return the original sequence, matching
> EnumerableDefaults.skip
>
> This is an optimization-path semantic regression. The same logical operation
> has two implementations in Calcite, but the list-specialized path is not
> behaviorally equivalent to the generic enumerable contract for negative
> counts.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)