asolimando commented on code in PR #3666:
URL: https://github.com/apache/calcite/pull/3666#discussion_r1501844006


##########
core/src/main/java/org/apache/calcite/util/ImmutableIntList.java:
##########
@@ -298,16 +324,22 @@ public ImmutableIntList appendAll(Iterable<Integer> list) 
{
     return ImmutableIntList.copyOf(Iterables.concat(this, list));
   }
 
+  /** Applies an operator to each element. */
+  public ImmutableIntList map(IntUnaryOperator operator) {
+    final int[] integers = new int[ints.length];
+    for (int i = 0; i < ints.length; i++) {
+      integers[i] = operator.applyAsInt(ints[i]);
+    }
+    return new ImmutableIntList(integers);
+  }
+
   /**
    * Increments {@code offset} to each element of the list and
    * returns a new int list.
    */
+  @Deprecated // to be removed before 2.0

Review Comment:
   I would revert this change and remove the `map` method, as explained in the 
other comment



##########
core/src/main/java/org/apache/calcite/util/ImmutableIntList.java:
##########
@@ -298,16 +324,22 @@ public ImmutableIntList appendAll(Iterable<Integer> list) 
{
     return ImmutableIntList.copyOf(Iterables.concat(this, list));
   }
 
+  /** Applies an operator to each element. */

Review Comment:
   There is no use for this method so far, you are just replacing `.incr()` 
without any evidence that this method will ever be used, IMO we can postpone 
creating a generic method to the moment where it will be needed, otherwise we 
are just carrying around the weight of the old deprecated method and creating 
new code without any immediate benefit.



##########
.github/workflows/main.yml:
##########
@@ -69,7 +69,7 @@ jobs:
       with:
         job-id: jdk${{ matrix.jdk }}
         remote-build-cache-proxy-enabled: false
-        arguments: --scan --no-parallel --no-daemon build javadoc
+        arguments: --scan --no-parallel --no-daemon build javadoc 
--exclude-task :arrow:build

Review Comment:
   You have mentioned that Arrow has problems with Windows, do you have any 
source to link so that people can easily check if this changes for future 
versions?
   
   It can be even a comment here if appropriate.



##########
core/src/main/java/org/apache/calcite/util/ImmutableIntList.java:
##########
@@ -331,6 +363,10 @@ private static class EmptyImmutableIntList extends 
ImmutableIntList {
     @Override public ListIterator<Integer> listIterator() {
       return Collections.<Integer>emptyList().listIterator();
     }
+

Review Comment:
   I'd revert this change too



##########
core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java:
##########


Review Comment:
   I would revert changes to this class as well



##########
arrow/src/main/java/org/apache/calcite/adapter/arrow/ArrowEnumerator.java:
##########
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.adapter.arrow;
+
+import org.apache.calcite.linq4j.Enumerator;
+import org.apache.calcite.util.ImmutableIntList;
+import org.apache.calcite.util.Util;
+
+import org.apache.arrow.gandiva.evaluator.Filter;
+import org.apache.arrow.gandiva.evaluator.Projector;
+import org.apache.arrow.gandiva.evaluator.SelectionVector;
+import org.apache.arrow.gandiva.evaluator.SelectionVectorInt16;
+import org.apache.arrow.gandiva.exceptions.GandivaException;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+import org.apache.arrow.vector.ValueVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.VectorUnloader;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.message.ArrowRecordBatch;
+
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Enumerator that reads from a collection of Arrow value-vectors.
+ */
+class ArrowEnumerator implements Enumerator<Object> {
+  private final BufferAllocator allocator;
+  private final ArrowFileReader arrowFileReader;
+  private final @Nullable Projector projector;
+  private final @Nullable Filter filter;
+  private int rowIndex = -1;
+  private final List<ValueVector> valueVectors;
+  private final List<Integer> fields;
+  private int rowSize;
+  private @Nullable ArrowBuf buf;
+  private @Nullable SelectionVector selectionVector;
+  private int selectionVectorIndex = 0;
+
+  ArrowEnumerator(@Nullable Projector projector, @Nullable Filter filter,
+      ImmutableIntList fields, ArrowFileReader arrowFileReader) {
+    this.allocator = new RootAllocator(Long.MAX_VALUE);
+    this.projector = projector;

Review Comment:
   Nit: for what concerns fields coming from the constructor parameter, I'd 
initialize them in the same order, unless there is a good reason not to do so



##########
arrow/src/main/java/org/apache/calcite/adapter/arrow/ArrowEnumerator.java:
##########
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.adapter.arrow;
+
+import org.apache.calcite.linq4j.Enumerator;
+import org.apache.calcite.util.ImmutableIntList;
+import org.apache.calcite.util.Util;
+
+import org.apache.arrow.gandiva.evaluator.Filter;
+import org.apache.arrow.gandiva.evaluator.Projector;
+import org.apache.arrow.gandiva.evaluator.SelectionVector;
+import org.apache.arrow.gandiva.evaluator.SelectionVectorInt16;
+import org.apache.arrow.gandiva.exceptions.GandivaException;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+import org.apache.arrow.vector.ValueVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.VectorUnloader;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.message.ArrowRecordBatch;
+
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Enumerator that reads from a collection of Arrow value-vectors.
+ */
+class ArrowEnumerator implements Enumerator<Object> {
+  private final BufferAllocator allocator;
+  private final ArrowFileReader arrowFileReader;
+  private final @Nullable Projector projector;
+  private final @Nullable Filter filter;
+  private int rowIndex = -1;
+  private final List<ValueVector> valueVectors;
+  private final List<Integer> fields;
+  private int rowSize;
+  private @Nullable ArrowBuf buf;
+  private @Nullable SelectionVector selectionVector;
+  private int selectionVectorIndex = 0;
+
+  ArrowEnumerator(@Nullable Projector projector, @Nullable Filter filter,
+      ImmutableIntList fields, ArrowFileReader arrowFileReader) {
+    this.allocator = new RootAllocator(Long.MAX_VALUE);
+    this.projector = projector;
+    this.filter = filter;
+    this.arrowFileReader = arrowFileReader;
+    this.fields = fields;
+    this.valueVectors = new ArrayList<>(fields.size());
+
+    // Set up fields so that first call to moveNext() will trigger a call to
+    // loadNextBatch().
+    if (projector != null) {
+      rowIndex = rowSize = 0;
+    } else {
+      selectionVector = null;
+      selectionVectorIndex = 0;
+    }
+  }
+
+  @Override public Object current() {
+    if (fields.size() == 1) {
+      return this.valueVectors.get(0).getObject(rowIndex);
+    }
+    Object[] current = new Object[valueVectors.size()];
+    for (int i = 0; i < valueVectors.size(); i++) {
+      ValueVector vector = this.valueVectors.get(i);
+      current[i] = vector.getObject(rowIndex);
+    }
+    return current;
+  }
+
+  @Override public boolean moveNext() {
+    if (projector != null) {
+      if (rowIndex >= rowSize - 1) {
+        final boolean hasNextBatch;
+        try {
+          hasNextBatch = arrowFileReader.loadNextBatch();
+        } catch (IOException e) {
+          throw Util.toUnchecked(e);
+        }
+        if (hasNextBatch) {
+          rowIndex = 0;
+          this.valueVectors.clear();
+          loadNextArrowBatch();
+          return true;
+        } else {
+          return false;
+        }

Review Comment:
   What about this?
   
   ```suggestion
           if (hasNextBatch) {
             rowIndex = 0;
             this.valueVectors.clear();
             loadNextArrowBatch();
           } 
           return hasNextBatch;
   ```



##########
arrow/src/main/java/org/apache/calcite/adapter/arrow/ArrowEnumerator.java:
##########
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.adapter.arrow;
+
+import org.apache.calcite.linq4j.Enumerator;
+import org.apache.calcite.util.ImmutableIntList;
+import org.apache.calcite.util.Util;
+
+import org.apache.arrow.gandiva.evaluator.Filter;
+import org.apache.arrow.gandiva.evaluator.Projector;
+import org.apache.arrow.gandiva.evaluator.SelectionVector;
+import org.apache.arrow.gandiva.evaluator.SelectionVectorInt16;
+import org.apache.arrow.gandiva.exceptions.GandivaException;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+import org.apache.arrow.vector.ValueVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.VectorUnloader;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.message.ArrowRecordBatch;
+
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Enumerator that reads from a collection of Arrow value-vectors.
+ */
+class ArrowEnumerator implements Enumerator<Object> {
+  private final BufferAllocator allocator;
+  private final ArrowFileReader arrowFileReader;
+  private final @Nullable Projector projector;
+  private final @Nullable Filter filter;
+  private int rowIndex = -1;
+  private final List<ValueVector> valueVectors;
+  private final List<Integer> fields;
+  private int rowSize;
+  private @Nullable ArrowBuf buf;
+  private @Nullable SelectionVector selectionVector;
+  private int selectionVectorIndex = 0;
+
+  ArrowEnumerator(@Nullable Projector projector, @Nullable Filter filter,
+      ImmutableIntList fields, ArrowFileReader arrowFileReader) {
+    this.allocator = new RootAllocator(Long.MAX_VALUE);
+    this.projector = projector;
+    this.filter = filter;
+    this.arrowFileReader = arrowFileReader;
+    this.fields = fields;
+    this.valueVectors = new ArrayList<>(fields.size());
+
+    // Set up fields so that first call to moveNext() will trigger a call to

Review Comment:
   The comment is not clear to me, can you make it more explicit?



##########
arrow/src/main/java/org/apache/calcite/adapter/arrow/ArrowEnumerator.java:
##########
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.adapter.arrow;
+
+import org.apache.calcite.linq4j.Enumerator;
+import org.apache.calcite.util.ImmutableIntList;
+import org.apache.calcite.util.Util;
+
+import org.apache.arrow.gandiva.evaluator.Filter;
+import org.apache.arrow.gandiva.evaluator.Projector;
+import org.apache.arrow.gandiva.evaluator.SelectionVector;
+import org.apache.arrow.gandiva.evaluator.SelectionVectorInt16;
+import org.apache.arrow.gandiva.exceptions.GandivaException;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+import org.apache.arrow.vector.ValueVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.VectorUnloader;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.message.ArrowRecordBatch;
+
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Enumerator that reads from a collection of Arrow value-vectors.
+ */
+class ArrowEnumerator implements Enumerator<Object> {
+  private final BufferAllocator allocator;
+  private final ArrowFileReader arrowFileReader;
+  private final @Nullable Projector projector;
+  private final @Nullable Filter filter;
+  private int rowIndex = -1;
+  private final List<ValueVector> valueVectors;
+  private final List<Integer> fields;
+  private int rowSize;
+  private @Nullable ArrowBuf buf;
+  private @Nullable SelectionVector selectionVector;
+  private int selectionVectorIndex = 0;
+
+  ArrowEnumerator(@Nullable Projector projector, @Nullable Filter filter,
+      ImmutableIntList fields, ArrowFileReader arrowFileReader) {
+    this.allocator = new RootAllocator(Long.MAX_VALUE);
+    this.projector = projector;
+    this.filter = filter;
+    this.arrowFileReader = arrowFileReader;
+    this.fields = fields;
+    this.valueVectors = new ArrayList<>(fields.size());
+
+    // Set up fields so that first call to moveNext() will trigger a call to
+    // loadNextBatch().
+    if (projector != null) {
+      rowIndex = rowSize = 0;

Review Comment:
   Nit:
   ```suggestion
         rowIndex = 0;
         rowSize = 0;
   ```
   
   IMO it's not worth sparing a line but making it less readable, in Java you 
rarely this kind of assignments.



##########
arrow/src/main/java/org/apache/calcite/adapter/arrow/ArrowEnumerator.java:
##########
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.adapter.arrow;
+
+import org.apache.calcite.linq4j.Enumerator;
+import org.apache.calcite.util.ImmutableIntList;
+import org.apache.calcite.util.Util;
+
+import org.apache.arrow.gandiva.evaluator.Filter;
+import org.apache.arrow.gandiva.evaluator.Projector;
+import org.apache.arrow.gandiva.evaluator.SelectionVector;
+import org.apache.arrow.gandiva.evaluator.SelectionVectorInt16;
+import org.apache.arrow.gandiva.exceptions.GandivaException;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+import org.apache.arrow.vector.ValueVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.VectorUnloader;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.message.ArrowRecordBatch;
+
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Enumerator that reads from a collection of Arrow value-vectors.
+ */
+class ArrowEnumerator implements Enumerator<Object> {
+  private final BufferAllocator allocator;
+  private final ArrowFileReader arrowFileReader;
+  private final @Nullable Projector projector;
+  private final @Nullable Filter filter;
+  private int rowIndex = -1;
+  private final List<ValueVector> valueVectors;
+  private final List<Integer> fields;
+  private int rowSize;
+  private @Nullable ArrowBuf buf;
+  private @Nullable SelectionVector selectionVector;
+  private int selectionVectorIndex = 0;
+
+  ArrowEnumerator(@Nullable Projector projector, @Nullable Filter filter,
+      ImmutableIntList fields, ArrowFileReader arrowFileReader) {
+    this.allocator = new RootAllocator(Long.MAX_VALUE);
+    this.projector = projector;
+    this.filter = filter;
+    this.arrowFileReader = arrowFileReader;
+    this.fields = fields;
+    this.valueVectors = new ArrayList<>(fields.size());
+
+    // Set up fields so that first call to moveNext() will trigger a call to
+    // loadNextBatch().
+    if (projector != null) {
+      rowIndex = rowSize = 0;
+    } else {
+      selectionVector = null;
+      selectionVectorIndex = 0;
+    }
+  }
+
+  @Override public Object current() {
+    if (fields.size() == 1) {
+      return this.valueVectors.get(0).getObject(rowIndex);
+    }
+    Object[] current = new Object[valueVectors.size()];
+    for (int i = 0; i < valueVectors.size(); i++) {
+      ValueVector vector = this.valueVectors.get(i);
+      current[i] = vector.getObject(rowIndex);
+    }
+    return current;
+  }
+
+  @Override public boolean moveNext() {
+    if (projector != null) {
+      if (rowIndex >= rowSize - 1) {
+        final boolean hasNextBatch;
+        try {
+          hasNextBatch = arrowFileReader.loadNextBatch();
+        } catch (IOException e) {
+          throw Util.toUnchecked(e);
+        }
+        if (hasNextBatch) {
+          rowIndex = 0;
+          this.valueVectors.clear();
+          loadNextArrowBatch();
+          return true;
+        } else {
+          return false;
+        }
+      } else {
+        rowIndex++;
+        return true;
+      }
+    } else {
+      if (selectionVector == null
+          || selectionVectorIndex >= selectionVector.getRecordCount()) {
+        boolean hasNextBatch;
+        while (true) {
+          try {
+            hasNextBatch = arrowFileReader.loadNextBatch();
+          } catch (IOException e) {
+            throw Util.toUnchecked(e);
+          }
+          if (hasNextBatch) {
+            selectionVectorIndex = 0;
+            this.valueVectors.clear();
+            loadNextArrowBatch();
+            assert selectionVector != null;
+            if (selectionVectorIndex >= selectionVector.getRecordCount()) {
+              // the "filtered" batch is empty, but there may be more batches 
to fetch
+              continue;
+            }
+            rowIndex = selectionVector.getIndex(selectionVectorIndex++);
+            return true;
+          } else {
+            return false;

Review Comment:
   same as in the comment above



##########
core/src/main/java/org/apache/calcite/util/ImmutableIntList.java:
##########
@@ -192,7 +208,11 @@ public int[] toIntArray() {
     return ints.clone();
   }
 
-  /** Returns an List of {@code Integer}. */
+  /** Returns a List of {@code Integer}.

Review Comment:
   Since I suggested to revert changes stemming from this deprecation, I'd 
change this back too



##########
.github/workflows/main.yml:
##########
@@ -215,6 +215,7 @@ jobs:
           S3_BUILD_CACHE_ACCESS_KEY_ID: ${{ 
secrets.S3_BUILD_CACHE_ACCESS_KEY_ID }}
           S3_BUILD_CACHE_SECRET_KEY: ${{ secrets.S3_BUILD_CACHE_SECRET_KEY }}
           GRADLE_ENTERPRISE_ACCESS_KEY: ${{ secrets.GE_ACCESS_TOKEN }}
+          _JAVA_OPTIONS: ${{ env._JAVA_OPTIONS }} 
--add-opens=java.base/java.nio=ALL-UNNAMED

Review Comment:
   Can you add a comment to help future maintainers understand why this is 
needed?



##########
site/_docs/arrow_adapter.md:
##########
@@ -0,0 +1,89 @@
+---
+layout: docs
+title: Arrow adapter
+permalink: /docs/arrow_adapter.html
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+**Note**: Arrow Adapter is an experimental feature;
+changes in public API and usage are expected.
+
+## Overview
+
+Calcite's adapter for Apache Arrow is able to read and process data in Arrow
+format.
+
+It can read files in Arrow's
+[Feather format](https://wesmckinney.com/blog/feather-and-apache-arrow/)
+(which generally have a `.arrow` suffix) in the same way that the
+[File Adapter](file_adapter.html) can read `.csv` files.
+
+The File Adapter immediately translates data to
+[Enumerable format]({{ site.apiRoot 
}}/org/apache/calcite/adapter/enumerable/EnumerableConvention.html)
+for further processing, but Arrow is an extremely efficient format for
+processing in-memory data, so the Arrow Adapter has implementations of other
+relational operators for further processing.

Review Comment:
   I am not sure I understand what you mean with the sentence starting at `, 
but Arrow` line 40, can you expand a bit on it? 
   
   I will probably understand once I am done looking at the whole PR, but this 
doc should be high-level and shouldn't require looking in-depth at the code to 
have an idea of what the adapter does.



##########
arrow/src/main/java/org/apache/calcite/adapter/arrow/ArrowEnumerable.java:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.adapter.arrow;
+
+import org.apache.calcite.linq4j.AbstractEnumerable;
+import org.apache.calcite.linq4j.Enumerator;
+import org.apache.calcite.util.ImmutableIntList;
+import org.apache.calcite.util.Util;
+
+import org.apache.arrow.gandiva.evaluator.Filter;
+import org.apache.arrow.gandiva.evaluator.Projector;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+/**
+ * Enumerable that reads from Arrow value-vectors.
+ */
+class ArrowEnumerable extends AbstractEnumerable<Object> {
+  private final ArrowFileReader arrowFileReader;
+  private final @Nullable Projector projector;
+  private final @Nullable Filter filter;
+  private final ImmutableIntList fields;
+
+  ArrowEnumerable(ArrowFileReader arrowFileReader,
+      @Nullable Projector projector, @Nullable Filter filter,

Review Comment:
   Nit: can we move `projector` in the first line and leave in the second just 
`filter` and `fields`?
   
   It seems a bit more in line with what is currently used in the codebase, 
even if it's not a check-style violation



##########
arrow/src/main/java/org/apache/calcite/adapter/arrow/ArrowEnumerator.java:
##########
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.adapter.arrow;
+
+import org.apache.calcite.linq4j.Enumerator;
+import org.apache.calcite.util.ImmutableIntList;
+import org.apache.calcite.util.Util;
+
+import org.apache.arrow.gandiva.evaluator.Filter;
+import org.apache.arrow.gandiva.evaluator.Projector;
+import org.apache.arrow.gandiva.evaluator.SelectionVector;
+import org.apache.arrow.gandiva.evaluator.SelectionVectorInt16;
+import org.apache.arrow.gandiva.exceptions.GandivaException;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+import org.apache.arrow.vector.ValueVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.VectorUnloader;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.message.ArrowRecordBatch;
+
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Enumerator that reads from a collection of Arrow value-vectors.
+ */
+class ArrowEnumerator implements Enumerator<Object> {
+  private final BufferAllocator allocator;
+  private final ArrowFileReader arrowFileReader;
+  private final @Nullable Projector projector;
+  private final @Nullable Filter filter;
+  private int rowIndex = -1;
+  private final List<ValueVector> valueVectors;
+  private final List<Integer> fields;
+  private int rowSize;
+  private @Nullable ArrowBuf buf;
+  private @Nullable SelectionVector selectionVector;
+  private int selectionVectorIndex = 0;
+
+  ArrowEnumerator(@Nullable Projector projector, @Nullable Filter filter,
+      ImmutableIntList fields, ArrowFileReader arrowFileReader) {
+    this.allocator = new RootAllocator(Long.MAX_VALUE);
+    this.projector = projector;
+    this.filter = filter;
+    this.arrowFileReader = arrowFileReader;
+    this.fields = fields;
+    this.valueVectors = new ArrayList<>(fields.size());
+
+    // Set up fields so that first call to moveNext() will trigger a call to
+    // loadNextBatch().
+    if (projector != null) {
+      rowIndex = rowSize = 0;
+    } else {

Review Comment:
   The code implies that either we have a `projector`, or we must take care of 
the selector.
   
   From what I read below, the selector is needed when you have a filter, but 
from the rest of the code you seem to be able to have both, so the `else` part 
here is not needed?
   
   If, on the contrary, there is such a constraint (you can either have a 
filter or a project, but not both), we need to check for this in the 
constructor and in the rest of the code (e.g., `loadNextArrowBatch`, for which 
both can be non-null at the same time)



##########
arrow/src/main/java/org/apache/calcite/adapter/arrow/ArrowEnumerator.java:
##########


Review Comment:
   @macroguo-ghy looking at 
[`ArrowTable`](https://github.com/apache/calcite/blob/5bb4d6c63b08fc27d02e6a417a010a84c6d0106d/arrow/src/main/java/org/apache/calcite/adapter/arrow/ArrowTable.java#L97)
 it seems that my understanding that a query can only have projection or filter 
but not both is confirmed (of course that's not from a logical standpoint as it 
wouldn't make sense, it's necessarily an implementation choice, as the tests 
clearly describe).
   
   Based on this consideration, I am proposing to replace the `ArrowEnumerator` 
class with an abstract class and two sub-classes, one for filter, one for 
projection, you can find the associated code here if you want to take a look: 
https://github.com/asolimando/calcite/tree/arrow-fix



##########
arrow/src/test/java/org/apache/calcite/adapter/arrow/ArrowAdapterTest.java:
##########
@@ -0,0 +1,503 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.adapter.arrow;
+
+import org.apache.calcite.jdbc.JavaTypeFactoryImpl;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.rel.type.RelDataTypeSystem;
+import org.apache.calcite.schema.Table;
+import org.apache.calcite.test.CalciteAssert;
+import org.apache.calcite.util.Sources;
+
+import org.apache.arrow.gandiva.evaluator.Projector;
+import org.apache.arrow.gandiva.exceptions.GandivaException;
+import org.apache.arrow.gandiva.expression.ExpressionTree;
+import org.apache.arrow.vector.types.pojo.Schema;
+
+import com.google.common.collect.ImmutableMap;
+
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URL;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+import static org.hamcrest.CoreMatchers.is;
+import static org.hamcrest.MatcherAssert.assertThat;
+import static org.junit.jupiter.api.Assumptions.assumeTrue;
+
+/**
+ * Tests for the Apache Arrow adapter.
+ */
+class ArrowAdapterTest {
+  private static Map<String, String> arrow;
+  private static File arrowDataDirectory;
+  private static boolean hasGandivaSupport = detectGandivaSupport();
+
+  ArrowAdapterTest() {
+    assumeTrue(hasGandivaSupport, "gandiva not supported on this platform, 
skipping tests");
+  }
+
+  /**
+   * Gandiva (used to implement arrow filtering / projection) does not 
currently distribute
+   * a binary that is compatible with M1 macs on maven central.
+   * see <a 
href="https://issues.apache.org/jira/browse/ARROW-16608";>ARROW-16608</a>.
+   *
+   * @return true if we believe that gandiva is supported on this platform and 
we can run the tests
+   */
+  private static boolean detectGandivaSupport() {
+    try {
+      Schema emptySchema = new Schema(new ArrayList<>(), null);
+      List<ExpressionTree> expressions = new ArrayList<>();
+      Projector.make(emptySchema, expressions);
+    } catch (GandivaException e) {
+      // this is ok -- we'll always hit this because of an empty expression
+      // the fact that we got here, is indicative that the JNI library was 
loaded properly
+      return true;
+    } catch (UnsatisfiedLinkError e) {
+      return false;
+    }
+    return true;
+  }
+
+  @BeforeAll
+  static void initializeArrowState(@TempDir Path sharedTempDir) throws 
IOException, SQLException {
+    URL modelUrl =
+        
Objects.requireNonNull(ArrowAdapterTest.class.getResource("/arrow-model.json"), 
"url");
+    Path sourceModelFilePath = Sources.of(modelUrl).file().toPath();
+    Path modelFileTarget = sharedTempDir.resolve("arrow-model.json");
+    Files.copy(sourceModelFilePath, modelFileTarget);
+
+    Path arrowFilesDirectory = sharedTempDir.resolve("arrow");
+    Files.createDirectory(arrowFilesDirectory);
+    arrowDataDirectory = arrowFilesDirectory.toFile();
+
+    File dataLocationFile = 
arrowFilesDirectory.resolve("arrowdata.arrow").toFile();
+    ArrowData arrowDataGenerator = new ArrowData();
+    arrowDataGenerator.writeArrowData(dataLocationFile);
+    arrowDataGenerator.writeScottEmpData(arrowFilesDirectory);
+
+    arrow = ImmutableMap.of("model", 
modelFileTarget.toAbsolutePath().toString());
+  }
+
+  /** Test to read an Arrow file and check its field names. */
+  @Test void testArrowSchema() {
+    ArrowSchema arrowSchema = new ArrowSchema(arrowDataDirectory);
+    Map<String, Table> tableMap = arrowSchema.getTableMap();
+    RelDataTypeFactory typeFactory =
+        new JavaTypeFactoryImpl(RelDataTypeSystem.DEFAULT);
+    RelDataType relDataType = 
tableMap.get("ARROWDATA").getRowType(typeFactory);
+
+    assertThat(relDataType.getFieldNames().get(0), is("intField"));
+    assertThat(relDataType.getFieldNames().get(1), is("stringField"));
+    assertThat(relDataType.getFieldNames().get(2), is("floatField"));
+  }
+
+  @Test void testArrowProjectAllFields() {
+    String sql = "select * from arrowdata\n";
+    String plan = "PLAN=ArrowToEnumerableConverter\n"
+        + "  ArrowTableScan(table=[[ARROW, ARROWDATA]], fields=[[0, 1, 2, 
3]])\n\n";
+    String result = "intField=0; stringField=0; floatField=0.0; longField=0\n"
+        + "intField=1; stringField=1; floatField=1.0; longField=1\n"
+        + "intField=2; stringField=2; floatField=2.0; longField=2\n"
+        + "intField=3; stringField=3; floatField=3.0; longField=3\n"
+        + "intField=4; stringField=4; floatField=4.0; longField=4\n"
+        + "intField=5; stringField=5; floatField=5.0; longField=5\n";
+    CalciteAssert.that()
+        .with(arrow)
+        .query(sql)
+        .limit(6)
+        .returns(result)
+        .explainContains(plan);
+  }
+
+  @Test void testArrowProjectSingleField() {
+    String sql = "select \"intField\" from arrowdata\n";
+    String result = "intField=0\nintField=1\nintField=2\n"
+        + "intField=3\nintField=4\nintField=5\n";
+    String plan = "PLAN=ArrowToEnumerableConverter\n"
+        + "  ArrowProject(intField=[$0])\n"
+        + "    ArrowTableScan(table=[[ARROW, ARROWDATA]], fields=[[0, 1, 2, 
3]])\n\n";
+    CalciteAssert.that()
+        .with(arrow)
+        .query(sql)
+        .limit(6)
+        .returns(result)
+        .explainContains(plan);
+  }
+
+  @Test void testArrowProjectTwoFields() {
+    String sql = "select \"intField\", \"stringField\" from arrowdata\n";
+    String result = "intField=0; stringField=0\n"
+        + "intField=1; stringField=1\n"
+        + "intField=2; stringField=2\n"
+        + "intField=3; stringField=3\n"
+        + "intField=4; stringField=4\n"
+        + "intField=5; stringField=5\n";
+    String plan = "PLAN=ArrowToEnumerableConverter\n"
+        + "  ArrowProject(intField=[$0], stringField=[$1])\n"
+        + "    ArrowTableScan(table=[[ARROW, ARROWDATA]], fields=[[0, 1, 2, 
3]])\n\n";
+    CalciteAssert.that()
+        .with(arrow)
+        .query(sql)
+        .limit(6)
+        .returns(result)
+        .explainContains(plan);
+  }
+
+  @Test void testArrowProjectFieldsWithIntegerFilter() {
+    String sql = "select \"intField\", \"stringField\"\n"
+        + "from arrowdata\n"
+        + "where \"intField\" < 4";
+    String result = "intField=0; stringField=0\n"
+        + "intField=1; stringField=1\n"
+        + "intField=2; stringField=2\n"
+        + "intField=3; stringField=3\n";
+    String plan = "PLAN=ArrowToEnumerableConverter\n"
+        + "  ArrowProject(intField=[$0], stringField=[$1])\n"
+        + "    ArrowFilter(condition=[<($0, 4)])\n"
+        + "      ArrowTableScan(table=[[ARROW, ARROWDATA]], fields=[[0, 1, 2, 
3]])\n\n";
+    CalciteAssert.that()
+        .with(arrow)
+        .query(sql)
+        .limit(4)
+        .returns(result)
+        .explainContains(plan);
+  }
+
+  @Test void testArrowProjectFieldsWithMultipleFilters() {
+    String sql = "select \"intField\", \"stringField\"\n"
+        + "from arrowdata\n"
+        + "where \"intField\"=12 and \"stringField\"='12'";
+    String plan = "PLAN=ArrowToEnumerableConverter\n"
+        + "  ArrowProject(intField=[$0], stringField=[$1])\n"
+        + "    ArrowFilter(condition=[AND(=($0, 12), =($1, '12'))])\n"
+        + "      ArrowTableScan(table=[[ARROW, ARROWDATA]], fields=[[0, 1, 2, 
3]])\n\n";
+    String result = "intField=12; stringField=12\n";
+    CalciteAssert.that()
+        .with(arrow)
+        .query(sql)
+        .limit(3)
+        .returns(result)
+        .explainContains(plan);
+  }
+
+  @Test void testArrowProjectFieldsWithFloatFilter() {
+    String sql = "select * from arrowdata\n"
+        + " where \"floatField\"=15.0";
+    String plan = "PLAN=ArrowToEnumerableConverter\n"
+        + "  ArrowFilter(condition=[=(CAST($2):DOUBLE, 15.0)])\n"
+        + "    ArrowTableScan(table=[[ARROW, ARROWDATA]], fields=[[0, 1, 2, 
3]])\n\n";
+    String result = "intField=15; stringField=15; floatField=15.0; 
longField=15\n";
+    CalciteAssert.that()
+        .with(arrow)
+        .query(sql)
+        .returns(result)
+        .explainContains(plan);
+  }
+
+  @Test void testArrowProjectFieldsWithFilterOnLaterBatch() {
+    String sql = "select \"intField\"\n"
+        + "from arrowdata\n"
+        + "where \"intField\"=25";
+    String plan = "PLAN=ArrowToEnumerableConverter\n"
+        + "  ArrowProject(intField=[$0])\n"
+        + "    ArrowFilter(condition=[=($0, 25)])\n"
+        + "      ArrowTableScan(table=[[ARROW, ARROWDATA]], fields=[[0, 1, 2, 
3]])\n\n";
+    String result = "intField=25\n";
+    CalciteAssert.that()
+        .with(arrow)
+        .query(sql)
+        .returns(result)
+        .explainContains(plan);
+  }
+
+  // TODO: test a table whose field names contain spaces,

Review Comment:
   From a cursory look, it seems that we should add at least all the tests 
associated with these TODOs, and possibly others.



##########
arrow/src/main/java/org/apache/calcite/adapter/arrow/ArrowEnumerator.java:
##########
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.adapter.arrow;
+
+import org.apache.calcite.linq4j.Enumerator;
+import org.apache.calcite.util.ImmutableIntList;
+import org.apache.calcite.util.Util;
+
+import org.apache.arrow.gandiva.evaluator.Filter;
+import org.apache.arrow.gandiva.evaluator.Projector;
+import org.apache.arrow.gandiva.evaluator.SelectionVector;
+import org.apache.arrow.gandiva.evaluator.SelectionVectorInt16;
+import org.apache.arrow.gandiva.exceptions.GandivaException;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+import org.apache.arrow.vector.ValueVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.VectorUnloader;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.message.ArrowRecordBatch;
+
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Enumerator that reads from a collection of Arrow value-vectors.
+ */
+class ArrowEnumerator implements Enumerator<Object> {
+  private final BufferAllocator allocator;
+  private final ArrowFileReader arrowFileReader;
+  private final @Nullable Projector projector;
+  private final @Nullable Filter filter;
+  private int rowIndex = -1;
+  private final List<ValueVector> valueVectors;
+  private final List<Integer> fields;
+  private int rowSize;
+  private @Nullable ArrowBuf buf;
+  private @Nullable SelectionVector selectionVector;
+  private int selectionVectorIndex = 0;

Review Comment:
   The field is initialized both here and in the constructor, in both cases 
with the default value, probably we need neither of the two?



##########
arrow/src/main/java/org/apache/calcite/adapter/arrow/ArrowEnumerable.java:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.adapter.arrow;
+
+import org.apache.calcite.linq4j.AbstractEnumerable;
+import org.apache.calcite.linq4j.Enumerator;
+import org.apache.calcite.util.ImmutableIntList;
+import org.apache.calcite.util.Util;
+
+import org.apache.arrow.gandiva.evaluator.Filter;
+import org.apache.arrow.gandiva.evaluator.Projector;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+/**
+ * Enumerable that reads from Arrow value-vectors.
+ */
+class ArrowEnumerable extends AbstractEnumerable<Object> {
+  private final ArrowFileReader arrowFileReader;
+  private final @Nullable Projector projector;
+  private final @Nullable Filter filter;
+  private final ImmutableIntList fields;
+
+  ArrowEnumerable(ArrowFileReader arrowFileReader,
+      @Nullable Projector projector, @Nullable Filter filter,

Review Comment:
   Looking at `ArrowEnumerator`, what's exactly what I meant.
   
   And btw, why not having the same order of parameters, since they have the 
same params in the constructor?
   
   Order-wise, I prefer `ArrowFileReader arrowFileReader, @Nullable Projector 
projector, @Nullable Filter filter, ImmutableIntList fields` as in 
`ArrowEnumerable`, but formatted as in `ArrowEnumerator`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@calcite.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to