[ 
https://issues.apache.org/jira/browse/DRILL-7445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976550#comment-16976550
 ] 

ASF GitHub Bot commented on DRILL-7445:
---------------------------------------

ihuzenko commented on pull request #1899: DRILL-7445: Create batch copier based 
on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347342178
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/ResultSetCopier.java
 ##########
 @@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet;
+
+import org.apache.drill.exec.physical.impl.aggregate.BatchIterator;
+import org.apache.drill.exec.record.VectorContainer;
+
+/**
+ * Copies rows from an input batch to an output batch. The input
+ * batch is assumed to have a selection vector, or the caller
+ * will pick the rows to copy.
+ * <p>
+ * Works to create full output batches to minimize per-batch
+ * overhead and to eliminate unnecessary empty batches if no
+ * rows are copied.
+ * <p>
+ * The output batches are assumed to have the same schema as
+ * input batches. (No projection occurs.) The output schema will
+ * change each time the input schema changes. (For an SV4, then
+ * the upstream operator must have ensured all batches covered
+ * by the SV4 have the same schema.)
+ * <p>
+ * This implementation works with a single stream of batches which,
+ * following Drill's rules, must consist of the same set of vectors on
+ * each non-schema-change batch.
+ *
+ * <h4>Protocol</h4>
+ * Overall lifecycle:
+ * <ol>
+ * <li>Create an instance of the
+ *     {@link org.apache.drill.exec.physical.resultSet.impl.ResultSetCopierImpl
+ *      ResultSetCopierImpl} class, passing the input batch
+ *      accessor to the constructor.</li>
+ * <li>Loop to process each output batch as shown below. That is, continually
+ *     process calls to the {@link BatchIterator#next()} method.</li>
+ * <li>Call {@link #close()}.</li>
+ * </ol>
+ * <p>
+ *
+ * To build each output batch:
+ *
+ * <pre><code>
+ * public IterOutcome next() {
+ *   copier.startBatch();
+ *   while (! copier.isFull() {
+ *     copier.freeInput();
+ *     IterOutcome innerResult = inner.next();
+ *     if (innerResult == DONE) { break; }
+ *     copier.startInput();
+ *     copier.copyAll();
+ *   }
+ *   if (copier.hasRows()) {
+ *     outputContainer = copier.harvest();
+ *     return outputContainer.isSchemaChanged() ? OK_NEW_SCHEMA ? OK;
+ *   } else { return DONE; }
+ * }
+ * </code></pre>
+ * <p>
+ * The above assumes that the upstream operator can be polled
+ * multiple times in the DONE state. The extra polling is
+ * needed to handle any in-flight copies when the input
+ * exhausts its batches.
+ * <p>
+ * The above also shows that the copier handles and reports
+ * schema changes by setting the schema change flag in the
+ * output container. Real code must handle multiple calls to
+ * next() in the DONE state, and work around lack of such support
+ * in its input (perhaps by tracking a state.)
+ * <p>
+ * An input batch is processed by copying the rows. Copying can be done
+ * row-by row, via a row range, or by copying the entire input batch as
+ * shown in the example.
+ * Copying the entire batch make sense when the input batch carries as
+ * selection vector that identifies which rows to copy, in which
+ * order.
+ * <p>
+ * Because we wish to fill the output batch, we may be able to copy
+ * part of a batch, the whole batch, or multiple batches to the output.
+ */
+
+public interface ResultSetCopier {
+
+  /**
+   * Start the next output batch.
+   */
+  void startBatch();
+
+  /**
+   * Start the next input batch. The input batch must be held
+   * by the VectorAccessor passed into the constructor.
+   */
+  void startInput();
+
+  /**
+   * If copying rows one by one, copy the next row from the
+   * input.
+   *
+   * @return true if more rows remain on the input, false
+   * if all rows are exhausted
+   */
+  boolean copyNext();
+
+  /**
+   * Copy a row at the given position. For those cases in
+   * which random copying is needed, but a selection vector
+   * is not available. Note that this version is slow because
+   * of the need to reset indexes for every row. Better to
+   * use a selection vector, then copy sequentially.
+   *
+   * @param posn the input row position. If a selection vector
+   * is attached, then this is the selection vector position
+   */
+  void copyRecord(int posn);
+
+  /**
+   * Copy all (remaining) input rows to the output.
+   * If insufficient space exists in the output, does a partial
+   * copy, and {@link #isCopyPending()} will return true.
+   */
+  void copyAll();
 
 Review comment:
   ```suggestion
     void copyRows();
   ```
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Create batch copier based on result set framework
> -------------------------------------------------
>
>                 Key: DRILL-7445
>                 URL: https://issues.apache.org/jira/browse/DRILL-7445
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.16.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: 1.17.0
>
>
> The result set framework now provides both a reader and writer. Provide a 
> copier that copies batches using this framework. Such a copier can:
> * Copy selected records
> * Copy all records, such as for an SV2 or SV4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to