arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF V2 Scan Framework URL: https://github.com/apache/drill/pull/2056#discussion_r410711887
########## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/v3/lifecycle/ScanLifecycle.java ########## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.scan.v3.lifecycle; + +import org.apache.drill.common.exceptions.CustomErrorContext; +import org.apache.drill.exec.memory.BufferAllocator; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.scan.RowBatchReader; +import org.apache.drill.exec.physical.impl.scan.v3.ReaderFactory; +import org.apache.drill.exec.physical.impl.scan.v3.ScanLifecycleBuilder; +import org.apache.drill.exec.physical.impl.scan.v3.schema.ScanSchemaConfigBuilder; +import org.apache.drill.exec.physical.impl.scan.v3.schema.ScanSchemaTracker; +import org.apache.drill.exec.physical.resultSet.ResultSetLoader; +import org.apache.drill.exec.physical.resultSet.impl.ResultVectorCacheImpl; +import org.apache.drill.exec.record.metadata.TupleMetadata; + +/** +/** + * Basic scan framework for a set of "managed" readers and which uses the + * scan schema tracker to evolve the scan output schema. + * Readers are created and managed via a reader + * factory class unique to each type of scan. The reader factory also provides + * the scan-specific schema negotiator to be passed to the reader. + * + * <h4>Lifecycle</h4> + * + * The options provided in the {@link ScanLifecycleBuilder} are + * sufficient to drive the entire scan operator functionality. + * Schema resolution and projection is done generically and is the same for all + * data sources. Only the + * reader (created via the factory class) differs from one type of file to + * another. + * <p> + * The framework achieves the work described below by composing a + * set of detailed classes, each of which performs some specific task. This + * structure leaves the reader to simply infer schema and read data. + * + * <h4>Reader Integration</h4> + * + * The details of how a file is structured, how a schema is inferred, how + * data is decoded: all that is encapsulated in the reader. The only real + * Interaction between the reader and the framework is: + * <ul> + * <li>The reader factory creates a reader and the corresponding schema + * negotiator.</li> + * <li>The reader "negotiates" a schema with the framework. The framework + * knows the projection list from the query plan, knows something about + * data types (whether a column should be scalar, a map or an array), and + * knows about the schema already defined by prior readers. The reader knows + * what schema it can produce (if "early schema.") The schema negotiator + * class handles this task.</li> + * <li>The reader reads data from the file and populates value vectors a + * batch at a time. The framework creates the result set loader to use for + * this work. The schema negotiator returns that loader to the reader, which + * uses it during read.<p> + * A reader may be "late schema", true "schema on read." In this case, the + * reader simply tells the result set loader to create a new column reader + * on the fly. The framework will work out if that new column is to be + * projected and will return either a real column writer (projected column) + * or a dummy column writer (unprojected column.)</li> + * <li>The reader then reads batches of data until all data is read. The + * result set loader signals when a batch is full; the reader should not + * worry about this detail itself.</li> + * <li>The reader then releases its resources.</li> + * </ul> + * <p> + * See {@link ScanSchemaTracker} for details about how the scan schema + * evolves over the scan lifecycle. + * + * <h4>Livecycle</h4> Review comment: ```suggestion * <h4>Life cycle</h4> ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
