[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16686544#comment-16686544
 ] 

ASF GitHub Bot commented on DRILL-6791:
---------------------------------------

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233454002
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/SmoothingProjection.java
 ##########
 @@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother.IncompatibleSchemaException;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Resolve a table schema against the prior schema. This works only if the
+ * types match and if all columns in the table schema already appear in the
+ * prior schema.
+ * <p>
+ * Consider this an experimental mechanism. The hope was that, with clever
+ * techniques, we could "smooth over" some of the issues that cause schema
+ * change events in Drill. As it turned out, however, creating this mechanism
+ * revealed that it is not possible, even in theory, to handle most schema
+ * changes because of the time dimension:
+ * <ul>
+ * <li>An even in a later batch may provide information that would have
+ * caused us to make a different decision in an earlier batch. For example,
+ * we are asked for column `foo`, did not see such a column in the first
+ * batch, block or file, guessed some type, and later saw that the column
+ * was of a different type. We can't "time travel" to tell our earlier
+ * selves, nor, when we make the initial type decision, can we jump to
+ * the future to see what type we'll discover.</li>
+ * <li>Readers in this fragment may see column `foo` but readers in
+ * another fragment read files/blocks that don't have that column. The
+ * two readers cannot communicate to agree on a type.</li>
+ * </ul>
+ * <p>
+ * What this mechanism can do is make decisions based on history: when a
+ * column appears, we can adjust its type a bit to try to avoid an
+ * unnecessary change. For example, if a prior file in this scan saw
+ * `foo` as nullable Varchar, but the present file has the column as
+ * requied Varchar, we can use the more general nullable form. But,
+ * again, the "can't predict the future" bites us: we can handle a
+ * nullable-to-required column change, but not visa-versa.
+ * <p>
+ * What this mechanism will tell the careful reader is that the only
+ * general solution to the schema-change problem is to now the full
+ * schema up front: for the planner to be told the schema and to
+ * communicate that schema to all readers so that all readers agree
+ * on the final schema.
+ * <p>
+ * When that is done, the techniques shown here can be used to adjust
+ * any per-file variation of schema to match the up-front schema.
+ */
+
+public class SmoothingProjection extends SchemaLevelProjection {
+
+  protected final List<MaterializedField> rewrittenFields = new ArrayList<>();
+
+  public SmoothingProjection(ScanLevelProjection scanProj,
+      TupleMetadata tableSchema,
+      ResolvedTuple priorSchema,
+      ResolvedTuple outputTuple,
+      List<SchemaProjectionResolver> resolvers) throws 
IncompatibleSchemaException {
+
+    super(resolvers);
+
+    for (ResolvedColumn priorCol : priorSchema.columns()) {
+      switch (priorCol.nodeType()) {
+      case ResolvedTableColumn.ID:
 
 Review comment:
   indent

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> -------------------------------------------
>
>                 Key: DRILL-6791
>                 URL: https://issues.apache.org/jira/browse/DRILL-6791
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.15.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>             Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to