[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-12-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714208#comment-16714208
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

vdiravka commented on issue #1501: DRILL-6791: Scan projection framework
URL: https://github.com/apache/drill/pull/1501#issuecomment-445621306
 
 
   Most likely the Travis build fails as usual due to timeout limit 50mins. I 
have created PR to fix that.
   
   `TestPStoreProviders.verifyZkStore()` test is marked as `SlowTest`, so it is 
excluded from Travis build.
   I have checked it on your branch and it passed for me, but with error 
message:
   > mvn -Dtest=TestPStoreProviders#verifyZkStore -pl exec/java-exec test
   > ...
   > [Time-limited test-EventThread] ERROR org.apache.curator.ConnectionState - 
Authentication failed
   
   The same message is on current master. So it is not related to your changes 
(but a good point to discuss why it appears).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-12-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714166#comment-16714166
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

paul-rogers commented on issue #1501: DRILL-6791: Scan projection framework
URL: https://github.com/apache/drill/pull/1501#issuecomment-445595147
 
 
   Builds seem flaky. My build for this change fails with:
   ```
   TestPStoreProviders.verifyZkStore:67 ยป NoSuchElement
   ```
   But, this change does not come anywhere near ZK or persistent storage.
   
   The Travis build fails with:
   
   ```
   ...
   [[1;34m INFO[m] artifact joda-time:joda-time: checking for updates from 
sonatype-apache
   The job exceeded the maximum time limit for jobs, and has been terminated.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-12-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714115#comment-16714115
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

paul-rogers commented on issue #1501: DRILL-6791: Scan projection framework
URL: https://github.com/apache/drill/pull/1501#issuecomment-445574835
 
 
   Squashed commits and rebased on latest master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-12-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714000#comment-16714000
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on issue #1501: DRILL-6791: Scan projection framework
URL: https://github.com/apache/drill/pull/1501#issuecomment-445546658
 
 
   +1, please squash the commits.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-12-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713582#comment-16713582
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

paul-rogers commented on issue #1501: DRILL-6791: Scan projection framework
URL: https://github.com/apache/drill/pull/1501#issuecomment-445440405
 
 
   @arina-ielchiieva, addressed your code review comments. Also, found and 
fixed a "bits" vector initialization bug. Please take another look to see if 
there is anything else we need to address. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696460#comment-16696460
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

paul-rogers commented on issue #1501: DRILL-6791: Scan projection framework
URL: https://github.com/apache/drill/pull/1501#issuecomment-441170180
 
 
   @arina-ielchiieva, made the requested revisions, but an seeing a flaky 
result in one of the unit tests. May be due to the null bits vector having 
garbage values. Will work on that fix then push the changes once the issue is 
fixed. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695107#comment-16695107
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on issue #1501: DRILL-6791: Scan projection framework
URL: https://github.com/apache/drill/pull/1501#issuecomment-440772501
 
 
   @paul-rogers would you have a chance to address code review comments before 
release cut off date?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686540#comment-16686540
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233458212
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/project/TestSchemaSmoothing.java
 ##
 @@ -0,0 +1,681 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.impl.protocol.SchemaTracker;
+import org.apache.drill.exec.physical.impl.scan.ScanTestUtils;
+import org.apache.drill.exec.physical.impl.scan.project.NullColumnBuilder;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import 
org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple.ResolvedRow;
+import org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection;
+import org.apache.drill.exec.physical.impl.scan.project.ScanSchemaOrchestrator;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanSchemaOrchestrator.ReaderSchemaOrchestrator;
+import org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother.IncompatibleSchemaException;
+import org.apache.drill.exec.physical.impl.scan.project.SmoothingProjection;
+import 
org.apache.drill.exec.physical.impl.scan.project.WildcardSchemaProjection;
+import org.apache.drill.exec.physical.rowSet.ResultSetLoader;
+import org.apache.drill.exec.physical.rowSet.impl.RowSetTestUtils;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.test.rowSet.schema.SchemaBuilder;
+import org.junit.Test;
+
+/**
+ * Tests schema smoothing at the schema projection level.
+ * This level handles reusing prior types when filling null
+ * values. But, because no actual vectors are involved, it
+ * does not handle the schema chosen for a table ahead of
+ * time, only the schema as it is merged with prior schema to
+ * detect missing columns.
+ * 
+ * Focuses on the SmoothingProjection class itself.
+ * 
+ * Note that, at present, schema smoothing does not work for entire
+ * maps. That is, if file 1 has, say {a: {b: 10, c: "foo"}}
+ * and file 2 has, say, {a: null}, then schema smoothing does
+ * not currently know how to recreate the map. The same is true of
+ * lists and unions. Handling such cases is complex and is probably
+ * better handled via a system that allows the user to specify their
+ * intent by providing a schema to apply to the two files.
+ */
+
+public class TestSchemaSmoothing extends SubOperatorTest {
+
+  /**
+   * Low-level test of the smoothing projection, including the exceptions
+   * it throws when things are not going its way.
+   */
+
+  @Test
+  public void testSmoothingProjection() {
+final ScanLevelProjection scanProj = new ScanLevelProjection(
+RowSetTestUtils.projectAll(),
+ScanTestUtils.parsers());
+
+// Table 1: (a: nullable bigint, b)
+
+final TupleMetadata schema1 = new SchemaBuilder()
+.addNullable("a", MinorType.BIGINT)
+.addNullable("b", MinorType.VARCHAR)
+.add("c", MinorType.FLOAT8)
+.buildSchema();
+ResolvedRow priorSchema;
+{
+  final NullColumnBuilder builder = new NullColumnBuilder(null, false);
+  final ResolvedRow rootTuple = new ResolvedRow(builder);
+  new WildcardSchemaProjection(
+  scanProj, schema1, rootTuple,
+  ScanTestUtils.resolvers());
+  priorSchema = rootTuple;
+}
+
+// Table 2: (a: 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686537#comment-16686537
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233454765
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/ScanTestUtils.java
 ##
 @@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.TupleSchema;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+public class ScanTestUtils {
+
+  /**
+   * Type-safe way to define a list of parsers.
+   * @param parsers
+   * @return
+   */
+
+  public static List parsers(ScanProjectionParser... 
parsers) {
+return ImmutableList.copyOf(parsers);
 
 Review comment:
   Consider using java built-in utils rather than guava: here and below.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686536#comment-16686536
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233453077
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/ResolvedTuple.java
 ##
 @@ -0,0 +1,427 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.rowSet.ResultVectorCache;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.vector.UInt4Vector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.AbstractMapVector;
+import org.apache.drill.exec.vector.complex.MapVector;
+import org.apache.drill.exec.vector.complex.RepeatedMapVector;
+
+import 
org.apache.drill.shaded.guava.com.google.common.annotations.VisibleForTesting;
+
+/**
+ * Drill rows are made up of a tree of tuples, with the row being the root
+ * tuple. Each tuple contains columns, some of which may be maps. This
+ * class represents each row or map in the output projection.
+ * 
+ * Output columns within the tuple can be projected from the data source,
+ * might be null (requested columns that don't match a data source column)
+ * or might be a constant (such as an implicit column.) This class
+ * orchestrates assembling an output tuple from a collection of these
+ * three column types. (Though implicit columns appear only in the root
+ * tuple.)
+ *
+ * Null Handling
+ *
+ * The project list might reference a "missing" map if the project list
+ * includes, say, SELECT a.b.c but `a` does not exist
+ * in the data source. In this case, the column a is implied to be a map,
+ * so the projection mechanism will create a null map for `a`
+ * and `b`, and will create a null column for `c`.
+ * 
+ * To accomplish this recursive null processing, each tuple is associated
+ * with a null builder. (The null builder can be null if projection is
+ * implicit with a wildcard; in such a case no null columns can occur.
+ * But, even here, with schema persistence, a SELECT * query
+ * may need null columns if a second file does not contain a column
+ * that appeared in a first file.)
+ * 
+ * The null builder is bound to each tuple to allow vector persistence
+ * via the result vector cache. If we must create a null column
+ * `x` in two different readers, then the rules of Drill
+ * require that the same vector be used for both (or else a schema
+ * change is signaled.) The vector cache works by name (and type).
+ * Since maps may contain columns with the same names as other maps,
+ * the vector cache must be associated with each tuple. And, by extension,
+ * the null builder must also be associated with each tuple.
+ *
+ * Lifecycle
+ *
+ * The lifecycle of a resolved tuple is:
+ * 
+ * The projection mechanism creates the output tuple, and its columns,
+ * by comparing the project list against the table schema. The result is
+ * a set of table, null, or constant columns.
+ * Once per schema change, the resolved tuple creates the output
+ * tuple by linking to vectors in their original locations. As it turns out,
+ * we can simply share the vectors; we don't need to transfer the buffers.
+ * To prepare for the transfer, the tuple asks the null column builder
+ * (if present) to build the required null columns.
+ * Once the output tuple is built, it can be used for any number of
+ * batches without further work. (The same vectors appear in the various inputs
+ * and the output, eliminating the need for any transfers.)
+ * Once per batch, the client must set the row count. This is needed for 
the
+ * output container, and for any "null" maps 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686539#comment-16686539
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233455585
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/project/TestNullColumnLoader.java
 ##
 @@ -0,0 +1,329 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import static org.junit.Assert.assertSame;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.project.NullColumnBuilder;
+import org.apache.drill.exec.physical.impl.scan.project.NullColumnLoader;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedNullColumn;
+import org.apache.drill.exec.physical.rowSet.ResultVectorCache;
+import org.apache.drill.exec.physical.rowSet.impl.NullResultVectorCacheImpl;
+import org.apache.drill.exec.physical.rowSet.impl.ResultVectorCacheImpl;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.test.rowSet.schema.SchemaBuilder;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.junit.Test;
+
+/**
+ * Test the mechanism that handles all-null columns during projection.
+ * An all-null column is one projected in the query, but which does
+ * not actually exist in the underlying data source (or input
+ * operator.)
+ * 
+ * In anticipation of having type information, this mechanism
+ * can create the classic nullable Int null column, or one of
+ * any other type and mode.
+ */
+
+public class TestNullColumnLoader extends SubOperatorTest {
+
+  private ResolvedNullColumn makeNullCol(String name, MajorType nullType) {
+
+// For this test, we don't need the projection, so just
+// set it to null.
+
+return new ResolvedNullColumn(name, nullType, null, 0);
+  }
+
+  private ResolvedNullColumn makeNullCol(String name) {
+return makeNullCol(name, null);
+  }
+
+  /**
+   * Test the simplest case: default null type, nothing in the vector
+   * cache. Specify no column type, the special NULL type, or a
+   * predefined type. Output types should be set accordingly.
+   */
+
+  @Test
+  public void testBasics() {
+
+final List defns = new ArrayList<>();
+defns.add(makeNullCol("unspecified", null));
+defns.add(makeNullCol("nullType", Types.optional(MinorType.NULL)));
+defns.add(makeNullCol("specifiedOpt", Types.optional(MinorType.VARCHAR)));
+defns.add(makeNullCol("specifiedReq", Types.required(MinorType.VARCHAR)));
+defns.add(makeNullCol("specifiedArray", 
Types.repeated(MinorType.VARCHAR)));
+
+final ResultVectorCache cache = new 
NullResultVectorCacheImpl(fixture.allocator());
+final NullColumnLoader staticLoader = new NullColumnLoader(cache, defns, 
null, false);
+
+// Create a batch
+
+final VectorContainer output = staticLoader.load(2);
+
+// Verify values and types
+
+final BatchSchema expectedSchema = new SchemaBuilder()
+.add("unspecified", NullColumnLoader.DEFAULT_NULL_TYPE)
+.add("nullType", NullColumnLoader.DEFAULT_NULL_TYPE)
+.addNullable("specifiedOpt", MinorType.VARCHAR)
+.addNullable("specifiedReq", MinorType.VARCHAR)
+.addArray("specifiedArray", MinorType.VARCHAR)
+.build();
+final SingleRowSet expected = fixture.rowSetBuilder(expectedSchema)
+.addRow(null, null, null, null, new String[] {})
+.addRow(null, null, null, null, new String[] {})
+.build();
+
+new 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686543#comment-16686543
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233452637
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/ResolvedTuple.java
 ##
 @@ -0,0 +1,427 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.rowSet.ResultVectorCache;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.vector.UInt4Vector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.AbstractMapVector;
+import org.apache.drill.exec.vector.complex.MapVector;
+import org.apache.drill.exec.vector.complex.RepeatedMapVector;
+
+import 
org.apache.drill.shaded.guava.com.google.common.annotations.VisibleForTesting;
+
+/**
+ * Drill rows are made up of a tree of tuples, with the row being the root
+ * tuple. Each tuple contains columns, some of which may be maps. This
+ * class represents each row or map in the output projection.
+ * 
+ * Output columns within the tuple can be projected from the data source,
+ * might be null (requested columns that don't match a data source column)
+ * or might be a constant (such as an implicit column.) This class
+ * orchestrates assembling an output tuple from a collection of these
+ * three column types. (Though implicit columns appear only in the root
+ * tuple.)
+ *
+ * Null Handling
+ *
+ * The project list might reference a "missing" map if the project list
+ * includes, say, SELECT a.b.c but `a` does not exist
+ * in the data source. In this case, the column a is implied to be a map,
+ * so the projection mechanism will create a null map for `a`
+ * and `b`, and will create a null column for `c`.
+ * 
+ * To accomplish this recursive null processing, each tuple is associated
+ * with a null builder. (The null builder can be null if projection is
+ * implicit with a wildcard; in such a case no null columns can occur.
+ * But, even here, with schema persistence, a SELECT * query
+ * may need null columns if a second file does not contain a column
+ * that appeared in a first file.)
+ * 
+ * The null builder is bound to each tuple to allow vector persistence
+ * via the result vector cache. If we must create a null column
+ * `x` in two different readers, then the rules of Drill
+ * require that the same vector be used for both (or else a schema
+ * change is signaled.) The vector cache works by name (and type).
+ * Since maps may contain columns with the same names as other maps,
+ * the vector cache must be associated with each tuple. And, by extension,
+ * the null builder must also be associated with each tuple.
+ *
+ * Lifecycle
+ *
+ * The lifecycle of a resolved tuple is:
+ * 
+ * The projection mechanism creates the output tuple, and its columns,
+ * by comparing the project list against the table schema. The result is
+ * a set of table, null, or constant columns.
+ * Once per schema change, the resolved tuple creates the output
+ * tuple by linking to vectors in their original locations. As it turns out,
+ * we can simply share the vectors; we don't need to transfer the buffers.
+ * To prepare for the transfer, the tuple asks the null column builder
+ * (if present) to build the required null columns.
+ * Once the output tuple is built, it can be used for any number of
+ * batches without further work. (The same vectors appear in the various inputs
+ * and the output, eliminating the need for any transfers.)
+ * Once per batch, the client must set the row count. This is needed for 
the
+ * output container, and for any "null" maps 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686542#comment-16686542
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233451408
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/ResolvedColumn.java
 ##
 @@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import org.apache.drill.exec.record.MaterializedField;
+
+/**
+ * A resolved column has a name, and a specification for how to project
+ * data from a source vector to a vector in the final output container.
+ * Describes the projection of a single column from
+ * an input to an output batch.
+ * 
+ * Although the table schema mechanism uses the newer "metadata"
+ * mechanism, resolved columns revert back to the original
+ * {@link MajorType} and {@link MaterializedField} mechanism used
+ * by the rest of Drill. Doing so loses a bit of additional
+ * information, but at present there is no way to export that information
+ * along with a serialized record batch; each operator must rediscover
+ * it after deserialization.
+ */
+
+public abstract class ResolvedColumn implements ColumnProjection {
+
+  public final VectorSource source;
+  public final int sourceIndex;
+
+  public ResolvedColumn(VectorSource source, int sourceIndex) {
+this.source = source;
+this.sourceIndex = sourceIndex;
+  }
+
+  public VectorSource source() { return source; }
+
+  public int sourceIndex() { return sourceIndex; }
+
+  /**
+   * Return the type of this column. Used primarily by the schema smoothing
+   * mechanism.
+   *
+   * @return
 
 Review comment:
   Move description here to avoid warning.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686535#comment-16686535
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233454585
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/ScanTestUtils.java
 ##
 @@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.TupleSchema;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+public class ScanTestUtils {
+
+  /**
+   * Type-safe way to define a list of parsers.
+   * @param parsers
+   * @return
 
 Review comment:
   add description


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686544#comment-16686544
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233454002
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/SmoothingProjection.java
 ##
 @@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother.IncompatibleSchemaException;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Resolve a table schema against the prior schema. This works only if the
+ * types match and if all columns in the table schema already appear in the
+ * prior schema.
+ * 
+ * Consider this an experimental mechanism. The hope was that, with clever
+ * techniques, we could "smooth over" some of the issues that cause schema
+ * change events in Drill. As it turned out, however, creating this mechanism
+ * revealed that it is not possible, even in theory, to handle most schema
+ * changes because of the time dimension:
+ * 
+ * An even in a later batch may provide information that would have
+ * caused us to make a different decision in an earlier batch. For example,
+ * we are asked for column `foo`, did not see such a column in the first
+ * batch, block or file, guessed some type, and later saw that the column
+ * was of a different type. We can't "time travel" to tell our earlier
+ * selves, nor, when we make the initial type decision, can we jump to
+ * the future to see what type we'll discover.
+ * Readers in this fragment may see column `foo` but readers in
+ * another fragment read files/blocks that don't have that column. The
+ * two readers cannot communicate to agree on a type.
+ * 
+ * 
+ * What this mechanism can do is make decisions based on history: when a
+ * column appears, we can adjust its type a bit to try to avoid an
+ * unnecessary change. For example, if a prior file in this scan saw
+ * `foo` as nullable Varchar, but the present file has the column as
+ * requied Varchar, we can use the more general nullable form. But,
+ * again, the "can't predict the future" bites us: we can handle a
+ * nullable-to-required column change, but not visa-versa.
+ * 
+ * What this mechanism will tell the careful reader is that the only
+ * general solution to the schema-change problem is to now the full
+ * schema up front: for the planner to be told the schema and to
+ * communicate that schema to all readers so that all readers agree
+ * on the final schema.
+ * 
+ * When that is done, the techniques shown here can be used to adjust
+ * any per-file variation of schema to match the up-front schema.
+ */
+
+public class SmoothingProjection extends SchemaLevelProjection {
+
+  protected final List rewrittenFields = new ArrayList<>();
+
+  public SmoothingProjection(ScanLevelProjection scanProj,
+  TupleMetadata tableSchema,
+  ResolvedTuple priorSchema,
+  ResolvedTuple outputTuple,
+  List resolvers) throws 
IncompatibleSchemaException {
+
+super(resolvers);
+
+for (ResolvedColumn priorCol : priorSchema.columns()) {
+  switch (priorCol.nodeType()) {
+  case ResolvedTableColumn.ID:
 
 Review comment:
   indent


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686541#comment-16686541
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233458346
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/project/TestSchemaSmoothing.java
 ##
 @@ -0,0 +1,681 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.impl.protocol.SchemaTracker;
+import org.apache.drill.exec.physical.impl.scan.ScanTestUtils;
+import org.apache.drill.exec.physical.impl.scan.project.NullColumnBuilder;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import 
org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple.ResolvedRow;
+import org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection;
+import org.apache.drill.exec.physical.impl.scan.project.ScanSchemaOrchestrator;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanSchemaOrchestrator.ReaderSchemaOrchestrator;
+import org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother.IncompatibleSchemaException;
+import org.apache.drill.exec.physical.impl.scan.project.SmoothingProjection;
+import 
org.apache.drill.exec.physical.impl.scan.project.WildcardSchemaProjection;
+import org.apache.drill.exec.physical.rowSet.ResultSetLoader;
+import org.apache.drill.exec.physical.rowSet.impl.RowSetTestUtils;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.test.rowSet.schema.SchemaBuilder;
+import org.junit.Test;
+
+/**
+ * Tests schema smoothing at the schema projection level.
+ * This level handles reusing prior types when filling null
+ * values. But, because no actual vectors are involved, it
+ * does not handle the schema chosen for a table ahead of
+ * time, only the schema as it is merged with prior schema to
+ * detect missing columns.
+ * 
+ * Focuses on the SmoothingProjection class itself.
+ * 
+ * Note that, at present, schema smoothing does not work for entire
+ * maps. That is, if file 1 has, say {a: {b: 10, c: "foo"}}
+ * and file 2 has, say, {a: null}, then schema smoothing does
+ * not currently know how to recreate the map. The same is true of
+ * lists and unions. Handling such cases is complex and is probably
+ * better handled via a system that allows the user to specify their
+ * intent by providing a schema to apply to the two files.
+ */
+
+public class TestSchemaSmoothing extends SubOperatorTest {
+
+  /**
+   * Low-level test of the smoothing projection, including the exceptions
+   * it throws when things are not going its way.
+   */
+
+  @Test
+  public void testSmoothingProjection() {
+final ScanLevelProjection scanProj = new ScanLevelProjection(
+RowSetTestUtils.projectAll(),
+ScanTestUtils.parsers());
+
+// Table 1: (a: nullable bigint, b)
+
+final TupleMetadata schema1 = new SchemaBuilder()
+.addNullable("a", MinorType.BIGINT)
+.addNullable("b", MinorType.VARCHAR)
+.add("c", MinorType.FLOAT8)
+.buildSchema();
+ResolvedRow priorSchema;
+{
+  final NullColumnBuilder builder = new NullColumnBuilder(null, false);
+  final ResolvedRow rootTuple = new ResolvedRow(builder);
+  new WildcardSchemaProjection(
+  scanProj, schema1, rootTuple,
+  ScanTestUtils.resolvers());
+  priorSchema = rootTuple;
+}
+
+// Table 2: (a: 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686534#comment-16686534
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233450250
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/MetadataManager.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.physical.rowSet.ResultVectorCache;
+
+/**
+ * Queries can contain a wildcard (*), table columns, or special
+ * system-defined columns (the file metadata columns AKA implicit
+ * columns, the `columns` column of CSV, etc.).
+ * 
+ * This class provides a generalized way of handling such extended
+ * columns. That is, this handles metadata for columns defined by
+ * the scan or file; columns defined by the table (the actual
+ * data metadata) is handled elsewhere.
+ * 
+ * Objects of this interface are driven by the projection processing
+ * framework which provides a vector cache from which to obtain
+ * materialized columns. The implementation must provide a projection
+ * parser to pick out the columns which this object handles.
+ * 
+ * A better name might be ImplicitMetadataManager to signify that
 
 Review comment:
   Agree, let's rename to avoid the confusion in future.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686538#comment-16686538
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233454141
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/SmoothingProjection.java
 ##
 @@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother.IncompatibleSchemaException;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Resolve a table schema against the prior schema. This works only if the
+ * types match and if all columns in the table schema already appear in the
+ * prior schema.
+ * 
+ * Consider this an experimental mechanism. The hope was that, with clever
+ * techniques, we could "smooth over" some of the issues that cause schema
+ * change events in Drill. As it turned out, however, creating this mechanism
+ * revealed that it is not possible, even in theory, to handle most schema
+ * changes because of the time dimension:
+ * 
+ * An even in a later batch may provide information that would have
+ * caused us to make a different decision in an earlier batch. For example,
+ * we are asked for column `foo`, did not see such a column in the first
+ * batch, block or file, guessed some type, and later saw that the column
+ * was of a different type. We can't "time travel" to tell our earlier
+ * selves, nor, when we make the initial type decision, can we jump to
+ * the future to see what type we'll discover.
+ * Readers in this fragment may see column `foo` but readers in
+ * another fragment read files/blocks that don't have that column. The
+ * two readers cannot communicate to agree on a type.
+ * 
+ * 
+ * What this mechanism can do is make decisions based on history: when a
+ * column appears, we can adjust its type a bit to try to avoid an
+ * unnecessary change. For example, if a prior file in this scan saw
+ * `foo` as nullable Varchar, but the present file has the column as
+ * requied Varchar, we can use the more general nullable form. But,
+ * again, the "can't predict the future" bites us: we can handle a
+ * nullable-to-required column change, but not visa-versa.
+ * 
+ * What this mechanism will tell the careful reader is that the only
+ * general solution to the schema-change problem is to now the full
+ * schema up front: for the planner to be told the schema and to
+ * communicate that schema to all readers so that all readers agree
+ * on the final schema.
+ * 
+ * When that is done, the techniques shown here can be used to adjust
+ * any per-file variation of schema to match the up-front schema.
+ */
+
+public class SmoothingProjection extends SchemaLevelProjection {
+
+  protected final List rewrittenFields = new ArrayList<>();
+
+  public SmoothingProjection(ScanLevelProjection scanProj,
+  TupleMetadata tableSchema,
+  ResolvedTuple priorSchema,
+  ResolvedTuple outputTuple,
+  List resolvers) throws 
IncompatibleSchemaException {
+
+super(resolvers);
+
+for (ResolvedColumn priorCol : priorSchema.columns()) {
+  switch (priorCol.nodeType()) {
+  case ResolvedTableColumn.ID:
+  case ResolvedNullColumn.ID:
+// TODO: To fix this, the null column loader must declare
 
 Review comment:
   Please explain this todo


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686533#comment-16686533
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233449683
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/ExplicitSchemaProjection.java
 ##
 @@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.List;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTuple;
+import 
org.apache.drill.exec.physical.rowSet.project.RequestedTuple.RequestedColumn;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Perform a schema projection for the case of an explicit list of
+ * projected columns. Example: SELECT a, b, c.
+ * 
+ * An explicit projection starts with the requested set of columns,
+ * then looks in the table schema to find matches. That is, it is
+ * driven by the query itself.
+ * 
+ * An explicit projection may include columns that do not exist in
+ * the source schema. In this case, we fill in null columns for
+ * unmatched projections.
+ */
+
+public class ExplicitSchemaProjection extends SchemaLevelProjection {
+
+  public ExplicitSchemaProjection(ScanLevelProjection scanProj,
+  TupleMetadata tableSchema,
+  ResolvedTuple rootTuple,
+  List resolvers) {
+super(resolvers);
+resolveRootTuple(scanProj, rootTuple, tableSchema);
+  }
+
+  private void resolveRootTuple(ScanLevelProjection scanProj,
+  ResolvedTuple rootTuple,
+  TupleMetadata tableSchema) {
+for (ColumnProjection col : scanProj.columns()) {
+  if (col.nodeType() == UnresolvedColumn.UNRESOLVED) {
+resolveColumn(rootTuple, ((UnresolvedColumn) col).element(), 
tableSchema);
+  } else {
+resolveSpecial(rootTuple, col, tableSchema);
+  }
+}
+  }
+
+  private void resolveColumn(ResolvedTuple outputTuple,
+  RequestedColumn inputCol, TupleMetadata tableSchema) {
+int tableColIndex = tableSchema.index(inputCol.name());
+if (tableColIndex == -1) {
+  resolveNullColumn(outputTuple, inputCol);
+} else {
+  resolveTableColumn(outputTuple, inputCol,
+  tableSchema.metadata(tableColIndex),
+  tableColIndex);
+}
+  }
+
+  private void resolveTableColumn(ResolvedTuple outputTuple,
+  RequestedColumn requestedCol,
+  ColumnMetadata column, int sourceIndex) {
+
+// Is the requested column implied to be a map?
+// A requested column is a map if the user requests x.y and we
+// are resolving column x. The presence of y as a member implies
+// that x is a map.
+
+if (requestedCol.isTuple()) {
+  resolveMap(outputTuple, requestedCol, column, sourceIndex);
+}
+
+// Is the requested column implied to be an array?
+// This occurs when the projection list contains at least one
+// array index reference such as x[10].
+
+else if (requestedCol.isArray()) {
+  resolveArray(outputTuple, requestedCol, column, sourceIndex);
+}
+
+// A plain old column. Might be an array or a map, but if
+// so, the request list just mentions it by name without implying
+// the column type. That is, the project list just contains x
+// by itself.
+
+else {
+  projectTableColumn(outputTuple, requestedCol, column, sourceIndex);
+}
+  }
+
+  private void resolveMap(ResolvedTuple outputTuple,
+  RequestedColumn requestedCol, ColumnMetadata column,
+  int sourceIndex) {
+
+// If the actual column isn't a map, then the request is invalid.
+
+if (! column.isMap()) {
+  throw UserException
+.validationError()
+.message("Project list implies a map 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683279#comment-16683279
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

paul-rogers commented on issue #1501: DRILL-6791: Scan projection framework
URL: https://github.com/apache/drill/pull/1501#issuecomment-437770579
 
 
   @arina-ielchiieva, anything I should take care of on this one?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674736#comment-16674736
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

paul-rogers commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r230644967
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/ScanTestUtils.java
 ##
 @@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.TupleSchema;
+import com.google.common.collect.Lists;
+
+public class ScanTestUtils {
+
+  // Default file metadata column names; primarily for testing.
+
+  public static final String FILE_NAME_COL = "filename";
+  public static final String FULLY_QUALIFIED_NAME_COL = "fqn";
+  public static final String FILE_PATH_COL = "filepath";
+  public static final String SUFFIX_COL = "suffix";
+  public static final String PARTITION_COL = "dir";
+
+  /**
+   * Type-safe way to define a list of parsers.
+   * @param parsers
+   * @return
+   */
+
+  public static List parsers(ScanProjectionParser... 
parsers) {
+return Lists.newArrayList(parsers);
 
 Review comment:
   Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674732#comment-16674732
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

paul-rogers commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r230645237
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/project/TestScanBatchWriters.java
 ##
 @@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ 
**/
+package org.apache.drill.exec.physical.impl.scan.project;
 
 Review comment:
   Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674735#comment-16674735
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

paul-rogers commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r230641861
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/NoOpMetadataManager.java
 ##
 @@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.physical.rowSet.ResultVectorCache;
+
+/**
+ * Do-nothing implementation of the metadata manager. Allows the
+ * metadata manager to be optional without needing an if-statement
+ * on every access.
+ */
+
+public class NoOpMetadataManager implements MetadataManager {
+
+  @Override
+  public void bind(ResultVectorCache vectorCache) { }
+
+  @Override
+  public ScanProjectionParser projectionParser() { return null; }
+
+  @Override
+  public SchemaProjectionResolver resolver() {
+throw new UnsupportedOperationException();
 
 Review comment:
   Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674734#comment-16674734
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

paul-rogers commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r230644180
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/ScanTestUtils.java
 ##
 @@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.TupleSchema;
+import com.google.common.collect.Lists;
+
+public class ScanTestUtils {
+
+  // Default file metadata column names; primarily for testing.
+
+  public static final String FILE_NAME_COL = "filename";
 
 Review comment:
   Tried, but it seems that the class no longer exists. As I recall, the names 
came from config settings or session options.
   
   Actually, we don't even need these fields for this PR, so I removed them. 
Will add them in a later PR that adds the file metadata mechanism (the 
successor to the implicit column explorer.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-11-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674733#comment-16674733
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

paul-rogers commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r230642040
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/ScanLevelProjection.java
 ##
 @@ -0,0 +1,349 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTuple;
+import 
org.apache.drill.exec.physical.rowSet.project.RequestedTuple.RequestedColumn;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTupleImpl;
+
+/**
+ * Parses and analyzes the projection list passed to the scanner. The
+ * projection list is per scan, independent of any tables that the
+ * scanner might scan. The projection list is then used as input to the
+ * per-table projection planning.
+ * 
+ * Accepts the inputs needed to
+ * plan a projection, builds the mappings, and constructs the projection
+ * mapping object.
+ * 
+ * Builds the per-scan projection plan given a set of projected columns.
+ * Determines the output schema, which columns to project from the data
+ * source, which are metadata, and so on.
+ * 
+ * An annoying aspect of SQL is that the projection list (the list of
+ * columns to appear in the output) is specified after the SELECT keyword.
+ * In Relational theory, projection is about columns, selection is about
+ * rows...
+ * 
+ * Mappings can be based on three primary use cases:
+ * 
+ * SELECT *: Project all data source columns, whatever they happen
+ * to be. Create columns using names from the data source. The data source
+ * also determines the order of columns within the row.
+ * SELECT columns: Similar to SELECT * in that it projects all 
columns
+ * from the data source, in data source order. But, rather than creating
+ * individual output columns for each data source column, creates a single
+ * column which is an array of Varchars which holds the (text form) of
+ * each column as an array element.
+ * SELECT a, b, c, ...: Project a specific set of columns, 
identified by
+ * case-insensitive name. The output row uses the names from the SELECT list,
+ * but types from the data source. Columns appear in the row in the order
+ * specified by the SELECT.
+ * SELECT ...: SELECT nothing, occurs in SELECT COUNT(*)
+ * type queries. The provided projection list contains no (table) columns, 
though
+ * it may contain metadata columns.
+ * 
+ * Names in the SELECT list can reference any of five distinct types of output
+ * columns:
+ * 
+ * Wildcard ("*") column: indicates the place in the projection list to 
insert
+ * the table columns once found in the table projection plan.
+ * Data source columns: columns from the underlying table. The table
+ * projection planner will determine if the column exists, or must be filled
+ * in with a null column.
+ * The generic data source columns array: columns, or optionally
+ * specific members of the columns array such as 
columns[1].
+ * Implicit columns: fqn, filename, filepath
+ * and suffix. These reference
+ * parts of the name of the file being scanned.
+ * Partition columns: dir0, dir1, ...: These reference
+ * parts of the path name of the file.
+ * 
+ *
+ * @see {@link ImplicitColumnExplorer}, the class from which this class
+ * evolved
+ */
+
+public class ScanLevelProjection {
+
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ScanLevelProjection.class);
+
+  /**
+   * Interface for add-on parsers, avoids the need to create
+   * a single, tightly-coupled parser for all types of columns.
+   * The main parser handles wildcards and assumes the rest of
+   * the columns are table columns. The add-on parser can tag
+   * columns as special, such as to hold metadata.
+   */
+
+  public 

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-10-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669949#comment-16669949
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r226964830
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/ScanLevelProjection.java
 ##
 @@ -0,0 +1,349 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTuple;
+import 
org.apache.drill.exec.physical.rowSet.project.RequestedTuple.RequestedColumn;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTupleImpl;
+
+/**
+ * Parses and analyzes the projection list passed to the scanner. The
+ * projection list is per scan, independent of any tables that the
+ * scanner might scan. The projection list is then used as input to the
+ * per-table projection planning.
+ * 
+ * Accepts the inputs needed to
+ * plan a projection, builds the mappings, and constructs the projection
+ * mapping object.
+ * 
+ * Builds the per-scan projection plan given a set of projected columns.
+ * Determines the output schema, which columns to project from the data
+ * source, which are metadata, and so on.
+ * 
+ * An annoying aspect of SQL is that the projection list (the list of
+ * columns to appear in the output) is specified after the SELECT keyword.
+ * In Relational theory, projection is about columns, selection is about
+ * rows...
+ * 
+ * Mappings can be based on three primary use cases:
+ * 
+ * SELECT *: Project all data source columns, whatever they happen
+ * to be. Create columns using names from the data source. The data source
+ * also determines the order of columns within the row.
+ * SELECT columns: Similar to SELECT * in that it projects all 
columns
+ * from the data source, in data source order. But, rather than creating
+ * individual output columns for each data source column, creates a single
+ * column which is an array of Varchars which holds the (text form) of
+ * each column as an array element.
+ * SELECT a, b, c, ...: Project a specific set of columns, 
identified by
+ * case-insensitive name. The output row uses the names from the SELECT list,
+ * but types from the data source. Columns appear in the row in the order
+ * specified by the SELECT.
+ * SELECT ...: SELECT nothing, occurs in SELECT COUNT(*)
+ * type queries. The provided projection list contains no (table) columns, 
though
+ * it may contain metadata columns.
+ * 
+ * Names in the SELECT list can reference any of five distinct types of output
+ * columns:
+ * 
+ * Wildcard ("*") column: indicates the place in the projection list to 
insert
+ * the table columns once found in the table projection plan.
+ * Data source columns: columns from the underlying table. The table
+ * projection planner will determine if the column exists, or must be filled
+ * in with a null column.
+ * The generic data source columns array: columns, or optionally
+ * specific members of the columns array such as 
columns[1].
+ * Implicit columns: fqn, filename, filepath
+ * and suffix. These reference
+ * parts of the name of the file being scanned.
+ * Partition columns: dir0, dir1, ...: These reference
+ * parts of the path name of the file.
+ * 
+ *
+ * @see {@link ImplicitColumnExplorer}, the class from which this class
+ * evolved
+ */
+
+public class ScanLevelProjection {
+
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ScanLevelProjection.class);
+
+  /**
+   * Interface for add-on parsers, avoids the need to create
+   * a single, tightly-coupled parser for all types of columns.
+   * The main parser handles wildcards and assumes the rest of
+   * the columns are table columns. The add-on parser can tag
+   * columns as special, such as to hold metadata.
+   */
+
+  

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-10-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669943#comment-16669943
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r225643112
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/NoOpMetadataManager.java
 ##
 @@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.physical.rowSet.ResultVectorCache;
+
+/**
+ * Do-nothing implementation of the metadata manager. Allows the
+ * metadata manager to be optional without needing an if-statement
+ * on every access.
+ */
+
+public class NoOpMetadataManager implements MetadataManager {
+
+  @Override
+  public void bind(ResultVectorCache vectorCache) { }
+
+  @Override
+  public ScanProjectionParser projectionParser() { return null; }
+
+  @Override
+  public SchemaProjectionResolver resolver() {
+throw new UnsupportedOperationException();
 
 Review comment:
   Please provide error message.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-10-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669944#comment-16669944
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r226276316
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/project/TestScanBatchWriters.java
 ##
 @@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ 
**/
+package org.apache.drill.exec.physical.impl.scan.project;
 
 Review comment:
   Please remove.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-10-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669947#comment-16669947
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r226967308
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/ScanTestUtils.java
 ##
 @@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.TupleSchema;
+import com.google.common.collect.Lists;
+
+public class ScanTestUtils {
+
+  // Default file metadata column names; primarily for testing.
+
+  public static final String FILE_NAME_COL = "filename";
 
 Review comment:
   Can be taken from `ImplicitColumnExplorer`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-10-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669945#comment-16669945
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r226964830
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/ScanLevelProjection.java
 ##
 @@ -0,0 +1,349 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTuple;
+import 
org.apache.drill.exec.physical.rowSet.project.RequestedTuple.RequestedColumn;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTupleImpl;
+
+/**
+ * Parses and analyzes the projection list passed to the scanner. The
+ * projection list is per scan, independent of any tables that the
+ * scanner might scan. The projection list is then used as input to the
+ * per-table projection planning.
+ * 
+ * Accepts the inputs needed to
+ * plan a projection, builds the mappings, and constructs the projection
+ * mapping object.
+ * 
+ * Builds the per-scan projection plan given a set of projected columns.
+ * Determines the output schema, which columns to project from the data
+ * source, which are metadata, and so on.
+ * 
+ * An annoying aspect of SQL is that the projection list (the list of
+ * columns to appear in the output) is specified after the SELECT keyword.
+ * In Relational theory, projection is about columns, selection is about
+ * rows...
+ * 
+ * Mappings can be based on three primary use cases:
+ * 
+ * SELECT *: Project all data source columns, whatever they happen
+ * to be. Create columns using names from the data source. The data source
+ * also determines the order of columns within the row.
+ * SELECT columns: Similar to SELECT * in that it projects all 
columns
+ * from the data source, in data source order. But, rather than creating
+ * individual output columns for each data source column, creates a single
+ * column which is an array of Varchars which holds the (text form) of
+ * each column as an array element.
+ * SELECT a, b, c, ...: Project a specific set of columns, 
identified by
+ * case-insensitive name. The output row uses the names from the SELECT list,
+ * but types from the data source. Columns appear in the row in the order
+ * specified by the SELECT.
+ * SELECT ...: SELECT nothing, occurs in SELECT COUNT(*)
+ * type queries. The provided projection list contains no (table) columns, 
though
+ * it may contain metadata columns.
+ * 
+ * Names in the SELECT list can reference any of five distinct types of output
+ * columns:
+ * 
+ * Wildcard ("*") column: indicates the place in the projection list to 
insert
+ * the table columns once found in the table projection plan.
+ * Data source columns: columns from the underlying table. The table
+ * projection planner will determine if the column exists, or must be filled
+ * in with a null column.
+ * The generic data source columns array: columns, or optionally
+ * specific members of the columns array such as 
columns[1].
+ * Implicit columns: fqn, filename, filepath
+ * and suffix. These reference
+ * parts of the name of the file being scanned.
+ * Partition columns: dir0, dir1, ...: These reference
+ * parts of the path name of the file.
+ * 
+ *
+ * @see {@link ImplicitColumnExplorer}, the class from which this class
+ * evolved
+ */
+
+public class ScanLevelProjection {
+
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ScanLevelProjection.class);
+
+  /**
+   * Interface for add-on parsers, avoids the need to create
+   * a single, tightly-coupled parser for all types of columns.
+   * The main parser handles wildcards and assumes the rest of
+   * the columns are table columns. The add-on parser can tag
+   * columns as special, such as to hold metadata.
+   */
+
+  

[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-10-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669946#comment-16669946
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r226967308
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/ScanTestUtils.java
 ##
 @@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.TupleSchema;
+import com.google.common.collect.Lists;
+
+public class ScanTestUtils {
+
+  // Default file metadata column names; primarily for testing.
+
+  public static final String FILE_NAME_COL = "filename";
 
 Review comment:
   Can betaken from `ImplicitColumnExplorer`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-10-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662722#comment-16662722
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

vdiravka commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r227928299
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/ScanTestUtils.java
 ##
 @@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.TupleSchema;
+import com.google.common.collect.Lists;
+
+public class ScanTestUtils {
+
+  // Default file metadata column names; primarily for testing.
+
+  public static final String FILE_NAME_COL = "filename";
+  public static final String FULLY_QUALIFIED_NAME_COL = "fqn";
+  public static final String FILE_PATH_COL = "filepath";
+  public static final String SUFFIX_COL = "suffix";
+  public static final String PARTITION_COL = "dir";
+
+  /**
+   * Type-safe way to define a list of parsers.
+   * @param parsers
+   * @return
+   */
+
+  public static List parsers(ScanProjectionParser... 
parsers) {
+return Lists.newArrayList(parsers);
 
 Review comment:
   Currently shaded 23 version of Guava is used for Drill internal purpose, see 
more in [DRILL-6422](https://issues.apache.org/jira/browse/DRILL-6422).
   Usage of 19 version of not shaded Guava causes `BUILD FAILURE` by the 
`checkstyle-validation`
   -> `import org.apache.drill.shaded.guava.com.google.common.collect.Lists`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647546#comment-16647546
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

vdiravka commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r224684940
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/ScanTestUtils.java
 ##
 @@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.TupleSchema;
+import com.google.common.collect.Lists;
+
+public class ScanTestUtils {
+
+  // Default file metadata column names; primarily for testing.
+
+  public static final String FILE_NAME_COL = "filename";
+  public static final String FULLY_QUALIFIED_NAME_COL = "fqn";
+  public static final String FILE_PATH_COL = "filepath";
+  public static final String SUFFIX_COL = "suffix";
+  public static final String PARTITION_COL = "dir";
+
+  /**
+   * Type-safe way to define a list of parsers.
+   * @param parsers
+   * @return
+   */
+
+  public static List parsers(ScanProjectionParser... 
parsers) {
+return Lists.newArrayList(parsers);
 
 Review comment:
   Use `new ArrayList<>()` to make Travis build happy :)  
   The note in Guava `ListsnewArrayList()` javadoc:
   > Note for Java 7 and later: this method is now unnecessary and
   >  should be treated as deprecated. Instead, use the {@code ArrayList}
   > {@linkplain ArrayList#ArrayList() constructor} directly, taking advantage
   > of the new http://goo.gl/iz2Wi;>"diamond" syntax.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6791) Merge scan projection framework into master

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647508#comment-16647508
 ] 

ASF GitHub Bot commented on DRILL-6791:
---

paul-rogers opened a new pull request #1501: DRILL-6791: Scan projection 
framework
URL: https://github.com/apache/drill/pull/1501
 
 
   The "schema projection" mechanism:
   
   * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
projection.
   * Handles null columns (for projection a column "x" that does not exist in 
the base table.)
   * Handles constant columns as used for file metadata (AKA "implicit" 
columns).
   * Handle schema persistence: the need to reuse the same vectors across 
different scanners
   * Provides a framework for consuming externally-supplied metadata
   * Since we don't yet have a way to provide "real" metadata, obtains metadata 
hints from previous batches and from the projection list (a.b implies that "a" 
is a map, c[0] implies that "c" is an array, etc.)
   * Handles merging the set of data source columns and null columns to create 
the final output batch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge scan projection framework into master
> ---
>
> Key: DRILL-6791
> URL: https://issues.apache.org/jira/browse/DRILL-6791
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Merge the next set of "result set loader" code into master via a PR. This one 
> covers the "schema projection" mechanism which:
> * Handles none (SELECT COUNT\(*)), some (SELECT a, b, x) and all (SELECT *) 
> projection.
> * Handles null columns (for projection a column "x" that does not exist in 
> the base table.)
> * Handles constant columns as used for file metadata (AKA "implicit" columns).
> * Handle schema persistence: the need to reuse the same vectors across 
> different scanners
> * Provides a framework for consuming externally-supplied metadata
> * Since we don't yet have a way to provide "real" metadata, obtains metadata 
> hints from previous batches and from the projection list (a.b implies that 
> "a" is a map, c[0] implies that "c" is an array, etc.)
> * Handles merging the set of data source columns and null columns to create 
> the final output batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)