[GitHub] [drill] cgivre commented on issue #1892: DRILL-7437: Storage Plugin for Generic HTTP REST API

2020-04-11 Thread GitBox
cgivre commented on issue #1892: DRILL-7437: Storage Plugin for Generic HTTP 
REST API
URL: https://github.com/apache/drill/pull/1892#issuecomment-612551648
 
 
   @paul-rogers 
   I rebased this branch on master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (DRILL-6672) Drill table functions cannot handle "setFoo" accessors

2020-04-11 Thread Paul Rogers (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved DRILL-6672.

Resolution: Not A Problem

Storage and format plugins must be immutable since their entire values are used 
as keys in an internal map (plugin registry and format plugin tables.) So, no 
config should have a "setFoo()" method.

> Drill table functions cannot handle "setFoo" accessors
> --
>
> Key: DRILL-6672
> URL: https://issues.apache.org/jira/browse/DRILL-6672
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Consider an example format plugin, such as the regex one used in the Drill 
> book. (GitHub reference needed.) We can define the plugin using getters and 
> setters like this:
> {code}
> public class RegexFormatConfig implements FormatPluginConfig {
>   private String regex;
>   private String fields;
>   private String extension;
>   public void setRegex(String regex) { this.regex = regex; }
>   public void setFields(String fields) { this.fields = fields; }
>   public void setExtension(String extension) { this.extension = extension; }
> {code}
> We can then create a plugin configuration using the Drill Web console, the 
> {{bootstrap-storage-plugins.json}} and so on. All work fine.
> Suppose we try to define a configuration using a Drill table function:
> {code}
>   final String sql = "SELECT * FROM table(cp.`regex/simple.log2`\n" +
>   "(type => 'regex',\n" +
>   " extension => 'log2',\n" +
>   " regex => '(dddd)-(dd)-(dd) 
> .*',\n" +
>   " fields => 'a, b, c, d'))";
> {code}
> We get this error:
> {noformat}
> org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: 
> can not set value (\d\d\d\d)-(\d\d)-(\d\d) .* to parameter regex: class 
> java.lang.String
> table regex/simple.log2
> parameter regex
> {noformat}
> The reason is that the code that handles table functions only knows how to 
> set public fields, it does not know about the Java Bean getter/setter 
> conventions used by Jackson:
> {code}
> package org.apache.drill.exec.store.dfs;
> ...
> final class FormatPluginOptionsDescriptor {
>   ...
>   FormatPluginConfig createConfigForTable(TableInstance t) {
> ...
> Field field = pluginConfigClass.getField(paramDef.name);
> ...
> }
> field.set(config, param);
>   } catch (IllegalAccessException | NoSuchFieldException | 
> SecurityException e) {
> throw UserException.parseError(e)
> .message("can not set value %s to parameter %s: %s", param, 
> paramDef.name, paramDef.type)
> ...
> {code}
> The only workaround is to make all fields public:
> {code}
> public class RegexFormatConfig implements FormatPluginConfig {
>   public String regex;
>   public String fields;
>   public String extension;
> {code}
> Since public fields are not good practice, please modify the table function 
> mechanism to follow Jackson conventions and allow Java Bean style setters. 
> (Or better, fix DRILL-6673 to allow immutable format objects via the use of a 
> constructor.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [drill] asfgit closed pull request #2050: DRILL-7694: Register drill.queries.* counter metrics on Drillbit startup

2020-04-11 Thread GitBox
asfgit closed pull request #2050: DRILL-7694: Register drill.queries.* counter 
metrics on Drillbit startup
URL: https://github.com/apache/drill/pull/2050
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] asfgit closed pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader

2020-04-11 Thread GitBox
asfgit closed pull request #2045: DRILL-7683: Add "message parsing" to new JSON 
loader
URL: https://github.com/apache/drill/pull/2045
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] asfgit closed pull request #2044: DRILL-7678: Update Yauaa Dependency

2020-04-11 Thread GitBox
asfgit closed pull request #2044: DRILL-7678: Update Yauaa Dependency
URL: https://github.com/apache/drill/pull/2044
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] asfgit closed pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps

2020-04-11 Thread GitBox
asfgit closed pull request #2040: DRILL-7668: Allow Time Bucket Function to 
Accept Floats and Timestamps
URL: https://github.com/apache/drill/pull/2040
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] asfgit closed pull request #2047: DRILL-7675: Work around for partitions sender memory use

2020-04-11 Thread GitBox
asfgit closed pull request #2047: DRILL-7675: Work around for partitions sender 
memory use
URL: https://github.com/apache/drill/pull/2047
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF v2 scan schema resolution

2020-04-11 Thread GitBox
arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF 
v2 scan schema resolution
URL: https://github.com/apache/drill/pull/2051#discussion_r407075167
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/v3/schema/ScanSchemaTracker.java
 ##
 @@ -0,0 +1,466 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.v3.schema;
+
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.impl.ProjectionFilter;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Computes scan output schema from a variety of sources.
+ * 
+ * The scan operator output schema can be defined or dynamic.
+ *
+ * Defined Schema
+ *
+ * The planner computes a defined schema from metadata, as in a typical
+ * query engine. A defined schema defines the output schema directly:
+ * the defined schema is the output schema. Drill's planner does not
+ * yet support a defined schema, but work is in progress to get there for
+ * some cases.
+ * 
+ * With a defined schema, the reader is given a fully-defined schema and
+ * its job is to produce vectors that match the given schema. (The details
+ * are handled by the {@link ResultSetLoader}.)
+ * 
+ * At present, since the planner does not actually provide a defined schema,
+ * we support it in this class, and verify that the defined schema, if 
provided,
+ * exactly matches the names in the project list in the same order.
+ *
+ * Dynamic Schema
+ *
+ * A dynamic schema is one defined at run time: the traditional Drill approach.
+ * A dynamic schema starts with a projection list : a list of column 
names
+ * without types.
+ * This class converts the project list into a dynamic reader schema which is
+ * a schema in which each column has the type {@code LATE}, which basically 
means
+ * "a type to be named later" by the reader.
+ *
+ * Hybrid Schema
+ *
+ * Some readers support a provided schema, which is an concept similar 
to,
+ * but distinct from, a defined schema. The provided schema provides 
hints
+ * about a schema. At present, it
+ * is an extra; not used or understood by the planner. Thus, the projection
+ * list is independent of the provided schema: the lists may be disjoint.
+ * 
+ * With a provided schema, the project list defines the output schema. If the
+ * provided schema provides projected columns, then the provided schema for 
those
+ * columns flow to the output schema, just as for a defined schema. Similarly, 
the
+ * reader is given a defined schema for those columns.
+ * 
+ * Where a provided schema differs is that the project list can include columns
+ * not in the provided schema, such columns act like the dynamic case: the 
reader
+ * defines the column type.
+ *
+ * Projection Types
+ *
+ * Drill will pass in a project list which is one of three kinds::
+ * 
+ * {@code >SELECT *}: Project all data source columns, whatever they happen
+ * to be. Create columns using names from the data source. The data source
+ * also determines the order of columns within the row.
+ * {@code >SELECT a, b, c, ...}: Project a specific set of columns, 
identified by
+ * case-insensitive name. The output row uses the names from the SELECT list,
+ * but types from the data source. Columns appear in the row in the order
+ * specified by the {@code SELECT}.
+ * SELECT ...}: Project nothing, occurs in {@code >SELECT COUNT(*)}
+ * type queries. The provided projection list contains no (table) columns, 
though
+ * it may contain metadata columns.
+ * 
+ * Names in the project list can reference any of five distinct types of output
+ * columns:
+ * 
+ * Wildcard ("*") column: indicates the place in the projection list to 
insert
+ * the table columns once found in the table projection plan.
+ * Data source columns: columns from the underlying table. The table
+ * projection planner will determine if the column exists, or must be filled
+ * in with a null column.
+ * The generic data source columns array: {@code >columns}, 

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF v2 scan schema resolution

2020-04-11 Thread GitBox
arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF 
v2 scan schema resolution
URL: https://github.com/apache/drill/pull/2051#discussion_r407075244
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/v3/schema/ScanSchemaTracker.java
 ##
 @@ -0,0 +1,466 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.v3.schema;
+
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.impl.ProjectionFilter;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Computes scan output schema from a variety of sources.
+ * 
+ * The scan operator output schema can be defined or dynamic.
+ *
+ * Defined Schema
+ *
+ * The planner computes a defined schema from metadata, as in a typical
+ * query engine. A defined schema defines the output schema directly:
+ * the defined schema is the output schema. Drill's planner does not
+ * yet support a defined schema, but work is in progress to get there for
+ * some cases.
+ * 
+ * With a defined schema, the reader is given a fully-defined schema and
+ * its job is to produce vectors that match the given schema. (The details
+ * are handled by the {@link ResultSetLoader}.)
+ * 
+ * At present, since the planner does not actually provide a defined schema,
+ * we support it in this class, and verify that the defined schema, if 
provided,
+ * exactly matches the names in the project list in the same order.
+ *
+ * Dynamic Schema
+ *
+ * A dynamic schema is one defined at run time: the traditional Drill approach.
+ * A dynamic schema starts with a projection list : a list of column 
names
+ * without types.
+ * This class converts the project list into a dynamic reader schema which is
+ * a schema in which each column has the type {@code LATE}, which basically 
means
+ * "a type to be named later" by the reader.
+ *
+ * Hybrid Schema
+ *
+ * Some readers support a provided schema, which is an concept similar 
to,
+ * but distinct from, a defined schema. The provided schema provides 
hints
+ * about a schema. At present, it
+ * is an extra; not used or understood by the planner. Thus, the projection
+ * list is independent of the provided schema: the lists may be disjoint.
+ * 
+ * With a provided schema, the project list defines the output schema. If the
+ * provided schema provides projected columns, then the provided schema for 
those
+ * columns flow to the output schema, just as for a defined schema. Similarly, 
the
+ * reader is given a defined schema for those columns.
+ * 
+ * Where a provided schema differs is that the project list can include columns
+ * not in the provided schema, such columns act like the dynamic case: the 
reader
+ * defines the column type.
+ *
+ * Projection Types
+ *
+ * Drill will pass in a project list which is one of three kinds::
+ * 
+ * {@code >SELECT *}: Project all data source columns, whatever they happen
+ * to be. Create columns using names from the data source. The data source
+ * also determines the order of columns within the row.
+ * {@code >SELECT a, b, c, ...}: Project a specific set of columns, 
identified by
+ * case-insensitive name. The output row uses the names from the SELECT list,
+ * but types from the data source. Columns appear in the row in the order
+ * specified by the {@code SELECT}.
+ * SELECT ...}: Project nothing, occurs in {@code >SELECT COUNT(*)}
+ * type queries. The provided projection list contains no (table) columns, 
though
+ * it may contain metadata columns.
+ * 
+ * Names in the project list can reference any of five distinct types of output
+ * columns:
+ * 
+ * Wildcard ("*") column: indicates the place in the projection list to 
insert
+ * the table columns once found in the table projection plan.
+ * Data source columns: columns from the underlying table. The table
+ * projection planner will determine if the column exists, or must be filled
+ * in with a null column.
+ * The generic data source columns array: {@code >columns}, 

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF v2 scan schema resolution

2020-04-11 Thread GitBox
arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF 
v2 scan schema resolution
URL: https://github.com/apache/drill/pull/2051#discussion_r407075338
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/v3/schema/ScanSchemaTracker.java
 ##
 @@ -0,0 +1,466 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.v3.schema;
+
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.impl.ProjectionFilter;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Computes scan output schema from a variety of sources.
+ * 
+ * The scan operator output schema can be defined or dynamic.
+ *
+ * Defined Schema
+ *
+ * The planner computes a defined schema from metadata, as in a typical
+ * query engine. A defined schema defines the output schema directly:
+ * the defined schema is the output schema. Drill's planner does not
+ * yet support a defined schema, but work is in progress to get there for
+ * some cases.
+ * 
+ * With a defined schema, the reader is given a fully-defined schema and
+ * its job is to produce vectors that match the given schema. (The details
+ * are handled by the {@link ResultSetLoader}.)
+ * 
+ * At present, since the planner does not actually provide a defined schema,
+ * we support it in this class, and verify that the defined schema, if 
provided,
+ * exactly matches the names in the project list in the same order.
+ *
+ * Dynamic Schema
+ *
+ * A dynamic schema is one defined at run time: the traditional Drill approach.
+ * A dynamic schema starts with a projection list : a list of column 
names
+ * without types.
+ * This class converts the project list into a dynamic reader schema which is
+ * a schema in which each column has the type {@code LATE}, which basically 
means
+ * "a type to be named later" by the reader.
+ *
+ * Hybrid Schema
+ *
+ * Some readers support a provided schema, which is an concept similar 
to,
+ * but distinct from, a defined schema. The provided schema provides 
hints
+ * about a schema. At present, it
+ * is an extra; not used or understood by the planner. Thus, the projection
+ * list is independent of the provided schema: the lists may be disjoint.
+ * 
+ * With a provided schema, the project list defines the output schema. If the
+ * provided schema provides projected columns, then the provided schema for 
those
+ * columns flow to the output schema, just as for a defined schema. Similarly, 
the
+ * reader is given a defined schema for those columns.
+ * 
+ * Where a provided schema differs is that the project list can include columns
+ * not in the provided schema, such columns act like the dynamic case: the 
reader
+ * defines the column type.
+ *
+ * Projection Types
+ *
+ * Drill will pass in a project list which is one of three kinds::
+ * 
+ * {@code >SELECT *}: Project all data source columns, whatever they happen
+ * to be. Create columns using names from the data source. The data source
+ * also determines the order of columns within the row.
+ * {@code >SELECT a, b, c, ...}: Project a specific set of columns, 
identified by
+ * case-insensitive name. The output row uses the names from the SELECT list,
+ * but types from the data source. Columns appear in the row in the order
+ * specified by the {@code SELECT}.
+ * SELECT ...}: Project nothing, occurs in {@code >SELECT COUNT(*)}
+ * type queries. The provided projection list contains no (table) columns, 
though
+ * it may contain metadata columns.
+ * 
+ * Names in the project list can reference any of five distinct types of output
+ * columns:
+ * 
+ * Wildcard ("*") column: indicates the place in the projection list to 
insert
+ * the table columns once found in the table projection plan.
+ * Data source columns: columns from the underlying table. The table
+ * projection planner will determine if the column exists, or must be filled
+ * in with a null column.
+ * The generic data source columns array: {@code >columns}, 

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF v2 scan schema resolution

2020-04-11 Thread GitBox
arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF 
v2 scan schema resolution
URL: https://github.com/apache/drill/pull/2051#discussion_r407074984
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/v3/schema/ScanSchemaTracker.java
 ##
 @@ -0,0 +1,466 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.v3.schema;
+
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.impl.ProjectionFilter;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Computes scan output schema from a variety of sources.
+ * 
+ * The scan operator output schema can be defined or dynamic.
+ *
+ * Defined Schema
+ *
+ * The planner computes a defined schema from metadata, as in a typical
+ * query engine. A defined schema defines the output schema directly:
+ * the defined schema is the output schema. Drill's planner does not
+ * yet support a defined schema, but work is in progress to get there for
+ * some cases.
+ * 
+ * With a defined schema, the reader is given a fully-defined schema and
+ * its job is to produce vectors that match the given schema. (The details
+ * are handled by the {@link ResultSetLoader}.)
+ * 
+ * At present, since the planner does not actually provide a defined schema,
+ * we support it in this class, and verify that the defined schema, if 
provided,
+ * exactly matches the names in the project list in the same order.
+ *
+ * Dynamic Schema
+ *
+ * A dynamic schema is one defined at run time: the traditional Drill approach.
+ * A dynamic schema starts with a projection list : a list of column 
names
+ * without types.
+ * This class converts the project list into a dynamic reader schema which is
+ * a schema in which each column has the type {@code LATE}, which basically 
means
+ * "a type to be named later" by the reader.
+ *
+ * Hybrid Schema
+ *
+ * Some readers support a provided schema, which is an concept similar 
to,
+ * but distinct from, a defined schema. The provided schema provides 
hints
+ * about a schema. At present, it
+ * is an extra; not used or understood by the planner. Thus, the projection
+ * list is independent of the provided schema: the lists may be disjoint.
+ * 
+ * With a provided schema, the project list defines the output schema. If the
+ * provided schema provides projected columns, then the provided schema for 
those
+ * columns flow to the output schema, just as for a defined schema. Similarly, 
the
+ * reader is given a defined schema for those columns.
+ * 
+ * Where a provided schema differs is that the project list can include columns
+ * not in the provided schema, such columns act like the dynamic case: the 
reader
+ * defines the column type.
+ *
+ * Projection Types
+ *
+ * Drill will pass in a project list which is one of three kinds::
 
 Review comment:
   ```suggestion
* Drill will pass in a project list which is one of three kinds:
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF v2 scan schema resolution

2020-04-11 Thread GitBox
arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF 
v2 scan schema resolution
URL: https://github.com/apache/drill/pull/2051#discussion_r407073746
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/v3/schema/ScanSchemaResolver.java
 ##
 @@ -0,0 +1,367 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.v3.schema;
+
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.exceptions.UserException;
+import 
org.apache.drill.exec.physical.impl.scan.v3.schema.DynamicSchemaFilter.DynamicTupleFilter;
+import 
org.apache.drill.exec.physical.impl.scan.v3.schema.MutableTupleMetadata.ColumnHandle;
+import 
org.apache.drill.exec.physical.impl.scan.v3.schema.ScanSchemaTracker.ProjectionType;
+import org.apache.drill.exec.physical.resultSet.impl.ProjectionFilter;
+import 
org.apache.drill.exec.physical.resultSet.impl.ProjectionFilter.ProjResult;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.DynamicColumn;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Resolves a schema against the existing scan schema.
+ * Expands columns by comparing the existing scan schema with
+ * a "revised" (provided or reader) schema, adjusting the scan schema
+ * accordingly. Maps are expanded recursively. Other columns
+ * must match types (concrete columns) or the type must match projection
+ * (for dynamic columns.)
+ * 
+ * Resolves a provided schema against the projection list.
+ * The provided schema can be strict (converts a wildcard into
+ * an explicit projection) or lenient (the reader can add
+ * additional columns to a wildcard.)
+ * Resolves an early reader schema against the projection list
+ * and optional provided schema.
+ * Resolves a reader output schema against a dynamic (projection
+ * list), concreted (provided or prior reader) schema) or combination.
+ * 
+ * 
+ * In practice, the logic is simpler: given a schema (dynamic, concrete
+ * or combination), further resolve the schema using the input schema
+ * provided. Resolve dynamic columns, verify consistency of concrete
+ * columns.
+ * 
+ * Projected columns start as dynamic (no type). Columns
+ * are resolved to a known type as a schema identifies that type.
+ * Subsequent schemas are obligated to use that same type to avoid
+ * an inconsistent schema change downstream.
+ * 
+ * Expands columns by comparing the existing scan schema with
+ * a "revised" (provided or reader) schema, adjusting the scan schema
+ * accordingly. Maps are expanded recursively. Other columns
+ * must match types (concrete columns) or the type must match projection
+ * (for dynamic columns.)
+ * 
+ * A "resolved" projection list is a list of concrete columns: table
+ * columns, nulls, file metadata or partition metadata. An unresolved list
+ * has either table column names, but no match, or a wildcard column.
+ * 
+ * The idea is that the projection list moves through stages of resolution
+ * depending on which information is available. An "early schema" table
+ * provides schema information up front, and so allows fully resolving
+ * the projection list on table open. A "late schema" table allows only a
+ * partially resolved projection list, with the remainder of resolution
+ * happening on the first (or perhaps every) batch.
+ */
+public class ScanSchemaResolver {
+  private static final Logger logger = 
LoggerFactory.getLogger(ScanSchemaResolver.class);
+
+  public enum Mode {
+STRICT_PROVIDED_SCHEMA,
+LENIENT_PROVIDED_SCHEMA,
+EARLY_READER_SCHEMA,
+READER_SCHEMA,
+MISSING_COLS
+  }
+
+  private final MutableTupleMetadata schema;
+  private final Mode mode;
+  private final boolean isProjectAll;
+  private final boolean allowMapAdditions;
+  private final String source;
+  private final CustomErrorContext errorContext;
+
+  public ScanSchemaResolver(MutableTupleMetadata schema, Mode mode,
+  boolean allowMapAdditions,
+  CustomErrorContext errorContext) {
+this.schema = schema;
+

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF v2 scan schema resolution

2020-04-11 Thread GitBox
arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF 
v2 scan schema resolution
URL: https://github.com/apache/drill/pull/2051#discussion_r407074717
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DynamicColumn.java
 ##
 @@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import java.util.Objects;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.MaterializedField;
+
+/**
+ * A dynamic column has a name but not a type. The column may be
+ * a map, array or scalar: we don't yet know. A dynamic column is
+ * the equivalent of an item in a name-only project list. This type
+ * can also represent a wildcard. A dynamic column is not a concrete
+ * data description: it must be resolved to an actual type before
+ * it can be used to create vectors, readers, writers, etc. The
+ * dynamic column allows the tuple metadata to be used to represent
+ * all phases of a schema lifecycle, including Drill's "dynamic"
+ * schema before a reader resolves the column to some actual
+ * type.
+ */
+public class DynamicColumn extends AbstractColumnMetadata {
 
 Review comment:
   We need to ensure that dynamic column can be parsed, this is needed during 
schema serialization / deserialization...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF v2 scan schema resolution

2020-04-11 Thread GitBox
arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF 
v2 scan schema resolution
URL: https://github.com/apache/drill/pull/2051#discussion_r407073237
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/v3/schema/MutableTupleMetadata.java
 ##
 @@ -0,0 +1,220 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.v3.schema;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.drill.common.map.CaseInsensitiveMap;
+import 
org.apache.drill.exec.physical.impl.scan.v3.schema.ScanSchemaTracker.ProjectionType;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.TupleSchema;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+
+/**
+ * A mutable form of a tuple schema. Allows insertions (at the wildcard 
position),
+ * and replacing columns (as the schema becomes resolved). Tracks implicit 
columns
+ * (those not filled in by the reader).
+ * 
+ * Does not implement the {@code TupleMetadata} interface because that 
interface
+ * has far more functionality than is needed here, and assumes that column 
order
+ * remains fixed (and hence columns can be addressed by position) which is not
+ * true for this class.
+ * 
+ * This class represents the top-level tuple (the row.) Maps are also dynamic,
+ * but provide a subset of resolution options:
+ * map fields cannot be implicit. They can, however, be defined,
+ * provided, discovered or missing. Map columns can start unresolved
+ * if the map comes from projection. A map itself can be resolved,
+ * but its members may be unresolved. New map members may only be added at the
+ * end (there is no equivalent of a wildcard position.)
+ */
+public class MutableTupleMetadata {
+
+  /**
+   * Holder for a column to allow inserting and replacing columns within
+   * the top-level project list. Changes to the column within the holder
+   * must go through the tuple itself so we can track schema versions.
+   * 
+   * Tracks the resolution status of each individual column as
+   * described for {@link ScanSchemaTracker}. Models a column throughout the
+   * projection lifecycle. Columns evolve from unresolved to resolved at
+   * different times. Columns are either implicit (defined by the framework)
+   * or normal (defined by the reader). Columns can be defined by the
+   * planner (via a defined schema), partially defined (via a provided
+   * schema), or discovered by the reader. Regardless of the path
+   * to definition, by the time the first batch is delivered downstream,
+   * each column has an output schema which describes the data.
+   */
+  public static class ColumnHandle {
+private ColumnMetadata col;
+private boolean isImplicit;
+
+public ColumnHandle(ColumnMetadata col) {
+  this.col = col;
+  this.isImplicit = SchemaUtils.isImplicit(col);
+}
+
+public String name() {
+  return col.name();
+}
+
+private void replace(ColumnMetadata col) {
+  this.col = col;
+}
+
+private void resolve(ColumnMetadata col) {
+  SchemaUtils.mergeColProperties(this.col, col);
+  this.col = col;
+}
+
+private void resolveImplicit(ColumnMetadata col) {
+  SchemaUtils.mergeColProperties(this.col, col);
+  this.col = col;
+  markImplicit();
+}
+
+public void markImplicit() {
+  Preconditions.checkState(SchemaUtils.isImplicit(col));
+  isImplicit = true;
+}
+
+public ColumnMetadata column() { return col; }
+public boolean isImplicit() { return isImplicit; }
+
+@Override
+public String toString() {
+  return col.toString();
+}
+  }
+
+  protected final List columns = new 
ArrayList<>();
+  protected final Map nameIndex =
+  CaseInsensitiveMap.newHashMap();
+  private ProjectionType projType;
+  private int insertPoint = -1;
+  private int version;
+
+  public void setProjectionType(ScanSchemaTracker.ProjectionType type) {
+this.projType = type;
+  }
+
+  public void setInsertPoint(int insertPoint) {

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF v2 scan schema resolution

2020-04-11 Thread GitBox
arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF 
v2 scan schema resolution
URL: https://github.com/apache/drill/pull/2051#discussion_r407074119
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/v3/file/FileScanUtils.java
 ##
 @@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.v3.file;
+
+import java.util.List;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.rowSet.RowSetTestUtils;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+
+public class FileScanUtils {
+
+  // Default file metadata column names; primarily for testing.
+
+  public static final String FILE_NAME_COL = "filename";
+  public static final String FULLY_QUALIFIED_NAME_COL = "fqn";
+  public static final String FILE_PATH_COL = "filepath";
+  public static final String SUFFIX_COL = "suffix";
+  public static final String PARTITION_COL = "dir";
+
+  public static String partitionColName(int partition) {
+return PARTITION_COL + partition;
+  }
+
+  public static List expandMetadata(int dirCount) {
+List selected = Lists.newArrayList(
 
 Review comment:
   Arrays.asList


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF v2 scan schema resolution

2020-04-11 Thread GitBox
arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF 
v2 scan schema resolution
URL: https://github.com/apache/drill/pull/2051#discussion_r407074521
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/AbstractColumnMetadata.java
 ##
 @@ -295,6 +285,8 @@ public String toString() {
 .toString();
   }
 
+  protected void appendContents(StringBuilder buf) { }
+
   @JsonProperty("type")
   @Override
   public String typeString() {
 
 Review comment:
   Better make this abstract since all columns which implement this, must 
ensure parser can parse string back to type.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF v2 scan schema resolution

2020-04-11 Thread GitBox
arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF 
v2 scan schema resolution
URL: https://github.com/apache/drill/pull/2051#discussion_r407074862
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/TupleSchema.java
 ##
 @@ -161,6 +161,17 @@ public boolean isEquivalent(TupleMetadata other) {
 return true;
   }
 
+  @Override
+  public boolean equals(Object o) {
 
 Review comment:
   Do you override hash code as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF v2 scan schema resolution

2020-04-11 Thread GitBox
arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF 
v2 scan schema resolution
URL: https://github.com/apache/drill/pull/2051#discussion_r407074770
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DynamicColumn.java
 ##
 @@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import java.util.Objects;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.MaterializedField;
+
+/**
+ * A dynamic column has a name but not a type. The column may be
+ * a map, array or scalar: we don't yet know. A dynamic column is
+ * the equivalent of an item in a name-only project list. This type
+ * can also represent a wildcard. A dynamic column is not a concrete
+ * data description: it must be resolved to an actual type before
+ * it can be used to create vectors, readers, writers, etc. The
+ * dynamic column allows the tuple metadata to be used to represent
+ * all phases of a schema lifecycle, including Drill's "dynamic"
+ * schema before a reader resolves the column to some actual
+ * type.
+ */
+public class DynamicColumn extends AbstractColumnMetadata {
+
+  // Same as SchemaPath.DYNAMIC_STAR, but SchemaPath is not visible here.
+  public static final String WILDCARD = "**";
+  public static final DynamicColumn WILDCARD_COLUMN = new 
DynamicColumn(WILDCARD);
+
+  public DynamicColumn(String name) {
+super(name, MinorType.LATE, DataMode.REQUIRED);
+  }
+
+  @Override
+  public StructureType structureType() { return StructureType.DYNAMIC; }
+
+  @Override
+  public boolean isDynamic() { return true; }
+
+  @Override
+  public MaterializedField schema() {
+return MaterializedField.create(name, majorType());
+  }
+
+  @Override
+  public MaterializedField emptySchema() {
+return schema();
+  }
+
+  @Override
+  public ColumnMetadata cloneEmpty() {
+return copy();
+  }
+
+  @Override
+  public ColumnMetadata copy() {
+return new DynamicColumn(name);
+  }
+
+  @Override
+  public boolean equals(Object o) {
 
 Review comment:
   You need to override hash  code as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF v2 scan schema resolution

2020-04-11 Thread GitBox
arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF 
v2 scan schema resolution
URL: https://github.com/apache/drill/pull/2051#discussion_r407073123
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/v3/schema/DynamicSchemaFilter.java
 ##
 @@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.v3.schema;
+
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import 
org.apache.drill.exec.physical.impl.scan.v3.schema.MutableTupleMetadata.ColumnHandle;
+import 
org.apache.drill.exec.physical.impl.scan.v3.schema.ScanSchemaTracker.ProjectionType;
+import org.apache.drill.exec.physical.resultSet.impl.ProjectionFilter;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.DynamicColumn;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Projection filter based on the scan schema which typically starts as fully
+ * dynamic, then becomes more concrete as the scan progresses. Enforces that
+ * projected columns must be consistent with either projection, or the existing
+ * concrete schema for that columns.
+ */
+public abstract class DynamicSchemaFilter implements ProjectionFilter {
+
+  public enum NewColumnsMode { NONE, ALL, CHILD_ONLY }
 
 Review comment:
   Could you please describe what modes mean?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF v2 scan schema resolution

2020-04-11 Thread GitBox
arina-ielchiieva commented on a change in pull request #2051: DRILL-7696: EVF 
v2 scan schema resolution
URL: https://github.com/apache/drill/pull/2051#discussion_r407073811
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/v3/schema/ScanSchemaResolver.java
 ##
 @@ -0,0 +1,367 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.v3.schema;
+
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.exceptions.UserException;
+import 
org.apache.drill.exec.physical.impl.scan.v3.schema.DynamicSchemaFilter.DynamicTupleFilter;
+import 
org.apache.drill.exec.physical.impl.scan.v3.schema.MutableTupleMetadata.ColumnHandle;
+import 
org.apache.drill.exec.physical.impl.scan.v3.schema.ScanSchemaTracker.ProjectionType;
+import org.apache.drill.exec.physical.resultSet.impl.ProjectionFilter;
+import 
org.apache.drill.exec.physical.resultSet.impl.ProjectionFilter.ProjResult;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.DynamicColumn;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Resolves a schema against the existing scan schema.
+ * Expands columns by comparing the existing scan schema with
+ * a "revised" (provided or reader) schema, adjusting the scan schema
+ * accordingly. Maps are expanded recursively. Other columns
+ * must match types (concrete columns) or the type must match projection
+ * (for dynamic columns.)
+ * 
+ * Resolves a provided schema against the projection list.
+ * The provided schema can be strict (converts a wildcard into
+ * an explicit projection) or lenient (the reader can add
+ * additional columns to a wildcard.)
+ * Resolves an early reader schema against the projection list
+ * and optional provided schema.
+ * Resolves a reader output schema against a dynamic (projection
+ * list), concreted (provided or prior reader) schema) or combination.
+ * 
+ * 
+ * In practice, the logic is simpler: given a schema (dynamic, concrete
+ * or combination), further resolve the schema using the input schema
+ * provided. Resolve dynamic columns, verify consistency of concrete
+ * columns.
+ * 
+ * Projected columns start as dynamic (no type). Columns
+ * are resolved to a known type as a schema identifies that type.
+ * Subsequent schemas are obligated to use that same type to avoid
+ * an inconsistent schema change downstream.
+ * 
+ * Expands columns by comparing the existing scan schema with
+ * a "revised" (provided or reader) schema, adjusting the scan schema
+ * accordingly. Maps are expanded recursively. Other columns
+ * must match types (concrete columns) or the type must match projection
+ * (for dynamic columns.)
+ * 
+ * A "resolved" projection list is a list of concrete columns: table
+ * columns, nulls, file metadata or partition metadata. An unresolved list
+ * has either table column names, but no match, or a wildcard column.
+ * 
+ * The idea is that the projection list moves through stages of resolution
+ * depending on which information is available. An "early schema" table
+ * provides schema information up front, and so allows fully resolving
+ * the projection list on table open. A "late schema" table allows only a
+ * partially resolved projection list, with the remainder of resolution
+ * happening on the first (or perhaps every) batch.
+ */
+public class ScanSchemaResolver {
+  private static final Logger logger = 
LoggerFactory.getLogger(ScanSchemaResolver.class);
+
+  public enum Mode {
+STRICT_PROVIDED_SCHEMA,
+LENIENT_PROVIDED_SCHEMA,
+EARLY_READER_SCHEMA,
+READER_SCHEMA,
+MISSING_COLS
+  }
+
+  private final MutableTupleMetadata schema;
+  private final Mode mode;
+  private final boolean isProjectAll;
+  private final boolean allowMapAdditions;
+  private final String source;
+  private final CustomErrorContext errorContext;
+
+  public ScanSchemaResolver(MutableTupleMetadata schema, Mode mode,
+  boolean allowMapAdditions,
+  CustomErrorContext errorContext) {
+this.schema = schema;
+

[GitHub] [drill] arina-ielchiieva commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader

2020-04-11 Thread GitBox
arina-ielchiieva commented on issue #2045: DRILL-7683: Add "message parsing" to 
new JSON loader
URL: https://github.com/apache/drill/pull/2045#issuecomment-612423451
 
 
   Changes look ok, but I don't think we should do this since PR has been 
already reviewed.
   I will send follow-up email with the proposal how we can speed up merge 
process.
   
   +1, LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on issue #2052: DRILL-7603 and DRILL-7604: Add schema, options to REST query

2020-04-11 Thread GitBox
arina-ielchiieva commented on issue #2052: DRILL-7603 and DRILL-7604: Add 
schema, options to REST query
URL: https://github.com/apache/drill/pull/2052#issuecomment-612420174
 
 
   @paul-rogers changes look good but I have two questions:
   1. the screenshot in the PR title does not show how options will be 
described from WEB UI. Looks like this part of the implementation is missing. I 
think since we allow passing options through REST API to have ability to set 
them on query page.
   2. When default schema was added, case when user was able to return query 
from Profiles was missed (see 
https://issues.apache.org/jira/browse/DRILL-7655). Should be this covered in 
your changes as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2052: DRILL-7603 and DRILL-7604: Add schema, options to REST query

2020-04-11 Thread GitBox
arina-ielchiieva commented on a change in pull request #2052: DRILL-7603 and 
DRILL-7604: Add schema, options to REST query
URL: https://github.com/apache/drill/pull/2052#discussion_r407062664
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/OptionValue.java
 ##
 @@ -146,6 +164,40 @@ public static OptionValue create(AccessibleScopes type, 
String name, Object val,
 throw new IllegalArgumentException(String.format("Unsupported type %s", 
val.getClass()));
   }
 
+  private static OptionValue fromString(AccessibleScopes type, String name,
+  String val, OptionScope scope, Kind kind) {
+val = val.trim();
 
 Review comment:
   We don't expect value to be null? Correct?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2052: DRILL-7603 and DRILL-7604: Add schema, options to REST query

2020-04-11 Thread GitBox
arina-ielchiieva commented on a change in pull request #2052: DRILL-7603 and 
DRILL-7604: Add schema, options to REST query
URL: https://github.com/apache/drill/pull/2052#discussion_r407062641
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/OptionValue.java
 ##
 @@ -146,6 +164,40 @@ public static OptionValue create(AccessibleScopes type, 
String name, Object val,
 throw new IllegalArgumentException(String.format("Unsupported type %s", 
val.getClass()));
   }
 
+  private static OptionValue fromString(AccessibleScopes type, String name,
+  String val, OptionScope scope, Kind kind) {
+val = val.trim();
+try {
+  switch (kind) {
+  case BOOLEAN: {
 
 Review comment:
   Please format case inside the switch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on issue #2050: DRILL-7694: Register drill.queries.* counter metrics on Drillbit startup

2020-04-11 Thread GitBox
arina-ielchiieva commented on issue #2050: DRILL-7694: Register drill.queries.* 
counter metrics on Drillbit startup
URL: https://github.com/apache/drill/pull/2050#issuecomment-612414695
 
 
   +1, LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services