[jira] [Created] (DRILL-6854) Profiles page should provide more insights on parquet statistics for complex reader

2018-11-14 Thread Nitin Sharma (JIRA)
Nitin Sharma created DRILL-6854:
---

 Summary: Profiles page should provide more insights on parquet 
statistics for complex reader
 Key: DRILL-6854
 URL: https://issues.apache.org/jira/browse/DRILL-6854
 Project: Apache Drill
  Issue Type: Bug
Reporter: Nitin Sharma


Profiles page should provide more insights on parquet statistics for complex 
reader.

E.g. For plain reader the operator metrics are good. For complex reader, 
operator metrics are always empty. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6853) Parquet Complex Reader for nested schema should have configurable memory or max records to fetch

2018-11-14 Thread Nitin Sharma (JIRA)
Nitin Sharma created DRILL-6853:
---

 Summary: Parquet Complex Reader for nested schema should have 
configurable memory or max records to fetch
 Key: DRILL-6853
 URL: https://issues.apache.org/jira/browse/DRILL-6853
 Project: Apache Drill
  Issue Type: Bug
Reporter: Nitin Sharma


Parquet Complex reader while fetching nested schema should have configurable 
memory or max records to fetch and not default to 4000 records.

While scanning TB of data with wider columns, this could easily cause OOM 
issues. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] kkhatua commented on issue #1539: DRILL-6847: Add Query Metadata to RESTful Interface

2018-11-14 Thread GitBox
kkhatua commented on issue #1539: DRILL-6847: Add Query Metadata to RESTful 
Interface
URL: https://github.com/apache/drill/pull/1539#issuecomment-438860831
 
 
   @cgivre  I didn't build and try this out, but I'm curious on how we manage 
for `select * from` queries. Especially with schema changes... like fields 
being added or dropped between rows.
   Also, I agree with @arina-ielchiieva 's point of not repeating the field 
names (unless schema change requires it). But I'm not sure how badly we break 
backward compatibility (perhaps carry a `version` in the REST response?). 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [DISCUSS] Resurrect support for Table Statistics in Drill

2018-11-14 Thread Vitalii Diravka
You have described the similar design, which Drill Metastore are going to
provide.
I'm talking about API with put and get statistics and different storages
for Metadata, aka Metastore Plugins.
Since the project is in progress it is possible to determine the API for
storing and obtaining stats and also the way how it can be stored in
metadata cache files.


Kind regards
Vitalii


On Sat, Nov 10, 2018 at 1:03 PM Paul Rogers 
wrote:

> Hi Gautam,
>
> You touched on the key issue: storage. You mention that the Drill stats
> implementation learned from Oracle. Very wise: Oracle is the clear expert
> in this space.
>
> There is a very important difference, however, between Drill and Oracle.
> Oracle is a complete database including both query engine and storage.
> Drill is a query engine only. This is the issue at the heart of our
> discussion.
>
> Oracle has a tabular storage engine for relational data. Oracle uses that
> storage engine for metadata and stats. This ensures that metadata and stats
> benefit from concurrency control, transactions, crash recovery (i.e. roll
> forward/roll back), backup and so.
>
> Drill's equivalents are. . . (crickets.)
>
> Drill is a query engine that sits atop the storage engine of your choice.
> That is what sets Drill apart from Impala and Hive which are tightly
> coupled to HDFS, HMS, Ranger/Sentry, etc. (Spark takes a similar position
> to Drill: Spark runs on anything and has no storage, other than shuffle
> files.)
>
> As a query engine, Drill should compute stats, as you suggested. But, when
> it comes to STORING stats, Drill has nothing to say, nor should it.
>
> We currently use a broken implementation for Parquet metadata. We write
> files into the data directory (destroying directory update timestamps),
> across multiple files, with no concurrency control, no versioning, no crash
> recovery, no nothing. Run a query concurrently with Parquet metadata
> collection: things get corrupted. Run two Parquet metadata updates, things
> get really corrupted. Why? Storage is hard to get right when doing
> concurrent access and update.
>
> This is not a foundation on which to build! Oracle would not survive a day
> if it corrupted system tables when two or more users did operations at the
> same time.
>
> OK, Drill has a problem. The first step is to acknowledge it. The next is
> to look for solutions.
>
> Either Drill adds a storage engine, or it stays agnostic, leaves storage
> to an external system, and makes stats storage a plugin. Drill already
> accesses data via a plugin. This is why Drill can read HDFS, S3, Aluxio,
> Kafka, JDBC, and on and on. This is a valuable, differentiating feature. It
> is, in fact, why Drill has a place in a world dominated by Hive, Spark and
> Impala.
>
> For stats, this means that Drill does the query engine part (gather stats
> on the one hand, and consume stats for planning on the other.) But, it
> means that Drill DOES NOT attempt to store the stats. Drill relies on an
> external system for that role.
>
> Here is where the stats discussion aligns with the metadata (table schema)
> discussion. There are many ways to store metadata (including stats). In a
> RDBMS, in HMS, in files (done with MVCC or other concurrency control), in a
> key/value store and so on. All of these are more robust than the broken
> Parquet metadata file implementation.
>
> So, if stats are to be stored by an external storage system, that means
> that Drill's focus should be on APIs: how to obtain the stats from Drill to
> store them, and how to return them to Drill when requested when planning a
> query. This is exactly the same model we take with data (Drill gives data
> to HDFS to store, asks HDFS for the location of the data during planning.)
>
> This is the reason I suggested gathering stats as a query: you need add no
> new API: just issue a query using the existing Drill client. As you point
> out, perhaps Drill is in a better position to decide what stats should be
> gathered. Point taken. So, instead of using a query, define a stats API
> with both "put" and "get" interfaces.
>
> Then, of course, you can certainly create a POC implementation of the
> storage engine based on the broken Parquet metadata file format. Since it
> is just a reference implementation, the fragility of the solution can be
> forgiven.
>
> This is a very complex topic, and touches on Drill's place in the open
> source query engine world. Thanks much for having the patience to discuss
> the issues here on the dev list.
>
> What do other people think about the storage question? Is the plugin
> approach the right one? Is there some other alternative the project should
> consider? Should Drill build its own?
>
> Thanks,
> - Paul
>
>
>
> On Friday, November 9, 2018, 3:11:11 PM PST, Gautam Parai <
> gpa...@mapr.com> wrote:
>
>  Hi Paul,
>
> ...


[GitHub] cgivre closed pull request #1541: Update 041-logfile-plugin.md

2018-11-14 Thread GitBox
cgivre closed pull request #1541: Update 041-logfile-plugin.md
URL: https://github.com/apache/drill/pull/1541
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on issue #1541: Update 041-logfile-plugin.md

2018-11-14 Thread GitBox
arina-ielchiieva commented on issue #1541: Update 041-logfile-plugin.md
URL: https://github.com/apache/drill/pull/1541#issuecomment-438770446
 
 
   As far as I remember when you delete remote branch PR will close 
automatically. if not please close it manually and create new from proper 
branch.
   Updating the doc is ok, just don't create branches in Drill project (as a 
committer you do have write permission to the repo and should be careful with 
it), create branches in your own fork and then do the PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cgivre commented on issue #1541: Update 041-logfile-plugin.md

2018-11-14 Thread GitBox
cgivre commented on issue #1541: Update 041-logfile-plugin.md
URL: https://github.com/apache/drill/pull/1541#issuecomment-438769201
 
 
   Hi Arina, 
   These files don't seem to come across in forks.  Should I just close the PR? 
 This is the second or third time this has happened to someone, so I'd like to 
make sure that the docs are correct.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6852) Adapt current Parquet Metadata cache implementation to use Drill Metastore API

2018-11-14 Thread Volodymyr Vysotskyi (JIRA)
Volodymyr Vysotskyi created DRILL-6852:
--

 Summary: Adapt current Parquet Metadata cache implementation to 
use Drill Metastore API
 Key: DRILL-6852
 URL: https://issues.apache.org/jira/browse/DRILL-6852
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Volodymyr Vysotskyi
Assignee: Volodymyr Vysotskyi


According to the design document for DRILL-6552, existing metadata cache API 
should be adapted to use generalized API for metastore and parquet metadata 
cache will be presented as the implementation of metastore API.

The aim of this Jira is to refactor Parquet Metadata cache implementation and 
adapt it to use Drill Metastore API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] arina-ielchiieva commented on issue #1541: Update 041-logfile-plugin.md

2018-11-14 Thread GitBox
arina-ielchiieva commented on issue #1541: Update 041-logfile-plugin.md
URL: https://github.com/apache/drill/pull/1541#issuecomment-438752291
 
 
   @cgivre you have created `cgivre-logfile-doc-update-1` branch in Drill 
project, not in your fork. Please remove the branch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [Agenda] Drill developer meetup 2018

2018-11-14 Thread Hanumanth Maduri


Hello Drillers,

Here is the webex link for remote attendees.
Remote attendees can join at 
https://mapr.webex.com/mapr/j.phpMTID=ma05d8b5406acdb6292d5b81c79240a38

Thanks


> On Nov 2, 2018, at 11:25 AM, Abhishek Girish  wrote:
> 
> Charles, I'm sure we'll have a link for remote folks to join - will share
> it closer to the day.
> 
>> On Thu, Nov 1, 2018 at 1:58 PM hanu mapr  wrote:
>> 
>> Hello All,
>> 
>> There was typo for the year in the mail. It should be 2018 instead of 2019.
>> Thanks Aman for correcting it.
>> 
>> Regards,
>> -Hanu
>> 
>>> On Thu, Nov 1, 2018 at 6:30 AM Charles Givre  wrote:
>>> 
>>> Hi Hanumath,
>>> This looks great!!  Will you be streaming the event for those of us not
>> in
>>> the Bay Area?
>>> Thx,
>>> — C
>>> 
 On Nov 1, 2018, at 00:10, Hanumath Rao Maduri 
>>> wrote:
 
 Drill Developers,
 
 
 I am quite excited to announce the details of the Drill developers day
 2018. I have consolidated the topics from our earlier discussions and
 prioritized them according to the votes.
 
 
 MapR has offered to host it on Nov 14th in Training room downstairs.
 
 
 Here is the exact location
 
 
 Training Room at
 
 4555 Great America Pkwy, Suite 201, Santa Clara, CA, 95054.
 
 
 Please find the agenda for the meetup.
 
 
 
 *Lunch starts at 12:00PM.*
 
 
 *[12:25 - 12:40] Welcome *
 
  - Recap on last year's activities
  - Preview of this year's focus
 
 *[12:40 - 1:00] Storage plugins*
 
 
 
  - Adding new storage plugins for the following:
 - Netflix Iceberg, Kudu(some code already exists), Cassandra,
 Elasticsearch, Carbondata, ORC/XML file formats, Spark
 RDD/DataFrames/Datasets, Graph databases & more
  - Improving documentation related to Storage plugins
 
 
 *[1:00 - 1:45] Schema discovery & Evolution*
 
 
 
  - Creation, management of schema
  - Handling schema changes in certain common cases
  - Handling NULL values elegantly
  - Schema learning (similar to MSGpack plugin)
  - Query hints
 
 *[1:45 - 2:30] Metadata Management*
 
 
 
  - Defining an abstraction layer for various types of metadata: views,
  schema, statistics, security
  - Underlying storage for metadata: what are the options and their
  trade-offs?
  - Hive metastore
  - Parquet metadata cache (parquet specific for row group metadata)
  - Ease of using the parquet files generated by other engines (like
>>> spark)
 
 
 *[2:30 - 2:45] Break*
 
 
 *[2:45 - 4:00] Resource management*
 
 
 
  - Resource limits per query
  - Optimal memory assignment for blocking operators based on stats
  - Enhancing the blocking and exchange operators to live within memory
  limits
  - Aligning with admission control/queueing (YARN concepts)
  - Query scheduling based on queues using tagging and costing
  - Drill on kubernetes
 
 
 *[4:00 - 4:20] Apache Arrow*
 
  - Benefits of integrating Apache Drill with Apache Arrow
  - Possible trade-offs & implementation hurdles
 
 *[4:20 - 4:40] **Performance Improvements*
 
  - Efficient handling of Broadcast/Semi/Anti Semi join
  - Drill Statistics handling
  - Optimizing complex Parquet reader
 
 Thanks,
 -Hanu
>>> 
>>> 
>> 


[GitHub] vvysotskyi commented on issue #1524: DRILL-6830: Remove Hook.REL_BUILDER_SIMPLIFY handler after use

2018-11-14 Thread GitBox
vvysotskyi commented on issue #1524: DRILL-6830: Remove 
Hook.REL_BUILDER_SIMPLIFY handler after use
URL: https://github.com/apache/drill/pull/1524#issuecomment-438724649
 
 
   @lushuifeng, you are right that the problem connected with 
`TestCaseNullableTypes#testCaseNullableTypesVarchar` is in Calcite. The initial 
goal for adding `Hook.REL_BUILDER_SIMPLIFY.add(Hook.propertyJ(false));` was to 
avoid failures connected with treating empty strings as null feature, but looks 
like it was fixed in another place.
   
   The interesting thing is that problem connected with 
`TestCaseNullableTypes#testCaseNullableTypesVarchar` failure was fixed after 
1.17 version, so when Drill is rebased onto Calcite 1.18, 
`Hook.REL_BUILDER_SIMPLIFY.add(Hook.propertyJ(false));` may be removed.
   
   @ihuzenko, since you are working on Calcite rebase, could you please also 
take care of it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6851) Create Drill Operator for Kubernetes

2018-11-14 Thread Abhishek Girish (JIRA)
Abhishek Girish created DRILL-6851:
--

 Summary: Create Drill Operator for Kubernetes
 Key: DRILL-6851
 URL: https://issues.apache.org/jira/browse/DRILL-6851
 Project: Apache Drill
  Issue Type: Task
Reporter: Abhishek Girish
Assignee: Abhishek Girish


This task is to track creating an initial version of the Drill Operator for 
Kubernetes. I'll shortly update the JIRA on background, details on Operator, 
and what's planned for the first version. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] cgivre opened a new pull request #1541: Update 041-logfile-plugin.md

2018-11-14 Thread GitBox
cgivre opened a new pull request #1541: Update 041-logfile-plugin.md
URL: https://github.com/apache/drill/pull/1541
 
 
   The documentation for the logfile plugin reflected a previous version. This 
is the correct documentation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6850) JDBC integration tests

2018-11-14 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-6850:
--

 Summary: JDBC integration tests
 Key: DRILL-6850
 URL: https://issues.apache.org/jira/browse/DRILL-6850
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JDBC
Affects Versions: 1.14.0
Reporter: Vitalii Diravka
 Fix For: 1.15.0


The following command will run Drill integratiom tests for RDBMS (Derby and 
MySQL):
_mvn integration-test failsafe:integration-test -pl contrib/storage-jdbc_

Currently some drill/exec/store/jdbc TestJdbcPluginWithDerbyIT and 
TestJdbcPluginWithMySQLIT tests fail:
{code}
Results :

Failed tests: 
  TestJdbcPluginWithDerbyIT.showTablesDefaultSchema:117 expected:<1> but was:<0>
Tests in error: 
  TestJdbcPluginWithDerbyIT.describe » UserRemote VALIDATION ERROR: Unknown 
tabl...
  
TestJdbcPluginWithDerbyIT.pushdownDoubleJoinAndFilter:111->PlanTestBase.testPlanMatchingPatterns:84->PlanTestBase.testPlanMatchingPatterns:89->PlanTestBase.getPlanInString:369->BaseTestQuery.testSqlWithResults:322->BaseTestQuery.testRunAndReturn:341
 » Rpc
  TestJdbcPluginWithDerbyIT.testCrossSourceMultiFragmentJoin » UserRemote 
VALIDA...
  TestJdbcPluginWithDerbyIT.validateResult:71 »  at position 0 column 
'`NUMERIC_...
  TestJdbcPluginWithMySQLIT.validateResult:108 »  at position 0 column 
'`numeric...

Tests run: 14, Failures: 1, Errors: 5, Skipped: 0
{code} 

Most likely these are old regressions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] arina-ielchiieva commented on issue #1537: DRILL-6744: Support varchar and decimal push down

2018-11-14 Thread GitBox
arina-ielchiieva commented on issue #1537: DRILL-6744: Support varchar and 
decimal push down
URL: https://github.com/apache/drill/pull/1537#issuecomment-438675122
 
 
   @vvysotskyi addressed code review comments:
   1. Used VersionUtil from Hadoop lib.
   2. Made ParquetReaderConfig immutable.
   3. Added trace login instead of removing default in switch.
   4. Made other changes as requested.
   
   Thanks for the code review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan projection framework

2018-11-14 Thread GitBox
arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233454765
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/ScanTestUtils.java
 ##
 @@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.TupleSchema;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+public class ScanTestUtils {
+
+  /**
+   * Type-safe way to define a list of parsers.
+   * @param parsers
+   * @return
+   */
+
+  public static List parsers(ScanProjectionParser... 
parsers) {
+return ImmutableList.copyOf(parsers);
 
 Review comment:
   Consider using java built-in utils rather than guava: here and below.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan projection framework

2018-11-14 Thread GitBox
arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233454585
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/ScanTestUtils.java
 ##
 @@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.TupleSchema;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+public class ScanTestUtils {
+
+  /**
+   * Type-safe way to define a list of parsers.
+   * @param parsers
+   * @return
 
 Review comment:
   add description


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan projection framework

2018-11-14 Thread GitBox
arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233450250
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/MetadataManager.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection.ScanProjectionParser;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaLevelProjection.SchemaProjectionResolver;
+import org.apache.drill.exec.physical.rowSet.ResultVectorCache;
+
+/**
+ * Queries can contain a wildcard (*), table columns, or special
+ * system-defined columns (the file metadata columns AKA implicit
+ * columns, the `columns` column of CSV, etc.).
+ * 
+ * This class provides a generalized way of handling such extended
+ * columns. That is, this handles metadata for columns defined by
+ * the scan or file; columns defined by the table (the actual
+ * data metadata) is handled elsewhere.
+ * 
+ * Objects of this interface are driven by the projection processing
+ * framework which provides a vector cache from which to obtain
+ * materialized columns. The implementation must provide a projection
+ * parser to pick out the columns which this object handles.
+ * 
+ * A better name might be ImplicitMetadataManager to signify that
 
 Review comment:
   Agree, let's rename to avoid the confusion in future.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan projection framework

2018-11-14 Thread GitBox
arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233455585
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/project/TestNullColumnLoader.java
 ##
 @@ -0,0 +1,329 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import static org.junit.Assert.assertSame;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.project.NullColumnBuilder;
+import org.apache.drill.exec.physical.impl.scan.project.NullColumnLoader;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedNullColumn;
+import org.apache.drill.exec.physical.rowSet.ResultVectorCache;
+import org.apache.drill.exec.physical.rowSet.impl.NullResultVectorCacheImpl;
+import org.apache.drill.exec.physical.rowSet.impl.ResultVectorCacheImpl;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.test.rowSet.schema.SchemaBuilder;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.junit.Test;
+
+/**
+ * Test the mechanism that handles all-null columns during projection.
+ * An all-null column is one projected in the query, but which does
+ * not actually exist in the underlying data source (or input
+ * operator.)
+ * 
+ * In anticipation of having type information, this mechanism
+ * can create the classic nullable Int null column, or one of
+ * any other type and mode.
+ */
+
+public class TestNullColumnLoader extends SubOperatorTest {
+
+  private ResolvedNullColumn makeNullCol(String name, MajorType nullType) {
+
+// For this test, we don't need the projection, so just
+// set it to null.
+
+return new ResolvedNullColumn(name, nullType, null, 0);
+  }
+
+  private ResolvedNullColumn makeNullCol(String name) {
+return makeNullCol(name, null);
+  }
+
+  /**
+   * Test the simplest case: default null type, nothing in the vector
+   * cache. Specify no column type, the special NULL type, or a
+   * predefined type. Output types should be set accordingly.
+   */
+
+  @Test
+  public void testBasics() {
+
+final List defns = new ArrayList<>();
+defns.add(makeNullCol("unspecified", null));
+defns.add(makeNullCol("nullType", Types.optional(MinorType.NULL)));
+defns.add(makeNullCol("specifiedOpt", Types.optional(MinorType.VARCHAR)));
+defns.add(makeNullCol("specifiedReq", Types.required(MinorType.VARCHAR)));
+defns.add(makeNullCol("specifiedArray", 
Types.repeated(MinorType.VARCHAR)));
+
+final ResultVectorCache cache = new 
NullResultVectorCacheImpl(fixture.allocator());
+final NullColumnLoader staticLoader = new NullColumnLoader(cache, defns, 
null, false);
+
+// Create a batch
+
+final VectorContainer output = staticLoader.load(2);
+
+// Verify values and types
+
+final BatchSchema expectedSchema = new SchemaBuilder()
+.add("unspecified", NullColumnLoader.DEFAULT_NULL_TYPE)
+.add("nullType", NullColumnLoader.DEFAULT_NULL_TYPE)
+.addNullable("specifiedOpt", MinorType.VARCHAR)
+.addNullable("specifiedReq", MinorType.VARCHAR)
+.addArray("specifiedArray", MinorType.VARCHAR)
+.build();
+final SingleRowSet expected = fixture.rowSetBuilder(expectedSchema)
+.addRow(null, null, null, null, new String[] {})
+.addRow(null, null, null, null, new String[] {})
+.build();
+
+new RowSetComparison(expected)
+.verifyAndClearAll(fixture.wrap(output));
+staticLoader.close();
+  }
+
+  /**
+   * Test the ability to use a type other than nullable INT for null
+   * columns. This occurs, for example, in the CSV reader where no
+  

[GitHub] arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan projection framework

2018-11-14 Thread GitBox
arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233451408
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/ResolvedColumn.java
 ##
 @@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import org.apache.drill.exec.record.MaterializedField;
+
+/**
+ * A resolved column has a name, and a specification for how to project
+ * data from a source vector to a vector in the final output container.
+ * Describes the projection of a single column from
+ * an input to an output batch.
+ * 
+ * Although the table schema mechanism uses the newer "metadata"
+ * mechanism, resolved columns revert back to the original
+ * {@link MajorType} and {@link MaterializedField} mechanism used
+ * by the rest of Drill. Doing so loses a bit of additional
+ * information, but at present there is no way to export that information
+ * along with a serialized record batch; each operator must rediscover
+ * it after deserialization.
+ */
+
+public abstract class ResolvedColumn implements ColumnProjection {
+
+  public final VectorSource source;
+  public final int sourceIndex;
+
+  public ResolvedColumn(VectorSource source, int sourceIndex) {
+this.source = source;
+this.sourceIndex = sourceIndex;
+  }
+
+  public VectorSource source() { return source; }
+
+  public int sourceIndex() { return sourceIndex; }
+
+  /**
+   * Return the type of this column. Used primarily by the schema smoothing
+   * mechanism.
+   *
+   * @return
 
 Review comment:
   Move description here to avoid warning.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan projection framework

2018-11-14 Thread GitBox
arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233452637
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/ResolvedTuple.java
 ##
 @@ -0,0 +1,427 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.rowSet.ResultVectorCache;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.vector.UInt4Vector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.AbstractMapVector;
+import org.apache.drill.exec.vector.complex.MapVector;
+import org.apache.drill.exec.vector.complex.RepeatedMapVector;
+
+import 
org.apache.drill.shaded.guava.com.google.common.annotations.VisibleForTesting;
+
+/**
+ * Drill rows are made up of a tree of tuples, with the row being the root
+ * tuple. Each tuple contains columns, some of which may be maps. This
+ * class represents each row or map in the output projection.
+ * 
+ * Output columns within the tuple can be projected from the data source,
+ * might be null (requested columns that don't match a data source column)
+ * or might be a constant (such as an implicit column.) This class
+ * orchestrates assembling an output tuple from a collection of these
+ * three column types. (Though implicit columns appear only in the root
+ * tuple.)
+ *
+ * Null Handling
+ *
+ * The project list might reference a "missing" map if the project list
+ * includes, say, SELECT a.b.c but `a` does not exist
+ * in the data source. In this case, the column a is implied to be a map,
+ * so the projection mechanism will create a null map for `a`
+ * and `b`, and will create a null column for `c`.
+ * 
+ * To accomplish this recursive null processing, each tuple is associated
+ * with a null builder. (The null builder can be null if projection is
+ * implicit with a wildcard; in such a case no null columns can occur.
+ * But, even here, with schema persistence, a SELECT * query
+ * may need null columns if a second file does not contain a column
+ * that appeared in a first file.)
+ * 
+ * The null builder is bound to each tuple to allow vector persistence
+ * via the result vector cache. If we must create a null column
+ * `x` in two different readers, then the rules of Drill
+ * require that the same vector be used for both (or else a schema
+ * change is signaled.) The vector cache works by name (and type).
+ * Since maps may contain columns with the same names as other maps,
+ * the vector cache must be associated with each tuple. And, by extension,
+ * the null builder must also be associated with each tuple.
+ *
+ * Lifecycle
+ *
+ * The lifecycle of a resolved tuple is:
+ * 
+ * The projection mechanism creates the output tuple, and its columns,
+ * by comparing the project list against the table schema. The result is
+ * a set of table, null, or constant columns.
+ * Once per schema change, the resolved tuple creates the output
+ * tuple by linking to vectors in their original locations. As it turns out,
+ * we can simply share the vectors; we don't need to transfer the buffers.
+ * To prepare for the transfer, the tuple asks the null column builder
+ * (if present) to build the required null columns.
+ * Once the output tuple is built, it can be used for any number of
+ * batches without further work. (The same vectors appear in the various inputs
+ * and the output, eliminating the need for any transfers.)
+ * Once per batch, the client must set the row count. This is needed for 
the
+ * output container, and for any "null" maps that the project may have 
created.
+ * 
+ *
+ * Projection Mapping
+ *
+ * Each column is is mapped into the output tuple (vector container or map) in
+ * the order that the columns are defined here. (That order follows the project
+ * list f

[GitHub] arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan projection framework

2018-11-14 Thread GitBox
arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233458212
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/project/TestSchemaSmoothing.java
 ##
 @@ -0,0 +1,681 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.impl.protocol.SchemaTracker;
+import org.apache.drill.exec.physical.impl.scan.ScanTestUtils;
+import org.apache.drill.exec.physical.impl.scan.project.NullColumnBuilder;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import 
org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple.ResolvedRow;
+import org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection;
+import org.apache.drill.exec.physical.impl.scan.project.ScanSchemaOrchestrator;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanSchemaOrchestrator.ReaderSchemaOrchestrator;
+import org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother.IncompatibleSchemaException;
+import org.apache.drill.exec.physical.impl.scan.project.SmoothingProjection;
+import 
org.apache.drill.exec.physical.impl.scan.project.WildcardSchemaProjection;
+import org.apache.drill.exec.physical.rowSet.ResultSetLoader;
+import org.apache.drill.exec.physical.rowSet.impl.RowSetTestUtils;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.test.rowSet.schema.SchemaBuilder;
+import org.junit.Test;
+
+/**
+ * Tests schema smoothing at the schema projection level.
+ * This level handles reusing prior types when filling null
+ * values. But, because no actual vectors are involved, it
+ * does not handle the schema chosen for a table ahead of
+ * time, only the schema as it is merged with prior schema to
+ * detect missing columns.
+ * 
+ * Focuses on the SmoothingProjection class itself.
+ * 
+ * Note that, at present, schema smoothing does not work for entire
+ * maps. That is, if file 1 has, say {a: {b: 10, c: "foo"}}
+ * and file 2 has, say, {a: null}, then schema smoothing does
+ * not currently know how to recreate the map. The same is true of
+ * lists and unions. Handling such cases is complex and is probably
+ * better handled via a system that allows the user to specify their
+ * intent by providing a schema to apply to the two files.
+ */
+
+public class TestSchemaSmoothing extends SubOperatorTest {
+
+  /**
+   * Low-level test of the smoothing projection, including the exceptions
+   * it throws when things are not going its way.
+   */
+
+  @Test
+  public void testSmoothingProjection() {
+final ScanLevelProjection scanProj = new ScanLevelProjection(
+RowSetTestUtils.projectAll(),
+ScanTestUtils.parsers());
+
+// Table 1: (a: nullable bigint, b)
+
+final TupleMetadata schema1 = new SchemaBuilder()
+.addNullable("a", MinorType.BIGINT)
+.addNullable("b", MinorType.VARCHAR)
+.add("c", MinorType.FLOAT8)
+.buildSchema();
+ResolvedRow priorSchema;
+{
+  final NullColumnBuilder builder = new NullColumnBuilder(null, false);
+  final ResolvedRow rootTuple = new ResolvedRow(builder);
+  new WildcardSchemaProjection(
+  scanProj, schema1, rootTuple,
+  ScanTestUtils.resolvers());
+  priorSchema = rootTuple;
+}
+
+// Table 2: (a: nullable bigint, c), column omitted, original schema 
preserved
+
+final TupleMetadata schema2 = new SchemaBuilder()
+.addNullable("a", MinorType.BIGINT)
+.add("c", MinorType.FLOAT8)
+.buildSchema();
+try {
+   

[GitHub] arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan projection framework

2018-11-14 Thread GitBox
arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233453077
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/ResolvedTuple.java
 ##
 @@ -0,0 +1,427 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.rowSet.ResultVectorCache;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.vector.UInt4Vector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.AbstractMapVector;
+import org.apache.drill.exec.vector.complex.MapVector;
+import org.apache.drill.exec.vector.complex.RepeatedMapVector;
+
+import 
org.apache.drill.shaded.guava.com.google.common.annotations.VisibleForTesting;
+
+/**
+ * Drill rows are made up of a tree of tuples, with the row being the root
+ * tuple. Each tuple contains columns, some of which may be maps. This
+ * class represents each row or map in the output projection.
+ * 
+ * Output columns within the tuple can be projected from the data source,
+ * might be null (requested columns that don't match a data source column)
+ * or might be a constant (such as an implicit column.) This class
+ * orchestrates assembling an output tuple from a collection of these
+ * three column types. (Though implicit columns appear only in the root
+ * tuple.)
+ *
+ * Null Handling
+ *
+ * The project list might reference a "missing" map if the project list
+ * includes, say, SELECT a.b.c but `a` does not exist
+ * in the data source. In this case, the column a is implied to be a map,
+ * so the projection mechanism will create a null map for `a`
+ * and `b`, and will create a null column for `c`.
+ * 
+ * To accomplish this recursive null processing, each tuple is associated
+ * with a null builder. (The null builder can be null if projection is
+ * implicit with a wildcard; in such a case no null columns can occur.
+ * But, even here, with schema persistence, a SELECT * query
+ * may need null columns if a second file does not contain a column
+ * that appeared in a first file.)
+ * 
+ * The null builder is bound to each tuple to allow vector persistence
+ * via the result vector cache. If we must create a null column
+ * `x` in two different readers, then the rules of Drill
+ * require that the same vector be used for both (or else a schema
+ * change is signaled.) The vector cache works by name (and type).
+ * Since maps may contain columns with the same names as other maps,
+ * the vector cache must be associated with each tuple. And, by extension,
+ * the null builder must also be associated with each tuple.
+ *
+ * Lifecycle
+ *
+ * The lifecycle of a resolved tuple is:
+ * 
+ * The projection mechanism creates the output tuple, and its columns,
+ * by comparing the project list against the table schema. The result is
+ * a set of table, null, or constant columns.
+ * Once per schema change, the resolved tuple creates the output
+ * tuple by linking to vectors in their original locations. As it turns out,
+ * we can simply share the vectors; we don't need to transfer the buffers.
+ * To prepare for the transfer, the tuple asks the null column builder
+ * (if present) to build the required null columns.
+ * Once the output tuple is built, it can be used for any number of
+ * batches without further work. (The same vectors appear in the various inputs
+ * and the output, eliminating the need for any transfers.)
+ * Once per batch, the client must set the row count. This is needed for 
the
+ * output container, and for any "null" maps that the project may have 
created.
+ * 
+ *
+ * Projection Mapping
+ *
+ * Each column is is mapped into the output tuple (vector container or map) in
+ * the order that the columns are defined here. (That order follows the project
+ * list f

[GitHub] arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan projection framework

2018-11-14 Thread GitBox
arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233454002
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/SmoothingProjection.java
 ##
 @@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother.IncompatibleSchemaException;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Resolve a table schema against the prior schema. This works only if the
+ * types match and if all columns in the table schema already appear in the
+ * prior schema.
+ * 
+ * Consider this an experimental mechanism. The hope was that, with clever
+ * techniques, we could "smooth over" some of the issues that cause schema
+ * change events in Drill. As it turned out, however, creating this mechanism
+ * revealed that it is not possible, even in theory, to handle most schema
+ * changes because of the time dimension:
+ * 
+ * An even in a later batch may provide information that would have
+ * caused us to make a different decision in an earlier batch. For example,
+ * we are asked for column `foo`, did not see such a column in the first
+ * batch, block or file, guessed some type, and later saw that the column
+ * was of a different type. We can't "time travel" to tell our earlier
+ * selves, nor, when we make the initial type decision, can we jump to
+ * the future to see what type we'll discover.
+ * Readers in this fragment may see column `foo` but readers in
+ * another fragment read files/blocks that don't have that column. The
+ * two readers cannot communicate to agree on a type.
+ * 
+ * 
+ * What this mechanism can do is make decisions based on history: when a
+ * column appears, we can adjust its type a bit to try to avoid an
+ * unnecessary change. For example, if a prior file in this scan saw
+ * `foo` as nullable Varchar, but the present file has the column as
+ * requied Varchar, we can use the more general nullable form. But,
+ * again, the "can't predict the future" bites us: we can handle a
+ * nullable-to-required column change, but not visa-versa.
+ * 
+ * What this mechanism will tell the careful reader is that the only
+ * general solution to the schema-change problem is to now the full
+ * schema up front: for the planner to be told the schema and to
+ * communicate that schema to all readers so that all readers agree
+ * on the final schema.
+ * 
+ * When that is done, the techniques shown here can be used to adjust
+ * any per-file variation of schema to match the up-front schema.
+ */
+
+public class SmoothingProjection extends SchemaLevelProjection {
+
+  protected final List rewrittenFields = new ArrayList<>();
+
+  public SmoothingProjection(ScanLevelProjection scanProj,
+  TupleMetadata tableSchema,
+  ResolvedTuple priorSchema,
+  ResolvedTuple outputTuple,
+  List resolvers) throws 
IncompatibleSchemaException {
+
+super(resolvers);
+
+for (ResolvedColumn priorCol : priorSchema.columns()) {
+  switch (priorCol.nodeType()) {
+  case ResolvedTableColumn.ID:
 
 Review comment:
   indent


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan projection framework

2018-11-14 Thread GitBox
arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233449683
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/ExplicitSchemaProjection.java
 ##
 @@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.List;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTuple;
+import 
org.apache.drill.exec.physical.rowSet.project.RequestedTuple.RequestedColumn;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Perform a schema projection for the case of an explicit list of
+ * projected columns. Example: SELECT a, b, c.
+ * 
+ * An explicit projection starts with the requested set of columns,
+ * then looks in the table schema to find matches. That is, it is
+ * driven by the query itself.
+ * 
+ * An explicit projection may include columns that do not exist in
+ * the source schema. In this case, we fill in null columns for
+ * unmatched projections.
+ */
+
+public class ExplicitSchemaProjection extends SchemaLevelProjection {
+
+  public ExplicitSchemaProjection(ScanLevelProjection scanProj,
+  TupleMetadata tableSchema,
+  ResolvedTuple rootTuple,
+  List resolvers) {
+super(resolvers);
+resolveRootTuple(scanProj, rootTuple, tableSchema);
+  }
+
+  private void resolveRootTuple(ScanLevelProjection scanProj,
+  ResolvedTuple rootTuple,
+  TupleMetadata tableSchema) {
+for (ColumnProjection col : scanProj.columns()) {
+  if (col.nodeType() == UnresolvedColumn.UNRESOLVED) {
+resolveColumn(rootTuple, ((UnresolvedColumn) col).element(), 
tableSchema);
+  } else {
+resolveSpecial(rootTuple, col, tableSchema);
+  }
+}
+  }
+
+  private void resolveColumn(ResolvedTuple outputTuple,
+  RequestedColumn inputCol, TupleMetadata tableSchema) {
+int tableColIndex = tableSchema.index(inputCol.name());
+if (tableColIndex == -1) {
+  resolveNullColumn(outputTuple, inputCol);
+} else {
+  resolveTableColumn(outputTuple, inputCol,
+  tableSchema.metadata(tableColIndex),
+  tableColIndex);
+}
+  }
+
+  private void resolveTableColumn(ResolvedTuple outputTuple,
+  RequestedColumn requestedCol,
+  ColumnMetadata column, int sourceIndex) {
+
+// Is the requested column implied to be a map?
+// A requested column is a map if the user requests x.y and we
+// are resolving column x. The presence of y as a member implies
+// that x is a map.
+
+if (requestedCol.isTuple()) {
+  resolveMap(outputTuple, requestedCol, column, sourceIndex);
+}
+
+// Is the requested column implied to be an array?
+// This occurs when the projection list contains at least one
+// array index reference such as x[10].
+
+else if (requestedCol.isArray()) {
+  resolveArray(outputTuple, requestedCol, column, sourceIndex);
+}
+
+// A plain old column. Might be an array or a map, but if
+// so, the request list just mentions it by name without implying
+// the column type. That is, the project list just contains x
+// by itself.
+
+else {
+  projectTableColumn(outputTuple, requestedCol, column, sourceIndex);
+}
+  }
+
+  private void resolveMap(ResolvedTuple outputTuple,
+  RequestedColumn requestedCol, ColumnMetadata column,
+  int sourceIndex) {
+
+// If the actual column isn't a map, then the request is invalid.
+
+if (! column.isMap()) {
+  throw UserException
+.validationError()
+.message("Project list implies a map column, but actual column is not 
a map")
+.addContext("Projected column", requestedCol.fullName())
+.addContext("Actual type", column.type().name())
+.build(logger);
+}
+
+// The requested column is implied to b

[GitHub] arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan projection framework

2018-11-14 Thread GitBox
arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233458346
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/project/TestSchemaSmoothing.java
 ##
 @@ -0,0 +1,681 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.impl.protocol.SchemaTracker;
+import org.apache.drill.exec.physical.impl.scan.ScanTestUtils;
+import org.apache.drill.exec.physical.impl.scan.project.NullColumnBuilder;
+import org.apache.drill.exec.physical.impl.scan.project.ResolvedColumn;
+import 
org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple.ResolvedRow;
+import org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection;
+import org.apache.drill.exec.physical.impl.scan.project.ScanSchemaOrchestrator;
+import 
org.apache.drill.exec.physical.impl.scan.project.ScanSchemaOrchestrator.ReaderSchemaOrchestrator;
+import org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother.IncompatibleSchemaException;
+import org.apache.drill.exec.physical.impl.scan.project.SmoothingProjection;
+import 
org.apache.drill.exec.physical.impl.scan.project.WildcardSchemaProjection;
+import org.apache.drill.exec.physical.rowSet.ResultSetLoader;
+import org.apache.drill.exec.physical.rowSet.impl.RowSetTestUtils;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.test.rowSet.schema.SchemaBuilder;
+import org.junit.Test;
+
+/**
+ * Tests schema smoothing at the schema projection level.
+ * This level handles reusing prior types when filling null
+ * values. But, because no actual vectors are involved, it
+ * does not handle the schema chosen for a table ahead of
+ * time, only the schema as it is merged with prior schema to
+ * detect missing columns.
+ * 
+ * Focuses on the SmoothingProjection class itself.
+ * 
+ * Note that, at present, schema smoothing does not work for entire
+ * maps. That is, if file 1 has, say {a: {b: 10, c: "foo"}}
+ * and file 2 has, say, {a: null}, then schema smoothing does
+ * not currently know how to recreate the map. The same is true of
+ * lists and unions. Handling such cases is complex and is probably
+ * better handled via a system that allows the user to specify their
+ * intent by providing a schema to apply to the two files.
+ */
+
+public class TestSchemaSmoothing extends SubOperatorTest {
+
+  /**
+   * Low-level test of the smoothing projection, including the exceptions
+   * it throws when things are not going its way.
+   */
+
+  @Test
+  public void testSmoothingProjection() {
+final ScanLevelProjection scanProj = new ScanLevelProjection(
+RowSetTestUtils.projectAll(),
+ScanTestUtils.parsers());
+
+// Table 1: (a: nullable bigint, b)
+
+final TupleMetadata schema1 = new SchemaBuilder()
+.addNullable("a", MinorType.BIGINT)
+.addNullable("b", MinorType.VARCHAR)
+.add("c", MinorType.FLOAT8)
+.buildSchema();
+ResolvedRow priorSchema;
+{
+  final NullColumnBuilder builder = new NullColumnBuilder(null, false);
+  final ResolvedRow rootTuple = new ResolvedRow(builder);
+  new WildcardSchemaProjection(
+  scanProj, schema1, rootTuple,
+  ScanTestUtils.resolvers());
+  priorSchema = rootTuple;
+}
+
+// Table 2: (a: nullable bigint, c), column omitted, original schema 
preserved
+
+final TupleMetadata schema2 = new SchemaBuilder()
+.addNullable("a", MinorType.BIGINT)
+.add("c", MinorType.FLOAT8)
+.buildSchema();
+try {
+   

[GitHub] arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan projection framework

2018-11-14 Thread GitBox
arina-ielchiieva commented on a change in pull request #1501: DRILL-6791: Scan 
projection framework
URL: https://github.com/apache/drill/pull/1501#discussion_r233454141
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/SmoothingProjection.java
 ##
 @@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import 
org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother.IncompatibleSchemaException;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Resolve a table schema against the prior schema. This works only if the
+ * types match and if all columns in the table schema already appear in the
+ * prior schema.
+ * 
+ * Consider this an experimental mechanism. The hope was that, with clever
+ * techniques, we could "smooth over" some of the issues that cause schema
+ * change events in Drill. As it turned out, however, creating this mechanism
+ * revealed that it is not possible, even in theory, to handle most schema
+ * changes because of the time dimension:
+ * 
+ * An even in a later batch may provide information that would have
+ * caused us to make a different decision in an earlier batch. For example,
+ * we are asked for column `foo`, did not see such a column in the first
+ * batch, block or file, guessed some type, and later saw that the column
+ * was of a different type. We can't "time travel" to tell our earlier
+ * selves, nor, when we make the initial type decision, can we jump to
+ * the future to see what type we'll discover.
+ * Readers in this fragment may see column `foo` but readers in
+ * another fragment read files/blocks that don't have that column. The
+ * two readers cannot communicate to agree on a type.
+ * 
+ * 
+ * What this mechanism can do is make decisions based on history: when a
+ * column appears, we can adjust its type a bit to try to avoid an
+ * unnecessary change. For example, if a prior file in this scan saw
+ * `foo` as nullable Varchar, but the present file has the column as
+ * requied Varchar, we can use the more general nullable form. But,
+ * again, the "can't predict the future" bites us: we can handle a
+ * nullable-to-required column change, but not visa-versa.
+ * 
+ * What this mechanism will tell the careful reader is that the only
+ * general solution to the schema-change problem is to now the full
+ * schema up front: for the planner to be told the schema and to
+ * communicate that schema to all readers so that all readers agree
+ * on the final schema.
+ * 
+ * When that is done, the techniques shown here can be used to adjust
+ * any per-file variation of schema to match the up-front schema.
+ */
+
+public class SmoothingProjection extends SchemaLevelProjection {
+
+  protected final List rewrittenFields = new ArrayList<>();
+
+  public SmoothingProjection(ScanLevelProjection scanProj,
+  TupleMetadata tableSchema,
+  ResolvedTuple priorSchema,
+  ResolvedTuple outputTuple,
+  List resolvers) throws 
IncompatibleSchemaException {
+
+super(resolvers);
+
+for (ResolvedColumn priorCol : priorSchema.columns()) {
+  switch (priorCol.nodeType()) {
+  case ResolvedTableColumn.ID:
+  case ResolvedNullColumn.ID:
+// TODO: To fix this, the null column loader must declare
 
 Review comment:
   Please explain this todo


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on a change in pull request #1539: DRILL-6847: Add Query Metadata to RESTful Interface

2018-11-14 Thread GitBox
arina-ielchiieva commented on a change in pull request #1539: DRILL-6847: Add 
Query Metadata to RESTful Interface
URL: https://github.com/apache/drill/pull/1539#discussion_r233457684
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -106,7 +110,10 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 // TODO:  Clean:  DRILL-2933:  That load(...) no longer throws
 // SchemaChangeException, so check/clean catch clause below.
 for (int i = 0; i < loader.getSchema().getFieldCount(); ++i) {
-  columns.add(loader.getSchema().getColumn(i).getName());
+
+  MaterializedField col = loader.getSchema().getColumn(i);
+  columns.add(col.getName());
+  metadata.add(col.getType().getMinorType().name());
 
 Review comment:
   1. Duplicating column name does not make sense.
   2. You may not output precision and scale is they are absent, depends in 
which Object you plan to deserialize this information.
   3. Look at major type, for example.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cgivre commented on a change in pull request #1539: DRILL-6847: Add Query Metadata to RESTful Interface

2018-11-14 Thread GitBox
cgivre commented on a change in pull request #1539: DRILL-6847: Add Query 
Metadata to RESTful Interface
URL: https://github.com/apache/drill/pull/1539#discussion_r233456389
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -106,7 +110,10 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 // TODO:  Clean:  DRILL-2933:  That load(...) no longer throws
 // SchemaChangeException, so check/clean catch clause below.
 for (int i = 0; i < loader.getSchema().getFieldCount(); ++i) {
-  columns.add(loader.getSchema().getColumn(i).getName());
+
+  MaterializedField col = loader.getSchema().getColumn(i);
+  columns.add(col.getName());
+  metadata.add(col.getType().getMinorType().name());
 
 Review comment:
   How would you recommend designing that?  I was trying to keep this PR 
relatively simple, and backwards compatible, but one option might be to make 
the metadata a little duplicative so something like:
   
   ```
   "metadata": [{
  "name": "price",
  "type": "FLOAT4"
  "precision":
  "scale"
  },{
"name": "customer",
"type": "VARCHAR"
 ...
   ]
   ```
   Do you know off hand where the precision/scale or any other attributes of 
the columns can be accessed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on a change in pull request #1539: DRILL-6847: Add Query Metadata to RESTful Interface

2018-11-14 Thread GitBox
arina-ielchiieva commented on a change in pull request #1539: DRILL-6847: Add 
Query Metadata to RESTful Interface
URL: https://github.com/apache/drill/pull/1539#discussion_r233423505
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -106,7 +110,10 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 // TODO:  Clean:  DRILL-2933:  That load(...) no longer throws
 // SchemaChangeException, so check/clean catch clause below.
 for (int i = 0; i < loader.getSchema().getFieldCount(); ++i) {
-  columns.add(loader.getSchema().getColumn(i).getName());
+
+  MaterializedField col = loader.getSchema().getColumn(i);
+  columns.add(col.getName());
+  metadata.add(col.getType().getMinorType().name());
 
 Review comment:
   I see but though it is not your use case, I think we should consider 
shipping not only String type but as well information about precision and scale 
for those who might need it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cgivre commented on a change in pull request #1539: DRILL-6847: Add Query Metadata to RESTful Interface

2018-11-14 Thread GitBox
cgivre commented on a change in pull request #1539: DRILL-6847: Add Query 
Metadata to RESTful Interface
URL: https://github.com/apache/drill/pull/1539#discussion_r233405972
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -106,7 +110,10 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 // TODO:  Clean:  DRILL-2933:  That load(...) no longer throws
 // SchemaChangeException, so check/clean catch clause below.
 for (int i = 0; i < loader.getSchema().getFieldCount(); ++i) {
-  columns.add(loader.getSchema().getColumn(i).getName());
+
+  MaterializedField col = loader.getSchema().getColumn(i);
+  columns.add(col.getName());
+  metadata.add(col.getType().getMinorType().name());
 
 Review comment:
   Hi @arina-ielchiieva ,
   The use case I had in mind was integrating Drill with SQLPad and Apache 
Superset.  In these instances basically, the UI needed to know if a field was 
numeric, temporal of any sort, or text so that it could render visualizations 
properly.  
   
   I'm sure there are other use cases out there, but I know that for me at 
least, this was a major blocker in getting Drill to work with the various BI 
tools.  The JDBC interface provided this information, but the RESTful interface 
did not, so I had to resort to hackery. 
   
   So to answer your question, it might be useful for other use cases to 
provide precision and scale, but for the one I had in mind, that would not be 
helpful. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support varchar and decimal push down

2018-11-14 Thread GitBox
vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support 
varchar and decimal push down
URL: https://github.com/apache/drill/pull/1537#discussion_r233132202
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/stat/ParquetMetaStatCollector.java
 ##
 @@ -86,30 +87,26 @@ public ParquetMetaStatCollector(ParquetTableMetadataBase 
parquetTableMetadata,
   columnMetadataMap.put(schemaPath, columnMetadata);
 }
 
-for (final SchemaPath field : fields) {
-  final PrimitiveType.PrimitiveTypeName primitiveType;
-  final OriginalType originalType;
-
-  final ColumnMetadata columnMetadata = 
columnMetadataMap.get(field.getUnIndexed());
-
+for (SchemaPath field : fields) {
+  ColumnMetadata columnMetadata = 
columnMetadataMap.get(field.getUnIndexed());
   if (columnMetadata != null) {
-final Object min = columnMetadata.getMinValue();
-final Object max = columnMetadata.getMaxValue();
-final long numNulls = columnMetadata.getNulls() == null ? -1 : 
columnMetadata.getNulls();
-
-primitiveType = 
this.parquetTableMetadata.getPrimitiveType(columnMetadata.getName());
-originalType = 
this.parquetTableMetadata.getOriginalType(columnMetadata.getName());
-int precision = 0;
-int scale = 0;
+ColumnStatisticsBuilder statisticsBuilder = 
ColumnStatisticsBuilder.builder()
+  .setMin(columnMetadata.getMinValue())
+  .setMax(columnMetadata.getMaxValue())
+  .setNumNulls(columnMetadata.getNulls() == null ? -1 : 
columnMetadata.getNulls())
 
 Review comment:
   Please replace -1 with `GroupScan.NO_COLUMN_STATS`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support varchar and decimal push down

2018-11-14 Thread GitBox
vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support 
varchar and decimal push down
URL: https://github.com/apache/drill/pull/1537#discussion_r233397050
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetReaderConfig.java
 ##
 @@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.server.options.SystemOptionManager;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.ParquetReadOptions;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertTrue;
+
+public class TestParquetReaderConfig {
+
+  @Test
+  public void testDefaultsDeserialization() throws Exception {
+ObjectMapper mapper = new ObjectMapper();
+ParquetReaderConfig readerConfig = ParquetReaderConfig.builder().build(); 
// all defaults
+String value = mapper.writeValueAsString(readerConfig);
+assertEquals("{}", value);
+
+readerConfig = mapper.readValue(value, ParquetReaderConfig.class);
+assertTrue(readerConfig.autoCorrectCorruptedDates); // check that default 
value is restored
+
+readerConfig.autoCorrectCorruptedDates = false; // change the default
+readerConfig.enableStringsSignedMinMax = false; // update to the default
+
+value = mapper.writeValueAsString(readerConfig);
+assertEquals("{\"autoCorrectCorruptedDates\":false}", value);
+  }
+
+  @Test
+  public void testAddConfigToConf() {
+Configuration conf = new Configuration();
+conf.setBoolean(ParquetReaderConfig.ENABLE_BYTES_READ_COUNTER, true);
+conf.setBoolean(ParquetReaderConfig.ENABLE_BYTES_TOTAL_COUNTER, true);
+conf.setBoolean(ParquetReaderConfig.ENABLE_TIME_READ_COUNTER, true);
+
+ParquetReaderConfig readerConfig = 
ParquetReaderConfig.builder().withConf(conf).build();
+Configuration newConf = readerConfig.addCountersToConf(new 
Configuration());
+checkConfigValue(newConf, ParquetReaderConfig.ENABLE_BYTES_READ_COUNTER, 
"true");
+checkConfigValue(newConf, ParquetReaderConfig.ENABLE_BYTES_TOTAL_COUNTER, 
"true");
+checkConfigValue(newConf, ParquetReaderConfig.ENABLE_TIME_READ_COUNTER, 
"true");
+
+conf = new Configuration();
+conf.setBoolean(ParquetReaderConfig.ENABLE_BYTES_READ_COUNTER, false);
+conf.setBoolean(ParquetReaderConfig.ENABLE_BYTES_TOTAL_COUNTER, false);
+conf.setBoolean(ParquetReaderConfig.ENABLE_TIME_READ_COUNTER, false);
+
+readerConfig = ParquetReaderConfig.builder().withConf(conf).build();
+newConf = readerConfig.addCountersToConf(new Configuration());
+checkConfigValue(newConf, ParquetReaderConfig.ENABLE_BYTES_READ_COUNTER, 
"false");
+checkConfigValue(newConf, ParquetReaderConfig.ENABLE_BYTES_TOTAL_COUNTER, 
"false");
+checkConfigValue(newConf, ParquetReaderConfig.ENABLE_TIME_READ_COUNTER, 
"false");
+  }
+
+  @Test
+  public void testReadOptions() {
+ParquetReaderConfig readerConfig = new ParquetReaderConfig();
 
 Review comment:
   Looks like we contravene the recommendation from its Javadoc


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support varchar and decimal push down

2018-11-14 Thread GitBox
vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support 
varchar and decimal push down
URL: https://github.com/apache/drill/pull/1537#discussion_r233394516
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/stat/ParquetMetaStatCollector.java
 ##
 @@ -132,62 +129,163 @@ public 
ParquetMetaStatCollector(ParquetTableMetadataBase parquetTableMetadata,
   }
 
   /**
-   * Builds column statistics using given primitiveType, originalType, scale,
-   * precision, numNull, min and max values.
+   * Helper class that creates parquet {@link ColumnStatistics} based on given
+   * min and max values, type, number of nulls, precision and scale.
*
-   * @param min min value for statistics
-   * @param max max value for statistics
-   * @param numNullsnum_nulls for statistics
-   * @param primitiveType   type that determines statistics class
-   * @param originalTypetype that determines statistics class
-   * @param scale   scale value (used for DECIMAL type)
-   * @param precision   precision value (used for DECIMAL type)
-   * @return column statistics
*/
-  private ColumnStatistics getStat(Object min, Object max, long numNulls,
-   PrimitiveType.PrimitiveTypeName 
primitiveType, OriginalType originalType,
-   int scale, int precision) {
-Statistics stat = Statistics.getStatsBasedOnType(primitiveType);
-Statistics convertedStat = stat;
-
-TypeProtos.MajorType type = ParquetReaderUtility.getType(primitiveType, 
originalType, scale, precision);
-stat.setNumNulls(numNulls);
-
-if (min != null && max != null ) {
-  switch (type.getMinorType()) {
-  case INT :
-  case TIME:
-((IntStatistics) stat).setMinMax(Integer.parseInt(min.toString()), 
Integer.parseInt(max.toString()));
-break;
-  case BIGINT:
-  case TIMESTAMP:
-((LongStatistics) stat).setMinMax(Long.parseLong(min.toString()), 
Long.parseLong(max.toString()));
-break;
-  case FLOAT4:
-((FloatStatistics) stat).setMinMax(Float.parseFloat(min.toString()), 
Float.parseFloat(max.toString()));
-break;
-  case FLOAT8:
-((DoubleStatistics) 
stat).setMinMax(Double.parseDouble(min.toString()), 
Double.parseDouble(max.toString()));
-break;
-  case DATE:
-convertedStat = new LongStatistics();
-convertedStat.setNumNulls(stat.getNumNulls());
-final long minMS = 
convertToDrillDateValue(Integer.parseInt(min.toString()));
-final long maxMS = 
convertToDrillDateValue(Integer.parseInt(max.toString()));
-((LongStatistics) convertedStat ).setMinMax(minMS, maxMS);
-break;
-  case BIT:
-((BooleanStatistics) 
stat).setMinMax(Boolean.parseBoolean(min.toString()), 
Boolean.parseBoolean(max.toString()));
-break;
-  default:
-  }
+  private static class ColumnStatisticsBuilder {
+
+private Object min;
+private Object max;
+private long numNulls;
+private PrimitiveType.PrimitiveTypeName primitiveType;
+private OriginalType originalType;
+private int scale;
+private int precision;
+
+static ColumnStatisticsBuilder builder() {
+  return new ColumnStatisticsBuilder();
 }
 
-return new ColumnStatistics(convertedStat, type);
-  }
+ColumnStatisticsBuilder setMin(Object min) {
+  this.min = min;
+  return this;
+}
+
+ColumnStatisticsBuilder setMax(Object max) {
+  this.max = max;
+  return this;
+}
+
+ColumnStatisticsBuilder setNumNulls(long numNulls) {
+  this.numNulls = numNulls;
+  return this;
+}
+
+ColumnStatisticsBuilder setPrimitiveType(PrimitiveType.PrimitiveTypeName 
primitiveType) {
+  this.primitiveType = primitiveType;
+  return this;
+}
+
+ColumnStatisticsBuilder setOriginalType(OriginalType originalType) {
+  this.originalType = originalType;
+  return this;
+}
 
-  private static long convertToDrillDateValue(int dateValue) {
+ColumnStatisticsBuilder setScale(int scale) {
+  this.scale = scale;
+  return this;
+}
+
+ColumnStatisticsBuilder setPrecision(int precision) {
+  this.precision = precision;
+  return this;
+}
+
+
+/**
+ * Builds column statistics using given primitive and original types,
+ * scale, precision, number of nulls, min and max values.
+ * Min and max values for binary statistics are set only if allowed.
+ *
+ * @return column statistics
+ */
+ColumnStatistics build() {
+  Statistics stat = Statistics.getStatsBasedOnType(primitiveType);
+  Statistics convertedStat = stat;
+
+  TypeProtos.MajorType type = ParquetReaderUtility.getType(primitiveType, 
originalType, scale, precision);
+  stat.setNumNulls(numNulls);
+
+  if (min != null && max != null) {
+switch (type.getMinorType()) {

[GitHub] vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support varchar and decimal push down

2018-11-14 Thread GitBox
vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support 
varchar and decimal push down
URL: https://github.com/apache/drill/pull/1537#discussion_r233097319
 
 

 ##
 File path: common/src/main/java/org/apache/drill/common/VersionUtil.java
 ##
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.common;
+
+import org.apache.maven.artifact.versioning.DefaultArtifactVersion;
+
+/**
+ * Utility class for project version.
+ */
+public class VersionUtil {
 
 Review comment:
   Looks like `hadoop-common` has the class which does similar things: 
`org.apache.hadoop.util.VersionUtil`. Can we replace this class with hadoop one?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support varchar and decimal push down

2018-11-14 Thread GitBox
vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support 
varchar and decimal push down
URL: https://github.com/apache/drill/pull/1537#discussion_r233127110
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderConfig.java
 ##
 @@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.ParquetReadOptions;
+
+import java.util.Objects;
+
+import static 
org.apache.parquet.format.converter.ParquetMetadataConverter.NO_FILTER;
+
+/**
+ * Stores consolidated parquet reading configuration. Can obtain config values 
from various sources:
+ * Assignment priority of configuration values is the following:
+ * parquet format config
+ * Hadoop configuration
+ * session options
+ *
+ * During serialization does not deserialize the default values in keep 
serialized object smaller.
+ * Should be initialized using {@link Builder}, constructor is made public 
only for ser / de purposes.
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ParquetReaderConfig {
+
+  public static final String ENABLE_BYTES_READ_COUNTER = 
"parquet.benchmark.bytes.read";
+  public static final String ENABLE_BYTES_TOTAL_COUNTER = 
"parquet.benchmark.bytes.total";
+  public static final String ENABLE_TIME_READ_COUNTER = 
"parquet.benchmark.time.read";
+
+  // keep variables public for ser / de to avoid creating getters and 
constructor with params for all variables
+  // add defaults to keep deserialized object smaller
+  public boolean enableBytesReadCounter = false;
+  public boolean enableBytesTotalCounter = false;
+  public boolean enableTimeReadCounter = false;
+  public boolean autoCorrectCorruptedDates = true;
+  public boolean enableStringsSignedMinMax = false;
+
+  public static ParquetReaderConfig.Builder builder() {
+return new ParquetReaderConfig.Builder();
+  }
+
+  public static ParquetReaderConfig getDefaultInstance() {
+return new ParquetReaderConfig();
+  }
+
+  // default constructor should be used only for ser / de and testing
+  public ParquetReaderConfig() { }
+
+  public boolean autoCorrectCorruptedDates() {
+return autoCorrectCorruptedDates;
+  }
+
+  public boolean enableStringsSignedMinMax() {
+return enableStringsSignedMinMax;
+  }
+
+  public ParquetReadOptions toReadOptions() {
+return ParquetReadOptions.builder()
+  .withMetadataFilter(NO_FILTER)
 
 Review comment:
   `NO_FILTER` is set in `ParquetReadOptions.Builder` by default, so it may be 
removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support varchar and decimal push down

2018-11-14 Thread GitBox
vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support 
varchar and decimal push down
URL: https://github.com/apache/drill/pull/1537#discussion_r233390608
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderConfig.java
 ##
 @@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.ParquetReadOptions;
+
+import java.util.Objects;
+
+import static 
org.apache.parquet.format.converter.ParquetMetadataConverter.NO_FILTER;
+
+/**
+ * Stores consolidated parquet reading configuration. Can obtain config values 
from various sources:
+ * Assignment priority of configuration values is the following:
+ * parquet format config
+ * Hadoop configuration
+ * session options
+ *
+ * During serialization does not deserialize the default values in keep 
serialized object smaller.
+ * Should be initialized using {@link Builder}, constructor is made public 
only for ser / de purposes.
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ParquetReaderConfig {
 
 Review comment:
   Is it possible to make this class immutable, so in `getDefaultInstance()` 
method we will be able to use the same object instead of instantiating the new 
one?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support varchar and decimal push down

2018-11-14 Thread GitBox
vvysotskyi commented on a change in pull request #1537: DRILL-6744: Support 
varchar and decimal push down
URL: https://github.com/apache/drill/pull/1537#discussion_r233135431
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/stat/ParquetMetaStatCollector.java
 ##
 @@ -132,62 +129,163 @@ public 
ParquetMetaStatCollector(ParquetTableMetadataBase parquetTableMetadata,
   }
 
   /**
-   * Builds column statistics using given primitiveType, originalType, scale,
-   * precision, numNull, min and max values.
+   * Helper class that creates parquet {@link ColumnStatistics} based on given
+   * min and max values, type, number of nulls, precision and scale.
*
-   * @param min min value for statistics
-   * @param max max value for statistics
-   * @param numNullsnum_nulls for statistics
-   * @param primitiveType   type that determines statistics class
-   * @param originalTypetype that determines statistics class
-   * @param scale   scale value (used for DECIMAL type)
-   * @param precision   precision value (used for DECIMAL type)
-   * @return column statistics
*/
-  private ColumnStatistics getStat(Object min, Object max, long numNulls,
-   PrimitiveType.PrimitiveTypeName 
primitiveType, OriginalType originalType,
-   int scale, int precision) {
-Statistics stat = Statistics.getStatsBasedOnType(primitiveType);
-Statistics convertedStat = stat;
-
-TypeProtos.MajorType type = ParquetReaderUtility.getType(primitiveType, 
originalType, scale, precision);
-stat.setNumNulls(numNulls);
-
-if (min != null && max != null ) {
-  switch (type.getMinorType()) {
-  case INT :
-  case TIME:
-((IntStatistics) stat).setMinMax(Integer.parseInt(min.toString()), 
Integer.parseInt(max.toString()));
-break;
-  case BIGINT:
-  case TIMESTAMP:
-((LongStatistics) stat).setMinMax(Long.parseLong(min.toString()), 
Long.parseLong(max.toString()));
-break;
-  case FLOAT4:
-((FloatStatistics) stat).setMinMax(Float.parseFloat(min.toString()), 
Float.parseFloat(max.toString()));
-break;
-  case FLOAT8:
-((DoubleStatistics) 
stat).setMinMax(Double.parseDouble(min.toString()), 
Double.parseDouble(max.toString()));
-break;
-  case DATE:
-convertedStat = new LongStatistics();
-convertedStat.setNumNulls(stat.getNumNulls());
-final long minMS = 
convertToDrillDateValue(Integer.parseInt(min.toString()));
-final long maxMS = 
convertToDrillDateValue(Integer.parseInt(max.toString()));
-((LongStatistics) convertedStat ).setMinMax(minMS, maxMS);
-break;
-  case BIT:
-((BooleanStatistics) 
stat).setMinMax(Boolean.parseBoolean(min.toString()), 
Boolean.parseBoolean(max.toString()));
-break;
-  default:
-  }
+  private static class ColumnStatisticsBuilder {
+
+private Object min;
+private Object max;
+private long numNulls;
+private PrimitiveType.PrimitiveTypeName primitiveType;
+private OriginalType originalType;
+private int scale;
+private int precision;
+
+static ColumnStatisticsBuilder builder() {
+  return new ColumnStatisticsBuilder();
 }
 
-return new ColumnStatistics(convertedStat, type);
-  }
+ColumnStatisticsBuilder setMin(Object min) {
+  this.min = min;
+  return this;
+}
+
+ColumnStatisticsBuilder setMax(Object max) {
+  this.max = max;
+  return this;
+}
+
+ColumnStatisticsBuilder setNumNulls(long numNulls) {
+  this.numNulls = numNulls;
+  return this;
+}
+
+ColumnStatisticsBuilder setPrimitiveType(PrimitiveType.PrimitiveTypeName 
primitiveType) {
+  this.primitiveType = primitiveType;
+  return this;
+}
+
+ColumnStatisticsBuilder setOriginalType(OriginalType originalType) {
+  this.originalType = originalType;
+  return this;
+}
 
-  private static long convertToDrillDateValue(int dateValue) {
+ColumnStatisticsBuilder setScale(int scale) {
+  this.scale = scale;
+  return this;
+}
+
+ColumnStatisticsBuilder setPrecision(int precision) {
+  this.precision = precision;
+  return this;
+}
+
+
+/**
+ * Builds column statistics using given primitive and original types,
+ * scale, precision, number of nulls, min and max values.
+ * Min and max values for binary statistics are set only if allowed.
+ *
+ * @return column statistics
+ */
+ColumnStatistics build() {
+  Statistics stat = Statistics.getStatsBasedOnType(primitiveType);
+  Statistics convertedStat = stat;
+
+  TypeProtos.MajorType type = ParquetReaderUtility.getType(primitiveType, 
originalType, scale, precision);
+  stat.setNumNulls(numNulls);
+
+  if (min != null && max != null) {
+switch (type.getMinorType()) {

[GitHub] arina-ielchiieva commented on a change in pull request #1539: DRILL-6847: Add Query Metadata to RESTful Interface

2018-11-14 Thread GitBox
arina-ielchiieva commented on a change in pull request #1539: DRILL-6847: Add 
Query Metadata to RESTful Interface
URL: https://github.com/apache/drill/pull/1539#discussion_r233348011
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -106,7 +110,10 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 // TODO:  Clean:  DRILL-2933:  That load(...) no longer throws
 // SchemaChangeException, so check/clean catch clause below.
 for (int i = 0; i < loader.getSchema().getFieldCount(); ++i) {
-  columns.add(loader.getSchema().getColumn(i).getName());
+
+  MaterializedField col = loader.getSchema().getColumn(i);
+  columns.add(col.getName());
+  metadata.add(col.getType().getMinorType().name());
 
 Review comment:
   Some types can have precision and scale, does this information needed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on a change in pull request #1539: DRILL-6847: Add Query Metadata to RESTful Interface

2018-11-14 Thread GitBox
arina-ielchiieva commented on a change in pull request #1539: DRILL-6847: Add 
Query Metadata to RESTful Interface
URL: https://github.com/apache/drill/pull/1539#discussion_r233348011
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -106,7 +110,10 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 // TODO:  Clean:  DRILL-2933:  That load(...) no longer throws
 // SchemaChangeException, so check/clean catch clause below.
 for (int i = 0; i < loader.getSchema().getFieldCount(); ++i) {
-  columns.add(loader.getSchema().getColumn(i).getName());
+
+  MaterializedField col = loader.getSchema().getColumn(i);
+  columns.add(col.getName());
+  metadata.add(col.getType().getMinorType().name());
 
 Review comment:
   Some types can have precision and scale, is this information needed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services