[GitHub] drill pull request: DRILL-3623: For limit 0 queries, use a shorter...

2016-03-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/405


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3623: For limit 0 queries, use a shorter...

2016-03-22 Thread sudheeshkatkam
Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/405#issuecomment-200026538
  
Thank you for the reviews.

All regression tests passed; I am running unit tests right now.

Note that, the `planner.enable_limit0_optimization` option is disabled by 
default. To summarize (and document) the limitations:

If, during validation, the planner is able to resolve that the types of the 
columns (i.e. types are non late binding), the shorter execution path is taken. 
Some types are excluded:
+ DECIMAL type is not fully supported in general.
+ VARBINARY is not fully tested.
+ MAP, ARRAY are currently not exposed to the planner.
+ TINYINT, SMALLINT are defined in the Drill type system but have been 
turned off for now.
+ SYMBOL, MULTISET, DISTINCT, STRUCTURED, ROW, OTHER, CURSOR, COLUMN_LIST 
are Calcite types currently not supported by Drill, nor defined in the Drill 
type list.

Three scenarios when the planner can do type resolution during validation:
+ Queries on Hive tables
+ Queries with explicit casts on table columns, example: `SELECT CAST(col1 
AS BIGINT), ABS(CAST(col2 AS INTEGER)) FROM table;`
+ Queries on views with casts on table columns

In the latter two cases, the schema of the query with LIMIT 0 clause has 
relaxed nullability compared to the query without the LIMIT 0 clause. Example:
Say the schema definition of the Parquet file (`numbers.parquet`) is:
```
message Numbers {
  required int col1;
  optional int col2;
 }
```

Since the view definition does not specify nullability of columns, and 
schema of a parquet file is not yet leveraged by Drill's planner:
```
CREATE VIEW dfs.tmp.mynumbers AS SELECT CAST(col1 AS INTEGER) as col1, 
CAST(col2 AS INTEGER) AS col2 FROM dfs.tmp.`numbers.parquet`;
```
(1) For query with LIMIT 0 clause, since the file/ metadata is not read, 
Drill assumes the nullability of both columns is 
[`columnNullable`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNullable).
```
SELECT col1, col2 FROM dfs.tmp.mynumbers LIMIT 0;
```

(2) For query without LIMIT 0 clause, since the file is read, Drill knows 
the nullability of `col1` is 
[`columnNoNulls`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNoNulls),
 and `col2` is 
[`columnNullable`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNullable).
```
SELECT col1, col2 FROM dfs.tmp.mynumbers LIMIT 1;
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3623: For limit 0 queries, use a shorter...

2016-03-22 Thread jinfengni
Github user jinfengni commented on the pull request:

https://github.com/apache/drill/pull/405#issuecomment-20859
  
LGTM.

+1 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3623: For limit 0 queries, use a shorter...

2016-03-22 Thread StevenMPhillips
Github user StevenMPhillips commented on the pull request:

https://github.com/apache/drill/pull/405#issuecomment-199964093
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3623: For limit 0 queries, use a shorter...

2016-03-22 Thread sudheeshkatkam
Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/405#issuecomment-199962117
  
I have addressed @jinfengni 's and @hsuanyi 's comments 
[here](https://github.com/sudheeshkatkam/drill/commit/e4cfdfa9b0562d52ac07f6d80860a82fa8baba40)
 [I force pushed to this branch and somehow their comments are not referenced 
in this PR any longer.]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3623: For limit 0 queries, use a shorter...

2016-03-22 Thread StevenMPhillips
Github user StevenMPhillips commented on the pull request:

https://github.com/apache/drill/pull/405#issuecomment-199910572
  
Sudheesh, could you respond to the comments made by Jin Feng? If you have 
already discussed it with him in person, could you post a summary here.

I am in favor of merging this PR, but just the last commit. I think the 
second commit should be separate, and should be done as an inserted operator 
(similar to IteratorValidator), rather than modifying the constructor for 
Screen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3623: For limit 0 queries, use a shorter...

2016-03-22 Thread hsuanyi
Github user hsuanyi commented on the pull request:

https://github.com/apache/drill/pull/405#issuecomment-199650953
  
Except for my comments (which are for future improvements and can be filed 
as follow-up jira issues)
LGTM +1 (non-binding)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3623: For limit 0 queries, use a shorter...

2016-03-21 Thread StevenMPhillips
Github user StevenMPhillips commented on a diff in the pull request:

https://github.com/apache/drill/pull/405#discussion_r56918321
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/limit/TestLimit0.java
 ---
@@ -0,0 +1,677 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.limit;
+
+import com.google.common.collect.Lists;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.BaseTestQuery;
+import org.apache.drill.PlanTestBase;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.joda.time.DateTime;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Ignore;
+import org.junit.Test;
+
+import java.util.List;
+
+public class TestLimit0 extends BaseTestQuery {
+
+  private static final String viewName = "limitZeroEmployeeView";
+
+  private static String wrapLimit0(final String query) {
+return "SELECT * FROM (" + query + ") LZT LIMIT 0";
+  }
+
+  @BeforeClass
+  public static void createView() throws Exception {
+test("USE dfs_test.tmp");
+test(String.format("CREATE OR REPLACE VIEW %s AS SELECT " +
+"CAST(employee_id AS INT) AS employee_id, " +
+"CAST(full_name AS VARCHAR(25)) AS full_name, " +
+"CAST(position_id AS INTEGER) AS position_id, " +
+"CAST(department_id AS BIGINT) AS department_id," +
+"CAST(birth_date AS DATE) AS birth_date, " +
+"CAST(hire_date AS TIMESTAMP) AS hire_date, " +
+"CAST(salary AS DOUBLE) AS salary, " +
+"CAST(salary AS FLOAT) AS fsalary, " +
+"CAST((CASE WHEN marital_status = 'S' THEN true ELSE false END) AS 
BOOLEAN) AS single, " +
+"CAST(education_level AS VARCHAR(60)) AS education_level," +
+"CAST(gender AS CHAR) AS gender " +
+"FROM cp.`employee.json` " +
+"ORDER BY employee_id " +
+"LIMIT 1;", viewName));
+// { "employee_id":1,"full_name":"Sheri 
Nowmer","first_name":"Sheri","last_name":"Nowmer","position_id":1,
+// 
"position_title":"President","store_id":0,"department_id":1,"birth_date":"1961-08-26",
+// "hire_date":"1994-12-01 
00:00:00.0","end_date":null,"salary":8.,"supervisor_id":0,
+// "education_level":"Graduate 
Degree","marital_status":"S","gender":"F","management_role":"Senior Management" 
}
+  }
+
+  @AfterClass
+  public static void tearDownView() throws Exception {
+test("DROP VIEW " + viewName + ";");
+  }
+
+  //  SIMPLE QUERIES 
+
+  @Test
+  public void infoSchema() throws Exception {
+testBuilder()
+.sqlQuery(String.format("DESCRIBE %s", viewName))
+.unOrdered()
+.baselineColumns("COLUMN_NAME", "DATA_TYPE", "IS_NULLABLE")
+.baselineValues("employee_id", "INTEGER", "YES")
+.baselineValues("full_name", "CHARACTER VARYING", "YES")
+.baselineValues("position_id", "INTEGER", "YES")
+.baselineValues("department_id", "BIGINT", "YES")
+.baselineValues("birth_date", "DATE", "YES")
+.baselineValues("hire_date", "TIMESTAMP", "YES")
+.baselineValues("salary", "DOUBLE", "YES")
+.baselineValues("fsalary", "FLOAT", "YES")
+.baselineValues("single", "BOOLEAN", "NO")
+.baselineValues("education_level", "CHARACTER VARYING", "YES")
+.baselineValues("gender", "CHARACTER", "YES")
+.go();
+  }
+
+  @Test
+  @Ignore("DateTime timezone error needs to be fixed.")
+  public void simpleSelect() throws Exception {
+testBuilder()
+.sqlQuery(String.format("SELECT * FROM %s", viewName))
+.ordered()
+.baselineColumns("employee_id", "full_name", "position_id", 

[GitHub] drill pull request: DRILL-3623: For limit 0 queries, use a shorter...

2016-03-20 Thread hsuanyi
Github user hsuanyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/405#discussion_r56783075
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DirectScanPrule.java
 ---
@@ -0,0 +1,40 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.physical;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.drill.exec.planner.logical.DrillDirectScanRel;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+
--- End diff --

java doc?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3623: For limit 0 queries, use a shorter...

2016-03-03 Thread sudheeshkatkam
GitHub user sudheeshkatkam opened a pull request:

https://github.com/apache/drill/pull/405

DRILL-3623: For limit 0 queries, use a shorter path when result column 
types are known

Moving from #193 to here.

+ There is a pull request open for first commit (DRILL-4372: #397).
+ Second commit has a "nice to have" check: ensuring planning and execution 
types patch.
+ My changes are in the third commit (e4cfdfa). Please review this.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sudheeshkatkam/drill DRILL-3623-pr

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/405.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #405


commit c553365e39947ba6c95d645cc971cf4d696ee758
Author: Sudheesh Katkam 
Date:   2015-12-22T04:38:59Z

DRILL-4372: Expose the functions return type to Drill

- Drill-Calite version update:
This commit needs to have Calcite's patch (CALCITE-1062) to plugin 
customized SqlOperator.

- FunctionTemplate
Add FunctionArgumentNumber annotation. This annotation element tells if the 
number of argument(s) is fixed or arbitrary (e.g., String concatenation 
function).

Due to this modification, there are some minor changes in DrillFuncHolder, 
DrillFunctionRegistry and FunctionAttributes.

- Checker
Add a new Checker (which Calcite uses to validate the legitimacy of the 
number of argument(s) for a function) to allow functions with arbitrary 
arguments to pass Caclite's validation

- Type conversion between Drill and Calcite
DrillConstExector is given a static method getDrillTypeFromCalcite() to 
convert Calcite types to Drill's.

- Extract function's return type inference
Unlike other functions, Extract function's return type can be determined 
solely based on the first argument. A logic is added in to allow this inference 
to happen

- DrillCalcite wrapper:
From the aspects of return type inference and argument type checks, 
Calcite's mechanism is very different from Drill's. In addition, currently, 
there is no straightforward way for Drill to plug-in customized mechanisms to 
Calcite. Thus, wrappers are provided to serve the objective.

Except for the mechanisms of type inference and argument type checks, these 
wrappers just forward any method calls to the wrapped SqlOpertor, SqlFuncion or 
SqlAggFunction to respond.

A interface DrillCalciteSqlWrapper is also added for the callers of the 
three wrappers to get the wrapped objects easier.

Due to these wrappers, UnsupportedOperatorsVisitor is modified in a minor 
manner.

- Calcite's SqlOpertor, SqlFuncion or SqlAggFunction are wrapped in 
DrillOperatorTable
Instead of returning Caclite's native SqlOpertor, SqlFuncion or 
SqlAggFunction, return the wrapped ones to ensure customized behaviors can be 
adopted.

- Type inference mechanism
This mechanism is used across all SqlOpertor, SqlFuncion or SqlAggFunction. 
Thus, it is factored out as its own method in TypeInferenceUtils

- Upgrade Drill-Calcite

Bump version number to 1.4.0-drill-test-r16

- Implement two argument version of lpad, rpad

- Implement one argument version of ltrim, rtrim, btrim

commit c3f0649e3ebb45d54e747f099d6699150bfa9869
Author: Hsuan-Yi Chu 
Date:   2016-02-03T05:17:50Z

DRILL-4372: Part 2: Optionally ensure planning and execution types match

commit e4cfdfa9b0562d52ac07f6d80860a82fa8baba40
Author: Sudheesh Katkam 
Date:   2016-03-03T21:25:39Z

DRILL-3623: For limit 0 queries, use a shorter path when result column 
types are known




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---