Re: Precedence of List and Map
As I understand, the current algorithm is next: to use any function in drill we must find the best match for this function via get cast rules and precedence map. To implement count function for complex datatypes we just need availability of map and list in precedenceMap and cast rules. Kind regards Vitalii 2016-06-02 18:10 GMT+00:00 Jacques Nadeau : > Why do we need any precedence information for implementing new specific > type functions? > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Thu, Jun 2, 2016 at 9:34 AM, Vitalii Diravka > > wrote: > > > Thank's for reply. > > > > It is necessary for implementing count function on complex datatypes. > > That's why I'm interested only in "precedenceMap" now. > > I'm going to add simple cast rules for Map and List: > > rules.put(MinorType.MAP, Sets.newHashSet(MinorType.MAP)); > > rules.put(MinorType.LIST, Sets.newHashSet(MinorType.LIST)); > > since cast from any other type isn't supported now. > > > > I am agree with placing of Map and List in the end of "precedenceMap" and > > before Union type. > > Does it matter the first will be Map or List on that place? > > > > > > > > Kind regards > > Vitalii > > > > 2016-06-01 17:57 GMT+00:00 Aman Sinha : > > > > > What are the implicit casting rules for promoting a data type to a List > > or > > > Map ? It seems to me the reverse mapping is more useful: casting a > List > > > or Map to a VARCHAR is possible, so for instance I can do a join > between > > a > > > Map containing {x: 1, y: 2} and a Varchar containing the exact same > > > string. To handle this you would add the mapping to the > > > ResolverTypePrecedence.secondaryImplicitCastRules. > > > > > > If there is a valid promotion to List or Map in the precedenceMap, > since > > > these are complex types I would think it belongs to the end just before > > the > > > UNION type (since Union is the superset). > > > > > > On Wed, Jun 1, 2016 at 9:24 AM, Vitalii Diravka < > > vitalii.dira...@gmail.com > > > > > > > wrote: > > > > > > > Hi all! > > > > > > > > I need to add List and Map data types into "precedenceMap" in the > > > > "ResolverTypePrecedence" class. > > > > And I am interested in precedence value of these data types. > > > > What are your thoughts about it? > > > > > > > > > > > > You can see all current precedence map below. > > > > > > > > > > > > > precedenceMap = new HashMap(); > > > > > precedenceMap.put(MinorType.NULL, i += 2); // NULL is legal > to > > > > > implicitly be promoted to any other type > > > > > precedenceMap.put(MinorType.FIXEDBINARY, i += 2); // Fixed-length > is > > > > > promoted to var length > > > > > precedenceMap.put(MinorType.VARBINARY, i += 2); > > > > > precedenceMap.put(MinorType.FIXEDCHAR, i += 2); > > > > > precedenceMap.put(MinorType.VARCHAR, i += 2); > > > > > precedenceMap.put(MinorType.FIXED16CHAR, i += 2); > > > > > precedenceMap.put(MinorType.VAR16CHAR, i += 2); > > > > > precedenceMap.put(MinorType.BIT, i += 2); > > > > > precedenceMap.put(MinorType.TINYINT, i += 2); //type with few > bytes > > > is > > > > > promoted to type with more bytes ==> no data loss. > > > > > precedenceMap.put(MinorType.UINT1, i += 2); //signed is legal > to > > > > > implicitly be promoted to unsigned. > > > > > precedenceMap.put(MinorType.SMALLINT, i += 2); > > > > > precedenceMap.put(MinorType.UINT2, i += 2); > > > > > precedenceMap.put(MinorType.INT, i += 2); > > > > > precedenceMap.put(MinorType.UINT4, i += 2); > > > > > precedenceMap.put(MinorType.BIGINT, i += 2); > > > > > precedenceMap.put(MinorType.UINT8, i += 2); > > > > > precedenceMap.put(MinorType.MONEY, i += 2); > > > > > precedenceMap.put(MinorType.FLOAT4, i += 2); > > > > > precedenceMap.put(MinorType.DECIMAL9, i += 2); > > > > > precedenceMap.put(MinorType.DECIMAL18, i += 2); > > > > > precedenceMap.put(MinorType.DECIMAL28DENSE, i += 2); > > > > > precedenceMap.put(MinorType.DECIMAL28SPARSE, i += 2); > > > > > precedenceMap.put(MinorType.DECIMAL38DENSE, i += 2); > > > > > precedenceMap.put(MinorType.DECIMAL38SPARSE, i += 2); > > > > > precedenceMap.put(MinorType.FLOAT8, i += 2); > > > > > precedenceMap.put(MinorType.DATE, i += 2); > > > > > precedenceMap.put(MinorType.TIMESTAMP, i += 2); > > > > > precedenceMap.put(MinorType.TIMETZ, i += 2); > > > > > precedenceMap.put(MinorType.TIMESTAMPTZ, i += 2); > > > > > precedenceMap.put(MinorType.TIME, i += 2); > > > > > precedenceMap.put(MinorType.INTERVALDAY, i+= 2); > > > > > precedenceMap.put(MinorType.INTERVALYEAR, i+= 2); > > > > > precedenceMap.put(MinorType.INTERVAL, i+= 2); > > > > > precedenceMap.put(MinorType.UNION, i += 2); > > > > > > > > > > > > > > > > Kind regards > > > > Vitalii > > > > > > > > > >
[jira] [Created] (DRILL-4705) Simple TSV file with quotes causes read error
Alexey Minakov created DRILL-4705: - Summary: Simple TSV file with quotes causes read error Key: DRILL-4705 URL: https://issues.apache.org/jira/browse/DRILL-4705 Project: Apache Drill Issue Type: Bug Affects Versions: 1.6.0 Environment: Mac OS X 10.11.3 Reporter: Alexey Minakov Attachments: test.tsv A simple TSV file with quotes causes error: Error: DATA_READ ERROR: Error processing input: Cannot use newline character within quoted string, line=3, char=98. Content parsed: [ ] Failure while reading file file:/users/alexeyminakov/drill/apache-drill-1.6.0/sample-data/test.tsv. Happened at or shortly before byte position 98. Fragment 0:0 [Error Id: 7b664e41-89cf-49c8-8843-68a713a1fc24 on 192.168.6.199:31010] (state=,code=0) Full console output: http://pastebin.com/qA7nDumz -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4706) Fragment planning causes Drillbits to read remote chunks when local copies are available
Kunal Khatua created DRILL-4706: --- Summary: Fragment planning causes Drillbits to read remote chunks when local copies are available Key: DRILL-4706 URL: https://issues.apache.org/jira/browse/DRILL-4706 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 1.6.0 Environment: CentOS, RHEL Reporter: Kunal Khatua When a table (datasize=70GB) of 160 parquet files (each having a single rowgroup and fitting within one chunk) is available on a 10-node setup with replication=3 ; a pure data scan query causes about 2% of the data to be read remotely. Even with the creation of metadata cache, the planner is selecting a sub-optimal plan of executing the SCAN fragments such that some of the data is served from a remote server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4707) Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result
Jinfeng Ni created DRILL-4707: - Summary: Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result Key: DRILL-4707 URL: https://issues.apache.org/jira/browse/DRILL-4707 Project: Apache Drill Issue Type: Bug Reporter: Jinfeng Ni Priority: Critical On latest master branch: {code} select version, commit_id, commit_message from sys.version; +-+---+-+ | version | commit_id | commit_message | +-+---+-+ | 1.7.0-SNAPSHOT | 3186217e5abe3c6c2c7e504cdb695567ff577e4c | DRILL-4607: Add a split function that allows to separate string by a delimiter | +-+---+-+ {code} If a query has two conflicting column names under case-insensitive policy, Drill will either hit memory leak, or incorrect issue. Q1. {code} select r_regionkey as XYZ, r_name as xyz FROM cp.`tpch/region.parquet`; Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory leaked: (131072) Allocator(op:0:0:1:Project) 100/131072/2490368/100 (res/actual/peak/limit) Fragment 0:0 {code} Q2: return only one column in the result. {code} select n_nationkey as XYZ, n_regionkey as xyz FROM cp.`tpch/nation.parquet`; +--+ | XYZ | +--+ | 0| | 1| | 1| | 1| | 4| | 0| | 3| {code} The cause of the problem seems to be that the Project thinks the two incoming columns as identical (since Drill adopts case-insensitive for column names in execution). The planner should make sure that the conflicting columns are resolved, since execution is name-based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4708) connection closed unexpectedly
Chun Chang created DRILL-4708: - Summary: connection closed unexpectedly Key: DRILL-4708 URL: https://issues.apache.org/jira/browse/DRILL-4708 Project: Apache Drill Issue Type: Bug Components: Execution - RPC Affects Versions: 1.7.0 Reporter: Chun Chang Running DRILL functional automation, we often see query failed randomly due to the following unexpected connection close error. {noformat} Execution Failures: /root/drillAutomation/framework/framework/resources/Functional/ctas/ctas_flatten/10rows/filter5.q Query: select * from dfs.ctas_flatten.`filter5_10rows_ctas` Failed with exception java.sql.SQLException: CONNECTION ERROR: Connection /10.10.100.171:36185 <--> drillats4.qa.lab/10.10.100.174:31010 (user client) closed unexpectedly. Drillbit down? [Error Id: 3d5dad8e-80d0-4c7f-9012-013bf01ce2b7 ] at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247) at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:321) at oadd.net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187) at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:172) at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:210) at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:99) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: oadd.org.apache.drill.common.exceptions.UserException: CONNECTION ERROR: Connection /10.10.100.171:36185 <--> drillats4.qa.lab/10.10.100.174:31010 (user client) closed unexpectedly. Drillbit down? [Error Id: 3d5dad8e-80d0-4c7f-9012-013bf01ce2b7 ] at oadd.org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) at oadd.org.apache.drill.exec.rpc.user.QueryResultHandler$ChannelClosedHandler$1.operationComplete(QueryResultHandler.java:373) at oadd.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at oadd.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) at oadd.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) at oadd.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406) at oadd.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82) at oadd.io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:943) at oadd.io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:592) at oadd.io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:584) at oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.closeOnRead(AbstractNioByteChannel.java:71) at oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:89) at oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162) at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at oadd.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at oadd.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ... 1 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] drill pull request #513: DRILL-4607: - Fix unittest failure. Janino cannot c...
GitHub user parthchandra opened a pull request: https://github.com/apache/drill/pull/513 DRILL-4607: - Fix unittest failure. Janino cannot compile a function ⦠â¦that uses generics; so replaced the implementation of StringFunctions.Split to not use any. You can merge this pull request into a Git repository by running: $ git pull https://github.com/parthchandra/drill DRILL-4607 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/513.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #513 commit 130e565e00527c5632cd4b61cce19da734f70148 Author: Parth Chandra Date: 2016-06-03T00:23:13Z DRILL-4607: - Fix unittest failure. Janino cannot compile a function that uses generics; so replaced the implementation of StringFunctions.Split to not use any. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #513: DRILL-4607: - Fix unittest failure. Janino cannot compile ...
Github user sudheeshkatkam commented on the issue: https://github.com/apache/drill/pull/513 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #513: DRILL-4607: - Fix unittest failure. Janino cannot c...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/513 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #514: DRILL-4694: CTAS in JSON format produces extraneous...
GitHub user parthchandra opened a pull request: https://github.com/apache/drill/pull/514 DRILL-4694: CTAS in JSON format produces extraneous NULL fields Changed behavior of JSON CTAS to skip fields if the value is null. Added an option "store.json.writer.skip_null_fields" to enable old behavior. You can merge this pull request into a Git repository by running: $ git pull https://github.com/parthchandra/drill DRILL-4694 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/514.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #514 commit 68ff4950f8302725aadf72a50074f6eef735738b Author: Parth Chandra Date: 2016-06-02T00:19:03Z DRILL-4694: CTAS in JSON format produces extraneous NULL fields Changed behavior of JSON CTAS to skip fields if the value is null. Added an option "store.json.writer.skip_null_fields" to enable old behavior. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #514: DRILL-4694: CTAS in JSON format produces extraneous...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/514#discussion_r65795019 --- Diff: exec/java-exec/src/main/codegen/templates/JsonOutputRecordWriter.java --- @@ -61,7 +62,13 @@ @Override public void startField() throws IOException { + <#if mode.prefix = "Nullable" > --- End diff -- should be == --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #514: DRILL-4694: CTAS in JSON format produces extraneous...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/514#discussion_r65795030 --- Diff: exec/java-exec/src/main/codegen/templates/JsonOutputRecordWriter.java --- @@ -120,7 +127,13 @@ public void writeField() throws IOException { <#elseif mode.prefix == "Repeated" > gen.write${typeName}(i, reader); <#else> +<#if mode.prefix = "Nullable" > --- End diff -- same as above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #514: DRILL-4694: CTAS in JSON format produces extraneous...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/514#discussion_r65795093 --- Diff: exec/java-exec/src/test/resources/json/ctas_alltypes_map_out.json --- @@ -0,0 +1,41 @@ +{ --- End diff -- minor point: in TestJsonReader lately we were embedding the data generation in the code instead of creating new files..but for small files like these I suppose this is fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #514: DRILL-4694: CTAS in JSON format produces extraneous...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/514#discussion_r65795257 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestCTASJson.java --- @@ -0,0 +1,129 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill; + + +import org.apache.drill.common.util.TestTools; +import org.apache.drill.exec.ExecConstants; +import org.junit.Test; + +import static org.junit.Assert.assertEquals; + +public class TestCTASJson extends PlanTestBase { + static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestCTASJson.class); + + static final String WORKING_PATH = TestTools.getWorkingPath(); + static final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources"; + + @Test + /** + * Test a source json file that contains records that are maps with fields of all types. + * Some records have missing fields. CTAS should skip the missing fields + */ public void testctas_alltypes_map() throws Exception { +String testName = "ctas_alltypes_map"; +test("use dfs_test.tmp"); +test("alter session set store.format = 'json' "); +test("alter session set store.json.writer.skip_null_fields = true"); // DEFAULT +test("create table " + testName + "_json as select * from cp.`json/" + testName + ".json`"); + +final String query = "select * from `" + testName + "_json` t1 "; + +testBuilder() +.sqlQuery(query) +.ordered() +.jsonBaselineFile("json/" + testName + ".json") +.build() +.run(); + +test("drop table " + testName + "_json" ); + } + + @Test + /** + * Test a source json file that contains records that are maps with fields of all types. + * Some records have missing fields. CTAS should NOT skip the missing fields + */ + public void testctas_alltypes_map_noskip() throws Exception { +String testName = "ctas_alltypes_map"; +test("use dfs_test.tmp"); +test("alter session set store.format = 'json' "); +test("alter session set store.json.writer.skip_null_fields = false"); // CHANGE from default +test("create table " + testName + "_json as select * from cp.`json/" + testName + ".json`"); + +final String query = "select * from `" + testName + "_json` t1 "; + +testBuilder() +.sqlQuery(query) +.ordered() +.jsonBaselineFile("json/" + testName + "_out.json") +.build() +.run(); + +test("drop table " + testName + "_json" ); + } + + @Test + /** + * Test a source json file that contains records that are maps with fields of all types. + * Some records have missing fields. CTAS should skip the missing fields + */ public void testctas_alltypes_repeatedmap() throws Exception { +String testName = "ctas_alltypes_repeated_map"; +test("use dfs_test.tmp"); +test("alter session set store.format = 'json' "); +test("alter session set store.json.writer.skip_null_fields = true"); // DEFAULT +test("create table " + testName + "_json as select * from cp.`json/" + testName + ".json`"); + +final String query = "select * from `" + testName + "_json` t1 "; + +testBuilder() +.sqlQuery(query) +.ordered() +.jsonBaselineFile("json/" + testName + ".json") +.build() +.run(); + +test("drop table " + testName + "_json" ); + + } + + @Test + /** + * Test a source json file that contains records that are maps with fields of all types. + * Some records have missing fields. CTAS should NOT skip the missing fields + */ + public void testctas_alltypes_repeated_map_noskip() throws Exception { +String testName = "ctas_alltypes_repeated_map"; +test("use dfs_test.tmp"); +test("alter session set store.for