[GitHub] drill pull request #514: DRILL-4694: CTAS in JSON format produces extraneous...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/514#discussion_r65795093 --- Diff: exec/java-exec/src/test/resources/json/ctas_alltypes_map_out.json --- @@ -0,0 +1,41 @@ +{ --- End diff -- minor point: in TestJsonReader lately we were embedding the data generation in the code instead of creating new files..but for small files like these I suppose this is fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #514: DRILL-4694: CTAS in JSON format produces extraneous...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/514#discussion_r65795030 --- Diff: exec/java-exec/src/main/codegen/templates/JsonOutputRecordWriter.java --- @@ -120,7 +127,13 @@ public void writeField() throws IOException { <#elseif mode.prefix == "Repeated" > gen.write${typeName}(i, reader); <#else> +<#if mode.prefix = "Nullable" > --- End diff -- same as above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #514: DRILL-4694: CTAS in JSON format produces extraneous...
GitHub user parthchandra opened a pull request: https://github.com/apache/drill/pull/514 DRILL-4694: CTAS in JSON format produces extraneous NULL fields Changed behavior of JSON CTAS to skip fields if the value is null. Added an option "store.json.writer.skip_null_fields" to enable old behavior. You can merge this pull request into a Git repository by running: $ git pull https://github.com/parthchandra/drill DRILL-4694 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/514.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #514 commit 68ff4950f8302725aadf72a50074f6eef735738b Author: Parth ChandraDate: 2016-06-02T00:19:03Z DRILL-4694: CTAS in JSON format produces extraneous NULL fields Changed behavior of JSON CTAS to skip fields if the value is null. Added an option "store.json.writer.skip_null_fields" to enable old behavior. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #513: DRILL-4607: - Fix unittest failure. Janino cannot compile ...
Github user sudheeshkatkam commented on the issue: https://github.com/apache/drill/pull/513 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (DRILL-4707) Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result
Jinfeng Ni created DRILL-4707: - Summary: Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result Key: DRILL-4707 URL: https://issues.apache.org/jira/browse/DRILL-4707 Project: Apache Drill Issue Type: Bug Reporter: Jinfeng Ni Priority: Critical On latest master branch: {code} select version, commit_id, commit_message from sys.version; +-+---+-+ | version | commit_id | commit_message | +-+---+-+ | 1.7.0-SNAPSHOT | 3186217e5abe3c6c2c7e504cdb695567ff577e4c | DRILL-4607: Add a split function that allows to separate string by a delimiter | +-+---+-+ {code} If a query has two conflicting column names under case-insensitive policy, Drill will either hit memory leak, or incorrect issue. Q1. {code} select r_regionkey as XYZ, r_name as xyz FROM cp.`tpch/region.parquet`; Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory leaked: (131072) Allocator(op:0:0:1:Project) 100/131072/2490368/100 (res/actual/peak/limit) Fragment 0:0 {code} Q2: return only one column in the result. {code} select n_nationkey as XYZ, n_regionkey as xyz FROM cp.`tpch/nation.parquet`; +--+ | XYZ | +--+ | 0| | 1| | 1| | 1| | 4| | 0| | 3| {code} The cause of the problem seems to be that the Project thinks the two incoming columns as identical (since Drill adopts case-insensitive for column names in execution). The planner should make sure that the conflicting columns are resolved, since execution is name-based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4706) Fragment planning causes Drillbits to read remote chunks when local copies are available
Kunal Khatua created DRILL-4706: --- Summary: Fragment planning causes Drillbits to read remote chunks when local copies are available Key: DRILL-4706 URL: https://issues.apache.org/jira/browse/DRILL-4706 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 1.6.0 Environment: CentOS, RHEL Reporter: Kunal Khatua When a table (datasize=70GB) of 160 parquet files (each having a single rowgroup and fitting within one chunk) is available on a 10-node setup with replication=3 ; a pure data scan query causes about 2% of the data to be read remotely. Even with the creation of metadata cache, the planner is selecting a sub-optimal plan of executing the SCAN fragments such that some of the data is served from a remote server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4705) Simple TSV file with quotes causes read error
Alexey Minakov created DRILL-4705: - Summary: Simple TSV file with quotes causes read error Key: DRILL-4705 URL: https://issues.apache.org/jira/browse/DRILL-4705 Project: Apache Drill Issue Type: Bug Affects Versions: 1.6.0 Environment: Mac OS X 10.11.3 Reporter: Alexey Minakov Attachments: test.tsv A simple TSV file with quotes causes error: Error: DATA_READ ERROR: Error processing input: Cannot use newline character within quoted string, line=3, char=98. Content parsed: [ ] Failure while reading file file:/users/alexeyminakov/drill/apache-drill-1.6.0/sample-data/test.tsv. Happened at or shortly before byte position 98. Fragment 0:0 [Error Id: 7b664e41-89cf-49c8-8843-68a713a1fc24 on 192.168.6.199:31010] (state=,code=0) Full console output: http://pastebin.com/qA7nDumz -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Precedence of List and Map
As I understand, the current algorithm is next: to use any function in drill we must find the best match for this function via get cast rules and precedence map. To implement count function for complex datatypes we just need availability of map and list in precedenceMap and cast rules. Kind regards Vitalii 2016-06-02 18:10 GMT+00:00 Jacques Nadeau: > Why do we need any precedence information for implementing new specific > type functions? > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Thu, Jun 2, 2016 at 9:34 AM, Vitalii Diravka > > wrote: > > > Thank's for reply. > > > > It is necessary for implementing count function on complex datatypes. > > That's why I'm interested only in "precedenceMap" now. > > I'm going to add simple cast rules for Map and List: > > rules.put(MinorType.MAP, Sets.newHashSet(MinorType.MAP)); > > rules.put(MinorType.LIST, Sets.newHashSet(MinorType.LIST)); > > since cast from any other type isn't supported now. > > > > I am agree with placing of Map and List in the end of "precedenceMap" and > > before Union type. > > Does it matter the first will be Map or List on that place? > > > > > > > > Kind regards > > Vitalii > > > > 2016-06-01 17:57 GMT+00:00 Aman Sinha : > > > > > What are the implicit casting rules for promoting a data type to a List > > or > > > Map ? It seems to me the reverse mapping is more useful: casting a > List > > > or Map to a VARCHAR is possible, so for instance I can do a join > between > > a > > > Map containing {x: 1, y: 2} and a Varchar containing the exact same > > > string. To handle this you would add the mapping to the > > > ResolverTypePrecedence.secondaryImplicitCastRules. > > > > > > If there is a valid promotion to List or Map in the precedenceMap, > since > > > these are complex types I would think it belongs to the end just before > > the > > > UNION type (since Union is the superset). > > > > > > On Wed, Jun 1, 2016 at 9:24 AM, Vitalii Diravka < > > vitalii.dira...@gmail.com > > > > > > > wrote: > > > > > > > Hi all! > > > > > > > > I need to add List and Map data types into "precedenceMap" in the > > > > "ResolverTypePrecedence" class. > > > > And I am interested in precedence value of these data types. > > > > What are your thoughts about it? > > > > > > > > > > > > You can see all current precedence map below. > > > > > > > > > > > > > precedenceMap = new HashMap (); > > > > > precedenceMap.put(MinorType.NULL, i += 2); // NULL is legal > to > > > > > implicitly be promoted to any other type > > > > > precedenceMap.put(MinorType.FIXEDBINARY, i += 2); // Fixed-length > is > > > > > promoted to var length > > > > > precedenceMap.put(MinorType.VARBINARY, i += 2); > > > > > precedenceMap.put(MinorType.FIXEDCHAR, i += 2); > > > > > precedenceMap.put(MinorType.VARCHAR, i += 2); > > > > > precedenceMap.put(MinorType.FIXED16CHAR, i += 2); > > > > > precedenceMap.put(MinorType.VAR16CHAR, i += 2); > > > > > precedenceMap.put(MinorType.BIT, i += 2); > > > > > precedenceMap.put(MinorType.TINYINT, i += 2); //type with few > bytes > > > is > > > > > promoted to type with more bytes ==> no data loss. > > > > > precedenceMap.put(MinorType.UINT1, i += 2); //signed is legal > to > > > > > implicitly be promoted to unsigned. > > > > > precedenceMap.put(MinorType.SMALLINT, i += 2); > > > > > precedenceMap.put(MinorType.UINT2, i += 2); > > > > > precedenceMap.put(MinorType.INT, i += 2); > > > > > precedenceMap.put(MinorType.UINT4, i += 2); > > > > > precedenceMap.put(MinorType.BIGINT, i += 2); > > > > > precedenceMap.put(MinorType.UINT8, i += 2); > > > > > precedenceMap.put(MinorType.MONEY, i += 2); > > > > > precedenceMap.put(MinorType.FLOAT4, i += 2); > > > > > precedenceMap.put(MinorType.DECIMAL9, i += 2); > > > > > precedenceMap.put(MinorType.DECIMAL18, i += 2); > > > > > precedenceMap.put(MinorType.DECIMAL28DENSE, i += 2); > > > > > precedenceMap.put(MinorType.DECIMAL28SPARSE, i += 2); > > > > > precedenceMap.put(MinorType.DECIMAL38DENSE, i += 2); > > > > > precedenceMap.put(MinorType.DECIMAL38SPARSE, i += 2); > > > > > precedenceMap.put(MinorType.FLOAT8, i += 2); > > > > > precedenceMap.put(MinorType.DATE, i += 2); > > > > > precedenceMap.put(MinorType.TIMESTAMP, i += 2); > > > > > precedenceMap.put(MinorType.TIMETZ, i += 2); > > > > > precedenceMap.put(MinorType.TIMESTAMPTZ, i += 2); > > > > > precedenceMap.put(MinorType.TIME, i += 2); > > > > > precedenceMap.put(MinorType.INTERVALDAY, i+= 2); > > > > > precedenceMap.put(MinorType.INTERVALYEAR, i+= 2); > > > > > precedenceMap.put(MinorType.INTERVAL, i+= 2); > > > > > precedenceMap.put(MinorType.UNION, i += 2); > > > > > > > > > > > > > > > > Kind regards > > > > Vitalii > > > > > > > > > >
[GitHub] drill issue #512: Drill 4573 fix issue with unicode chars
Github user jinfengni commented on the issue: https://github.com/apache/drill/pull/512 @jcmcote , thanks for the new PR. I'll take a look tomorrow, and let you know my feedback. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---