[GitHub] drill pull request #514: DRILL-4694: CTAS in JSON format produces extraneous...

2016-06-03 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/514#discussion_r65795093
  
--- Diff: exec/java-exec/src/test/resources/json/ctas_alltypes_map_out.json 
---
@@ -0,0 +1,41 @@
+{
--- End diff --

minor point: in TestJsonReader lately we were embedding the data generation 
in the code instead of creating new files..but for small files like these I 
suppose this is fine. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #514: DRILL-4694: CTAS in JSON format produces extraneous...

2016-06-03 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/514#discussion_r65795030
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/JsonOutputRecordWriter.java ---
@@ -120,7 +127,13 @@ public void writeField() throws IOException {
   <#elseif mode.prefix == "Repeated" >
 gen.write${typeName}(i, reader);
   <#else>
+<#if mode.prefix = "Nullable" >
--- End diff --

same as above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #514: DRILL-4694: CTAS in JSON format produces extraneous...

2016-06-03 Thread parthchandra
GitHub user parthchandra opened a pull request:

https://github.com/apache/drill/pull/514

DRILL-4694: CTAS in JSON format produces extraneous NULL fields

   Changed behavior of JSON CTAS to skip fields if the value is null. Added 
an option "store.json.writer.skip_null_fields" to enable old behavior.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/parthchandra/drill DRILL-4694

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/514.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #514


commit 68ff4950f8302725aadf72a50074f6eef735738b
Author: Parth Chandra 
Date:   2016-06-02T00:19:03Z

DRILL-4694: CTAS in JSON format produces extraneous NULL fields
   Changed behavior of JSON CTAS to skip fields if the value is null. Added 
an option "store.json.writer.skip_null_fields" to enable old behavior.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #513: DRILL-4607: - Fix unittest failure. Janino cannot compile ...

2016-06-03 Thread sudheeshkatkam
Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/513
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4707) Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result

2016-06-03 Thread Jinfeng Ni (JIRA)
Jinfeng Ni created DRILL-4707:
-

 Summary: Conflicting columns names under case-insensitive policy 
lead to either memory leak or incorrect result
 Key: DRILL-4707
 URL: https://issues.apache.org/jira/browse/DRILL-4707
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jinfeng Ni
Priority: Critical


On latest master branch:

{code}
select version, commit_id, commit_message from sys.version;
+-+---+-+
| version | commit_id | 
commit_message  |
+-+---+-+
| 1.7.0-SNAPSHOT  | 3186217e5abe3c6c2c7e504cdb695567ff577e4c  | DRILL-4607: Add 
a split function that allows to separate string by a delimiter  |
+-+---+-+
{code}

If a query has two conflicting column names under case-insensitive policy, 
Drill will either hit memory leak, or incorrect issue.

Q1.

{code}
select r_regionkey as XYZ, r_name as xyz FROM cp.`tpch/region.parquet`;
Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory 
leaked: (131072)
Allocator(op:0:0:1:Project) 100/131072/2490368/100 
(res/actual/peak/limit)


Fragment 0:0
{code}

Q2: return only one column in the result. 
{code}
select n_nationkey as XYZ, n_regionkey as xyz FROM cp.`tpch/nation.parquet`;
+--+
| XYZ  |
+--+
| 0|
| 1|
| 1|
| 1|
| 4|
| 0|
| 3|
{code}

The cause of the problem seems to be that the Project thinks the two incoming 
columns as identical (since Drill adopts case-insensitive for column names in 
execution). 

The planner should make sure that the conflicting columns are resolved, since 
execution is name-based. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4706) Fragment planning causes Drillbits to read remote chunks when local copies are available

2016-06-03 Thread Kunal Khatua (JIRA)
Kunal Khatua created DRILL-4706:
---

 Summary: Fragment planning causes Drillbits to read remote chunks 
when local copies are available
 Key: DRILL-4706
 URL: https://issues.apache.org/jira/browse/DRILL-4706
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.6.0
 Environment: CentOS, RHEL
Reporter: Kunal Khatua


When a table (datasize=70GB) of 160 parquet files (each having a single 
rowgroup and fitting within one chunk) is available on a 10-node setup with 
replication=3 ; a pure data scan query causes about 2% of the data to be read 
remotely. 
Even with the creation of metadata cache, the planner is selecting a 
sub-optimal plan of executing the SCAN fragments such that some of the data is 
served from a remote server. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4705) Simple TSV file with quotes causes read error

2016-06-03 Thread Alexey Minakov (JIRA)
Alexey Minakov created DRILL-4705:
-

 Summary: Simple TSV file with quotes causes read error
 Key: DRILL-4705
 URL: https://issues.apache.org/jira/browse/DRILL-4705
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.6.0
 Environment: Mac OS X 10.11.3
Reporter: Alexey Minakov
 Attachments: test.tsv

A simple TSV file with quotes causes error:

Error: DATA_READ ERROR: Error processing input: Cannot use newline character 
within quoted string, line=3, char=98. Content parsed: [ ]

Failure while reading file 
file:/users/alexeyminakov/drill/apache-drill-1.6.0/sample-data/test.tsv. 
Happened at or shortly before byte position 98.
Fragment 0:0

[Error Id: 7b664e41-89cf-49c8-8843-68a713a1fc24 on 192.168.6.199:31010] 
(state=,code=0)



Full console output: http://pastebin.com/qA7nDumz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Precedence of List and Map

2016-06-03 Thread Vitalii Diravka
As I understand, the current algorithm is next:
to use any function in drill we must find the best match for this function
via get cast rules and precedence map.
To implement count function for complex datatypes we just need availability
of map and list in precedenceMap and cast rules.

Kind regards
Vitalii

2016-06-02 18:10 GMT+00:00 Jacques Nadeau :

> Why do we need any precedence information for implementing new specific
> type functions?
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Thu, Jun 2, 2016 at 9:34 AM, Vitalii Diravka  >
> wrote:
>
> > Thank's for reply.
> >
> > It is necessary for implementing count function on complex datatypes.
> > That's why I'm interested only in "precedenceMap" now.
> > I'm going to add simple cast rules for Map and List:
> > rules.put(MinorType.MAP, Sets.newHashSet(MinorType.MAP));
> > rules.put(MinorType.LIST, Sets.newHashSet(MinorType.LIST));
> > since cast from any other type isn't supported now.
> >
> > I am agree with placing of Map and List in the end of "precedenceMap" and
> > before Union type.
> > Does it matter the first will be Map or List on that place?
> >
> >
> >
> > Kind regards
> > Vitalii
> >
> > 2016-06-01 17:57 GMT+00:00 Aman Sinha :
> >
> > > What are the implicit casting rules for promoting a data type to a List
> > or
> > > Map ?  It seems to me the reverse mapping is more useful:  casting a
> List
> > > or Map to a VARCHAR is possible, so for instance I can do a join
> between
> > a
> > > Map containing {x: 1, y: 2}  and a Varchar containing the exact same
> > > string.  To handle this you would add the mapping to the
> > > ResolverTypePrecedence.secondaryImplicitCastRules.
> > >
> > > If there is a valid promotion to List or Map in the precedenceMap,
> since
> > > these are complex types I would think it belongs to the end just before
> > the
> > > UNION type (since Union is the superset).
> > >
> > > On Wed, Jun 1, 2016 at 9:24 AM, Vitalii Diravka <
> > vitalii.dira...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi all!
> > > >
> > > > I need to add List and Map data types into "precedenceMap" in the
> > > > "ResolverTypePrecedence" class.
> > > > And I am interested in precedence value of these data types.
> > > > What are your thoughts about it?
> > > >
> > > >
> > > > You can see all current precedence map below.
> > > >
> > > >
> > > > > precedenceMap = new HashMap();
> > > > > precedenceMap.put(MinorType.NULL, i += 2);   // NULL is legal
> to
> > > > > implicitly be promoted to any other type
> > > > > precedenceMap.put(MinorType.FIXEDBINARY, i += 2); // Fixed-length
> is
> > > > > promoted to var length
> > > > > precedenceMap.put(MinorType.VARBINARY, i += 2);
> > > > > precedenceMap.put(MinorType.FIXEDCHAR, i += 2);
> > > > > precedenceMap.put(MinorType.VARCHAR, i += 2);
> > > > > precedenceMap.put(MinorType.FIXED16CHAR, i += 2);
> > > > > precedenceMap.put(MinorType.VAR16CHAR, i += 2);
> > > > > precedenceMap.put(MinorType.BIT, i += 2);
> > > > > precedenceMap.put(MinorType.TINYINT, i += 2);   //type with few
> bytes
> > > is
> > > > > promoted to type with more bytes ==> no data loss.
> > > > > precedenceMap.put(MinorType.UINT1, i += 2); //signed is legal
> to
> > > > > implicitly be promoted to unsigned.
> > > > > precedenceMap.put(MinorType.SMALLINT, i += 2);
> > > > > precedenceMap.put(MinorType.UINT2, i += 2);
> > > > > precedenceMap.put(MinorType.INT, i += 2);
> > > > > precedenceMap.put(MinorType.UINT4, i += 2);
> > > > > precedenceMap.put(MinorType.BIGINT, i += 2);
> > > > > precedenceMap.put(MinorType.UINT8, i += 2);
> > > > > precedenceMap.put(MinorType.MONEY, i += 2);
> > > > > precedenceMap.put(MinorType.FLOAT4, i += 2);
> > > > > precedenceMap.put(MinorType.DECIMAL9, i += 2);
> > > > > precedenceMap.put(MinorType.DECIMAL18, i += 2);
> > > > > precedenceMap.put(MinorType.DECIMAL28DENSE, i += 2);
> > > > > precedenceMap.put(MinorType.DECIMAL28SPARSE, i += 2);
> > > > > precedenceMap.put(MinorType.DECIMAL38DENSE, i += 2);
> > > > > precedenceMap.put(MinorType.DECIMAL38SPARSE, i += 2);
> > > > > precedenceMap.put(MinorType.FLOAT8, i += 2);
> > > > > precedenceMap.put(MinorType.DATE, i += 2);
> > > > > precedenceMap.put(MinorType.TIMESTAMP, i += 2);
> > > > > precedenceMap.put(MinorType.TIMETZ, i += 2);
> > > > > precedenceMap.put(MinorType.TIMESTAMPTZ, i += 2);
> > > > > precedenceMap.put(MinorType.TIME, i += 2);
> > > > > precedenceMap.put(MinorType.INTERVALDAY, i+= 2);
> > > > > precedenceMap.put(MinorType.INTERVALYEAR, i+= 2);
> > > > > precedenceMap.put(MinorType.INTERVAL, i+= 2);
> > > > > precedenceMap.put(MinorType.UNION, i += 2);
> > > >
> > > >
> > > >
> > > > Kind regards
> > > > Vitalii
> > > >
> > >
> >
>


[GitHub] drill issue #512: Drill 4573 fix issue with unicode chars

2016-06-03 Thread jinfengni
Github user jinfengni commented on the issue:

https://github.com/apache/drill/pull/512
  
@jcmcote , thanks for the new PR. I'll take a look tomorrow, and let you 
know my feedback. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---