[jira] [Created] (DRILL-4130) Ability to set settings at Table or View level rather than SESSION or SYSTEM

2015-11-25 Thread John Omernik (JIRA)
John Omernik created DRILL-4130:
---

 Summary: Ability to set settings at Table or View level rather 
than SESSION or SYSTEM
 Key: DRILL-4130
 URL: https://issues.apache.org/jira/browse/DRILL-4130
 Project: Apache Drill
  Issue Type: Improvement
  Components: Metadata
Affects Versions: 1.3.0
 Environment: All
Reporter: John Omernik
 Fix For: Future


There are a number of settings within drill for handling data that due to low 
level of granularity there may be unintended data reading consequences. A few 
examples include:

store.json.read_numbers_as_double
and
store.json.all_text_mode

(There are likely more, these are some I've worked with)

The documentation on https://drill.apache.org/docs/json-data-model/ outlines 
how when dealing with certain types of data, that these settings can be helpful 
for reading, and indeed some queries fail with a suggestion to change these 
settings. 

A few points here. 1. The documentation suggests alter system commands.  This 
is not ideal as it changes the default way drill handles data for all users AND 
not all users will (should) have the privs to enter this command.  The 
documentation at a minimum should show alter session (or provide a clearer 
understanding of the difference) 

But even with alter session, that affects reads for all JSON files for that 
session, when in reality, the reasoning behind the setting is to be able to 
read a specific table that has poorly formed JSON.  Thus, issuing a command 
that alters how Drill reads all JSON in order to read one table of JSON could 
have unintended consequences, especially for a user who just wants to be able 
to read things and issues commands without thinking things through. 

Now as an administrator, there are two use cases here.  One is I have a table 
of poorly formed JSON that requires one of these settings, and I can't change 
the source, therefore, can I create a view that makes it so all reads of this 
table are done with the more permissive  setting? Setting these in a view would 
be very helpful from an administrator perspective for known bad data sources.  
Keep users from having to think about it, and let them do their exploration. 

The other use case, is the ability for a user to set a session level read that 
only applies for the table being read.  alter session set 
"%tablename%.store.json.read_numbers_as_double = true" (and have the errors 
that display use that as the default suggestion) that way, the user can issue 
the command, but not have downstream consequences in their session while 
reading other tables. 

Either case is valuable to an administrator, and could help prevent data read 
issues. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4129) Ability to Secure Storage plugins

2015-11-25 Thread John Omernik (JIRA)
John Omernik created DRILL-4129:
---

 Summary: Ability to Secure Storage plugins
 Key: DRILL-4129
 URL: https://issues.apache.org/jira/browse/DRILL-4129
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Affects Versions: 1.3.0
 Environment: All
Reporter: John Omernik
 Fix For: Future


With more storage plugins hitting other data stores with their own 
authentication schemes, (and thus having to embed credentials into the plugin 
for access) Drill thus needs the ability to put security around these plugins.  
Two approaches, perhaps both are needed, one is to somehow challenge the user 
during the session for credentials and pass those credentials to the underlying 
storage system. This would involve caching and may or may not be useable for 
all cases .

The other is to provide a way to secure storage plugins, similar to how we 
secure views (i.e. using filesystem permissions).  There was some discussion on 
the user list,  I copied one of my posts here as a potential idea: 

Then I think the idea of securing each storage plugin may be a good idea.  Just 
an off the cuff idea: What if we had an option to enable security for storage 
plugins (an opt in process) that specified a filesystem location as a root 
location for storage plugins. 

Each storage plugin created would get a directory under that root.  

STORAGE_PLUGIN_SECURITY_ROOT="maprfs://data/storage_plugins"


Then if I had a Mongo plugin named labmongo,  a jdbc plugin named labmysql, and 
a hbase plugin named labhbase it would create directories that would be world 
readable as such:

/data/storage_plugins/labmongo
/data/storage_plugins/labmysql
/data/storage_plugins/labhbase

These would be "world readable" as to be "visible"  If you didn't want them to 
be visible to users, you could change the root permissions to be limiting, but 
only users who can read them will have them shown in "show databases"

In those directories there would be an automatically created a directory called 
"security" or "default"  

Permissions and ownership (for impersonation) for the plugin would be set by 
setting the filesystem permissions on that directory (default/security)

Then you could create views for the storage plugin itself that would be located 
in the root:
/data/storage_plugins/labmobgo/view_limited.json
/data/storage_plugins/labmongo/view_other_limited.json

And use permissions on those views like we do with permissions on filesystem 
locations. 

In addition, this model would allow us to create workspaces that are specific 
to certain tables within the storage plugin (because now we'd have a place to 
store those workspaces) and those works spaces could have permissions too.  

I can see potential pitfalls here, however, this gives flexibility and it 
matches the security model for the filesystem plugin in that people wouldn't 
have "one" way to do security for filesystem plugins, and another for 
non-filesystem plugins. It could help increase adoption and ease people into 
using it through familiarity. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4135) Update Vectors & Operators to transfer ownership

2015-11-25 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-4135:
-

 Summary: Update Vectors & Operators to transfer ownership
 Key: DRILL-4135
 URL: https://issues.apache.org/jira/browse/DRILL-4135
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Execution - Flow
Reporter: Jacques Nadeau
Assignee: Steven Phillips






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-11-25 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-4134:
-

 Summary: Incorporate remaining patches from DRILL-1942 Allocator 
refactor
 Key: DRILL-4134
 URL: https://issues.apache.org/jira/browse/DRILL-4134
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Jacques Nadeau






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4131) Update RPC layer to use child allocators of the RootAllocator rather than using the PooledByteBufAllocatorL directly

2015-11-25 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-4131:
--
Issue Type: Sub-task  (was: Improvement)
Parent: DRILL-4133

> Update RPC layer to use child allocators of the RootAllocator rather than 
> using the PooledByteBufAllocatorL directly
> 
>
> Key: DRILL-4131
> URL: https://issues.apache.org/jira/browse/DRILL-4131
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4137) Metadata Cache not being leveraged

2015-11-25 Thread Rahul Challapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli updated DRILL-4137:
-
Attachment: fewtypes.parquet

> Metadata Cache not being leveraged
> --
>
> Key: DRILL-4137
> URL: https://issues.apache.org/jira/browse/DRILL-4137
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Rahul Challapalli
>Priority: Critical
> Attachments: fewtypes.parquet
>
>
> git.commit.id.abbrev=367d74a
> The below query is not leveraging the metadata
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> explain plan for  select * from fewtypes;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(*=[$0])
> 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/metadata_caching/fewtypes/fewtypes.parquet]], 
> selectionRoot=/drill/testdata/metadata_caching/fewtypes/fewtypes.parquet, 
> numFiles=1, usedMetadataFile=false, columns=[`*`]]])
> {code}
> I attached the data set used



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4124) Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise in logs

2015-11-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027740#comment-15027740
 ] 

ASF GitHub Bot commented on DRILL-4124:
---

Github user jaltekruse commented on the pull request:

https://github.com/apache/drill/pull/281#issuecomment-159748623
  
+1


> Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise 
> in logs
> ---
>
> Key: DRILL-4124
> URL: https://issues.apache.org/jira/browse/DRILL-4124
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-11-25 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027796#comment-15027796
 ] 

Jacques Nadeau commented on DRILL-4134:
---

Posted here: https://github.com/apache/drill/pull/283

> Incorporate remaining patches from DRILL-1942 Allocator refactor
> 
>
> Key: DRILL-4134
> URL: https://issues.apache.org/jira/browse/DRILL-4134
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-11-25 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-4134:
--
Fix Version/s: 1.4.0

> Incorporate remaining patches from DRILL-1942 Allocator refactor
> 
>
> Key: DRILL-4134
> URL: https://issues.apache.org/jira/browse/DRILL-4134
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-11-25 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau reassigned DRILL-4134:
-

Assignee: Jacques Nadeau

> Incorporate remaining patches from DRILL-1942 Allocator refactor
> 
>
> Key: DRILL-4134
> URL: https://issues.apache.org/jira/browse/DRILL-4134
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4136) Enhance queue support to take query cost & available cluster resources into account

2015-11-25 Thread Hanifi Gunes (JIRA)
Hanifi Gunes created DRILL-4136:
---

 Summary: Enhance queue support to take query cost & available 
cluster resources into account
 Key: DRILL-4136
 URL: https://issues.apache.org/jira/browse/DRILL-4136
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Flow
Affects Versions: 1.3.0
Reporter: Hanifi Gunes
Assignee: Hanifi Gunes


Current queue support relies on a distributed semaphore around a fix 
pre-defined number. This semaphore indicates the number of queries Drill can 
run concurrently. Presently, we define small and large queues where we classify 
queries based on a threshold and use two semaphores around small and large 
queues individually. 

This issue proposes to come up with an enhanced queueing or query dispatch 
mechanism where a query is granted execution based on its cost and availability 
of system resources(cpu, io, memory etc). Enhancing cost planing and 
introducing a distributed resource management should be addressed later to 
fully benefit from this enhancement. The proposal is a non-blocking and 
asynchronous mechanism that assumes eventual consistency around available 
system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4133) Improve Allocator accounting and operator level memory ownership

2015-11-25 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-4133:
-

 Summary: Improve Allocator accounting and operator level memory 
ownership
 Key: DRILL-4133
 URL: https://issues.apache.org/jira/browse/DRILL-4133
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Flow
Reporter: Jacques Nadeau
Assignee: Jacques Nadeau






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4137) Metadata Cache not being leveraged

2015-11-25 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-4137:


 Summary: Metadata Cache not being leveraged
 Key: DRILL-4137
 URL: https://issues.apache.org/jira/browse/DRILL-4137
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Reporter: Rahul Challapalli
Priority: Critical


git.commit.id.abbrev=367d74a

The below query is not leveraging the metadata

{code}
0: jdbc:drill:zk=10.10.100.190:5181> explain plan for  select * from fewtypes;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(*=[$0])
00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/metadata_caching/fewtypes/fewtypes.parquet]], 
selectionRoot=/drill/testdata/metadata_caching/fewtypes/fewtypes.parquet, 
numFiles=1, usedMetadataFile=false, columns=[`*`]]])
{code}

I attached the data set used



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4132) Ability to submit simple type of physical plan directly to EndPoint DrillBit for execution

2015-11-25 Thread Yuliya Feldman (JIRA)
Yuliya Feldman created DRILL-4132:
-

 Summary: Ability to submit simple type of physical plan directly 
to EndPoint DrillBit for execution
 Key: DRILL-4132
 URL: https://issues.apache.org/jira/browse/DRILL-4132
 Project: Apache Drill
  Issue Type: New Feature
  Components: Execution - Flow, Execution - RPC
Reporter: Yuliya Feldman
Assignee: Yuliya Feldman


Today Drill Query execution is optimistic and stateful (at least due to data 
exchanges) - if any of the stages of query execution fails whole query fails. 
If query is just simple scan, filter push down and project where no data 
exchange happens between DrillBits there is no need to fail whole query when 
one DrillBit fails, as minor fragments running on that DrillBit can be rerun on 
the other DrillBit. There are probably multiple ways to achieve this. This JIRA 
is to open discussion on: 
1. agreement that we need to support above use case 
2. means of achieving it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4137) Metadata Cache not being leveraged

2015-11-25 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027917#comment-15027917
 ] 

Rahul Challapalli commented on DRILL-4137:
--

marked it as critical since this is a regression

> Metadata Cache not being leveraged
> --
>
> Key: DRILL-4137
> URL: https://issues.apache.org/jira/browse/DRILL-4137
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Rahul Challapalli
>Priority: Critical
> Attachments: fewtypes.parquet
>
>
> git.commit.id.abbrev=367d74a
> The below query is not leveraging the metadata
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> explain plan for  select * from fewtypes;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(*=[$0])
> 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/metadata_caching/fewtypes/fewtypes.parquet]], 
> selectionRoot=/drill/testdata/metadata_caching/fewtypes/fewtypes.parquet, 
> numFiles=1, usedMetadataFile=false, columns=[`*`]]])
> {code}
> I attached the data set used



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4137) Metadata Cache not being leveraged

2015-11-25 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027991#comment-15027991
 ] 

Rahul Challapalli commented on DRILL-4137:
--

I have to dig to find the specific commit. I ran it with a build roughly 1 week 
old and this issue was not present.

Also this test is part of our regression tests and is failing consistently

> Metadata Cache not being leveraged
> --
>
> Key: DRILL-4137
> URL: https://issues.apache.org/jira/browse/DRILL-4137
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Rahul Challapalli
>Priority: Critical
> Attachments: fewtypes.parquet
>
>
> git.commit.id.abbrev=367d74a
> The below query is not leveraging the metadata
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> explain plan for  select * from fewtypes;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(*=[$0])
> 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/metadata_caching/fewtypes/fewtypes.parquet]], 
> selectionRoot=/drill/testdata/metadata_caching/fewtypes/fewtypes.parquet, 
> numFiles=1, usedMetadataFile=false, columns=[`*`]]])
> {code}
> I attached the data set used



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4137) Metadata Cache not being leveraged

2015-11-25 Thread Suresh Ollala (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027941#comment-15027941
 ] 

Suresh Ollala commented on DRILL-4137:
--

[~rkins]Rahul, this is regression from which release?

> Metadata Cache not being leveraged
> --
>
> Key: DRILL-4137
> URL: https://issues.apache.org/jira/browse/DRILL-4137
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Rahul Challapalli
>Priority: Critical
> Attachments: fewtypes.parquet
>
>
> git.commit.id.abbrev=367d74a
> The below query is not leveraging the metadata
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> explain plan for  select * from fewtypes;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(*=[$0])
> 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/metadata_caching/fewtypes/fewtypes.parquet]], 
> selectionRoot=/drill/testdata/metadata_caching/fewtypes/fewtypes.parquet, 
> numFiles=1, usedMetadataFile=false, columns=[`*`]]])
> {code}
> I attached the data set used



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4047) Select with options

2015-11-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028119#comment-15028119
 ] 

ASF GitHub Bot commented on DRILL-4047:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/246


> Select with options
> ---
>
> Key: DRILL-4047
> URL: https://issues.apache.org/jira/browse/DRILL-4047
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> Add a mechanism to pass parameters down to the StoragePlugin when writing a 
> Select statement.
> Some discussion here:
> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4063) Missing files/classes needed for S3a access

2015-11-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028116#comment-15028116
 ] 

ASF GitHub Bot commented on DRILL-4063:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/265


> Missing files/classes needed for S3a access
> ---
>
> Key: DRILL-4063
> URL: https://issues.apache.org/jira/browse/DRILL-4063
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.3.0
> Environment: All
>Reporter: Nathan Griffith
>Assignee: Abhijit Pol
>  Labels: aws, aws-s3, s3, storage
>
> Specifying
> {code}
> "connection": "s3a://"
> {code}
> results in the following error:
> {code}
> Error: SYSTEM ERROR: ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> {code}
> I can fix this by dropping in these files from the hadoop binary tarball:
> hadoop-aws-2.6.2.jar
> aws-java-sdk-1.7.4.jar
> And then adding this to my core-site.xml:
> {code:xml}
>   
> fs.s3a.access.key
> ACCESSKEY
>   
>   
> fs.s3a.secret.key
> SECRETKEY
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4056) Avro deserialization corrupts data

2015-11-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028118#comment-15028118
 ] 

ASF GitHub Bot commented on DRILL-4056:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/266


> Avro deserialization corrupts data
> --
>
> Key: DRILL-4056
> URL: https://issues.apache.org/jira/browse/DRILL-4056
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.3.0
> Environment: Ubuntu 15.04 - Oracle Java
>Reporter: Stefán Baxter
>Assignee: Jason Altekruse
> Fix For: 1.3.0
>
> Attachments: test.zip
>
>
> I have an Avro file that support the following data/schema:
> {"field":"some", "classification":{"variant":"Gæst"}}
> When I select 10 rows from this file I get:
> +-+
> |   EXPR$0|
> +-+
> | Gæst|
> | Voksen  |
> | Voksen  |
> | Invitation KIF KBH  |
> | Invitation KIF KBH  |
> | Ordinarie pris KBH  |
> | Ordinarie pris KBH  |
> | Biljetter 200 krBH  |
> | Biljetter 200 krBH  |
> | Biljetter 200 krBH  |
> +-+
> The bug is that the field values are incorrectly de-serialized and the value 
> from the previous row is retained if the subsequent row is shorter.
> The sql query:
> "select s.classification.variant variant from dfs. as s limit 10;"
> That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because the 
> previous row had the value "Invitation KIF KBH".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4103) Add additional metadata to Parquet files generated by Drill

2015-11-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028117#comment-15028117
 ] 

ASF GitHub Bot commented on DRILL-4103:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/264


> Add additional metadata to Parquet files generated by Drill
> ---
>
> Key: DRILL-4103
> URL: https://issues.apache.org/jira/browse/DRILL-4103
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Jacques Nadeau
>Assignee: Julien Le Dem
> Fix For: 1.3.0
>
>
> For future compatibility efforts, it would be good for us to automatically 
> add metadata to Drill generated Parquet files. At a minimum, we should add 
> information about the fact that Drill generated the files and the version of 
> Drill that generated the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4130) Ability to set settings at Table or View level rather than SESSION or SYSTEM

2015-11-25 Thread Tomer Shiran (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027443#comment-15027443
 ] 

Tomer Shiran commented on DRILL-4130:
-

Maybe we should deprecate/remote the session variables and only have it as a 
SELECT option?

Most other properties related to reading a file (field delimiter, extract CSV 
headers, etc.) are actually format options (which will be available as SELECT 
options soon), so I think having these session/system variables is inconsistent 
in the first place. Thoughts?



> Ability to set settings at Table or View level rather than SESSION or SYSTEM
> 
>
> Key: DRILL-4130
> URL: https://issues.apache.org/jira/browse/DRILL-4130
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: 1.3.0
> Environment: All
>Reporter: John Omernik
>  Labels: administration, settings
> Fix For: Future
>
>
> There are a number of settings within drill for handling data that due to low 
> level of granularity there may be unintended data reading consequences. A few 
> examples include:
> store.json.read_numbers_as_double
> and
> store.json.all_text_mode
> (There are likely more, these are some I've worked with)
> The documentation on https://drill.apache.org/docs/json-data-model/ outlines 
> how when dealing with certain types of data, that these settings can be 
> helpful for reading, and indeed some queries fail with a suggestion to change 
> these settings. 
> A few points here. 1. The documentation suggests alter system commands.  This 
> is not ideal as it changes the default way drill handles data for all users 
> AND not all users will (should) have the privs to enter this command.  The 
> documentation at a minimum should show alter session (or provide a clearer 
> understanding of the difference) 
> But even with alter session, that affects reads for all JSON files for that 
> session, when in reality, the reasoning behind the setting is to be able to 
> read a specific table that has poorly formed JSON.  Thus, issuing a command 
> that alters how Drill reads all JSON in order to read one table of JSON could 
> have unintended consequences, especially for a user who just wants to be able 
> to read things and issues commands without thinking things through. 
> Now as an administrator, there are two use cases here.  One is I have a table 
> of poorly formed JSON that requires one of these settings, and I can't change 
> the source, therefore, can I create a view that makes it so all reads of this 
> table are done with the more permissive  setting? Setting these in a view 
> would be very helpful from an administrator perspective for known bad data 
> sources.  Keep users from having to think about it, and let them do their 
> exploration. 
> The other use case, is the ability for a user to set a session level read 
> that only applies for the table being read.  alter session set 
> "%tablename%.store.json.read_numbers_as_double = true" (and have the errors 
> that display use that as the default suggestion) that way, the user can issue 
> the command, but not have downstream consequences in their session while 
> reading other tables. 
> Either case is valuable to an administrator, and could help prevent data read 
> issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4130) Ability to set settings at Table or View level rather than SESSION or SYSTEM

2015-11-25 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027432#comment-15027432
 ] 

Julian Hyde commented on DRILL-4130:


Suppose that there is a system property P, and table T has overridden it, and 
the current session has overridden it also. It's not clear to me whether the 
table's setting or the session's setting should win.

You seem to have in mind that the table's setting would win, and no doubt you 
have a use case in mind where it makes sense that the table's setting would win.

But there are other properties where the user would legitimately expect the 
session to override the table. If we implement this feature as written we would 
violate the principle of least surprise.

> Ability to set settings at Table or View level rather than SESSION or SYSTEM
> 
>
> Key: DRILL-4130
> URL: https://issues.apache.org/jira/browse/DRILL-4130
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: 1.3.0
> Environment: All
>Reporter: John Omernik
>  Labels: administration, settings
> Fix For: Future
>
>
> There are a number of settings within drill for handling data that due to low 
> level of granularity there may be unintended data reading consequences. A few 
> examples include:
> store.json.read_numbers_as_double
> and
> store.json.all_text_mode
> (There are likely more, these are some I've worked with)
> The documentation on https://drill.apache.org/docs/json-data-model/ outlines 
> how when dealing with certain types of data, that these settings can be 
> helpful for reading, and indeed some queries fail with a suggestion to change 
> these settings. 
> A few points here. 1. The documentation suggests alter system commands.  This 
> is not ideal as it changes the default way drill handles data for all users 
> AND not all users will (should) have the privs to enter this command.  The 
> documentation at a minimum should show alter session (or provide a clearer 
> understanding of the difference) 
> But even with alter session, that affects reads for all JSON files for that 
> session, when in reality, the reasoning behind the setting is to be able to 
> read a specific table that has poorly formed JSON.  Thus, issuing a command 
> that alters how Drill reads all JSON in order to read one table of JSON could 
> have unintended consequences, especially for a user who just wants to be able 
> to read things and issues commands without thinking things through. 
> Now as an administrator, there are two use cases here.  One is I have a table 
> of poorly formed JSON that requires one of these settings, and I can't change 
> the source, therefore, can I create a view that makes it so all reads of this 
> table are done with the more permissive  setting? Setting these in a view 
> would be very helpful from an administrator perspective for known bad data 
> sources.  Keep users from having to think about it, and let them do their 
> exploration. 
> The other use case, is the ability for a user to set a session level read 
> that only applies for the table being read.  alter session set 
> "%tablename%.store.json.read_numbers_as_double = true" (and have the errors 
> that display use that as the default suggestion) that way, the user can issue 
> the command, but not have downstream consequences in their session while 
> reading other tables. 
> Either case is valuable to an administrator, and could help prevent data read 
> issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4130) Ability to set settings at Table or View level rather than SESSION or SYSTEM

2015-11-25 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027464#comment-15027464
 ] 

Julian Hyde commented on DRILL-4130:


I agree: properties set at the query level would clearly override those set at 
the session level. The principle of least surprise is restored.

My philosophy is that one should be able to set any property at any level above 
where it is actually used. If you set it at a high level (e.g. set field 
delimiter at system level) it merely becomes the default for where it is used 
at a lower level. Some properties only apply at high levels (say system) and it 
should be illegal to override them at lower levels.

> Ability to set settings at Table or View level rather than SESSION or SYSTEM
> 
>
> Key: DRILL-4130
> URL: https://issues.apache.org/jira/browse/DRILL-4130
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: 1.3.0
> Environment: All
>Reporter: John Omernik
>  Labels: administration, settings
> Fix For: Future
>
>
> There are a number of settings within drill for handling data that due to low 
> level of granularity there may be unintended data reading consequences. A few 
> examples include:
> store.json.read_numbers_as_double
> and
> store.json.all_text_mode
> (There are likely more, these are some I've worked with)
> The documentation on https://drill.apache.org/docs/json-data-model/ outlines 
> how when dealing with certain types of data, that these settings can be 
> helpful for reading, and indeed some queries fail with a suggestion to change 
> these settings. 
> A few points here. 1. The documentation suggests alter system commands.  This 
> is not ideal as it changes the default way drill handles data for all users 
> AND not all users will (should) have the privs to enter this command.  The 
> documentation at a minimum should show alter session (or provide a clearer 
> understanding of the difference) 
> But even with alter session, that affects reads for all JSON files for that 
> session, when in reality, the reasoning behind the setting is to be able to 
> read a specific table that has poorly formed JSON.  Thus, issuing a command 
> that alters how Drill reads all JSON in order to read one table of JSON could 
> have unintended consequences, especially for a user who just wants to be able 
> to read things and issues commands without thinking things through. 
> Now as an administrator, there are two use cases here.  One is I have a table 
> of poorly formed JSON that requires one of these settings, and I can't change 
> the source, therefore, can I create a view that makes it so all reads of this 
> table are done with the more permissive  setting? Setting these in a view 
> would be very helpful from an administrator perspective for known bad data 
> sources.  Keep users from having to think about it, and let them do their 
> exploration. 
> The other use case, is the ability for a user to set a session level read 
> that only applies for the table being read.  alter session set 
> "%tablename%.store.json.read_numbers_as_double = true" (and have the errors 
> that display use that as the default suggestion) that way, the user can issue 
> the command, but not have downstream consequences in their session while 
> reading other tables. 
> Either case is valuable to an administrator, and could help prevent data read 
> issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4130) Ability to set settings at Table or View level rather than SESSION or SYSTEM

2015-11-25 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027270#comment-15027270
 ] 

Jacques Nadeau commented on DRILL-4130:
---

I believe the right way to accomodate this will be to add these types of 
options to the SELECT WITH OPTIONS functionality. This will allow query lever 
setting of these values.

> Ability to set settings at Table or View level rather than SESSION or SYSTEM
> 
>
> Key: DRILL-4130
> URL: https://issues.apache.org/jira/browse/DRILL-4130
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: 1.3.0
> Environment: All
>Reporter: John Omernik
>  Labels: administration, settings
> Fix For: Future
>
>
> There are a number of settings within drill for handling data that due to low 
> level of granularity there may be unintended data reading consequences. A few 
> examples include:
> store.json.read_numbers_as_double
> and
> store.json.all_text_mode
> (There are likely more, these are some I've worked with)
> The documentation on https://drill.apache.org/docs/json-data-model/ outlines 
> how when dealing with certain types of data, that these settings can be 
> helpful for reading, and indeed some queries fail with a suggestion to change 
> these settings. 
> A few points here. 1. The documentation suggests alter system commands.  This 
> is not ideal as it changes the default way drill handles data for all users 
> AND not all users will (should) have the privs to enter this command.  The 
> documentation at a minimum should show alter session (or provide a clearer 
> understanding of the difference) 
> But even with alter session, that affects reads for all JSON files for that 
> session, when in reality, the reasoning behind the setting is to be able to 
> read a specific table that has poorly formed JSON.  Thus, issuing a command 
> that alters how Drill reads all JSON in order to read one table of JSON could 
> have unintended consequences, especially for a user who just wants to be able 
> to read things and issues commands without thinking things through. 
> Now as an administrator, there are two use cases here.  One is I have a table 
> of poorly formed JSON that requires one of these settings, and I can't change 
> the source, therefore, can I create a view that makes it so all reads of this 
> table are done with the more permissive  setting? Setting these in a view 
> would be very helpful from an administrator perspective for known bad data 
> sources.  Keep users from having to think about it, and let them do their 
> exploration. 
> The other use case, is the ability for a user to set a session level read 
> that only applies for the table being read.  alter session set 
> "%tablename%.store.json.read_numbers_as_double = true" (and have the errors 
> that display use that as the default suggestion) that way, the user can issue 
> the command, but not have downstream consequences in their session while 
> reading other tables. 
> Either case is valuable to an administrator, and could help prevent data read 
> issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4130) Ability to set settings at Table or View level rather than SESSION or SYSTEM

2015-11-25 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027489#comment-15027489
 ] 

Jacques Nadeau commented on DRILL-4130:
---

I also agree: get table options out of the session level. System makes sense 
for system defaults.

> Ability to set settings at Table or View level rather than SESSION or SYSTEM
> 
>
> Key: DRILL-4130
> URL: https://issues.apache.org/jira/browse/DRILL-4130
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: 1.3.0
> Environment: All
>Reporter: John Omernik
>  Labels: administration, settings
> Fix For: Future
>
>
> There are a number of settings within drill for handling data that due to low 
> level of granularity there may be unintended data reading consequences. A few 
> examples include:
> store.json.read_numbers_as_double
> and
> store.json.all_text_mode
> (There are likely more, these are some I've worked with)
> The documentation on https://drill.apache.org/docs/json-data-model/ outlines 
> how when dealing with certain types of data, that these settings can be 
> helpful for reading, and indeed some queries fail with a suggestion to change 
> these settings. 
> A few points here. 1. The documentation suggests alter system commands.  This 
> is not ideal as it changes the default way drill handles data for all users 
> AND not all users will (should) have the privs to enter this command.  The 
> documentation at a minimum should show alter session (or provide a clearer 
> understanding of the difference) 
> But even with alter session, that affects reads for all JSON files for that 
> session, when in reality, the reasoning behind the setting is to be able to 
> read a specific table that has poorly formed JSON.  Thus, issuing a command 
> that alters how Drill reads all JSON in order to read one table of JSON could 
> have unintended consequences, especially for a user who just wants to be able 
> to read things and issues commands without thinking things through. 
> Now as an administrator, there are two use cases here.  One is I have a table 
> of poorly formed JSON that requires one of these settings, and I can't change 
> the source, therefore, can I create a view that makes it so all reads of this 
> table are done with the more permissive  setting? Setting these in a view 
> would be very helpful from an administrator perspective for known bad data 
> sources.  Keep users from having to think about it, and let them do their 
> exploration. 
> The other use case, is the ability for a user to set a session level read 
> that only applies for the table being read.  alter session set 
> "%tablename%.store.json.read_numbers_as_double = true" (and have the errors 
> that display use that as the default suggestion) that way, the user can issue 
> the command, but not have downstream consequences in their session while 
> reading other tables. 
> Either case is valuable to an administrator, and could help prevent data read 
> issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4131) Update RPC layer to use child allocators of the RootAllocator rather than using the PooledByteBufAllocatorL directly

2015-11-25 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-4131:
--
Summary: Update RPC layer to use child allocators of the RootAllocator 
rather than using the PooledByteBufAllocatorL directly  (was: Update RPC layer 
to child allocators of the RootAllocator rather than using the 
PooledByteBufAllocatorL directly)

> Update RPC layer to use child allocators of the RootAllocator rather than 
> using the PooledByteBufAllocatorL directly
> 
>
> Key: DRILL-4131
> URL: https://issues.apache.org/jira/browse/DRILL-4131
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3726) Drill is not properly interpreting CRLF (0d0a). CR gets read as content.

2015-11-25 Thread Edmon Begoli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edmon Begoli updated DRILL-3726:

Fix Version/s: (was: Future)
   1.4.0

> Drill is not properly interpreting CRLF (0d0a). CR gets read as content.
> 
>
> Key: DRILL-3726
> URL: https://issues.apache.org/jira/browse/DRILL-3726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.1.0
> Environment: Linux RHEL 6.6, OSX 10.9
>Reporter: Edmon Begoli
> Fix For: 1.4.0
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
>   When we query the last attribute of a text file, we get missing characters. 
>  Looking at the row through Drill, a \r is included at the end of the last 
> attribute.  
> Looking in a text editor, it's not embedded into that attribute.
> I'm thinking that Drill is not interpreting CRLF (0d0a) as a new line, only 
> the LF, resulting in the CR becoming part of the last attribute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)