[jira] [Created] (DRILL-4130) Ability to set settings at Table or View level rather than SESSION or SYSTEM
John Omernik created DRILL-4130: --- Summary: Ability to set settings at Table or View level rather than SESSION or SYSTEM Key: DRILL-4130 URL: https://issues.apache.org/jira/browse/DRILL-4130 Project: Apache Drill Issue Type: Improvement Components: Metadata Affects Versions: 1.3.0 Environment: All Reporter: John Omernik Fix For: Future There are a number of settings within drill for handling data that due to low level of granularity there may be unintended data reading consequences. A few examples include: store.json.read_numbers_as_double and store.json.all_text_mode (There are likely more, these are some I've worked with) The documentation on https://drill.apache.org/docs/json-data-model/ outlines how when dealing with certain types of data, that these settings can be helpful for reading, and indeed some queries fail with a suggestion to change these settings. A few points here. 1. The documentation suggests alter system commands. This is not ideal as it changes the default way drill handles data for all users AND not all users will (should) have the privs to enter this command. The documentation at a minimum should show alter session (or provide a clearer understanding of the difference) But even with alter session, that affects reads for all JSON files for that session, when in reality, the reasoning behind the setting is to be able to read a specific table that has poorly formed JSON. Thus, issuing a command that alters how Drill reads all JSON in order to read one table of JSON could have unintended consequences, especially for a user who just wants to be able to read things and issues commands without thinking things through. Now as an administrator, there are two use cases here. One is I have a table of poorly formed JSON that requires one of these settings, and I can't change the source, therefore, can I create a view that makes it so all reads of this table are done with the more permissive setting? Setting these in a view would be very helpful from an administrator perspective for known bad data sources. Keep users from having to think about it, and let them do their exploration. The other use case, is the ability for a user to set a session level read that only applies for the table being read. alter session set "%tablename%.store.json.read_numbers_as_double = true" (and have the errors that display use that as the default suggestion) that way, the user can issue the command, but not have downstream consequences in their session while reading other tables. Either case is valuable to an administrator, and could help prevent data read issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4129) Ability to Secure Storage plugins
John Omernik created DRILL-4129: --- Summary: Ability to Secure Storage plugins Key: DRILL-4129 URL: https://issues.apache.org/jira/browse/DRILL-4129 Project: Apache Drill Issue Type: Improvement Components: Storage - Other Affects Versions: 1.3.0 Environment: All Reporter: John Omernik Fix For: Future With more storage plugins hitting other data stores with their own authentication schemes, (and thus having to embed credentials into the plugin for access) Drill thus needs the ability to put security around these plugins. Two approaches, perhaps both are needed, one is to somehow challenge the user during the session for credentials and pass those credentials to the underlying storage system. This would involve caching and may or may not be useable for all cases . The other is to provide a way to secure storage plugins, similar to how we secure views (i.e. using filesystem permissions). There was some discussion on the user list, I copied one of my posts here as a potential idea: Then I think the idea of securing each storage plugin may be a good idea. Just an off the cuff idea: What if we had an option to enable security for storage plugins (an opt in process) that specified a filesystem location as a root location for storage plugins. Each storage plugin created would get a directory under that root. STORAGE_PLUGIN_SECURITY_ROOT="maprfs://data/storage_plugins" Then if I had a Mongo plugin named labmongo, a jdbc plugin named labmysql, and a hbase plugin named labhbase it would create directories that would be world readable as such: /data/storage_plugins/labmongo /data/storage_plugins/labmysql /data/storage_plugins/labhbase These would be "world readable" as to be "visible" If you didn't want them to be visible to users, you could change the root permissions to be limiting, but only users who can read them will have them shown in "show databases" In those directories there would be an automatically created a directory called "security" or "default" Permissions and ownership (for impersonation) for the plugin would be set by setting the filesystem permissions on that directory (default/security) Then you could create views for the storage plugin itself that would be located in the root: /data/storage_plugins/labmobgo/view_limited.json /data/storage_plugins/labmongo/view_other_limited.json And use permissions on those views like we do with permissions on filesystem locations. In addition, this model would allow us to create workspaces that are specific to certain tables within the storage plugin (because now we'd have a place to store those workspaces) and those works spaces could have permissions too. I can see potential pitfalls here, however, this gives flexibility and it matches the security model for the filesystem plugin in that people wouldn't have "one" way to do security for filesystem plugins, and another for non-filesystem plugins. It could help increase adoption and ease people into using it through familiarity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4135) Update Vectors & Operators to transfer ownership
Jacques Nadeau created DRILL-4135: - Summary: Update Vectors & Operators to transfer ownership Key: DRILL-4135 URL: https://issues.apache.org/jira/browse/DRILL-4135 Project: Apache Drill Issue Type: Sub-task Components: Execution - Flow Reporter: Jacques Nadeau Assignee: Steven Phillips -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor
Jacques Nadeau created DRILL-4134: - Summary: Incorporate remaining patches from DRILL-1942 Allocator refactor Key: DRILL-4134 URL: https://issues.apache.org/jira/browse/DRILL-4134 Project: Apache Drill Issue Type: Sub-task Reporter: Jacques Nadeau -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4131) Update RPC layer to use child allocators of the RootAllocator rather than using the PooledByteBufAllocatorL directly
[ https://issues.apache.org/jira/browse/DRILL-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-4131: -- Issue Type: Sub-task (was: Improvement) Parent: DRILL-4133 > Update RPC layer to use child allocators of the RootAllocator rather than > using the PooledByteBufAllocatorL directly > > > Key: DRILL-4131 > URL: https://issues.apache.org/jira/browse/DRILL-4131 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Flow >Reporter: Jacques Nadeau > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4137) Metadata Cache not being leveraged
[ https://issues.apache.org/jira/browse/DRILL-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rahul Challapalli updated DRILL-4137: - Attachment: fewtypes.parquet > Metadata Cache not being leveraged > -- > > Key: DRILL-4137 > URL: https://issues.apache.org/jira/browse/DRILL-4137 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Reporter: Rahul Challapalli >Priority: Critical > Attachments: fewtypes.parquet > > > git.commit.id.abbrev=367d74a > The below query is not leveraging the metadata > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> explain plan for select * from fewtypes; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(*=[$0]) > 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:///drill/testdata/metadata_caching/fewtypes/fewtypes.parquet]], > selectionRoot=/drill/testdata/metadata_caching/fewtypes/fewtypes.parquet, > numFiles=1, usedMetadataFile=false, columns=[`*`]]]) > {code} > I attached the data set used -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4124) Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise in logs
[ https://issues.apache.org/jira/browse/DRILL-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027740#comment-15027740 ] ASF GitHub Bot commented on DRILL-4124: --- Github user jaltekruse commented on the pull request: https://github.com/apache/drill/pull/281#issuecomment-159748623 +1 > Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise > in logs > --- > > Key: DRILL-4124 > URL: https://issues.apache.org/jira/browse/DRILL-4124 > Project: Apache Drill > Issue Type: Improvement >Reporter: Julien Le Dem >Assignee: Julien Le Dem > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor
[ https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027796#comment-15027796 ] Jacques Nadeau commented on DRILL-4134: --- Posted here: https://github.com/apache/drill/pull/283 > Incorporate remaining patches from DRILL-1942 Allocator refactor > > > Key: DRILL-4134 > URL: https://issues.apache.org/jira/browse/DRILL-4134 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Flow >Reporter: Jacques Nadeau >Assignee: Jacques Nadeau > Fix For: 1.4.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor
[ https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-4134: -- Fix Version/s: 1.4.0 > Incorporate remaining patches from DRILL-1942 Allocator refactor > > > Key: DRILL-4134 > URL: https://issues.apache.org/jira/browse/DRILL-4134 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Flow >Reporter: Jacques Nadeau >Assignee: Jacques Nadeau > Fix For: 1.4.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor
[ https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau reassigned DRILL-4134: - Assignee: Jacques Nadeau > Incorporate remaining patches from DRILL-1942 Allocator refactor > > > Key: DRILL-4134 > URL: https://issues.apache.org/jira/browse/DRILL-4134 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Flow >Reporter: Jacques Nadeau >Assignee: Jacques Nadeau > Fix For: 1.4.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4136) Enhance queue support to take query cost & available cluster resources into account
Hanifi Gunes created DRILL-4136: --- Summary: Enhance queue support to take query cost & available cluster resources into account Key: DRILL-4136 URL: https://issues.apache.org/jira/browse/DRILL-4136 Project: Apache Drill Issue Type: Improvement Components: Execution - Flow Affects Versions: 1.3.0 Reporter: Hanifi Gunes Assignee: Hanifi Gunes Current queue support relies on a distributed semaphore around a fix pre-defined number. This semaphore indicates the number of queries Drill can run concurrently. Presently, we define small and large queues where we classify queries based on a threshold and use two semaphores around small and large queues individually. This issue proposes to come up with an enhanced queueing or query dispatch mechanism where a query is granted execution based on its cost and availability of system resources(cpu, io, memory etc). Enhancing cost planing and introducing a distributed resource management should be addressed later to fully benefit from this enhancement. The proposal is a non-blocking and asynchronous mechanism that assumes eventual consistency around available system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4133) Improve Allocator accounting and operator level memory ownership
Jacques Nadeau created DRILL-4133: - Summary: Improve Allocator accounting and operator level memory ownership Key: DRILL-4133 URL: https://issues.apache.org/jira/browse/DRILL-4133 Project: Apache Drill Issue Type: Improvement Components: Execution - Flow Reporter: Jacques Nadeau Assignee: Jacques Nadeau -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4137) Metadata Cache not being leveraged
Rahul Challapalli created DRILL-4137: Summary: Metadata Cache not being leveraged Key: DRILL-4137 URL: https://issues.apache.org/jira/browse/DRILL-4137 Project: Apache Drill Issue Type: Bug Components: Metadata Reporter: Rahul Challapalli Priority: Critical git.commit.id.abbrev=367d74a The below query is not leveraging the metadata {code} 0: jdbc:drill:zk=10.10.100.190:5181> explain plan for select * from fewtypes; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(*=[$0]) 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/metadata_caching/fewtypes/fewtypes.parquet]], selectionRoot=/drill/testdata/metadata_caching/fewtypes/fewtypes.parquet, numFiles=1, usedMetadataFile=false, columns=[`*`]]]) {code} I attached the data set used -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4132) Ability to submit simple type of physical plan directly to EndPoint DrillBit for execution
Yuliya Feldman created DRILL-4132: - Summary: Ability to submit simple type of physical plan directly to EndPoint DrillBit for execution Key: DRILL-4132 URL: https://issues.apache.org/jira/browse/DRILL-4132 Project: Apache Drill Issue Type: New Feature Components: Execution - Flow, Execution - RPC Reporter: Yuliya Feldman Assignee: Yuliya Feldman Today Drill Query execution is optimistic and stateful (at least due to data exchanges) - if any of the stages of query execution fails whole query fails. If query is just simple scan, filter push down and project where no data exchange happens between DrillBits there is no need to fail whole query when one DrillBit fails, as minor fragments running on that DrillBit can be rerun on the other DrillBit. There are probably multiple ways to achieve this. This JIRA is to open discussion on: 1. agreement that we need to support above use case 2. means of achieving it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4137) Metadata Cache not being leveraged
[ https://issues.apache.org/jira/browse/DRILL-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027917#comment-15027917 ] Rahul Challapalli commented on DRILL-4137: -- marked it as critical since this is a regression > Metadata Cache not being leveraged > -- > > Key: DRILL-4137 > URL: https://issues.apache.org/jira/browse/DRILL-4137 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Reporter: Rahul Challapalli >Priority: Critical > Attachments: fewtypes.parquet > > > git.commit.id.abbrev=367d74a > The below query is not leveraging the metadata > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> explain plan for select * from fewtypes; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(*=[$0]) > 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:///drill/testdata/metadata_caching/fewtypes/fewtypes.parquet]], > selectionRoot=/drill/testdata/metadata_caching/fewtypes/fewtypes.parquet, > numFiles=1, usedMetadataFile=false, columns=[`*`]]]) > {code} > I attached the data set used -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4137) Metadata Cache not being leveraged
[ https://issues.apache.org/jira/browse/DRILL-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027991#comment-15027991 ] Rahul Challapalli commented on DRILL-4137: -- I have to dig to find the specific commit. I ran it with a build roughly 1 week old and this issue was not present. Also this test is part of our regression tests and is failing consistently > Metadata Cache not being leveraged > -- > > Key: DRILL-4137 > URL: https://issues.apache.org/jira/browse/DRILL-4137 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Reporter: Rahul Challapalli >Priority: Critical > Attachments: fewtypes.parquet > > > git.commit.id.abbrev=367d74a > The below query is not leveraging the metadata > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> explain plan for select * from fewtypes; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(*=[$0]) > 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:///drill/testdata/metadata_caching/fewtypes/fewtypes.parquet]], > selectionRoot=/drill/testdata/metadata_caching/fewtypes/fewtypes.parquet, > numFiles=1, usedMetadataFile=false, columns=[`*`]]]) > {code} > I attached the data set used -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4137) Metadata Cache not being leveraged
[ https://issues.apache.org/jira/browse/DRILL-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027941#comment-15027941 ] Suresh Ollala commented on DRILL-4137: -- [~rkins]Rahul, this is regression from which release? > Metadata Cache not being leveraged > -- > > Key: DRILL-4137 > URL: https://issues.apache.org/jira/browse/DRILL-4137 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Reporter: Rahul Challapalli >Priority: Critical > Attachments: fewtypes.parquet > > > git.commit.id.abbrev=367d74a > The below query is not leveraging the metadata > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> explain plan for select * from fewtypes; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(*=[$0]) > 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:///drill/testdata/metadata_caching/fewtypes/fewtypes.parquet]], > selectionRoot=/drill/testdata/metadata_caching/fewtypes/fewtypes.parquet, > numFiles=1, usedMetadataFile=false, columns=[`*`]]]) > {code} > I attached the data set used -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4047) Select with options
[ https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028119#comment-15028119 ] ASF GitHub Bot commented on DRILL-4047: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/246 > Select with options > --- > > Key: DRILL-4047 > URL: https://issues.apache.org/jira/browse/DRILL-4047 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Relational Operators >Reporter: Julien Le Dem >Assignee: Julien Le Dem > > Add a mechanism to pass parameters down to the StoragePlugin when writing a > Select statement. > Some discussion here: > http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E > http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4063) Missing files/classes needed for S3a access
[ https://issues.apache.org/jira/browse/DRILL-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028116#comment-15028116 ] ASF GitHub Bot commented on DRILL-4063: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/265 > Missing files/classes needed for S3a access > --- > > Key: DRILL-4063 > URL: https://issues.apache.org/jira/browse/DRILL-4063 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Affects Versions: 1.3.0 > Environment: All >Reporter: Nathan Griffith >Assignee: Abhijit Pol > Labels: aws, aws-s3, s3, storage > > Specifying > {code} > "connection": "s3a://" > {code} > results in the following error: > {code} > Error: SYSTEM ERROR: ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.S3AFileSystem not found > {code} > I can fix this by dropping in these files from the hadoop binary tarball: > hadoop-aws-2.6.2.jar > aws-java-sdk-1.7.4.jar > And then adding this to my core-site.xml: > {code:xml} > > fs.s3a.access.key > ACCESSKEY > > > fs.s3a.secret.key > SECRETKEY > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4056) Avro deserialization corrupts data
[ https://issues.apache.org/jira/browse/DRILL-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028118#comment-15028118 ] ASF GitHub Bot commented on DRILL-4056: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/266 > Avro deserialization corrupts data > -- > > Key: DRILL-4056 > URL: https://issues.apache.org/jira/browse/DRILL-4056 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Affects Versions: 1.3.0 > Environment: Ubuntu 15.04 - Oracle Java >Reporter: Stefán Baxter >Assignee: Jason Altekruse > Fix For: 1.3.0 > > Attachments: test.zip > > > I have an Avro file that support the following data/schema: > {"field":"some", "classification":{"variant":"Gæst"}} > When I select 10 rows from this file I get: > +-+ > | EXPR$0| > +-+ > | Gæst| > | Voksen | > | Voksen | > | Invitation KIF KBH | > | Invitation KIF KBH | > | Ordinarie pris KBH | > | Ordinarie pris KBH | > | Biljetter 200 krBH | > | Biljetter 200 krBH | > | Biljetter 200 krBH | > +-+ > The bug is that the field values are incorrectly de-serialized and the value > from the previous row is retained if the subsequent row is shorter. > The sql query: > "select s.classification.variant variant from dfs. as s limit 10;" > That way the "Ordinarie pris" becomes "Ordinarie pris KBH" because the > previous row had the value "Invitation KIF KBH". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4103) Add additional metadata to Parquet files generated by Drill
[ https://issues.apache.org/jira/browse/DRILL-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028117#comment-15028117 ] ASF GitHub Bot commented on DRILL-4103: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/264 > Add additional metadata to Parquet files generated by Drill > --- > > Key: DRILL-4103 > URL: https://issues.apache.org/jira/browse/DRILL-4103 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Reporter: Jacques Nadeau >Assignee: Julien Le Dem > Fix For: 1.3.0 > > > For future compatibility efforts, it would be good for us to automatically > add metadata to Drill generated Parquet files. At a minimum, we should add > information about the fact that Drill generated the files and the version of > Drill that generated the files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4130) Ability to set settings at Table or View level rather than SESSION or SYSTEM
[ https://issues.apache.org/jira/browse/DRILL-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027443#comment-15027443 ] Tomer Shiran commented on DRILL-4130: - Maybe we should deprecate/remote the session variables and only have it as a SELECT option? Most other properties related to reading a file (field delimiter, extract CSV headers, etc.) are actually format options (which will be available as SELECT options soon), so I think having these session/system variables is inconsistent in the first place. Thoughts? > Ability to set settings at Table or View level rather than SESSION or SYSTEM > > > Key: DRILL-4130 > URL: https://issues.apache.org/jira/browse/DRILL-4130 > Project: Apache Drill > Issue Type: Improvement > Components: Metadata >Affects Versions: 1.3.0 > Environment: All >Reporter: John Omernik > Labels: administration, settings > Fix For: Future > > > There are a number of settings within drill for handling data that due to low > level of granularity there may be unintended data reading consequences. A few > examples include: > store.json.read_numbers_as_double > and > store.json.all_text_mode > (There are likely more, these are some I've worked with) > The documentation on https://drill.apache.org/docs/json-data-model/ outlines > how when dealing with certain types of data, that these settings can be > helpful for reading, and indeed some queries fail with a suggestion to change > these settings. > A few points here. 1. The documentation suggests alter system commands. This > is not ideal as it changes the default way drill handles data for all users > AND not all users will (should) have the privs to enter this command. The > documentation at a minimum should show alter session (or provide a clearer > understanding of the difference) > But even with alter session, that affects reads for all JSON files for that > session, when in reality, the reasoning behind the setting is to be able to > read a specific table that has poorly formed JSON. Thus, issuing a command > that alters how Drill reads all JSON in order to read one table of JSON could > have unintended consequences, especially for a user who just wants to be able > to read things and issues commands without thinking things through. > Now as an administrator, there are two use cases here. One is I have a table > of poorly formed JSON that requires one of these settings, and I can't change > the source, therefore, can I create a view that makes it so all reads of this > table are done with the more permissive setting? Setting these in a view > would be very helpful from an administrator perspective for known bad data > sources. Keep users from having to think about it, and let them do their > exploration. > The other use case, is the ability for a user to set a session level read > that only applies for the table being read. alter session set > "%tablename%.store.json.read_numbers_as_double = true" (and have the errors > that display use that as the default suggestion) that way, the user can issue > the command, but not have downstream consequences in their session while > reading other tables. > Either case is valuable to an administrator, and could help prevent data read > issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4130) Ability to set settings at Table or View level rather than SESSION or SYSTEM
[ https://issues.apache.org/jira/browse/DRILL-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027432#comment-15027432 ] Julian Hyde commented on DRILL-4130: Suppose that there is a system property P, and table T has overridden it, and the current session has overridden it also. It's not clear to me whether the table's setting or the session's setting should win. You seem to have in mind that the table's setting would win, and no doubt you have a use case in mind where it makes sense that the table's setting would win. But there are other properties where the user would legitimately expect the session to override the table. If we implement this feature as written we would violate the principle of least surprise. > Ability to set settings at Table or View level rather than SESSION or SYSTEM > > > Key: DRILL-4130 > URL: https://issues.apache.org/jira/browse/DRILL-4130 > Project: Apache Drill > Issue Type: Improvement > Components: Metadata >Affects Versions: 1.3.0 > Environment: All >Reporter: John Omernik > Labels: administration, settings > Fix For: Future > > > There are a number of settings within drill for handling data that due to low > level of granularity there may be unintended data reading consequences. A few > examples include: > store.json.read_numbers_as_double > and > store.json.all_text_mode > (There are likely more, these are some I've worked with) > The documentation on https://drill.apache.org/docs/json-data-model/ outlines > how when dealing with certain types of data, that these settings can be > helpful for reading, and indeed some queries fail with a suggestion to change > these settings. > A few points here. 1. The documentation suggests alter system commands. This > is not ideal as it changes the default way drill handles data for all users > AND not all users will (should) have the privs to enter this command. The > documentation at a minimum should show alter session (or provide a clearer > understanding of the difference) > But even with alter session, that affects reads for all JSON files for that > session, when in reality, the reasoning behind the setting is to be able to > read a specific table that has poorly formed JSON. Thus, issuing a command > that alters how Drill reads all JSON in order to read one table of JSON could > have unintended consequences, especially for a user who just wants to be able > to read things and issues commands without thinking things through. > Now as an administrator, there are two use cases here. One is I have a table > of poorly formed JSON that requires one of these settings, and I can't change > the source, therefore, can I create a view that makes it so all reads of this > table are done with the more permissive setting? Setting these in a view > would be very helpful from an administrator perspective for known bad data > sources. Keep users from having to think about it, and let them do their > exploration. > The other use case, is the ability for a user to set a session level read > that only applies for the table being read. alter session set > "%tablename%.store.json.read_numbers_as_double = true" (and have the errors > that display use that as the default suggestion) that way, the user can issue > the command, but not have downstream consequences in their session while > reading other tables. > Either case is valuable to an administrator, and could help prevent data read > issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4130) Ability to set settings at Table or View level rather than SESSION or SYSTEM
[ https://issues.apache.org/jira/browse/DRILL-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027464#comment-15027464 ] Julian Hyde commented on DRILL-4130: I agree: properties set at the query level would clearly override those set at the session level. The principle of least surprise is restored. My philosophy is that one should be able to set any property at any level above where it is actually used. If you set it at a high level (e.g. set field delimiter at system level) it merely becomes the default for where it is used at a lower level. Some properties only apply at high levels (say system) and it should be illegal to override them at lower levels. > Ability to set settings at Table or View level rather than SESSION or SYSTEM > > > Key: DRILL-4130 > URL: https://issues.apache.org/jira/browse/DRILL-4130 > Project: Apache Drill > Issue Type: Improvement > Components: Metadata >Affects Versions: 1.3.0 > Environment: All >Reporter: John Omernik > Labels: administration, settings > Fix For: Future > > > There are a number of settings within drill for handling data that due to low > level of granularity there may be unintended data reading consequences. A few > examples include: > store.json.read_numbers_as_double > and > store.json.all_text_mode > (There are likely more, these are some I've worked with) > The documentation on https://drill.apache.org/docs/json-data-model/ outlines > how when dealing with certain types of data, that these settings can be > helpful for reading, and indeed some queries fail with a suggestion to change > these settings. > A few points here. 1. The documentation suggests alter system commands. This > is not ideal as it changes the default way drill handles data for all users > AND not all users will (should) have the privs to enter this command. The > documentation at a minimum should show alter session (or provide a clearer > understanding of the difference) > But even with alter session, that affects reads for all JSON files for that > session, when in reality, the reasoning behind the setting is to be able to > read a specific table that has poorly formed JSON. Thus, issuing a command > that alters how Drill reads all JSON in order to read one table of JSON could > have unintended consequences, especially for a user who just wants to be able > to read things and issues commands without thinking things through. > Now as an administrator, there are two use cases here. One is I have a table > of poorly formed JSON that requires one of these settings, and I can't change > the source, therefore, can I create a view that makes it so all reads of this > table are done with the more permissive setting? Setting these in a view > would be very helpful from an administrator perspective for known bad data > sources. Keep users from having to think about it, and let them do their > exploration. > The other use case, is the ability for a user to set a session level read > that only applies for the table being read. alter session set > "%tablename%.store.json.read_numbers_as_double = true" (and have the errors > that display use that as the default suggestion) that way, the user can issue > the command, but not have downstream consequences in their session while > reading other tables. > Either case is valuable to an administrator, and could help prevent data read > issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4130) Ability to set settings at Table or View level rather than SESSION or SYSTEM
[ https://issues.apache.org/jira/browse/DRILL-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027270#comment-15027270 ] Jacques Nadeau commented on DRILL-4130: --- I believe the right way to accomodate this will be to add these types of options to the SELECT WITH OPTIONS functionality. This will allow query lever setting of these values. > Ability to set settings at Table or View level rather than SESSION or SYSTEM > > > Key: DRILL-4130 > URL: https://issues.apache.org/jira/browse/DRILL-4130 > Project: Apache Drill > Issue Type: Improvement > Components: Metadata >Affects Versions: 1.3.0 > Environment: All >Reporter: John Omernik > Labels: administration, settings > Fix For: Future > > > There are a number of settings within drill for handling data that due to low > level of granularity there may be unintended data reading consequences. A few > examples include: > store.json.read_numbers_as_double > and > store.json.all_text_mode > (There are likely more, these are some I've worked with) > The documentation on https://drill.apache.org/docs/json-data-model/ outlines > how when dealing with certain types of data, that these settings can be > helpful for reading, and indeed some queries fail with a suggestion to change > these settings. > A few points here. 1. The documentation suggests alter system commands. This > is not ideal as it changes the default way drill handles data for all users > AND not all users will (should) have the privs to enter this command. The > documentation at a minimum should show alter session (or provide a clearer > understanding of the difference) > But even with alter session, that affects reads for all JSON files for that > session, when in reality, the reasoning behind the setting is to be able to > read a specific table that has poorly formed JSON. Thus, issuing a command > that alters how Drill reads all JSON in order to read one table of JSON could > have unintended consequences, especially for a user who just wants to be able > to read things and issues commands without thinking things through. > Now as an administrator, there are two use cases here. One is I have a table > of poorly formed JSON that requires one of these settings, and I can't change > the source, therefore, can I create a view that makes it so all reads of this > table are done with the more permissive setting? Setting these in a view > would be very helpful from an administrator perspective for known bad data > sources. Keep users from having to think about it, and let them do their > exploration. > The other use case, is the ability for a user to set a session level read > that only applies for the table being read. alter session set > "%tablename%.store.json.read_numbers_as_double = true" (and have the errors > that display use that as the default suggestion) that way, the user can issue > the command, but not have downstream consequences in their session while > reading other tables. > Either case is valuable to an administrator, and could help prevent data read > issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4130) Ability to set settings at Table or View level rather than SESSION or SYSTEM
[ https://issues.apache.org/jira/browse/DRILL-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027489#comment-15027489 ] Jacques Nadeau commented on DRILL-4130: --- I also agree: get table options out of the session level. System makes sense for system defaults. > Ability to set settings at Table or View level rather than SESSION or SYSTEM > > > Key: DRILL-4130 > URL: https://issues.apache.org/jira/browse/DRILL-4130 > Project: Apache Drill > Issue Type: Improvement > Components: Metadata >Affects Versions: 1.3.0 > Environment: All >Reporter: John Omernik > Labels: administration, settings > Fix For: Future > > > There are a number of settings within drill for handling data that due to low > level of granularity there may be unintended data reading consequences. A few > examples include: > store.json.read_numbers_as_double > and > store.json.all_text_mode > (There are likely more, these are some I've worked with) > The documentation on https://drill.apache.org/docs/json-data-model/ outlines > how when dealing with certain types of data, that these settings can be > helpful for reading, and indeed some queries fail with a suggestion to change > these settings. > A few points here. 1. The documentation suggests alter system commands. This > is not ideal as it changes the default way drill handles data for all users > AND not all users will (should) have the privs to enter this command. The > documentation at a minimum should show alter session (or provide a clearer > understanding of the difference) > But even with alter session, that affects reads for all JSON files for that > session, when in reality, the reasoning behind the setting is to be able to > read a specific table that has poorly formed JSON. Thus, issuing a command > that alters how Drill reads all JSON in order to read one table of JSON could > have unintended consequences, especially for a user who just wants to be able > to read things and issues commands without thinking things through. > Now as an administrator, there are two use cases here. One is I have a table > of poorly formed JSON that requires one of these settings, and I can't change > the source, therefore, can I create a view that makes it so all reads of this > table are done with the more permissive setting? Setting these in a view > would be very helpful from an administrator perspective for known bad data > sources. Keep users from having to think about it, and let them do their > exploration. > The other use case, is the ability for a user to set a session level read > that only applies for the table being read. alter session set > "%tablename%.store.json.read_numbers_as_double = true" (and have the errors > that display use that as the default suggestion) that way, the user can issue > the command, but not have downstream consequences in their session while > reading other tables. > Either case is valuable to an administrator, and could help prevent data read > issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4131) Update RPC layer to use child allocators of the RootAllocator rather than using the PooledByteBufAllocatorL directly
[ https://issues.apache.org/jira/browse/DRILL-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-4131: -- Summary: Update RPC layer to use child allocators of the RootAllocator rather than using the PooledByteBufAllocatorL directly (was: Update RPC layer to child allocators of the RootAllocator rather than using the PooledByteBufAllocatorL directly) > Update RPC layer to use child allocators of the RootAllocator rather than > using the PooledByteBufAllocatorL directly > > > Key: DRILL-4131 > URL: https://issues.apache.org/jira/browse/DRILL-4131 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Reporter: Jacques Nadeau > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3726) Drill is not properly interpreting CRLF (0d0a). CR gets read as content.
[ https://issues.apache.org/jira/browse/DRILL-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edmon Begoli updated DRILL-3726: Fix Version/s: (was: Future) 1.4.0 > Drill is not properly interpreting CRLF (0d0a). CR gets read as content. > > > Key: DRILL-3726 > URL: https://issues.apache.org/jira/browse/DRILL-3726 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text & CSV >Affects Versions: 1.1.0 > Environment: Linux RHEL 6.6, OSX 10.9 >Reporter: Edmon Begoli > Fix For: 1.4.0 > > Original Estimate: 120h > Remaining Estimate: 120h > > When we query the last attribute of a text file, we get missing characters. > Looking at the row through Drill, a \r is included at the end of the last > attribute. > Looking in a text editor, it's not embedded into that attribute. > I'm thinking that Drill is not interpreting CRLF (0d0a) as a new line, only > the LF, resulting in the CR becoming part of the last attribute. -- This message was sent by Atlassian JIRA (v6.3.4#6332)