[GitHub] drill pull request #539: DRILL-4759:Drill throwing array index out of bound ...
Github user ppadma closed the pull request at: https://github.com/apache/drill/pull/539 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #539: DRILL-4759:Drill throwing array index out of bound ...
GitHub user ppadma opened a pull request: https://github.com/apache/drill/pull/539 DRILL-4759:Drill throwing array index out of bound exception when reading a parquet file written by map reduce program. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ppadma/drill master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/539.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #539 commit f5c60a88eb656b4eb383ba31a088330f45fc2f80 Author: Padma PenumarthyDate: 2016-06-28T23:16:45Z Fix for MD-904 commit 3ba14375d4c64c5b346ef6ce565638804f520eac Author: Padma Penumarthy Date: 2016-06-28T23:16:45Z Update Fix for MD-904 commit 7b837eb37ef0937a5087edd9738712610f79c737 Author: Padma Penumarthy Date: 2016-06-29T00:22:13Z Merge branch 'master' of https://github.com/ppadma/drill commit a3c20a44d8407af6cb4b01243e7bd799ece2f6d7 Author: Padma Penumarthy Date: 2016-06-30T18:31:38Z Merge branch 'master' of https://github.com/apache/drill commit 2c335fdddad1052871fa31843bd70df1e092c14b Author: Padma Penumarthy Date: 2016-07-01T00:36:22Z Fix for MD-904: Drill throwing array index out of bound exception when reading a parquet file written by map reduce program. commit 950d4a8198145b72fabf4a6ea658a83f1201f058 Author: Padma Penumarthy Date: 2016-07-01T00:52:38Z MD-904:Drill throwing array index out of bound exception when reading a parquet file written by map reduce program commit 8914c21216f54600b833895c124d222fc058d307 Author: Padma Penumarthy Date: 2016-07-01T05:51:47Z Updated fix for MD-904 commit 1d5d95a3a4edf44fb0d2cc08e41b9e9b60be8c75 Author: Padma Penumarthy Date: 2016-07-01T05:57:38Z Updated fix for MD-904 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (DRILL-4759) Drill throwing array index out of bound exception when reading a parquet file written by map reduce program.
Padma Penumarthy created DRILL-4759: --- Summary: Drill throwing array index out of bound exception when reading a parquet file written by map reduce program. Key: DRILL-4759 URL: https://issues.apache.org/jira/browse/DRILL-4759 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 1.7.0 Reporter: Padma Penumarthy Assignee: Padma Penumarthy Fix For: 1.8.0 An ArrayIndexOutOfBound exception is thrown while reading bigInt data type from dictionary encoded parquet data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Implement "DROP TABLE IIF EXISTS" statement
sounds good. We need to clear document this - specifically the Hive UDF IF syntax issue. -Neeraja On Fri, Jul 1, 2016 at 11:07 AM, Vitalii Diravkawrote: > Agree with the last decision to use "IF EXISTS" statement and `if` udf with > backticks. > It is acceptable option. > > Thank you for valuable advices. > > Kind regards > Vitalii > > 2016-06-30 17:22 GMT+00:00 John Omernik : > > > I agree with Julian. If we can backtick quote Hive's if and have an > option > > for Hive users, it would be nice. But Hive made a mess, and there is > > precedent for IF. This makes from a cluster administration perspective, > > and even being a Hive user, as long as I had an option (with backticks) > to > > allow me to move forward, I'd understand and accept the required changes. > > > > John > > > > > > On Thu, Jun 30, 2016 at 12:00 PM, Julian Hyde wrote: > > > > > Even though it’s not standard, several other databases have DROP TABLE > … > > > IF EXISTS (MySQL [1]; Postgres [2] and SQL Server 2016 [3] put the “IF > > > EXISTS” before the table name). I know there are problems with the IF > > > keyword clashing with the Hive “IF” function, but I think it would be > > crazy > > > to do “IIF EXISTS”. > > > > > > I’d block Hive’s “IF” function, frankly. They screwed up. No need to > > > propagate their mess into Drill. > > > > > > Julian > > > > > > [1] http://dev.mysql.com/doc/refman/5.7/en/drop-table.html < > > > http://dev.mysql.com/doc/refman/5.7/en/drop-table.html> > > > > > > [2] https://www.postgresql.org/docs/8.2/static/sql-droptable.html < > > > https://www.postgresql.org/docs/8.2/static/sql-droptable.html> > > > > > > [3] > > > > > > https://blogs.msdn.microsoft.com/sqlserverstorageengine/2015/11/03/drop-if-exists-new-thing-in-sql-server-2016/ > > > < > > > > > > https://blogs.msdn.microsoft.com/sqlserverstorageengine/2015/11/03/drop-if-exists-new-thing-in-sql-server-2016/ > > > > > > > > > > > On Jun 30, 2016, at 5:06 AM, Khurram Faraaz > > > wrote: > > > > > > > > I looked at the SQL standard and I did not find that IF EXISTS is a > > part > > > of > > > > DROP TABLE syntax, please see below. > > > > > > > > INTERNATIONAL STANDARD > > > > ISO/IEC 9075-2 > > > > Fourth edition 2011-12-15 > > > > > > > > > > > > Format > > > > ::= > > > > DROP TABLE > > > > > > > > ::= > > > >CASCADE > > > > | RESTRICT > > > > > > > > On Thu, Jun 30, 2016 at 3:44 PM, Arina Yelchiyeva < > > > > arina.yelchiy...@gmail.com> wrote: > > > > > > > >> To sum up currently we are facing two options: > > > >> > > > >> 1. Add IF as keyword. > > > >> Pros: > > > >> DROP TABLE / VIEW IF EXISTS will work > > > >> Cons: > > > >> if function (loaded from Hive) will stop working. In this case users > > > will > > > >> have two options: > > > >> a) surround if with backticks (ex: select `if`(condition,option1, > > > option2) > > > >> from table) > > > >> b) replace if function with case statement > > > >> > > > >> 2. Use IIF instead of IF > > > >> Pros: > > > >> if function will work, no backward compatibility issues. > > > >> Cons: > > > >> uncommon syntax for IF EXISTS statement > > > >> > > > >> So far none of this options seems to be ideal. > > > >> > > > >> Kind regards > > > >> Arina > > > >> > > > >> > > > >> On Wed, Jun 29, 2016 at 8:56 PM Paul Rogers > > > wrote: > > > >> > > > >>> Hi Vitalii, > > > >>> > > > >>> This will be a nice improvement. Your question about “IIF” vs. “IF” > > is > > > in > > > >>> the context of one small enhancement. But, it raises a larger > > question > > > >>> (which is beyond the scope of your project, but is worth discussing > > > >> anyway.) > > > >>> > > > >>> That larger issue is that we really should modify the Drill SQL > > parser > > > to > > > >>> better handle keywords vs. identifiers. That is, the following > > > >>> “pathological” statement should be valid: > > > >>> > > > >>> SELECT select, from FROM from, where WHERE from.select = > where.from; > > > >>> > > > >>> This seems very confusing to us humans. But, to the SQL grammar the > > > above > > > >>> is unambiguous. SQL syntax determines where a keyword is valid. All > > > other > > > >>> uses of that keyword can easily be interpreted as an identifier. > > > Further, > > > >>> the location of the identifier determines whether to interpreted it > > as > > > a > > > >>> column, table, schema, function, etc. For example, a keyword will > > never > > > >>> appear in a select list, from list or where expression. > Technically, > > we > > > >>> could introduce distinct name spaces for keywords, columns, tables, > > > >>> functions and so on. > > > >>> > > > >>> Without this change we run two risks: > > > >>> > > > >>> 1. We can’t use proper SQL syntax when we need it (as in your > > project.) > > > >>> 2. We risk breaking queries when we add new keywords (as in the > > dynamic > > > >>> UDF project.) > > > >>> > > > >>> This is
[jira] [Created] (DRILL-4758) Option for Lazy/Late Materialization of columns during query with Parquet
John Omernik created DRILL-4758: --- Summary: Option for Lazy/Late Materialization of columns during query with Parquet Key: DRILL-4758 URL: https://issues.apache.org/jira/browse/DRILL-4758 Project: Apache Drill Issue Type: Improvement Components: Storage - Parquet Affects Versions: 1.6.0 Reporter: John Omernik On tables stored as Parquet with lots of columns, it appears that all columns requested in the select statement are materialized for every row, regardless of the where clause filter. For example, a table with 100 columns, select field1 from table where id = 123 and client BETWEEN 10 and 100 Will return in 30 seconds a large amount of data (2 TB) and return no rows. However, select * from table where id = 123 and client BETWEEN 10 and 100 will take 15 minutes to run on the same amount of data, while still returning no rows. If an option (perhaps it should be the default) to only materialize rows that match the filter were present, it would provide a huge boon to performance. Now, if this were an issue because tables with a small number of columns would now have an extra step, one option would be to use table options (select with options) to make it so queries to certain tables would have this option, and queries to other tables would not. This is up for discussion, but I think the first step is to discuss how something this could be achieved. This is an item also being looked at by the Impala project on Parquet files. (IMPALA-2017) -- This message was sent by Atlassian JIRA (v6.3.4#6332)