[GitHub] drill pull request #539: DRILL-4759:Drill throwing array index out of bound ...

2016-07-01 Thread ppadma
Github user ppadma closed the pull request at:

https://github.com/apache/drill/pull/539


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #539: DRILL-4759:Drill throwing array index out of bound ...

2016-07-01 Thread ppadma
GitHub user ppadma opened a pull request:

https://github.com/apache/drill/pull/539

DRILL-4759:Drill throwing array index out of bound exception when reading a 
parquet file written by map reduce program.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ppadma/drill master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/539.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #539


commit f5c60a88eb656b4eb383ba31a088330f45fc2f80
Author: Padma Penumarthy 
Date:   2016-06-28T23:16:45Z

Fix for MD-904

commit 3ba14375d4c64c5b346ef6ce565638804f520eac
Author: Padma Penumarthy 
Date:   2016-06-28T23:16:45Z

Update Fix for MD-904

commit 7b837eb37ef0937a5087edd9738712610f79c737
Author: Padma Penumarthy 
Date:   2016-06-29T00:22:13Z

Merge branch 'master' of https://github.com/ppadma/drill

commit a3c20a44d8407af6cb4b01243e7bd799ece2f6d7
Author: Padma Penumarthy 
Date:   2016-06-30T18:31:38Z

Merge branch 'master' of https://github.com/apache/drill

commit 2c335fdddad1052871fa31843bd70df1e092c14b
Author: Padma Penumarthy 
Date:   2016-07-01T00:36:22Z

Fix for MD-904: Drill throwing array index out of bound exception when 
reading a parquet file written by map reduce program.

commit 950d4a8198145b72fabf4a6ea658a83f1201f058
Author: Padma Penumarthy 
Date:   2016-07-01T00:52:38Z

MD-904:Drill throwing array index out of bound exception when reading a 
parquet file written by map reduce program

commit 8914c21216f54600b833895c124d222fc058d307
Author: Padma Penumarthy 
Date:   2016-07-01T05:51:47Z

Updated fix for MD-904

commit 1d5d95a3a4edf44fb0d2cc08e41b9e9b60be8c75
Author: Padma Penumarthy 
Date:   2016-07-01T05:57:38Z

Updated fix for MD-904




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4759) Drill throwing array index out of bound exception when reading a parquet file written by map reduce program.

2016-07-01 Thread Padma Penumarthy (JIRA)
Padma Penumarthy created DRILL-4759:
---

 Summary: Drill throwing array index out of bound exception when 
reading a parquet file written by map reduce program.
 Key: DRILL-4759
 URL: https://issues.apache.org/jira/browse/DRILL-4759
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.7.0
Reporter: Padma Penumarthy
Assignee: Padma Penumarthy
 Fix For: 1.8.0


An ArrayIndexOutOfBound exception is thrown while reading bigInt data type from 
dictionary encoded parquet data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Implement "DROP TABLE IIF EXISTS" statement

2016-07-01 Thread Neeraja Rentachintala
sounds good.
We need to clear document this - specifically the Hive UDF IF syntax issue.

-Neeraja

On Fri, Jul 1, 2016 at 11:07 AM, Vitalii Diravka 
wrote:

> Agree with the last decision to use "IF EXISTS" statement and `if` udf with
> backticks.
> It is acceptable option.
>
> Thank you for valuable advices.
>
> Kind regards
> Vitalii
>
> 2016-06-30 17:22 GMT+00:00 John Omernik :
>
> > I agree with Julian. If we can backtick quote Hive's if and have an
> option
> > for Hive users, it would be nice. But Hive made a mess, and there is
> > precedent for IF.  This makes from a cluster administration perspective,
> > and even being a Hive user, as long as I had an option (with backticks)
> to
> > allow me to move forward, I'd understand and accept the required changes.
> >
> > John
> >
> >
> > On Thu, Jun 30, 2016 at 12:00 PM, Julian Hyde  wrote:
> >
> > > Even though it’s not standard, several other databases have DROP TABLE
> …
> > > IF EXISTS (MySQL [1]; Postgres [2] and SQL Server 2016 [3] put the “IF
> > > EXISTS” before the table name). I know there are problems with the IF
> > > keyword clashing with the Hive “IF” function, but I think it would be
> > crazy
> > > to do “IIF EXISTS”.
> > >
> > > I’d block Hive’s “IF” function, frankly. They screwed up. No need to
> > > propagate their mess into Drill.
> > >
> > > Julian
> > >
> > > [1] http://dev.mysql.com/doc/refman/5.7/en/drop-table.html <
> > > http://dev.mysql.com/doc/refman/5.7/en/drop-table.html>
> > >
> > > [2] https://www.postgresql.org/docs/8.2/static/sql-droptable.html <
> > > https://www.postgresql.org/docs/8.2/static/sql-droptable.html>
> > >
> > > [3]
> > >
> >
> https://blogs.msdn.microsoft.com/sqlserverstorageengine/2015/11/03/drop-if-exists-new-thing-in-sql-server-2016/
> > > <
> > >
> >
> https://blogs.msdn.microsoft.com/sqlserverstorageengine/2015/11/03/drop-if-exists-new-thing-in-sql-server-2016/
> > > >
> > >
> > > > On Jun 30, 2016, at 5:06 AM, Khurram Faraaz 
> > > wrote:
> > > >
> > > > I looked at the SQL standard and I did not find that IF EXISTS is a
> > part
> > > of
> > > > DROP TABLE syntax, please see below.
> > > >
> > > > INTERNATIONAL STANDARD
> > > > ISO/IEC 9075-2
> > > > Fourth edition 2011-12-15
> > > >
> > > >
> > > > Format
> > > >  ::=
> > > >  DROP TABLE  
> > > >
> > > >   ::=
> > > >CASCADE
> > > >  | RESTRICT
> > > >
> > > > On Thu, Jun 30, 2016 at 3:44 PM, Arina Yelchiyeva <
> > > > arina.yelchiy...@gmail.com> wrote:
> > > >
> > > >> To sum up currently we are facing two options:
> > > >>
> > > >> 1. Add IF as keyword.
> > > >> Pros:
> > > >> DROP TABLE / VIEW IF EXISTS will work
> > > >> Cons:
> > > >> if function (loaded from Hive) will stop working. In this case users
> > > will
> > > >> have two options:
> > > >> a) surround if with backticks (ex: select `if`(condition,option1,
> > > option2)
> > > >> from table)
> > > >> b) replace if function with case statement
> > > >>
> > > >> 2. Use IIF instead of IF
> > > >> Pros:
> > > >> if function will work, no backward compatibility issues.
> > > >> Cons:
> > > >> uncommon syntax for IF EXISTS statement
> > > >>
> > > >> So far none of this options seems to be ideal.
> > > >>
> > > >> Kind regards
> > > >> Arina
> > > >>
> > > >>
> > > >> On Wed, Jun 29, 2016 at 8:56 PM Paul Rogers 
> > > wrote:
> > > >>
> > > >>> Hi Vitalii,
> > > >>>
> > > >>> This will be a nice improvement. Your question about “IIF” vs. “IF”
> > is
> > > in
> > > >>> the context of one small enhancement. But, it raises a larger
> > question
> > > >>> (which is beyond the scope of your project, but is worth discussing
> > > >> anyway.)
> > > >>>
> > > >>> That larger issue is that we really should modify the Drill SQL
> > parser
> > > to
> > > >>> better handle keywords vs. identifiers. That is, the following
> > > >>> “pathological” statement should be valid:
> > > >>>
> > > >>> SELECT select, from FROM from, where WHERE from.select =
> where.from;
> > > >>>
> > > >>> This seems very confusing to us humans. But, to the SQL grammar the
> > > above
> > > >>> is unambiguous. SQL syntax determines where a keyword is valid. All
> > > other
> > > >>> uses of that keyword can easily be interpreted as an identifier.
> > > Further,
> > > >>> the location of the identifier determines whether to interpreted it
> > as
> > > a
> > > >>> column, table, schema, function, etc. For example, a keyword will
> > never
> > > >>> appear in a select list, from list or where expression.
> Technically,
> > we
> > > >>> could introduce distinct name spaces for keywords, columns, tables,
> > > >>> functions and so on.
> > > >>>
> > > >>> Without this change we run two risks:
> > > >>>
> > > >>> 1. We can’t use proper SQL syntax when we need it (as in your
> > project.)
> > > >>> 2. We risk breaking queries when we add new keywords (as in the
> > dynamic
> > > >>> UDF project.)
> > > >>>
> > > >>> This is 

[jira] [Created] (DRILL-4758) Option for Lazy/Late Materialization of columns during query with Parquet

2016-07-01 Thread John Omernik (JIRA)
John Omernik created DRILL-4758:
---

 Summary: Option for Lazy/Late Materialization of columns during 
query with Parquet
 Key: DRILL-4758
 URL: https://issues.apache.org/jira/browse/DRILL-4758
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Parquet
Affects Versions: 1.6.0
Reporter: John Omernik


On tables stored as Parquet with lots of columns, it appears that all columns 
requested in the select statement are materialized for every row, regardless of 
the where clause filter. 

For example, a table with 100 columns, 

select field1 from table where id = 123 and client BETWEEN 10 and 100 

Will return in 30 seconds a large amount of data (2 TB) and return no rows. 

However, 

select * from table where id = 123 and client BETWEEN 10 and 100 

will take 15 minutes to run on the same amount of data, while still returning 
no rows.  

If an option (perhaps it should be the default) to only materialize rows that 
match the filter were present, it would provide a huge boon to performance. 

Now, if this were an issue because tables with a small number of columns would 
now have an extra step, one option would be to use table options (select with 
options) to make it so queries to certain tables would have this option, and 
queries to other tables would not.  This is up for discussion, but I think the 
first step is to discuss how something this could be achieved.  This is an item 
also being looked at by the Impala project on Parquet files. (IMPALA-2017) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)