from:"Mehant Baid"

[jira] [Resolved] (DRILL-3739) NPE on select from Hive for HBase table

2015-12-30 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-3739.

   Resolution: Fixed
Fix Version/s: (was: 1.4.0)
   1.5.0

Fixed in 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a

> NPE on select from Hive for HBase table
> ---
>
> Key: DRILL-3739
> URL: https://issues.apache.org/jira/browse/DRILL-3739
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: ckran
>    Assignee: Mehant Baid
>Priority: Critical
> Fix For: 1.5.0
>
>
> For a table in HBase or MapR-DB with metadata created in Hive so that it can 
> be accessed through beeline or Hue. From Drill query fail with
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> NullPointerException [Error Id: 1cfd2a36-bc73-4a36-83ee-ac317b8e6cdb]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2419) UDF that returns string representation of expression type

2015-12-02 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-2419.

   Resolution: Fixed
Fix Version/s: (was: Future)
   1.3.0

Fixed in eb6325dc9b59291582cd7d3c3e5d02efd5d15906. 



> UDF that returns string representation of expression type
> -
>
> Key: DRILL-2419
> URL: https://issues.apache.org/jira/browse/DRILL-2419
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Victoria Markman
>Assignee: Steven Phillips
> Fix For: 1.3.0
>
>
> Suggested name: typeof (credit goes to Aman)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Moving directory based pruning to fire earlier

2015-11-23 Thread Mehant Baid

As part of DRILL-3996  
Jinfeng mentioned that he plans to move the directory based pruning rule 
earlier than column based pruning. I want to expand on that a little, 
provide the motivation and gather thoughts/ feedback.


Currently both the directory based pruning and the column based pruning 
is fired in the same planning phase and are based on Drill logical rels. 
This is not optimal in the case where data is organized in such a way 
that both directory based pruning and column based pruning can be 
applied (when the data is organized with a nested directory structure 
plus the individual files contain partition columns). As part of 
creating the Drill logical scan we read the footers of all the files 
involved. If the directory based pruning rule is fired earlier (rule to 
fire based on calcite logical rels) then we will be able to prune out 
unnecessary directories and save the work of reading the footers of 
these files.


Thanks
Mehant

Re: Moving directory based pruning to fire earlier

2015-11-23 Thread Mehant Baid

Currently all rules based on Calcite logical rels and Drill logical rels 
are put together and are fired together. As part of DRILL-3996, Jinfeng 
will break it down into different phases. I should be able to take 
advantage of this and move the directory based partition pruning to fire 
based on Calcite rels.


Thanks
Mehant

On 11/23/15 10:58 AM, Hanifi GUNES wrote:

The general idea of multi-phase pruning makes sense to me. I am wondering,
though, are we referring to introducing a new planning phase before the
logical or separating out the logic so as to make directory pruning kick
off ahead of column partitioning?

2015-11-23 10:33 GMT-08:00 Mehant Baid <baid.meh...@gmail.com>:


As part of DRILL-3996 <https://issues.apache.org/jira/browse/DRILL-3996>
Jinfeng mentioned that he plans to move the directory based pruning rule
earlier than column based pruning. I want to expand on that a little,
provide the motivation and gather thoughts/ feedback.

Currently both the directory based pruning and the column based pruning is
fired in the same planning phase and are based on Drill logical rels. This
is not optimal in the case where data is organized in such a way that both
directory based pruning and column based pruning can be applied (when the
data is organized with a nested directory structure plus the individual
files contain partition columns). As part of creating the Drill logical
scan we read the footers of all the files involved. If the directory based
pruning rule is fired earlier (rule to fire based on calcite logical rels)
then we will be able to prune out unnecessary directories and save the work
of reading the footers of these files.

Thanks
Mehant

[jira] [Created] (DRILL-4025) Don't invoke getFileStatus() when metadata cache is available

2015-11-03 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-4025:
--

 Summary: Don't invoke getFileStatus() when metadata cache is 
available
 Key: DRILL-4025
 URL: https://issues.apache.org/jira/browse/DRILL-4025
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.3.0
Reporter: Mehant Baid
Assignee: Mehant Baid


Currently we invoke getFileStatus() to list all the files under a directory 
even when we have the metadata cache file. The information is already present 
in the cache so we don't need to perform this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3941) Add timing instrumentation around Partition Pruning

2015-10-15 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3941:
--

 Summary: Add timing instrumentation around Partition Pruning
 Key: DRILL-3941
 URL: https://issues.apache.org/jira/browse/DRILL-3941
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid


We seem to spending a chunk time doing partition pruning, it would be good to 
log timing information to indicate the amount of time we spend doing pruning. A 
little more granularity to indicate the time taken to build the filter tree and 
in the interpreter would also be good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] Release Apache Drill 1.2.0 RC3

2015-10-14 Thread Mehant Baid


+1.

Built from source and ran unit tests on mac.
Ran a few sample queries in embedded and distributed mode.
Ran some basic sanity tests for drop table.
Verified checksums (md5, sha1)

Thanks
Mehant

On 10/14/15 1:08 PM, Venki Korukanti wrote:

+1.

Built from source
Installed on 3 node cluster
Ran few queries from sqlline and WebUI
Ran few checks and queries to verify HTTPS on Web UI and Hive native
parquet reader are working.

Thanks
Venki

On Wed, Oct 14, 2015 at 1:04 PM, Parth Chandra  wrote:


+1.

Downloaded source. Verified checksums
Built from source (MacOS)
Built C++ client from source (MacOS)
Tested multiple parallel queries (both sync and async APIs) via C++ client
query submitter. Tested cancel from the c++client.

Looks good.




On Mon, Oct 12, 2015 at 7:28 AM, Abdel Hakim Deneche <
adene...@maprtech.com>
wrote:


Hi all,

I propose a fourth release candidate of Apache Drill 1.2.0

The tarball artifacts are hosted at [1] and the maven artifacts are

hosted

at [2].

The vote will be open for the next 72 hours ending at 8AM Pacific,

October

15, 2015.

[ ] +1
[ ] +0
[ ] -1

Here is my vote:

+1

thanks,
Hakim

[1] http://people.apache.org/~adeneche/apache-drill-1.2.0-rc3/
[2]

https://repository.apache.org/content/repositories/orgapachedrill-1009

--

Abdelhakim Deneche

Software Engineer

   


Now Available - Free Hadoop On-Demand Training
<


http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available

[jira] [Created] (DRILL-3817) Refresh metadata does not work when used with sub schema

2015-09-21 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3817:
--

 Summary: Refresh metadata does not work when used with sub schema  
 Key: DRILL-3817
 URL: https://issues.apache.org/jira/browse/DRILL-3817
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


refresh table metadata dfs.tmp.`lineitem` does not work, hit the following 
exception

org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: 
org.apache.calcite.sql.SqlBasicCall cannot be cast to 
org.apache.calcite.sql.SqlIdentifier

If the sub schema is removed it works.
refresh table metadata dfs.`/tmp/lineitem`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Refresh Table Metadata : Cache file owner

2015-09-21 Thread mehant baid

Is impersonation enabled when you perform the refresh?

On Monday, September 21, 2015, rahul challapalli 
wrote:

> Hi,
>
> With the newly checked-in refresh metadata cache feature, I see that the
> cache file is always created as the user who started the drillbit process
> and has nothing to do with the user who has issued the "refresh table
> metadata" command. Can someone from the dev verify this?
>
> - Rahul
>

[jira] [Resolved] (DRILL-3535) Drop table support

2015-09-14 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-3535.

Resolution: Fixed

Fixed in 2a191847154203871454b229d8ef322766aa9ee4

> Drop table support
> --
>
> Key: DRILL-3535
> URL: https://issues.apache.org/jira/browse/DRILL-3535
> Project: Apache Drill
>  Issue Type: New Feature
>    Reporter: Mehant Baid
>    Assignee: Mehant Baid
>
> Umbrella JIRA to track support for "Drop table" feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3045) Drill is not partition pruning due to internal off-heap memory limit for planning phase

2015-09-14 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-3045.

Resolution: Fixed

Fixed in dfa47da669dc2455389151c4e4071d405030c7a9

> Drill is not partition pruning due to internal off-heap memory limit for 
> planning phase
> ---
>
> Key: DRILL-3045
> URL: https://issues.apache.org/jira/browse/DRILL-3045
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.0.0
>Reporter: Victoria Markman
>Assignee: Mehant Baid
> Fix For: 1.2.0
>
> Attachments: DRILL-3045.patch
>
>
> The symptom is: we are running simple query of the form "select x from t 
> where dir0='xyz and dir1='2015-01-01';" partition pruning works for a while 
> and then it stops working.
> Query does run (since we don't fail the query in the case when we failed to 
> prune) and return correct results. 
> drillbit.log
> {code}
> 015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN  
> o.a.d.exec.memory.BufferAllocator - Unable to allocate buffer of size 5000 
> due to memory limit. Current allocation: 16776840
> java.lang.Exception: null
>   at 
> org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:220)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:231)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:333)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>   at 
> org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223)
>  [optiq-core-0.9-drill-r20.jar:na]
>   at 
> org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661)
>  [optiq-core-0.9-drill-r20.jar:na]
>   at 
> net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) 
> [optiq-core-0.9-drill-r20.jar:na]
>   at 
> net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:275) 
> [optiq-core-0.9-drill-r20.jar:na]
>   at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:206)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan(ExplainHandler.java:61)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:145)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:773) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:204) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_65]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_65]
>   at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
> 2015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN  
> o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune 
> partition.
> java.lang.NullPointerException: null
>   at 
> org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:334)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187)
>  ~[drill-java-ex

Re: Review Request 37896: DRILL-3719: Adding negative sign in front of EXTRACT triggers Assertion Error

2015-08-28 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37896/#review96926
---



exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java (line 
1107)
https://reviews.apache.org/r/37896/#comment152618

shouldn't the test case expression be: 
-Extract(day from birth_date) and not include the multiplication 
explicitly?


- Mehant Baid


On Aug. 28, 2015, 5:40 p.m., Sean Hsuan-Yi Chu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/37896/
 ---
 
 (Updated Aug. 28, 2015, 5:40 p.m.)
 
 
 Review request for drill, Aman Sinha, Jinfeng Ni, and Mehant Baid.
 
 
 Bugs: DRILL-3719
 https://issues.apache.org/jira/browse/DRILL-3719
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 Expand -expression as -1 * expression in DrillOptiq
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
  11b9c9e 
   exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 
 6b74ecf 
 
 Diff: https://reviews.apache.org/r/37896/diff/
 
 
 Testing
 ---
 
 on the way
 
 
 Thanks,
 
 Sean Hsuan-Yi Chu

[jira] [Created] (DRILL-3690) Partitioning pruning produces wrong results when there are nested expressions in the filter

2015-08-22 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3690:
--

 Summary: Partitioning pruning produces wrong results when there 
are nested expressions in the filter
 Key: DRILL-3690
 URL: https://issues.apache.org/jira/browse/DRILL-3690
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
Priority: Blocker
 Fix For: 1.2.0


Consider the following query:
select 1 from foo where dir0 not in (1994) and dir1 not in (1995);

The filter condition is: AND(NOT(=($1, 1994)), NOT(=($2, 1995)))
In FindPartitionCondition we rewrite the filter to cherry pick the partition 
column conditions so the interpreter can evaluate it, however when the 
expression contains more than two levels of nesting (in this case AND(NOT(=))) 
) the expression does not get rewritten correctly. In this case the expression 
gets rewritten as: AND(=($1, 1994), =($2, 1995)). NOT is missing from the 
rewritten expression producing wrong results.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Hangout starting in 5 minutes!

2015-08-18 Thread Mehant Baid


Come join the Drill community hangout as we discuss what has been happening 
lately
and what is in the pipeline. All are welcome, if you know about Drill, want
to know more or just want to listen in.

Link:https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

Thanks

Meeting minutes from today's hangout (08/18)

2015-08-18 Thread Mehant Baid


Meeting minutes from today's hangout (08/18)

Attendees: Jacques, Andrew, Aman, Parth, Hsuan, Daniel, Kris, Hakim and 
Mehant


- JDBC storage plugin:
* Basic queries work, join pushdown, filter pushdown works.
* Jacques to add more tests to this
* Needs an initial review and more cleanup in the record reader logic
* Need support for more data types

- JDBC-all shading
* Jacques close to getting it to work. Right ordering between 
shading and proguard needs to be figured out.


- Travis CI/ CI
* Travis has about 3G of memory with a 15 minute time limit and 
currently the way the build and unit tests are laid out this will not work.
* Folks at Dremio are planning to spend some time looking at tests 
and making sure we don't start and shut down DrillBit for each test 
class that extends BaseTestQuery. CI can be revisited once this is done.


- RPC offloading
* Sudheesh, Parth and Jacques to have a quick sync up and analyze 
the performance impact of the patch.


- Window function memory leak
* Initial thought of Jacques is that the logic for memory being 
released by Netty back to the OS is not getting triggered. Hakim to 
provide a local reproduction with a single window function and sync up.


Thanks
Mehant

Re: [DISCUSS] Drop table support

2015-08-06 Thread Mehant Baid

I think there has been reasonable agreement as to what is to be
supported in the first iteration of this feature. I have summarized the
decisions made on this list in a document. If you have any more
suggestions please get them in by today.

https://docs.google.com/document/d/1XFdNMXnCZ4cLFcg1gHRutBo_hZ9WzvCuKx3Fithd4-k/edit?usp=sharing

Thanks
Mehant
On 8/5/15 1:49 PM, Neeraja Rentachintala wrote:

Another question/comment.

Does Drill need to manage concurrency for the Drop table i.e how do you
deal with users trying to read the data while somebody is dropping. Does it
need to implement some kind of locking.

I have some thoughts on that but would like to know others think - Drill is
not (yet) a transactional system but rather an interactive query layer on
variety of stores. The couple of most common use cases I can think of in
this context are - a user doing analytics/exploration and as part of it he
would create some intermediate tables, insert data into them and drop the
tables or BI tools generating these intermediate tables for processing
queries. Both these do not have the concurrency issue..
Additionally given that the data is externally managed, there could always
be other processes adding and deleting files and Drill doesn't even have
control over them.
Overall, I think the first phase of DROP implementation might be ok not to
have these locking/concurrency checks.

Thoughts?

-Neeraja

On Wed, Aug 5, 2015 at 11:54 AM, Mehant Baid baid.meh...@gmail.com wrote:

What you are suggesting makes sense in the case when security is enabled.
So when Drill is accessing the file system it will impersonate the user who
issued the command and drop will happen if the user has sufficient
permissions.

However when security isn't enabled, Drill will be accessing the file
system as the Drill user itself which is most likely to be a super user who
has permissions to delete most files. To prevent any catastrophic drops
checking for homogenous file formats makes sure that at least the directory
being dropped is something that can be read by Drill. This will prevent any
accidental drops (like dropping the home directory etc, because its likely
to have file formats that cannot be read by Drill). This will not prevent
against malicious behavior (for handling this security should be enabled).

Thanks
Mehant

On 8/5/15 11:43 AM, Ted Dunning wrote:

Is any check really necessary?

Can't we just say that for data sources that are file-like that drop is a
rough synonym for rm? If you have permission to remove files and
directories, you can do it. If you don't, it will fail, possibly half
done. I have never seen a bug filed against rm to add more elaborate
semantics, so why is it so necessary for Drill to have elaborate semantics
here?

On Wed, Aug 5, 2015 at 11:09 AM, Ramana I N inram...@gmail.com wrote:

The homogenous check- Will it be just checking for types are homogenous or

if they are actually types that can be read by drill?
Also, is there a good way to determine if a file can be read by drill?
And
will there be a perf hit if there are large number of files?

Regards
Ramana

On Wed, Aug 5, 2015 at 11:03 AM, Mehant Baid baid.meh...@gmail.com
wrote:

I agree, it is definitely restrictive. We can lift the restriction for

being able to drop a table (when security is off) only if the Drill user
owns it. I think the check for homogenous files should give us enough
confidence that we are not deleting a non Drill directory.

Thanks
Mehant

On 8/4/15 10:00 PM, Neeraja Rentachintala wrote:

Ted, thats fair point on the recovery part.

Regarding the other point by Mehant (copied below) ,there is an
implication
that user can drop only Drill managed tables (i.e created as Drill
user)
when security is not enabled. I think this check is too restrictive

(also
unintuitive). Drill doesn't have the concept of external/managed tables

and
a user (impersonated user if security is enabled or Drillbit service

user
if no security is enabled) should be able to drop the table if they have

permissions to do so. The above design proposes a check to verify if
the
files that need to be deleted are readable by Drill and I believe is a
good
validation to have.

/The above check is in the case when security is not enabled. Meaning
we
are executing as the Drill user. If we are running as the Drill user
(which
might be root or a super user) its likely that this user has
permissions
to
delete most files and checking for permissions might not suffice. So

when
security isn't enabled the proposal is to delete only those files that
are
owned (created) by the Drill user./

On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com
wrote:

On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala

nrentachint...@maprtech.com wrote:

Also will there any mechanism to recover once you accidentally drop?

yes. Snapshots
https://www.mapr.com/resources/videos/mapr-snapshots

Seriously, recovery of data due to user error

Re: [DISCUSS] Drop table support

2015-08-05 Thread Mehant Baid

I agree, it is definitely restrictive. We can lift the restriction for 
being able to drop a table (when security is off) only if the Drill user 
owns it. I think the check for homogenous files should give us enough 
confidence that we are not deleting a non Drill directory.


Thanks
Mehant

On 8/4/15 10:00 PM, Neeraja Rentachintala wrote:

Ted, thats fair point on the recovery part.

Regarding the other point by Mehant (copied below) ,there is an implication
that user can drop only Drill managed tables (i.e created as Drill user)
when security is not enabled. I think this check is too restrictive (also
unintuitive). Drill doesn't have the concept of external/managed tables and
a user (impersonated user if security is enabled or Drillbit service user
if no security is enabled) should be able to drop the table if they have
permissions to do so. The above design proposes a check to verify if the
files that need to be deleted are readable by Drill and I believe is a good
validation to have.

/The above check is in the case when security is not enabled. Meaning we
are executing as the Drill user. If we are running as the Drill user (which
might be root or a super user) its likely that this user has permissions to
delete most files and checking for permissions might not suffice. So when
security isn't enabled the proposal is to delete only those files that are
owned (created) by the Drill user./


On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com wrote:


On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala 
nrentachint...@maprtech.com wrote:


Also will there any mechanism to recover once you accidentally drop?


yes.  Snapshots https://www.mapr.com/resources/videos/mapr-snapshots.

Seriously, recovery of data due to user error is a platform thing.  How can
we recover from turning off the cluster?  From removing a disk on an Oracle
node?

I don't think that this is Drill's business.

Re: [DISCUSS] Drop table support

2015-08-05 Thread Mehant Baid

What you are suggesting makes sense in the case when security is 
enabled. So when Drill is accessing the file system it will impersonate 
the user who issued the command and drop will happen if the user has 
sufficient permissions.


However when security isn't enabled, Drill will be accessing the file 
system as the Drill user itself which is most likely to be a super user 
who has permissions to delete most files. To prevent any catastrophic 
drops checking for homogenous file formats makes sure that at least the 
directory being dropped is something that can be read by Drill. This 
will prevent any accidental drops (like dropping the home directory etc, 
because its likely to have file formats that cannot be read by Drill). 
This will not prevent against malicious behavior (for handling this 
security should be enabled).


Thanks
Mehant
On 8/5/15 11:43 AM, Ted Dunning wrote:

Is any check really necessary?

Can't we just say that for data sources that are file-like that drop is a
rough synonym for rm? If you have permission to remove files and
directories, you can do it.  If you don't, it will fail, possibly half
done. I have never seen a bug filed against rm to add more elaborate
semantics, so why is it so necessary for Drill to have elaborate semantics
here?



On Wed, Aug 5, 2015 at 11:09 AM, Ramana I N inram...@gmail.com wrote:


The homogenous check- Will it be just checking for types are homogenous or
if they are actually types that can be read by drill?
Also, is there a good way to determine if a file can be read by drill? And
will there be a perf hit if there are large number of files?

Regards
Ramana


On Wed, Aug 5, 2015 at 11:03 AM, Mehant Baid baid.meh...@gmail.com
wrote:


I agree, it is definitely restrictive. We can lift the restriction for
being able to drop a table (when security is off) only if the Drill user
owns it. I think the check for homogenous files should give us enough
confidence that we are not deleting a non Drill directory.

Thanks
Mehant


On 8/4/15 10:00 PM, Neeraja Rentachintala wrote:


Ted, thats fair point on the recovery part.

Regarding the other point by Mehant (copied below) ,there is an
implication
that user can drop only Drill managed tables (i.e created as Drill user)
when security is not enabled. I think this check is too restrictive

(also

unintuitive). Drill doesn't have the concept of external/managed tables
and
a user (impersonated user if security is enabled or Drillbit service

user

if no security is enabled) should be able to drop the table if they have
permissions to do so. The above design proposes a check to verify if the
files that need to be deleted are readable by Drill and I believe is a
good
validation to have.

/The above check is in the case when security is not enabled. Meaning we
are executing as the Drill user. If we are running as the Drill user
(which
might be root or a super user) its likely that this user has permissions
to
delete most files and checking for permissions might not suffice. So

when

security isn't enabled the proposal is to delete only those files that

are

owned (created) by the Drill user./


On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com
wrote:

On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala 

nrentachint...@maprtech.com wrote:

Also will there any mechanism to recover once you accidentally drop?

yes.  Snapshots https://www.mapr.com/resources/videos/mapr-snapshots

.

Seriously, recovery of data due to user error is a platform thing.  How
can
we recover from turning off the cluster?  From removing a disk on an
Oracle
node?

I don't think that this is Drill's business.

Re: Review Request 36875: DRILL-3554: Union over TIME and TIMESTAMP values throws SchemaChangeException

2015-08-03 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36875/#review93962
---

Ship it!


Ship It!

- Mehant Baid


On Aug. 2, 2015, 11:46 p.m., Sean Hsuan-Yi Chu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36875/
 ---
 
 (Updated Aug. 2, 2015, 11:46 p.m.)
 
 
 Review request for drill and Mehant Baid.
 
 
 Bugs: DRILL-3554
 https://issues.apache.org/jira/browse/DRILL-3554
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 Given types timestamp and time, implicit casting
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/resolver/ResolverTypePrecedence.java
  ea3155d 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/resolver/TypeCastRules.java
  f861586 
   exec/java-exec/src/test/java/org/apache/drill/TestImplicitCasting.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/36875/diff/
 
 
 Testing
 ---
 
 Unit, Functional, tpch
 
 
 Thanks,
 
 Sean Hsuan-Yi Chu

[jira] [Created] (DRILL-3593) Reorganize classes that are exposed to storage plugins

2015-08-02 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3593:
--

 Summary: Reorganize classes that are exposed to storage plugins
 Key: DRILL-3593
 URL: https://issues.apache.org/jira/browse/DRILL-3593
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


Based on the discussion on DRILL-3500 we want to reorganize some of the 
classes/ interfaces (QueryContext, PlannerSettings, OptimizerRulesContext ...) 
present at planning time and decide what is to be exposed to storage plugin's. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3500) Provide additional information while registering storage plugin optimizer rules

2015-08-02 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-3500.

Resolution: Fixed

Fixed in f8197cfe1bc3671aa6878ef9d1869b2fe8e57331

 Provide additional information while registering storage plugin optimizer 
 rules
 ---

 Key: DRILL-3500
 URL: https://issues.apache.org/jira/browse/DRILL-3500
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


 Currently all the optimizer rules internal to Drill have access to 
 QueryContext. This is used by a few rules like PruneScanRule which invoke the 
 interpreter to perform partition pruning. However the rules that belong to 
 specific storage plugins don't have access to this information. This JIRA 
 aims to do the following
 1. Add a new interface OptimizerRulesContext that will be implemented by 
 QueryContext. It will contain all the information needed by the rules. This 
 context will be passed to the storage plugin method while getting the 
 optimizer rules specific to that storage plugin.
 2. Restrict existing internal rules to only accept OptimizerRulesContext 
 instead of QueryContext so information in QueryContext has better 
 encapsulation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[DISCUSS] Drop table support

2015-07-30 Thread mehant baid

 Based on the discussion in the hangout I wanted to start a thread around
Drop table support.

Couple of high level points about what is planned to be supported

1. In the first iteration Drop table will only support dropping tables in
the file system and not dropping tables in Hive/ Hbase or other storage
plugins.
2. Since Drop table is potentially risky we want to be pessimistic about
dropping tables.

There are two broad scenarios while dealing with Drop table - Security
enabled and Security Disabled. In both cases we would like to follow the
below workflow

1. Check if the table being dropped can be consumed by Drill.
* Meaning do all the files in the directories conform to a format that
Drill can read (parquet, json, csv etc). Jacques pointed out that if there
is a bug in this logic where if one of the files in the directory conforms
to a format that Drill can read we create a DrillTable and error out if we
encounter other files we cannot read.
* The above point can in the worst case entail reading the entire file
system, if a user issues a drop table command on the root of the file
system. But its more likely that we will encounter a file that Drill cannot
read soon and abort the Drop with an error.
* Another minor clarification is we consider only those directories to
be consumable by Drill if they contain file formats that are homogenous and
can be read by Drill. For eg: we should fail if a user is trying to delete
a directory that contains both JSON and Parquet files.

2. Once we have confirmed that the table requested to be dropped contains
homogenous files which can be read by Drill, we delve into the file
permissions.
* If security is enabled, we impersonate the user issuing the command
and drop the directory (succeeds if FS allows and user has correct
permissions).
* If security is not enabled, we only drop the directory if all the
files are owned by the user Drillbit is running as (being pessimistic about
drop). We should collect this information when checking for homogenous
files.

Open Questions:

Views: How do we handle views that were created on top of the dropped
table. Following are a couple of scenarios we might want to explore
* Views are treated as a different entity and its useful for the user
to have a view definition still in place as the dropped table will be
replaced with new set of files with the exact schema and existing view
definition suffices. AFAIK, Oracle and SQL Server have this model and don't
drop the views if the base table is dropped.
* Once the table is dropped, the view definition is no longer needed
and hence should be dropped automatically. We can probably punt on this
till we have dotdrill files. With dotdrill files we can maintain some
information to indicate the views on this table and can drop the views
implicitly. But given that some of the popular databases don't do this, we
might want to conform to the standard behavior.

Thanks
Mehant

Re: Review Request 36630: DRILL-3503: Make PruneScanRule pluggable

2015-07-30 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36630/
---

(Updated July 31, 2015, 3:35 a.m.)


Review request for drill and Aman Sinha.


Changes
---

Rebased on latest master. Moved a change from the patch for 3121 here.


Bugs: DRILL-3503
https://issues.apache.org/jira/browse/DRILL-3503


Repository: drill-git


Description
---

Added an interface to abstract away partitioning scheme away from the partition 
pruning rule. Removed some of the redundant logic in PruneScanRule.


Diffs (updated)
-

  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java
 8307dff 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/DFSPartitionLocation.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/FileSystemPartitionDescriptor.java
 9ad14b1 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/ParquetPartitionDescriptor.java
 127e70a 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/PartitionDescriptor.java
 35fdae9 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/PartitionLocation.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushPartitionFilterIntoScan.java
 b83cedd 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
 daa7276 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/ParquetPruneScanRule.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java
 5b5e4bc 

Diff: https://reviews.apache.org/r/36630/diff/


Testing
---


Thanks,

Mehant Baid

Re: [DISCUSS] Drop table support

2015-07-30 Thread Mehant Baid


Answers inline.

On 7/30/15 4:56 PM, Neeraja Rentachintala wrote:

Few questions/comments inline.

On Thu, Jul 30, 2015 at 2:53 PM, mehant baid baid.meh...@gmail.com wrote:


  Based on the discussion in the hangout I wanted to start a thread around
Drop table support.

Couple of high level points about what is planned to be supported

1. In the first iteration Drop table will only support dropping tables in
the file system and not dropping tables in Hive/ Hbase or other storage
plugins.
2. Since Drop table is potentially risky we want to be pessimistic about
dropping tables.

There are two broad scenarios while dealing with Drop table - Security
enabled and Security Disabled. In both cases we would like to follow the
below workflow

1. Check if the table being dropped can be consumed by Drill.


[Neeraja] I am assuming if security is enabled, this is done with the
impersonated user identity. is this accurate.
/This is orthogonal to security/ file permissions. We want to make sure 
the directory we are dropping only contains homogenous file formats that 
Drill can read (eg: only .parquet, .json etc)./

 * Meaning do all the files in the directories conform to a format that
Drill can read (parquet, json, csv etc). Jacques pointed out that if there
is a bug in this logic where if one of the files in the directory conforms
to a format that Drill can read we create a DrillTable and error out if we
encounter other files we cannot read.


[Neeraja] What does it mean to create DrillTable here?
/I leaked a bit of existing implementation detail here. //The point I 
was trying to make was that the check for homogenous files in a 
directory applies to select and drop. /



 * The above point can in the worst case entail reading the entire file
system, if a user issues a drop table command on the root of the file
system. But its more likely that we will encounter a file that Drill cannot
read soon and abort the Drop with an error.
 * Another minor clarification is we consider only those directories to
be consumable by Drill if they contain file formats that are homogenous and
can be read by Drill. For eg: we should fail if a user is trying to delete
a directory that contains both JSON and Parquet files.

2. Once we have confirmed that the table requested to be dropped contains
homogenous files which can be read by Drill, we delve into the file
permissions.
 * If security is enabled, we impersonate the user issuing the command
and drop the directory (succeeds if FS allows and user has correct
permissions).
 * If security is not enabled, we only drop the directory if all the
files are owned by the user Drillbit is running as (being pessimistic about
drop). We should collect this information when checking for homogenous
files.


[Neeraja] Why do we need this check. How is this different from the
impersonated user scenario.
/The above check is in the case when security is not enabled. Meaning we 
are executing as the Drill user. If we are running as the Drill user 
(which might be root or a super user) its likely that this user has 
permissions to delete most files and checking for permissions might not 
suffice. So when security isn't enabled the proposal is to delete only 
those files that are owned (created) by the Drill user./



Open Questions:

Views: How do we handle views that were created on top of the dropped
table. Following are a couple of scenarios we might want to explore
 * Views are treated as a different entity and its useful for the user
to have a view definition still in place as the dropped table will be
replaced with new set of files with the exact schema and existing view
definition suffices. AFAIK, Oracle and SQL Server have this model and don't
drop the views if the base table is dropped.
 * Once the table is dropped, the view definition is no longer needed
and hence should be dropped automatically. We can probably punt on this
till we have dotdrill files. With dotdrill files we can maintain some
information to indicate the views on this table and can drop the views
implicitly. But given that some of the popular databases don't do this, we
might want to conform to the standard behavior.


[Neeraja] Agree with the recommendation here. It seems we can go with a
simpler approach here i.e treat views as different entity

Also will there any mechanism to recover once you accidentally drop?


Thanks
Mehant

Re: Review Request 36809: DRILL-3121: Hive partition pruning

2015-07-30 Thread Mehant Baid



 On July 30, 2015, 11:50 p.m., Aman Sinha wrote:
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java,
   line 104
  https://reviews.apache.org/r/36809/diff/2/?file=1023977#file1023977line104
 
  Do you want this to be case-sensitive comparison ?

The new files are essentially a subset of the old files so case-sensitive 
comparison should suffice.


 On July 30, 2015, 11:50 p.m., Aman Sinha wrote:
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionLocation.java,
   line 38
  https://reviews.apache.org/r/36809/diff/2/?file=1023978#file1023978line38
 
  This will throw IOBE if supplied max nesting level is less than 
  mostDirs.length.

This should not be the case, since in the case of hive, we get max partition 
hierarchy from the partition metadata itself as opposed to in the case of dfs 
where we set a maximum hierarchy of 10. I have added an assert.


 On July 30, 2015, 11:50 p.m., Aman Sinha wrote:
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/ParquetPartitionDescriptor.java,
   line 89
  https://reviews.apache.org/r/36809/diff/2/?file=1023985#file1023985line89
 
  It wasn't clear why the Hive changes affected this.. is this based on a 
  different patch from before ?

While creating a patch I missed this minor change, it belongs in 3503. Moved it.


 On July 30, 2015, 11:50 p.m., Aman Sinha wrote:
  contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java,
   line 68
  https://reviews.apache.org/r/36809/diff/2/?file=1023984#file1023984line68
 
  Are there any Hive PP unit tests that reference subdirectories ? Would 
  be good to add couple of tests.

In testRangeFilter() and testRangeFilterWithDisjunct() column 'c' is the top 
level partition and column 'd' is the next level partition. Is that what you 
were referring to or am I missing something?


- Mehant


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36809/#review93660
---


On July 31, 2015, 3:32 a.m., Mehant Baid wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36809/
 ---
 
 (Updated July 31, 2015, 3:32 a.m.)
 
 
 Review request for drill and Aman Sinha.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 Add support for interpreter based partition pruning for hive tables. Also 
 removes the old partition pruning logic.
 
 
 Diffs
 -
 
   
 contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java
  8307dff 
   
 contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionLocation.java
  PRE-CREATION 
   
 contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/HivePushPartitionFilterIntoScan.java
  6ab1a78 
   
 contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDataTypeUtility.java
  PRE-CREATION 
   
 contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveRecordReader.java
  088fb74 
   
 contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java
  fb827cc 
   
 contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveTable.java
  99101cc 
   
 contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java
  c846328 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DirPathBuilder.java
  892e8cb 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushPartitionFilterIntoScan.java
  b83cedd 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/PartitionPruningUtil.java
  05ccfb9 
 
 Diff: https://reviews.apache.org/r/36809/diff/
 
 
 Testing
 ---
 
 Pending unit tests.
 
 
 Thanks,
 
 Mehant Baid

Re: Review Request 36809: DRILL-3121: Hive partition pruning

2015-07-30 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36809/
---

(Updated July 31, 2015, 3:32 a.m.)


Review request for drill and Aman Sinha.


Changes
---

Rebased on latest master, addressed review comments.


Repository: drill-git


Description
---

Add support for interpreter based partition pruning for hive tables. Also 
removes the old partition pruning logic.


Diffs (updated)
-

  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java
 8307dff 
  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionLocation.java
 PRE-CREATION 
  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/HivePushPartitionFilterIntoScan.java
 6ab1a78 
  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDataTypeUtility.java
 PRE-CREATION 
  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveRecordReader.java
 088fb74 
  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java
 fb827cc 
  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveTable.java
 99101cc 
  
contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java
 c846328 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DirPathBuilder.java
 892e8cb 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushPartitionFilterIntoScan.java
 b83cedd 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/PartitionPruningUtil.java
 05ccfb9 

Diff: https://reviews.apache.org/r/36809/diff/


Testing
---

Pending unit tests.


Thanks,

Mehant Baid

[DISCUSS] Insert into Table support

2015-07-27 Thread Mehant Baid

I wanted to start a conversation around supporting the Insert into 
Table feature. As of 1.2 we initially want to support inserting into a 
table with Parquet files. Support for Json, CSV and other sources will 
follow as future enhancements.


Aman, Jinfeng, Neeraja and I had an initial discussion about this and 
Neeraja provided a good summary of our discussion (pasted below) also 
stating some of the requirements for this feature.


 A ) Support Insert into a non-partitioned table
-

Ex: INSERT INTO T1 [col1, col2, col3]  SELECT col4, col5, col6 from T2
(Source table: T2, Target table T1)
Requirements:

1. Target table column list specification is optional for Insert statement
2. When specified, the column list in the Insert statement should
   contain all the columns present in the target table (i.e No support
   for partial insert)
3. The column names specified for the source table do not need to match
   to the target table column names. Match is performed based on ordinal.
4.   # of Source table columns specified must be same as # of target
   table columns
5. Types of specified source table columns must match to the types of
   target table columns
6. Specification of * is not allowed in the Select table syntax
7. Select table syntax can specify constant values for one or more columns


 B ) Support insert into a partitioned table
--

Ex: INSERT INTO T1 col1, col2,col3  partition by col1,col2 SELECT 
col4,col,col6 from T2


 * Target column specification is required when inserting data into an
   already partitioned table
 * Requirements A.3-A.7 above apply for insert into partitioned tables
   as well
 * A partition by clause along with one or more columns is required
 * All the columns specified in partition by clause must exist in the
   target column list
 * Partition by columns specified do not need to match to the list of
   columns that the original table partitioned with (i.e if the
   original table is partitioned with col1, col2,  new data during
   insert can be partitioned by col3 or just with col1 or col2..)


Couple of open questions from the design perspective are

1. How do we perform validation. Validation of data types, number of 
columns being inserted etc. In addition to validation we need to make 
sure that when we insert into an existing tables we insert data with the 
existing column names (select column list can have different names). 
This poses problems around needing to know the metadata at planning 
time, two approaches that have been floating around are
* DotDrill files: We can store metadata, partitioning columns 
and other useful information here and we can perform validation during 
planning time. However the challenges with introducing DotDrill files 
include
 - consistency between metadata and the actual data 
(Nothing preventing users to copy files directly).
 - security around DotDrill files (can be dealt in the same 
way we perform security checks for drill tables in hdfs)
 - interface to change the DotDrill file, in the case we 
need to add a column to the table or add a new partition etc.


* Explicit Syntax/ No metadata approach: Another approach is to 
avoid DotDrill files and use explicit syntax to glean as much 
information as possible from the SQL statement itself. Some of the 
challenges with this approach are
 - Gathering metadata information: Since we have no idea 
what the existing schema is we would need to perform a mini scan to 
learn the schema at planning time to be able to perform some validation. 
The problem with this approach is how do we determine how many files we 
need to read in order to learn the schema? If we use a sample set and 
not all the files have the same schema,
we could have non-deterministic results based on the 
sample of files read. Also reading all the files and merging the schema 
seems like an expensive cost to pay.
 - From the user's perspective, while inserting into a 
partitioned table, user will have to specify the partitioning columns 
again in the Insert statement, despite having specified the partition 
columns in the CTAS.


2. What is a reasonable assumption for a Drill table in terms of 
changing schema. Having the same exact schema for all files in a table 
is too rigid an assumption at this point?


One thing to remember with DotDrill file is to also the repercussions on 
Drop table, Show tables, Describe table etc. i.e. it might make it 
easier to be able to support these operations.


Thanks
Mehant

Re: Review Request 36630: DRILL-3503: Make PruneScanRule pluggable

2015-07-25 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36630/
---

(Updated July 25, 2015, 6:18 a.m.)


Review request for drill and Aman Sinha.


Changes
---

Made a few changes to the partition pruning interface.


Bugs: DRILL-3503
https://issues.apache.org/jira/browse/DRILL-3503


Repository: drill-git


Description
---

Added an interface to abstract away partitioning scheme away from the partition 
pruning rule. Removed some of the redundant logic in PruneScanRule.


Diffs (updated)
-

  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java
 8307dff 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/DFSPartitionLocation.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/FileSystemPartitionDescriptor.java
 9ad14b1 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/ParquetPartitionDescriptor.java
 127e70a 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/PartitionDescriptor.java
 35fdae9 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/PartitionLocation.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushPartitionFilterIntoScan.java
 b83cedd 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
 daa7276 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/ParquetPruneScanRule.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java
 5b5e4bc 

Diff: https://reviews.apache.org/r/36630/diff/


Testing
---


Thanks,

Mehant Baid

[jira] [Created] (DRILL-3534) Insert into table support

2015-07-21 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3534:
--

 Summary: Insert into table support
 Key: DRILL-3534
 URL: https://issues.apache.org/jira/browse/DRILL-3534
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Mehant Baid
 Fix For: 1.2.0


Umbrella JIRA to track the Insert into table feature. More details regarding 
the scope, design etc will follow as things start to materialize. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3535) Drop table support

2015-07-21 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3535:
--

 Summary: Drop table support
 Key: DRILL-3535
 URL: https://issues.apache.org/jira/browse/DRILL-3535
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Mehant Baid


Umbrella JIRA to track support for Drop table feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 36630: DRILL-3503: Make PruneScanRule pluggable

2015-07-20 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36630/
---

Review request for drill and Aman Sinha.


Bugs: DRILL-3503
https://issues.apache.org/jira/browse/DRILL-3503


Repository: drill-git


Description
---

Added an interface to abstract away partitioning scheme away from the partition 
pruning rule. Removed some of the redundant logic in PruneScanRule.


Diffs
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/DFSPartitionpruningScheme.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/ParquetPartitionDescriptor.java
 127e70a 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/ParquetPartitionPruningScheme.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/PartitionPruningScheme.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
 daa7276 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/ParquetPruneScanRule.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java
 5b5e4bc 

Diff: https://reviews.apache.org/r/36630/diff/


Testing
---


Thanks,

Mehant Baid

[jira] [Created] (DRILL-3503) Make PruneScanRule have a pluggable partitioning mechanism

2015-07-16 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3503:
--

 Summary: Make PruneScanRule have a pluggable partitioning mechanism
 Key: DRILL-3503
 URL: https://issues.apache.org/jira/browse/DRILL-3503
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


Currently PruneScanRule performs partition pruning for file system. Some of the 
code relies on certain aspects of how partitioning is done in DFS. This JIRA 
aims to abstract out the behavior of the underlying partition scheme and 
delegate to the specific storage plugin to get that information. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3500) Provide additional information while registering storage plugin optimizer rules

2015-07-15 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3500:
--

 Summary: Provide additional information while registering storage 
plugin optimizer rules
 Key: DRILL-3500
 URL: https://issues.apache.org/jira/browse/DRILL-3500
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


Currently all the optimizer rules internal to Drill have access to 
QueryContext. This is used by a few rules like PruneScanRule which invoke the 
interpreter to perform partition pruning. However the rules that belong to 
specific storage plugins don't have access to this information. This JIRA aims 
to do the following

1. Add a new interface OptimizerRulesContext that will be implemented by 
QueryContext. It will contain all the information needed by the rules. This 
context will be passed to the storage plugin method while getting the optimizer 
rules specific to that storage plugin.

2. Restrict existing internal rules to only accept OptimizerRulesContext 
instead of QueryContext so information in QueryContext has better encapsulation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3460) Implement function validation in Drill

2015-07-06 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3460:
--

 Summary: Implement function validation in Drill
 Key: DRILL-3460
 URL: https://issues.apache.org/jira/browse/DRILL-3460
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.3.0


Since the schema of the table is not known during the validation phase of 
Calcite, Drill ends up skipping most of the validation checks in Calcite. 

This causes certain problems at execution time, for example when we fail 
function resolution or function execution due to incorrect types provided to 
the function. The worst manifestation of this problem is in the case when Drill 
tries to apply implicit casting and produces incorrect results. There are cases 
when its fine the apply the implicit cast but it doesn't make sense for a 
particular function. 

This JIRA is aimed to provide a new approach to be able to perform validation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3056) Numeric literal in an IN list is casted to decimal even when decimal type is disabled

2015-07-06 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-3056.

Resolution: Fixed

Even though the record type indicates Decimal type when the IN list is 
converted we still use double data type.

 Numeric literal in an IN list is casted to decimal even when decimal type is 
 disabled
 -

 Key: DRILL-3056
 URL: https://issues.apache.org/jira/browse/DRILL-3056
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.0.0
Reporter: Victoria Markman
Assignee: Mehant Baid
 Fix For: 1.2.0


 {code}
 0: jdbc:drill:schema=dfs select * from sys.options where name like 
 '%decimal%';
 +++++++++
 |name|kind|type|   status   |  num_val   | string_val 
 |  bool_val  | float_val  |
 +++++++++
 | planner.enable_decimal_data_type | BOOLEAN| SYSTEM | DEFAULT| 
 null   | null   | false  | null   |
 +++++++++
 1 row selected (0.212 seconds)
 {code}
 In list that contains more than 20 numeric literals.
 We are casting number with the decimal point to decimal type even though 
 decimal type is disabled:
 {code}
 0: jdbc:drill:schema=dfs explain plan including all attributes for select * 
 from t1 where a1 in 
 (1,2,3,4,5,6,7,8,9,0,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25.0);
 +++
 |text|json|
 +++
 | 00-00Screen : rowType = RecordType(ANY *): rowcount = 10.0, cumulative 
 cost = {24.0 rows, 158.0 cpu, 0.0 io, 0.0 network, 35.2 memory}, id = 4921
 00-01  Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 10.0, 
 cumulative cost = {23.0 rows, 157.0 cpu, 0.0 io, 0.0 network, 35.2 memory}, 
 id = 4920
 00-02Project(T7¦¦*=[$0]) : rowType = RecordType(ANY T7¦¦*): rowcount 
 = 10.0, cumulative cost = {23.0 rows, 157.0 cpu, 0.0 io, 0.0 network, 35.2 
 memory}, id = 4919
 00-03  HashJoin(condition=[=($2, $3)], joinType=[inner]) : rowType = 
 RecordType(ANY T7¦¦*, ANY a1, ANY a10, DECIMAL(11, 1) ROW_VALUE): rowcount = 
 10.0, cumulative cost = {23.0 rows, 157.0 cpu, 0.0 io, 0.0 network, 35.2 
 memory}, id = 4918
 00-05Project(T7¦¦*=[$0], a1=[$1], a10=[$1]) : rowType = 
 RecordType(ANY T7¦¦*, ANY a1, ANY a10): rowcount = 10.0, cumulative cost = 
 {10.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4915
 00-07  Project(T7¦¦*=[$0], a1=[$1]) : rowType = RecordType(ANY 
 T7¦¦*, ANY a1): rowcount = 10.0, cumulative cost = {10.0 rows, 20.0 cpu, 0.0 
 io, 0.0 network, 0.0 memory}, id = 4914
 00-08Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/subqueries/t1]], 
 selectionRoot=/drill/testdata/subqueries/t1, numFiles=1, columns=[`*`]]]) : 
 rowType = (DrillRecordRow[*, a1]): rowcount = 10.0, cumulative cost = {10.0 
 rows, 20.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4913
 00-04HashAgg(group=[{0}]) : rowType = RecordType(DECIMAL(11, 1) 
 ROW_VALUE): rowcount = 1.0, cumulative cost = {2.0 rows, 9.0 cpu, 0.0 io, 0.0 
 network, 17.6 memory}, id = 4917
 00-06  Values : rowType = RecordType(DECIMAL(11, 1) ROW_VALUE): 
 rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 network, 
 0.0 memory}, id = 4916
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3459) Umbrella JIRA for missing cast and convert_from/convert_to functions

2015-07-06 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3459:
--

 Summary: Umbrella JIRA for missing cast and 
convert_from/convert_to functions
 Key: DRILL-3459
 URL: https://issues.apache.org/jira/browse/DRILL-3459
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid


We have a handful of cast functions and convert_from/convert_to functions that 
need to be implemented. Will link all related issue to this umbrella JIRA so 
that we have a consolidated view of what needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3464) Index out of bounds exception while performing concat()

2015-07-06 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3464:
--

 Summary: Index out of bounds exception while performing concat()
 Key: DRILL-3464
 URL: https://issues.apache.org/jira/browse/DRILL-3464
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


We hit IOOB while performing concat() on a single input in DrillOptiq. Below is 
the stack trace:

at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_67]
at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_67]
at 
org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.getDrillFunctionFromOptiqCall(DrillOptiq.java:373)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:106)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:77)
 ~[classes/:na]
at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) ~[classes/:na]
at org.apache.drill.exec.planner.logical.DrillOptiq.toDrill(DrillOptiq.java:74) 
~[classes/:na]
at 
org.apache.drill.exec.planner.common.DrillProjectRelBase.getProjectExpressions(DrillProjectRelBase.java:111)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.physical.ProjectPrel.getPhysicalOperator(ProjectPrel.java:57)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.physical.ScreenPrel.getPhysicalOperator(ScreenPrel.java:51)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPop(DefaultSqlHandler.java:392)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:167)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:178)
 ~[classes/:na]
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:903) 
[classes/:na]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:242) 
[classes/:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Some questions on UDFs

2015-07-04 Thread mehant baid

For a detailed example on using ComplexWriter interface you can take a look
at the Mappify
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java
(kvgen) function. The function itself is very simple however it makes use
of the utility methods in MappifyUtility
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java
and MapUtility
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java
which perform most of the work.

Currently we don't have a generic infrastructure to handle errors coming
out of functions. However there is UserException, which when raised will
make sure that Drill does not gobble up the error message in that
exception. So you can probably throw a UserException with the failing input
in your function to make sure it propagates to the user.

Thanks
Mehant

On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau jacq...@apache.org wrote:

 *Holders are for both input and output.  You can also use CompleWriter for
 output and FieldReader for input if you want to write or read a complex
 value.

 I don't think we've provided a really clean way to construct a
 Repeated*Holder for output purposes.  You can probably do it by reaching
 into a bunch of internal interfaces in Drill.  However, I would recommend
 using the ComplexWriter output pattern for now.  This will be a little less
 efficient but substantially less brittle.  I suggest you open up a jira for
 using a Repeated*Holder as an output.

 On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning ted.dunn...@gmail.com wrote:

  Holders are for input, I think.
 
  Try the different kinds of writers.
 
 
 
  On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates jba...@maprtech.com wrote:
 
   Using a repeatedholder as a @param I've got working. I was working on a
   custom aggregator function using DrillAggFunc. In this I can do simple
   things but If I want to build a list values and do something with it in
  the
   final output method I think I need to use RepeatedHolders in the
   @Workspace. To do that I need to create a new one in the setup method.
 I
   can't get one built. They all require a BufferAllocator to be passed in
  to
   build it. I have not found a way to get an allocator yet. Any
  suggestions?
  
   On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning ted.dunn...@gmail.com
  wrote:
  
If you look at the zip function in
https://github.com/mapr-demos/simple-drill-functions you can have an
example of building a structure.
   
The basic idea is that your output is denoted as
   
@Output
BaseWriter.ComplexWriter writer;
   
The pattern for building a list of lists of integers is like this:
   
writer.setValueCount(n);
...
BaseWriter.ListWriter outer = writer.rootAsList();
outer.start(); // [ outer list
...
// for each inner list
BaseWriter.ListWriter inner = outer.list();
inner.start();
// for each inner list element
inner.integer().writeInt(accessor.get(i));
}
inner.end();   // ] inner list
}
outer.end(); // ] outer list
   
   
   
On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates jba...@maprtech.com
  wrote:
   
 I have working aggregation and simple UDFs. I've been trying to
   document
 and understand each of the options available in a Drill UDF.
Understanding
 the different FunctionScope's, the ones that are allowed, the ones
  that
are
 not. The impact of different cost categories. The different  steps
   needed
 to understand handling any of the supported data types  and
  structures
   in
 drill.

 Here are a few of my current road blocks. Any pointers would be
  greatly
 appreciated.


1. I've been trying to understand how to correctly use
   RepeatedHolders
of whatever type. For this discussion lets start with a
RepeatedBigIntHolder. I'm trying to figure out the best way to
   create
a
 new
one. I have not figured out where in the existing drill code
  someone
 does
this. If I use a  RepeatedBigIntHolder as a Workspace object is
 is
null
 to
start with. I created a new one in the startup section of the
 udf
   but
 the
vector was null. I can find no reference in creating a new
BigIntVector.
There is a way to create a BigIntVector and I did find an
 example
  of
creating a new VarCharVector but I can't do that using the drill
  jar
 files
from 1.0. The org.apache.drill.common.types.TypeProtos and
the org.apache.drill.common.types.TypeProtos.MinorType classes
 do
   not
appear to be accessible from the drill jar files.
2. What is the

Re: [VOTE] Release Apache Drill 1.1.0 (rc0)

2015-07-02 Thread Mehant Baid


+1 (binding)

* Downloaded src tar-ball, was able to build and run unit tests 
successfully.

* Brought up DrillBit in embedded and distributed mode.
* Ran some TPC-H queries via Sqlline and the web UI.
* Checked the UI for profiles

Looks good.

Thanks
Mehant


On 7/2/15 5:36 PM, Sudheesh Katkam wrote:

+1 (non-binding)

* downloaded binary tar-ball
* ran queries (including cancellations) in embedded mode on Mac; verified 
states in web UI

* downloaded and built from source tar-ball; ran unit tests on Mac
* ran queries (including cancellations) on a 3 node cluster; verified states in 
web UI

* built a Java query submitter that uses the maven artifacts

Thanks,
Sudheesh


On Jul 2, 2015, at 4:06 PM, Hanifi Gunes hgu...@maprtech.com wrote:

- fully built and tested Drill from source on CentOS
- deployed on 3 nodes
- ran concurrent queries
- manually inspected maven repo
- built a Scala query submitter importing jdbc-all artifact from the repo
at [jacques:3]

overall, great job!

+1 (binding)

On Thu, Jul 2, 2015 at 3:16 PM, rahul challapalli 
challapallira...@gmail.com wrote:


+1 (non-binding)

Tested the new CTAS auto partition feature
Published jdbc-all artifact looks good as well

I am able to add the staged jdbc-all package as a dependency to my sample
JDBC app's pom file and I was able to connect to my drill cluster. I think
this is a sufficient test for the published artifact.

Part of the pom file below

repositories
repository
  idstaged-releases/id
  url
http://repository.apache.org/content/repositories/orgapachedrill-1001
/url
/repository
  /repositories
   dependencies
dependency
groupIdorg.apache.drill.exec/groupId
artifactIddrill-jdbc-all/artifactId
version1.1.0/version
  /dependency
/dependencies

- Rahul

On Thu, Jul 2, 2015 at 2:02 PM, Parth Chandra pchan...@maprtech.com
wrote:


+1 (binding)

Release looks good.
Built from source (mvn clean install).
Verified src checksum.
Built C++ client, ran multiple parallel queries from C++ client against
drillbit. Tested all datatypes with C++ client.





On Thu, Jul 2, 2015 at 1:49 PM, Hsuan Yi Chu hyi...@maprtech.com

wrote:

+1 (non-binding)

Unit tests passed on mac  linux VM. Tried a few queries on 2-node VM.

All

worked out.

On Thu, Jul 2, 2015 at 1:24 PM, Norris Lee norr...@simba.com wrote:


I built from source on Linux and ran queries against different data
sources/file types through ODBC. Also ran our internal ODBC test

suite.

Looks good.

+1 (non-binding)

Norris

-Original Message-
From: Jinfeng Ni [mailto:jinfengn...@gmail.com]
Sent: Wednesday, July 01, 2015 4:03 PM
To: dev@drill.apache.org
Subject: Re: [VOTE] Release Apache Drill 1.1.0 (rc0)

-  Download the src tar on Mac and Linux and do a mvn full build.
-  Start drill in embedded mode on both Mac and Linux. Run several

TPCH

queries.
-  Tried with the CTAS auto-partitioning feature with some TPCH

dataset.

-  Verified checksum for both the source and binary tar files.

All look good.

+1  (binding).



On Wed, Jul 1, 2015 at 2:16 PM, Abdel Hakim Deneche 

adene...@maprtech.com

wrote:


I've built the src on a Mac and on linux vm machine and both were
successful with all unit tests passing.

I tried the binary release on my Mac: I started an embedded

drillbit

and run some queries, everything seems fine.

LGTM +1 (non bonding)

On Wed, Jul 1, 2015 at 11:40 AM, Jacques Nadeau 

jacq...@apache.org

wrote:


Hey Everybody,

I'm happy to propose a new release of Apache Drill, version

1.1.0.

This

is

the first release candidate (rc0).  It covers a total of 162

closed

JIRAs [1].

The tarball artifacts are hosted at [2] and the maven artifacts

(new

for this release) are hosted at [3].

The vote will be open for 72 hours ending at Noon Pacific, July

4,

2015.

[ ] +1
[ ] +0
[ ] -1

thanks,
Jacques

[1]



https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313

820version=12329689

[2] http://people.apache.org/~jacques/apache-drill-1.1.0.rc0/
[3]


https://repository.apache.org/content/repositories/orgapachedrill-10

01/




--

Abdelhakim Deneche

Software Engineer

  http://www.mapr.com/


Now Available - Free Hadoop On-Demand Training 


http://www.mapr.com/training?utm_source=Emailutm_medium=Signatureutm

_campaign=Free%20available

Re: Time for a 1.1 vote soon?

2015-07-01 Thread Mehant Baid

There might be a couple of patches that are ready to be merged (not for 
1.1). I am planning to create a branch for 1.1 and bump up the master 
version, if no one has any objections.


Thanks
Mehant

On 7/1/15 9:36 AM, Abhishek Girish wrote:

I did a fresh clone of Drill master and did a mvn clean install. Build
completed successfully and all tests passed.

On Wed, Jul 1, 2015 at 8:23 AM, Jacques Nadeau jacq...@apache.org wrote:


I got the suite to pass.  I'm going to put it up for vote but I want
everyone to run a clean build on the release artifact and report any
failures.  It is a really bad experience for users to experience these
issues.

On Wed, Jul 1, 2015 at 8:08 AM, Abdel Hakim Deneche adene...@maprtech.com
wrote:


@Sudheesh, the default value for forkCount is 2 so I don't think setting

it

explicitly in the command line makes a difference, or does it ?

On Wed, Jul 1, 2015 at 8:04 AM, Sudheesh Katkam skat...@maprtech.com
wrote:


I build using mvn clean install -DforkCount=2


On Jul 1, 2015, at 7:17 AM, Jacques Nadeau jacq...@apache.org

wrote:

Are people running with standard settings or custom settings?  With

clean

first or without?

I keep running and keep getting failures.  I'll reboot and see if

that

helps.

My last run (mvn clean; mvn install -DskipTests; mvn surefire:test):

Tests in error:
  TestMergingReceiver.testMultipleProvidersMixedSizes:98 »  test timed

out

after...
  TestParquetWriter.testLargeFooter:93-BaseTestQuery.test:309 »  test
timed out...
  TestSpoolingBuffer.testMultipleExchangesSingleThread:50 »  test

timed

out

afte...




On Tue, Jun 30, 2015 at 11:38 PM, Jinfeng Ni jinfengn...@gmail.com

wrote:

I just had a clean run on my Mac with the latest master branch.



On Tue, Jun 30, 2015 at 10:55 PM, Sudheesh Katkam 

skat...@maprtech.com

wrote:


I had a clean run on my Mac with the latest master this afternoon

(and

there are no new commits since then).


On Jun 30, 2015, at 9:52 PM, Steven Phillips 

sphill...@maprtech.com

wrote:

I just had a clean run on my Linux machine.

On Tue, Jun 30, 2015 at 9:32 PM, Parth Chandra 

pchan...@maprtech.com

wrote:


I just completed a clean build and test run (  mvn clean install)

on

my

Mac.
Will try on Linux.




On Tue, Jun 30, 2015 at 9:24 PM, Jacques Nadeau 

jacq...@apache.org

wrote:


I ran twice more.  So here are my three results:

1. Errors from previous email.
2. Tests hung indefinitely (mvn clean install).
3. I had one test failure, (mvn clean; mvn install),

Tests in error:
TestSpoolingBuffer.testMultipleExchangesSingleThread:50 »  test

timed

out

afte...

Is anybody having consistent completions all the way through to

the

distribution module?




On Tue, Jun 30, 2015 at 8:18 PM, Aman Sinha 

asi...@maprtech.com

wrote:

I re-ran on my mac and don't see the failures your are

seeing.  I

got

one

error below related to zookeeper but I believe this is

intermittent.

$mvn install


Tests in error:
TestPStoreProviders.verifyZkStore:55 » Runtime Failure while

accessing

Zookeep...

Tests run: 1310, Failures: 0, Errors: 1, Skipped: 114

$ java -version
java version 1.7.0_45
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)

On Tue, Jun 30, 2015 at 6:47 PM, Jacques Nadeau 

jacq...@apache.org

wrote:


I'm seeing failures running the build on master on a mac:

$ java -version
java version 1.7.0_80
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed

mode)

$mvn install

...

Failed tests:
TestDrillbitResilience.memoryLeaksWhenCancelled:890 We are

leaking

1812

bytes expected:0 but was:1812

Tests in error:

TestMergingReceiver.twoBitTwoExchange:84-Object.wait:503-Object.wait:-2

»  t...
TestMergingReceiver.testMultipleProvidersMixedSizes:98 »  test

timed

out

after...
TestSpoolingBuffer.testMultipleExchangesSingleThread:50 »

test

timed

out

afte...


TestJoinNullable.testMergeLOJNullableOneOrderedInputDescNullsLast

»

UserRemote

TestUnionAll.testFilterPushDownOverUnionAll:545-BaseTestQuery.testSqlWithResults:265-BaseTestQuery.testRunAndReturn:278

»
TestUnionAllBaseTestQuery.closeClient:233 » IllegalState

Attempted

to

close a...

On Tue, Jun 30, 2015 at 6:05 PM, Jacques Nadeau 

jacq...@apache.org

wrote:


Agreed.  I'll spin a release.


On Tue, Jun 30, 2015 at 6:01 PM, Parth Chandra 

par...@apache.org

wrote:


Hey guys,

Looks like 1.1 is looking fairly good with about 119 issues

fixed. I

would recommend we start the release process for 1.1.

Parth

On Fri, Jun 26, 2015 at 8:49 AM, Jacques Nadeau 

jacq...@apache.org

wrote:


Hey Guys,

Looks like a number things are being wrapped up so it is

probably

about

time for a 1.1 release. Shall we branch in the next day or

two

and

put

1.1

to a vote?

Jacques



--
Steven Phillips
Software Engineer

mapr.com



--

Abdelhakim Deneche

Software Engineer

   http://www.mapr.com/


Now Available - Free

[jira] [Created] (DRILL-3429) DrillAvgVarianceConvertlet may produce wrong results while rewriting stddev, variance

2015-06-30 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3429:
--

 Summary: DrillAvgVarianceConvertlet may produce wrong results 
while rewriting stddev, variance
 Key: DRILL-3429
 URL: https://issues.apache.org/jira/browse/DRILL-3429
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


DrillAvgVarianceConvertlet currently rewrites aggregate functions like avg, 
stddev, variance to simple computations. 

Eg: 
Stddev(x) = power(
 (sum(x * x) - sum(x) * sum(x) / count(x))
 / count(x),
 .5)

Consider the case when the input is an integer. Now the rewrite contains 
multiplication and division, which will bind to functions that operate on 
integers however the expected result should be a double and since double has 
more precision than integer we should be operating on double during the 
multiplication and division.  




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 35484: DRILL-2851: set an upper-bound on # of bytes to re-allocate to prevent overflows

2015-06-18 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35484/#review88389
---

Ship it!


Ship It!

- Mehant Baid


On June 17, 2015, 6:54 p.m., Hanifi Gunes wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35484/
 ---
 
 (Updated June 17, 2015, 6:54 p.m.)
 
 
 Review request for drill, Mehant Baid and Venki Korukanti.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 DRILL-2851: set an upper-bound on # of bytes to re-allocate to prevent 
 overflows
 Vectors
 - set an upper bound on # of bytes to allocate
 - 
 TestValueVector.java  
 - Add unit tests
 
 
 Diffs
 -
 
   exec/java-exec/src/main/codegen/templates/FixedValueVectors.java 
 7103a17108693d47839212c418d11d13fbb8f6f4 
   exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java 
 bd41e10d3f69e13d0f8c426460af5e9a09d93fd9 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseValueVector.java
  ec409a3fc59616708226aa500ccab1680cd261f6 
   exec/java-exec/src/main/java/org/apache/drill/exec/vector/BitVector.java 
 10bdf0752632c7577b9a6eb445c7101ec1a24730 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/record/vector/TestValueVector.java
  037c8c6d3da94acf5c2ca300ce617338cacb0fb0 
 
 Diff: https://reviews.apache.org/r/35484/diff/
 
 
 Testing
 ---
 
 all
 
 
 Thanks,
 
 Hanifi Gunes

[jira] [Resolved] (DRILL-3305) DrillOptiq should raise appropriate error message while dealing with unknown RexNode

2015-06-18 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-3305.

Resolution: Fixed

Fixed in fb25973b406d856f0edc9332aadd8e7152b27fa8

 DrillOptiq should raise appropriate error message while dealing with unknown 
 RexNode
 

 Key: DRILL-3305
 URL: https://issues.apache.org/jira/browse/DRILL-3305
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.1.0

 Attachments: DRILL-3305.patch


 Currently for certain types of RexNodes (RexOver, RexCorrelVariable) 
 DrillOptiq does not convert it to the equivalent logical expressions. In that 
 case we simply return a NullExpression (minor type: Null) and we error out 
 later in execution when we try to allocate a vector with minor type Null. We 
 should error out early in DrillOptiq that there was a planning issue 
 indicating the particular RexNode that wasn't handled correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2403) TimePrintMillis.toString() misses leading zeros in post-decimal-point part

2015-06-18 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-2403.

Resolution: Fixed

Fixed in c2a2377bdc2acaf714c19c0cb509f62c8aeffd19

 TimePrintMillis.toString() misses leading zeros in post-decimal-point part
 --

 Key: DRILL-2403
 URL: https://issues.apache.org/jira/browse/DRILL-2403
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Reporter: Daniel Barclay (Drill)
Assignee: Mehant Baid
 Fix For: 1.1.0

 Attachments: DRILL-2403.patch


 In org.apache.drill.exec.vector.accessor.sql.TimePrintMillis, the toString() 
 method includes this code:
 baseTime = baseTime + . + Integer.toString(millis);
 (Consider the result when millis is in the range 1 to 99.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3305) DrillOptiq should raise appropriate error message while dealing with unknown RexNode

2015-06-17 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3305:
--

 Summary: DrillOptiq should raise appropriate error message while 
dealing with unknown RexNode
 Key: DRILL-3305
 URL: https://issues.apache.org/jira/browse/DRILL-3305
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid


Currently for certain types of RexNodes (RexOver, RexCorrelVariable) DrillOptiq 
does not convert it to the equivalent logical expressions. In that case we 
simply return a NullExpression (minor type: Null) and we error out later in 
execution when we try to allocate a vector with minor type Null. We should 
error out early in DrillOptiq that there was a planning issue indicating the 
particular RexNode that wasn't handled correctly.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 35475: DRILL-3263: read tinyint and smallint columns from Hive as integer

2015-06-16 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35475/#review88173
---

Ship it!


Ship It!

- Mehant Baid


On June 15, 2015, 9:58 p.m., Jason Altekruse wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35475/
 ---
 
 (Updated June 15, 2015, 9:58 p.m.)
 
 
 Review request for drill, Mehant Baid and Venki Korukanti.
 
 
 Bugs: DRILL-3263
 https://issues.apache.org/jira/browse/DRILL-3263
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 Smallint and tinyint hve been disabled in much of Drill as they were only 
 partly implemented. Drill-2470 has been opened to track the completion of the 
 tinyint and smallint types. Untill this task is complete this change will 
 enable a wider range of queries to work with standard sql functions and 
 Drill's implicit cast system. The change is pretty small, it just changes the 
 type exposed from Hive tables with columns of smallint or tinyint to be a 
 regular integer.
 
 
 Diffs
 -
 
   
 contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveFieldConverter.java
  658dd79 
   
 contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveRecordReader.java
  3c8b9ba 
   
 contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java
  0da28e0 
   
 contrib/storage-hive/core/src/test/java/org/apache/drill/exec/fn/hive/HiveTestUDFImpls.java
  31e4715 
   
 contrib/storage-hive/core/src/test/java/org/apache/drill/exec/fn/hive/TestSampleHiveUDFs.java
  86a78e5 
   
 contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java
  27ba9fe 
 
 Diff: https://reviews.apache.org/r/35475/diff/
 
 
 Testing
 ---
 
 Unit tests passing, cluster tests are pending
 
 
 Thanks,
 
 Jason Altekruse

[jira] [Resolved] (DRILL-3245) Error message needs to be fixed.

2015-06-04 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-3245.

   Resolution: Fixed
Fix Version/s: 1.1.0

Fixed in 287f52db0bd0125e0ad1b3408f22775224ee9494

 Error message needs to be fixed.
 

 Key: DRILL-3245
 URL: https://issues.apache.org/jira/browse/DRILL-3245
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.0.0
Reporter: Khurram Faraaz
Assignee: Mehant Baid
 Fix For: 1.1.0

 Attachments: DRILL-3245.patch


 The error message need to be fixed.
 {code}
 0: jdbc:drill:schema=dfs.tmp SELECT SUM(columns[0]) FROM `first_25.csv`;
 Error: SYSTEM ERROR: java.lang.RuntimeException: Only COUNT aggregate 
 function supported for Boolean type
 Fragment 0:0
 [Error Id: ef5abe03-bbaf-4f20-bf86-4e307f86d944 on centos-02.qa.lab:31010] 
 (state=,code=0)
 {code}
 Stack trace from drillbit.log
 {code}
 [Error Id: eb09fc3d-3b10-4525-b7c4-9e6c66059c3f on centos-02.qa.lab:31010]
 org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
 java.lang.RuntimeException: Only COUNT aggregate function supported for 
 Boolean type
 Fragment 0:0
 [Error Id: eb09fc3d-3b10-4525-b7c4-9e6c66059c3f on centos-02.qa.lab:31010]
 at 
 org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:522)
  ~[drill-common-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
 at 
 org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:324)
  [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
 at 
 org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:180)
  [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
 at 
 org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:293)
  [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
 at 
 org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
  [drill-common-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_45]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_45]
 at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
 Caused by: java.lang.RuntimeException: Only COUNT aggregate function 
 supported for Boolean type
 at 
 org.apache.drill.exec.test.generated.StreamingAggregatorGen47.setupInterior(StreamingAggTemplate.java:60)
  ~[na:na]
 at 
 org.apache.drill.exec.test.generated.StreamingAggregatorGen47.setup(StreamingAggTemplate.java:53)
  ~[na:na]
 at 
 org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.createAggregatorInternal(StreamingAggBatch.java:308)
  ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
 at 
 org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.createAggregator(StreamingAggBatch.java:246)
  ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
 at 
 org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema(StreamingAggBatch.java:113)
  ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:127)
  ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
 at 
 org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) 
 ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
 at 
 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80)
  ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
 at 
 org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) 
 ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
 at 
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:259)
  ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
 at 
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:253)
  ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
 at java.security.AccessController.doPrivileged(Native Method) 
 ~[na:1.7.0_45]
 at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_45]
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
 at 
 org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:253)
  [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
 ... 4 common frames omitted
 2015-06-02 21:54:02,686 [BitServer-4] INFO  
 o.a.drill.exec.work.foreman.Foreman - State change

Re: Review Request 35030: DRILL-1760: implement count(nested-type)

2015-06-04 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35030/#review86519
---


Minor comment.


exec/java-exec/src/main/codegen/data/CountAggrTypes.tdd
https://reviews.apache.org/r/35030/#comment138575

I think we should remove this out of freemarker template and have the 
implementation directly in a separate class. We are introducing branching in 
the template (which reduces readability) but we are only adding only one new 
function.


- Mehant Baid


On June 3, 2015, 10:27 p.m., Hanifi Gunes wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35030/
 ---
 
 (Updated June 3, 2015, 10:27 p.m.)
 
 
 Review request for drill and Mehant Baid.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 DRILL-1760: implement count(nested-type)
 
 CountAggrTypes.java  CountAggregateFunctions.java
 - Introduced count over nested type
 
 Vectors  readers
 - Implemented isSet/isNull to behave as expected since these methods are now 
 used by count(complex-type)
 
 
 Diffs
 -
 
   exec/java-exec/src/main/codegen/data/CountAggrTypes.tdd 
 53e25f73ed88846cb05ad95b6aeaa722409f7bd4 
   exec/java-exec/src/main/codegen/templates/CountAggregateFunctions.java 
 71ac6a7dc831de9917e9468c2630f2242a576aeb 
   exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 
 7b2b78d80254abee8d380586fb4be64fee335b24 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseRepeatedValueVector.java
  d5a0d6268378d958c2e8b50826e6b78bd0c1850f 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapVector.java
  d0f38c2a397aac7eaad247c39b4b856c89c970a0 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java
  a97847ba07e2543b122009a06eafccf06b89b43a 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/impl/RepeatedListReaderImpl.java
  36e9beedbbb037564962d868b276e5d9d0c14140 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/impl/RepeatedMapReaderImpl.java
  b2fe7b7fc532bfd0b52559864404906107132ea9 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/impl/SingleMapReaderImpl.java
  1b39775f35403ad526756cc7fe5e08d2c393a99e 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/expr/fn/impl/TestCountFunctions.java
  PRE-CREATION 
   exec/java-exec/src/test/resources/functions/count-data.json PRE-CREATION 
   exec/java-exec/src/test/resources/parquet/count-data.parquet PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/35030/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Hanifi Gunes

Re: Review Request 34838: DRILL-3155: Part 1

2015-06-02 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34838/
---

(Updated June 2, 2015, 8:14 p.m.)


Review request for drill and Hanifi Gunes.


Changes
---

Addressed review comments.


Repository: drill-git


Description
---

This patch is a simple refactoring. Moved the classes related to complex 
vectors in the appropriate package.


Diffs (updated)
-

  exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 7b2b78d 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java
 00a78fd 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenTemplate.java
 b8d040c 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/Flattener.java
 323bf43 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/VectorHolder.java 
e602fd7 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/FixedWidthRepeatedReader.java
 2b929a4 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java
 0cbd480 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/AllocationHelper.java 
eddefd0 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseRepeatedValueVector.java
 d5a0d62 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/ContainerVectorLike.java
 95e3365 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedFixedWidthVectorLike.java
 450c673 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedMutator.java 
8e097e4 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedValueVector.java
 95a7252 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedVariableWidthVectorLike.java
 ac8589e 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/ContainerVectorLike.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedFixedWidthVectorLike.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedListVector.java
 a5553b2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java
 a97847b 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedValueVector.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedVariableWidthVectorLike.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/34838/diff/


Testing
---


Thanks,

Mehant Baid

Re: Review Request 34839: DRILL-3155: Part 2

2015-06-02 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34839/
---

(Updated June 2, 2015, 9:49 p.m.)


Review request for drill and Hanifi Gunes.


Changes
---

Addressed review comments


Repository: drill-git


Description
---

While allocating memory for composite vectors if one of the allocation fails we 
need to release all the allocated memory upto that point.


Diffs (updated)
-

  exec/java-exec/src/main/codegen/templates/NullableValueVectors.java 90ec6be 
  exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 7b2b78d 
  exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java b3389e2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java
 3c01939 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java
 a97847b 

Diff: https://reviews.apache.org/r/34839/diff/


Testing
---


Thanks,

Mehant Baid

Re: Review Request 34839: DRILL-3155: Part 2

2015-06-01 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34839/
---

(Updated June 1, 2015, 8:54 p.m.)


Review request for drill and Hanifi Gunes.


Repository: drill-git


Description
---

While allocating memory for composite vectors if one of the allocation fails we 
need to release all the allocated memory upto that point.


Diffs (updated)
-

  exec/java-exec/src/main/codegen/templates/NullableValueVectors.java 90ec6be 
  exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 7b2b78d 
  exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java b3389e2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java
 3c01939 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java
 a97847b 

Diff: https://reviews.apache.org/r/34839/diff/


Testing
---


Thanks,

Mehant Baid

Review Request 34838: DRILL-3155: Part 1

2015-05-30 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34838/
---

Review request for drill and Hanifi Gunes.


Repository: drill-git


Description
---

This patch is a simple refactoring. Moved the classes related to complex 
vectors in the appropriate package.


Diffs
-

  exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 7b2b78d 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java
 00a78fd 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenTemplate.java
 b8d040c 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/Flattener.java
 323bf43 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/VectorHolder.java 
e602fd7 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/FixedWidthRepeatedReader.java
 2b929a4 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java
 0cbd480 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/AllocationHelper.java 
eddefd0 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseRepeatedValueVector.java
 d5a0d62 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/ContainerVectorLike.java
 95e3365 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedFixedWidthVectorLike.java
 450c673 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedMutator.java 
8e097e4 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedValueVector.java
 95a7252 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedVariableWidthVectorLike.java
 ac8589e 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/ContainerVectorLike.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedFixedWidthVectorLike.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedListVector.java
 a5553b2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java
 a97847b 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMutator.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedValueVector.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedVariableWidthVectorLike.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/34838/diff/


Testing
---


Thanks,

Mehant Baid

Review Request 34839: DRILL-3155: Part 2

2015-05-30 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34839/
---

Review request for drill and Hanifi Gunes.


Repository: drill-git


Description
---

While allocating memory for composite vectors if one of the allocation fails we 
need to release all the allocated memory upto that point.


Diffs
-

  exec/java-exec/src/main/codegen/templates/NullableValueVectors.java 90ec6be 
  exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 7b2b78d 
  exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java b3389e2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java
 3c01939 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java
 a97847b 

Diff: https://reviews.apache.org/r/34839/diff/


Testing
---


Thanks,

Mehant Baid

Re: Review Request 34499: DRILL-3032: repeated vectors should handle late type instantiate its children upon construction

2015-05-22 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34499/#review85012
---

Ship it!


Ship It!

- Mehant Baid


On May 22, 2015, 9:19 p.m., Hanifi Gunes wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34499/
 ---
 
 (Updated May 22, 2015, 9:19 p.m.)
 
 
 Review request for drill and Mehant Baid.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 DRILL-3032: repeated vectors should handle late type  instantiate its 
 children upon construction
 
 MaterializedField.java
 - remove unused imports
 
 BaseRepeatedValueVector.java
 - repeated types should not attempt to create a child vector of late type
 
 RepeatedListVector.java
 - pass entire field rather than type to recursively instantiate a list vector
 
 The rest
 - minor code refactoring
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/record/MaterializedField.java
  64ba8611b36377084d3912f997ea428715ed2cf8 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java
  e5f4be1e960462f27f5c9477a3225fb7767cfde0 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseRepeatedValueVector.java
  bcf0793751443ffed5879d36e09dc97ac4f2591f 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/VectorDescriptor.java
  9a29848cf88bcba9e8d9ef57eec08f28a0ba9b4f 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedListVector.java
  b5de8b1e2081e13c36fe563002dad00341367b6e 
   exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 
 75bbc13eb47b3b1dfe95b3a24f2ecfd4b72f845b 
   exec/java-exec/src/test/resources/join/join-left-drill-3032.json 
 PRE-CREATION 
   exec/java-exec/src/test/resources/join/join-right-drill-3032.json 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34499/diff/
 
 
 Testing
 ---
 
 all
 
 
 Thanks,
 
 Hanifi Gunes

Re: Review Request 34499: DRILL-3032: repeated vectors should handle late type instantiate its children upon construction

2015-05-21 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34499/#review84807
---


Can you add a unit test like the one mentioned in the JIRA report for this bug.


exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseRepeatedValueVector.java
https://reviews.apache.org/r/34499/#comment136192

In which case where we creating vector with Late type. Shouldn't we have 
materialized the field earlier?



exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedListVector.java
https://reviews.apache.org/r/34499/#comment136195

store children's size in a local variable instead of making multiple calls.


- Mehant Baid


On May 20, 2015, 9:39 p.m., Hanifi Gunes wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34499/
 ---
 
 (Updated May 20, 2015, 9:39 p.m.)
 
 
 Review request for drill and Mehant Baid.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 DRILL-3032: repeated vectors should handle late type  instantiate its 
 children upon construction
 
 MaterializedField.java
 - remove unused imports
 
 BaseRepeatedValueVector.java
 - repeated types should not attempt to create a child vector of late type
 
 RepeatedListVector.java
 - pass entire field rather than type to recursively instantiate a list vector
 
 The rest
 - minor code refactoring
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/record/MaterializedField.java
  64ba8611b36377084d3912f997ea428715ed2cf8 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java
  e5f4be1e960462f27f5c9477a3225fb7767cfde0 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseRepeatedValueVector.java
  bcf0793751443ffed5879d36e09dc97ac4f2591f 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/VectorDescriptor.java
  9a29848cf88bcba9e8d9ef57eec08f28a0ba9b4f 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedListVector.java
  b5de8b1e2081e13c36fe563002dad00341367b6e 
 
 Diff: https://reviews.apache.org/r/34499/diff/
 
 
 Testing
 ---
 
 all
 
 
 Thanks,
 
 Hanifi Gunes

[jira] [Created] (DRILL-3155) Variable width vectors leak memory

2015-05-20 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3155:
--

 Summary: Variable width vectors leak memory
 Key: DRILL-3155
 URL: https://issues.apache.org/jira/browse/DRILL-3155
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.1.0


While allocating memory for variable width vectors we first allocate the 
necessary memory for the actual data followed by the memory needed for the 
offset vector. However if the first allocation for the data buffer succeeds and 
the one for the offset vector fails we don't release the buffer allocated for 
the data causing memory leaks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3086) Out of memory error is not propagated from HashJoinBatch

2015-05-14 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3086:
--

 Summary: Out of memory error is not propagated from HashJoinBatch
 Key: DRILL-3086
 URL: https://issues.apache.org/jira/browse/DRILL-3086
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.1.0


If we hit an OutOfMemoryException in HashJoinBatch in any of the following 
methods: buildSchema(), innerNext() etc we don't propagate this error back to 
the client instead we get a IllegalStateException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3093) Leaking RawBatchBuffer

2015-05-14 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3093:
--

 Summary: Leaking RawBatchBuffer
 Key: DRILL-3093
 URL: https://issues.apache.org/jira/browse/DRILL-3093
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3082) Safety asserts for Streaming Aggregate

2015-05-13 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3082:
--

 Summary: Safety asserts for Streaming Aggregate
 Key: DRILL-3082
 URL: https://issues.apache.org/jira/browse/DRILL-3082
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid


As I was debugging DRILL-3069, Steven mentioned he had a patch to safeguard 
against dropping rows in StreamingAggBatch.  It might be useful to get this 
patch as it would cause asserts in such scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-1980) Create table with a Cast to interval day results in a file which cannot be read

2015-05-12 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-1980.

Resolution: Pending Closed

Fixed in d10769f478900ff1868d206086874bdd67a45e7d

 Create table with a Cast to interval day results in a file which cannot be 
 read
 ---

 Key: DRILL-1980
 URL: https://issues.apache.org/jira/browse/DRILL-1980
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 0.7.0
Reporter: Ramana Inukonda Nagaraj
Assignee: Mehant Baid
 Fix For: 1.0.0

 Attachments: DRILL-1980.patch, alltypes.json, 
 parquet_all_types.parquet


 Created a parquet file from a json file with all types listed in it.
 {code}
 0: jdbc:drill: CREATE TABLE parquet_all_types AS SELECT cast( INT_col as 
 int) INT_col,cast( BIGINT_col as bigint) BIGINT_col,cast( DECIMAL9_col as 
 decimal) DECIMAL9_col,cast( DECIMAL18_col as decimal(18,9)) 
 DECIMAL18_col,cast( DECIMAL28SPARSE_col as decimal(28, 14)) 
 DECIMAL28SPARSE_col,cast( DECIMAL38SPARSE_col as decimal(38, 19)) 
 DECIMAL38SPARSE_col,cast( DATE_col as date) DATE_col,cast( TIME_col as time) 
 TIME_col,cast( TIMESTAMP_col as timestamp) TIMESTAMP_col,cast( FLOAT4_col as 
 float) FLOAT4_col,cast( FLOAT8_col as double) FLOAT8_col,cast( BIT_col as 
 boolean) BIT_col,cast( VARCHAR_col as varchar(65000)) VARCHAR_col,cast( 
 VAR16CHAR_col as varchar(65000)) VAR16CHAR_col,cast( VARBINARY_col as 
 varbinary(65000)) VARBINARY_col,cast( INTERVALYEAR_col as interval year) 
 INTERVALYEAR_col,cast( INTERVALDAY_col as interval day) INTERVALDAY_col FROM 
 `/user/root/alltypes.json`;
 ++---+
 |  Fragment  | Number of records written |
 ++---+
 | 0_0| 8 |
 ++---+
 1 row selected (0.595 seconds)
 {code}
 Tried reading created parquet file from drill. Fails with
 {code}
 0: jdbc:drill: explain plan for select * from 
 `/parquet_all_types/0_0_0.parquet`;
 Query failed: Query failed: Unexpected exception during fragment 
 initialization: Internal error: Error while applying rule DrillTableRule, 
 args [rel#6060:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[dfs, 
 root, /parquet_all_types/0_0_0.parquet])]
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 34028: DRILL-3020: Copy cause exception's message to thrown SQLException's message.

2015-05-12 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34028/#review83400
---

Ship it!


Ship It!

- Mehant Baid


On May 11, 2015, 6:06 p.m., Daniel Barclay wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34028/
 ---
 
 (Updated May 11, 2015, 6:06 p.m.)
 
 
 Review request for drill, Mehant Baid and Parth Chandra.
 
 
 Bugs: DRILL-3020
 https://issues.apache.org/jira/browse/DRILL-3020
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 Changed SQLException construction to set SQLException's message to the cause 
 exception's toString().
 
 Also:
 - Narrowed one SQLException to SQLNonTransientConnectionException (not other
   two, since unclear whether transient, non-transient, or varied).
 - Clarified/simplified exception messages.
 - Fixed message typo.
 
 
 Diffs
 -
 
   exec/jdbc/src/main/java/org/apache/drill/jdbc/DrillConnectionImpl.java 
 30279e6 
   exec/jdbc/src/main/java/org/apache/drill/jdbc/DrillCursor.java 30c85eb 
 
 Diff: https://reviews.apache.org/r/34028/diff/
 
 
 Testing
 ---
 
 Manually tested in SQLLine.
 
 Ran regular tests; no new failures.
 
 
 Thanks,
 
 Daniel Barclay

Re: Review Request 34024: DRILL-3010: Convert bad command error messages into UserExceptions in SqlHandlers

2015-05-11 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34024/#review83207
---

Ship it!


Ship It!

- Mehant Baid


On May 10, 2015, 4:23 p.m., Venki Korukanti wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34024/
 ---
 
 (Updated May 10, 2015, 4:23 p.m.)
 
 
 Review request for drill and Mehant Baid.
 
 
 Bugs: DRILL-3010
 https://issues.apache.org/jira/browse/DRILL-3010
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 Please see for details: https://issues.apache.org/jira/browse/DRILL-3010
 
 
 Diffs
 -
 
   common/src/main/java/org/apache/drill/common/exceptions/UserException.java 
 9283339 
   exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java 
 9e2f210 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SchemaUtilites.java
  PRE-CREATION 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/AbstractSqlHandler.java
  96fd877 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateTableHandler.java
  e9ac1e1 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DescribeTableHandler.java
  c76914b 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/ShowFileHandler.java
  7062375 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/ShowTablesHandler.java
  3d42f76 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/SqlHandlerUtil.java
  7ae5e0d 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/UseSchemaHandler.java
  e17e275 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/ViewHandler.java
  c59c3a2 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserSession.java 
 9f1a695 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java 
 33ddea5 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
  916564d 
   exec/java-exec/src/test/java/org/apache/drill/BaseTestQuery.java f8ec090 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/impersonation/TestImpersonationMetadata.java
  411660f 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
  958cf1a 
   exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestCTAS.java 
 5fff956 
   exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestInfoSchema.java 
 8bcbc7a 
   exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestViewSupport.java 
 0fc1f32 
   protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java 
 a229450 
   protocol/src/main/java/org/apache/drill/exec/proto/beans/DrillPBError.java 
 873ffa4 
   protocol/src/main/protobuf/UserBitShared.proto a17dbc7 
 
 Diff: https://reviews.apache.org/r/34024/diff/
 
 
 Testing
 ---
 
 Existing unittest have good coverage of various exepected bad command 
 message. Converted those to test for UserException with expected error 
 message.
 
 
 Thanks,
 
 Venki Korukanti

Re: Review Request 34064: DRILL-1980

2015-05-11 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34064/
---

(Updated May 12, 2015, 4:37 a.m.)


Review request for drill and Jason Altekruse.


Changes
---

Updated patch with tests


Repository: drill-git


Description
---

Add support for being able to perform CTAS with interval data type.
Also add support to be able to read interval type from parquet


Diffs (updated)
-

  exec/java-exec/src/main/codegen/templates/ParquetOutputRecordWriter.java 
0d24041 
  exec/java-exec/src/main/codegen/templates/ParquetTypeHelper.java 6ac488d 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 5291855 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java
 70b2342 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/FixedByteAlignedReader.java
 fe0234b 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableFixedByteAlignedReaders.java
 c2221d6 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetToDrillTypeConverter.java
 8ab5fea 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetGroupConverter.java
 c6367ae 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 958cf1a 

Diff: https://reviews.apache.org/r/34064/diff/


Testing (updated)
---

Added unit tests


Thanks,

Mehant Baid

Re: Review Request 34022: DRILL-2870: Fix return type of aggregate functions to be nullable (part 2)

2015-05-09 Thread Mehant Baid



 On May 10, 2015, 12:15 a.m., Aman Sinha wrote:
  exec/java-exec/src/main/codegen/templates/AggrTypeFunctions3.java, line 93
  https://reviews.apache.org/r/34022/diff/1/?file=954683#file954683line93
 
  I can see the reason but want to clarify: previously we were 
  incrementing the nonNullCount, now we are setting it to 1 since we only 
  care about the presence of at least 1 non-null value, right ?

Yes that is correct


- Mehant


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34022/#review83166
---


On May 9, 2015, 11:34 p.m., Mehant Baid wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34022/
 ---
 
 (Updated May 9, 2015, 11:34 p.m.)
 
 
 Review request for drill and Aman Sinha.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 This patch modifies aggregate functions so when we perform an aggregate 
 (other than count) on an empty set we get null (for both required and 
 optional input types). 
 
 Tdd file changes:
 Modified tdd files so that the output type of aggregate functions is optional
 
 Template file changes:
 Maintain a nonNullCount to indiciate if the output should be null or not.
 
 
 Diffs
 -
 
   exec/java-exec/src/main/codegen/data/AggrBitwiseLogicalTypes.tdd 2b72abd 
   exec/java-exec/src/main/codegen/data/AggrTypes1.tdd 8952417 
   exec/java-exec/src/main/codegen/data/AggrTypes2.tdd ee64daf 
   exec/java-exec/src/main/codegen/data/AggrTypes3.tdd 0c3a358 
   
 exec/java-exec/src/main/codegen/templates/AggrBitwiseLogicalTypeFunctions.java
  b159421 
   exec/java-exec/src/main/codegen/templates/AggrTypeFunctions1.java 19a6d46 
   exec/java-exec/src/main/codegen/templates/AggrTypeFunctions2.java 6701f09 
   exec/java-exec/src/main/codegen/templates/AggrTypeFunctions3.java c005446 
   exec/java-exec/src/main/codegen/templates/DateIntervalAggrFunctions1.java 
 e934167 
   exec/java-exec/src/main/codegen/templates/IntervalAggrFunctions2.java 
 b29fa08 
   exec/java-exec/src/main/codegen/templates/VarCharAggrFunctions1.java 
 53474ea 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java
  01db7c2 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/agg/TestAgg.java
  b39566a 
   exec/java-exec/src/test/resources/parquet/alltypes_required.parquet 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34022/diff/
 
 
 Testing
 ---
 
 Added unit tests.
 
 
 Thanks,
 
 Mehant Baid

Review Request 34021: DRILL-2870: Return type of aggregate functions (part 1)

2015-05-09 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34021/
---

Review request for drill and Aman Sinha.


Repository: drill-git


Description
---

Have broken down the final patch into two for easier review. This is a minor 
refactoring patch.

This patch simply moves count aggregate function to a different template than 
existing aggregate functions. Reason being that count is the only aggregate 
function with a required output type (irrespective of the input type). This 
will eliminate some of the conditional logic in the templates.


Diffs
-

  exec/java-exec/src/main/codegen/config.fmpp 8db120d 
  exec/java-exec/src/main/codegen/data/AggrTypes1.tdd 8952417 
  exec/java-exec/src/main/codegen/data/CountAggrTypes.tdd PRE-CREATION 
  exec/java-exec/src/main/codegen/data/DecimalAggrTypes1.tdd 5ac299c 
  exec/java-exec/src/main/codegen/templates/CountAggregateFunctions.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/34021/diff/


Testing
---

All unit tests and functional tests pass


Thanks,

Mehant Baid

[jira] [Created] (DRILL-3006) CTAS with interval data type creates invalid parquet file

2015-05-09 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-3006:
--

 Summary: CTAS with interval data type creates invalid parquet file
 Key: DRILL-3006
 URL: https://issues.apache.org/jira/browse/DRILL-3006
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Reporter: Mehant Baid
Assignee: Steven Phillips


Used the below CTAS statement:
create table t6 as select interval '10' day  interval_day_col from 
cp.`employee.json` limit 1;

When I query the table 't6'  the following exception is encountered:

Caused by: java.io.IOException: Failure while trying to get footer for file 
file:/tmp/t6/0_0_0.parquet
at 
org.apache.drill.exec.store.parquet.FooterGatherer$FooterReader.convertToIOException(FooterGatherer.java:120)
 ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
at 
org.apache.drill.exec.store.TimedRunnable.getValue(TimedRunnable.java:67) 
~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
at 
org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:136) 
~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.FooterGatherer.getFooters(FooterGatherer.java:95)
 ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.readFooterHelper(ParquetGroupScan.java:229)
 ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.access$000(ParquetGroupScan.java:79)
 ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.ParquetGroupScan$1.run(ParquetGroupScan.java:206)
 ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.ParquetGroupScan$1.run(ParquetGroupScan.java:204)
 ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
at java.security.AccessController.doPrivileged(Native Method) 
~[na:1.7.0_67]
at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_67]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
 ~[hadoop-common-2.4.1.jar:na]
at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.readFooter(ParquetGroupScan.java:204)
 ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
... 21 common frames omitted
Caused by: java.lang.IllegalArgumentException: Invalid FIXED_LEN_BYTE_ARRAY 
length: 0
at parquet.Preconditions.checkArgument(Preconditions.java:50) 
~[parquet-common-1.6.0rc3-drill-r0.3.jar:1.6.0rc3-drill-r0.3]
at parquet.schema.Types$PrimitiveBuilder.build(Types.java:320) 
~[parquet-column-1.6.0rc3-drill-r0.3.jar:1.6.0rc3-drill-r0.3]
at parquet.schema.Types$PrimitiveBuilder.build(Types.java:250) 
~[parquet-column-1.6.0rc3-drill-r0.3.jar:1.6.0rc3-drill-r0.3]
at parquet.schema.Types$Builder.named(Types.java:228) 
~[parquet-column-1.6.0rc3-drill-r0.3.jar:1.6.0rc3-drill-r0.3]
at 
parquet.format.converter.ParquetMetadataConverter.buildChildren(ParquetMetadataConverter.java:640)
 ~[parquet-hadoop-1.6.0rc3-drill-r0.3.jar:1.6.0rc3-drill-r0.3]
at 
parquet.format.converter.ParquetMetadataConverter.fromParquetSchema(ParquetMetadataConverter.java:601)
 ~[parquet-hadoop-1.6.0rc3-drill-r0.3.jar:1.6.0rc3-drill-r0.3]
at 
parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:543)
 ~[parquet-hadoop-1.6.0rc3-drill-r0.3.jar:1.6.0rc3-drill-r0.3]
at 
parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:529)
 ~[parquet-hadoop-1.6.0rc3-drill-r0.3.jar:1.6.0rc3-drill-r0.3]
at 
parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:480)
 ~[parquet-hadoop-1.6.0rc3-drill-r0.3.jar:1.6.0rc3-drill-r0.3]
at 
org.apache.drill.exec.store.parquet.FooterGatherer.readFooter(FooterGatherer.java:161)
 ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.FooterGatherer$FooterReader.runInner(FooterGatherer.java:115)
 ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.FooterGatherer$FooterReader.runInner(FooterGatherer.java:102)
 ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:47) 
~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
at 
org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:107) 
~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
... 30 common frames omitted

When I run parquet tools (parquet-schema or parquet-meta) on the parquet file I 
get a similar error: Invalid FIXED_LEN_BYTE_ARRAY length

Re: Review Request 33897: DRILL-2602: Throw an error on schema change during streaming aggregation

2015-05-08 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33897/#review83024
---

Ship it!


Ship It!

- Mehant Baid


On May 8, 2015, 4:55 p.m., abdelhakim deneche wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33897/
 ---
 
 (Updated May 8, 2015, 4:55 p.m.)
 
 
 Review request for drill and Jason Altekruse.
 
 
 Bugs: DRILL-2602
 https://issues.apache.org/jira/browse/DRILL-2602
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 updated both ExternalSortBatch and StreamAggBatch to throw a proper 
 UNSUPPORTED user exception.
 Here is the output when using a Stream aggregate:
 ```
 Query failed: UNSUPPORTED_OPERATION ERROR: Sort doesn't currently support 
 sorts with changing schemas
 
 Fragment 0:0
 
 [Error Id: 43fea1a6-1ae2-4c17-970e-8e168e347241 on 172.30.1.91:31010]
 ```
 And here is the error message for a hash aggregate:
 ```
 Query failed: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support 
 schema changes
 
 Fragment 0:0
 
 [Error Id: 6f849343-0a9d-4681-b101-17d7e1e32917 on 172.30.1.91:31010]
 ```
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
  b753574 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatch.java
  c1c5cb9 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
  e88bc67 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/SingleBatchSorterTemplate.java
  75892f9 
 
 Diff: https://reviews.apache.org/r/33897/diff/
 
 
 Testing
 ---
 
 all unit tests are passing along with customer/tpch100
 
 
 Thanks,
 
 abdelhakim deneche

Re: Review Request 33833: DRILL-2848: Part 2: Provide option to disable decimal type

2015-05-06 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33833/
---

(Updated May 6, 2015, 6:54 p.m.)


Review request for drill, Jason Altekruse and Jinfeng Ni.


Changes
---

Addressed review comments


Repository: drill-git


Description
---

This patch adds an option to enable/ disable decimal data type. Disabled 
casting to decimal, reading decimal from parquet and hive.


Diffs (updated)
-

  
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBasePushFilterIntoScan.java
 f1f3a0b 
  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveRecordReader.java
 8c400ea 
  
contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoPushDownFilterForScan.java
 4fd80bd 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillConstExecutor.java
 92e5678 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 441f2e3 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillParseContext.java
 be4474f 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java
 c8be019 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/FilterPrel.java
 b631cdc 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/FlattenPrel.java
 e206951 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 8f089c4 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectAllowDupPrel.java
 cc215f8 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrel.java
 35fa5be 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java
 c918723 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/ExplainHandler.java
 1636a25 
  
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
 33b2a4c 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaPushFilterIntoRecordGenerator.java
 0cf12b4 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java
 11d0042 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetToDrillTypeConverter.java
 7c3eeb8 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetGroupConverter.java
 389c1f6 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetReader.java
 921d134 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetRecordMaterializer.java
 574df40 
  
exec/java-exec/src/main/java/org/apache/drill/exec/work/ExecErrorConstants.java 
PRE-CREATION 
  exec/java-exec/src/test/java/org/apache/drill/TestBugFixes.java c627ff2 
  exec/java-exec/src/test/java/org/apache/drill/TestDisabledFunctionality.java 
504524d 
  exec/java-exec/src/test/java/org/apache/drill/TestFrameworkTest.java 31a7a64 
  exec/java-exec/src/test/java/org/apache/drill/TestFunctionsQuery.java 67131c1 
  
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestCastEmptyStrings.java
 3e05c0e 
  
exec/java-exec/src/test/java/org/apache/drill/exec/fn/interp/TestConstantFolding.java
 2c23df4 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 5670e1e 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestWriter.java
 5991046 
  
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/columnreaders/TestColumnReaderFactory.java
 9ae6b78 
  
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet2/TestDrillParquetReader.java
 782191f 
  exec/jdbc/src/test/java/org/apache/drill/jdbc/test/JdbcTestQueryBase.java 
d4eec1e 
  
exec/jdbc/src/test/java/org/apache/drill/jdbc/test/TestAggregateFunctionsQuery.java
 f04c2af 

Diff: https://reviews.apache.org/r/33833/diff/


Testing
---

Added negative tests. 

Modified existing unit tests to use the newly added parameter.


Thanks,

Mehant Baid

[jira] [Created] (DRILL-2963) Exists with empty left batch causes IllegalStateException

2015-05-05 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-2963:
--

 Summary: Exists with empty left batch causes IllegalStateException
 Key: DRILL-2963
 URL: https://issues.apache.org/jira/browse/DRILL-2963
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.0.0


In NestedLoopJoinBatch we don't correctly handle the case when we have an empty 
left input batch, need to return NONE in that case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 33662: DRILL-2902: Add support for context functions: user (synonyms session_user and system_user) and current_schema

2015-05-05 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33662/#review82541
---

Ship it!


Ship It!

- Mehant Baid


On May 4, 2015, 10:09 p.m., Venki Korukanti wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33662/
 ---
 
 (Updated May 4, 2015, 10:09 p.m.)
 
 
 Review request for drill and Mehant Baid.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 Please see https://issues.apache.org/jira/browse/DRILL-2902 for details.
 
 Apart from adding new UDFs, also refactored the context information stored in 
 PlanFragment into a separate message. Refactored QueryDateTimeInfo into 
 ContextInformation to provide one interface for all query context replated 
 info.
 
 
 Diffs
 -
 
   exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java 
 4576eb4 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/ContextFunctions.java
  PRE-CREATION 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DateTypeFunctions.java
  9c932d6 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/ops/ContextInformation.java
  PRE-CREATION 
   exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java 
 09a7568 
   exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java 
 6414f56 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryDateTimeInfo.java 
 f3cc666 
   exec/java-exec/src/main/java/org/apache/drill/exec/ops/UdfUtilities.java 
 1cdece1 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/SimpleParallelizer.java
  66ba229 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserSession.java 
 527bac0 
   exec/java-exec/src/main/java/org/apache/drill/exec/util/Utilities.java 
 8efb9e7 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java 
 4249cbe 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestContextFunctions.java
  PRE-CREATION 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/fn/interp/ExpressionInterpreterTest.java
  04e1980 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestLocalExchange.java
  9758eb0 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/partitionsender/TestPartitionSender.java
  6a6a7e0 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/pop/TestFragmentChecker.java
  32e3bf9 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/rpc/user/security/TestCustomUserAuthenticator.java
  70d43b6 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/testing/TestExceptionInjection.java
  604f375 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/testing/TestPauseInjection.java
  508b10c 
   protocol/src/main/java/org/apache/drill/exec/proto/BitControl.java 813d961 
   protocol/src/main/java/org/apache/drill/exec/proto/SchemaBitControl.java 
 5e7562e 
   protocol/src/main/java/org/apache/drill/exec/proto/beans/PlanFragment.java 
 f6fbce1 
   
 protocol/src/main/java/org/apache/drill/exec/proto/beans/QueryContextInformation.java
  PRE-CREATION 
   protocol/src/main/protobuf/BitControl.proto 0424725 
 
 Diff: https://reviews.apache.org/r/33662/diff/
 
 
 Testing
 ---
 
 Added unittests to test the new UDFs added in the patch.
 
 
 Thanks,
 
 Venki Korukanti

Re: Review Request 33662: DRILL-2902: Add support for context functions: user (synonyms session_user and system_user) and current_schema

2015-05-05 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33662/#review82537
---



exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/ContextFunctions.java
https://reviews.apache.org/r/33662/#comment133277

Should we check if the allocated buffer has enough space?


- Mehant Baid


On May 4, 2015, 10:09 p.m., Venki Korukanti wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33662/
 ---
 
 (Updated May 4, 2015, 10:09 p.m.)
 
 
 Review request for drill and Mehant Baid.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 Please see https://issues.apache.org/jira/browse/DRILL-2902 for details.
 
 Apart from adding new UDFs, also refactored the context information stored in 
 PlanFragment into a separate message. Refactored QueryDateTimeInfo into 
 ContextInformation to provide one interface for all query context replated 
 info.
 
 
 Diffs
 -
 
   exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java 
 4576eb4 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/ContextFunctions.java
  PRE-CREATION 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DateTypeFunctions.java
  9c932d6 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/ops/ContextInformation.java
  PRE-CREATION 
   exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java 
 09a7568 
   exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java 
 6414f56 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryDateTimeInfo.java 
 f3cc666 
   exec/java-exec/src/main/java/org/apache/drill/exec/ops/UdfUtilities.java 
 1cdece1 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/SimpleParallelizer.java
  66ba229 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserSession.java 
 527bac0 
   exec/java-exec/src/main/java/org/apache/drill/exec/util/Utilities.java 
 8efb9e7 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java 
 4249cbe 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestContextFunctions.java
  PRE-CREATION 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/fn/interp/ExpressionInterpreterTest.java
  04e1980 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestLocalExchange.java
  9758eb0 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/partitionsender/TestPartitionSender.java
  6a6a7e0 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/pop/TestFragmentChecker.java
  32e3bf9 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/rpc/user/security/TestCustomUserAuthenticator.java
  70d43b6 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/testing/TestExceptionInjection.java
  604f375 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/testing/TestPauseInjection.java
  508b10c 
   protocol/src/main/java/org/apache/drill/exec/proto/BitControl.java 813d961 
   protocol/src/main/java/org/apache/drill/exec/proto/SchemaBitControl.java 
 5e7562e 
   protocol/src/main/java/org/apache/drill/exec/proto/beans/PlanFragment.java 
 f6fbce1 
   
 protocol/src/main/java/org/apache/drill/exec/proto/beans/QueryContextInformation.java
  PRE-CREATION 
   protocol/src/main/protobuf/BitControl.proto 0424725 
 
 Diff: https://reviews.apache.org/r/33662/diff/
 
 
 Testing
 ---
 
 Added unittests to test the new UDFs added in the patch.
 
 
 Thanks,
 
 Venki Korukanti

Review Request 33834: DRILL-2848: Part 1: ParquetToDrillTypeConverter hygiene patch

2015-05-04 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33834/
---

Review request for drill and Jason Altekruse.


Repository: drill-git


Description
---

This patch simply cleans up redundant logic in ParquetToDrillTypeConverter and 
refactors common logic to determine minor type in a function.


Diffs
-

  common/src/main/java/org/apache/drill/common/util/CoreDecimalUtility.java 
302652e 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetToDrillTypeConverter.java
 7c3eeb8 

Diff: https://reviews.apache.org/r/33834/diff/


Testing
---


Thanks,

Mehant Baid

Review Request 33833: DRILL-2848: Part 2: Provide option to disable decimal type

2015-05-04 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33833/
---

Review request for drill, Jason Altekruse and Jinfeng Ni.


Repository: drill-git


Description
---

This patch adds an option to enable/ disable decimal data type. Disabled 
casting to decimal, reading decimal from parquet and hive.


Diffs
-

  
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBasePushFilterIntoScan.java
 f1f3a0b 
  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveRecordReader.java
 8c400ea 
  
contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoPushDownFilterForScan.java
 4fd80bd 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillConstExecutor.java
 92e5678 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 441f2e3 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillParseContext.java
 be4474f 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java
 c8be019 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/FilterPrel.java
 b631cdc 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/FlattenPrel.java
 e206951 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 8f089c4 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectAllowDupPrel.java
 cc215f8 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrel.java
 35fa5be 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java
 c918723 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/ExplainHandler.java
 1636a25 
  
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
 33b2a4c 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaPushFilterIntoRecordGenerator.java
 0cf12b4 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java
 11d0042 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetToDrillTypeConverter.java
 7c3eeb8 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetGroupConverter.java
 389c1f6 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetReader.java
 921d134 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetRecordMaterializer.java
 574df40 
  
exec/java-exec/src/main/java/org/apache/drill/exec/work/ExecErrorConstants.java 
PRE-CREATION 
  exec/java-exec/src/test/java/org/apache/drill/TestBugFixes.java c627ff2 
  exec/java-exec/src/test/java/org/apache/drill/TestDisabledFunctionality.java 
504524d 
  exec/java-exec/src/test/java/org/apache/drill/TestFrameworkTest.java 3abd193 
  exec/java-exec/src/test/java/org/apache/drill/TestFunctionsQuery.java 67131c1 
  
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestCastEmptyStrings.java
 3e05c0e 
  
exec/java-exec/src/test/java/org/apache/drill/exec/fn/interp/TestConstantFolding.java
 2c23df4 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 5670e1e 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestWriter.java
 5991046 
  
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/columnreaders/TestColumnReaderFactory.java
 9ae6b78 
  
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet2/TestDrillParquetReader.java
 782191f 
  exec/jdbc/src/test/java/org/apache/drill/jdbc/test/JdbcTestQueryBase.java 
5c0a0e5 
  
exec/jdbc/src/test/java/org/apache/drill/jdbc/test/TestAggregateFunctionsQuery.java
 aa68e9f 

Diff: https://reviews.apache.org/r/33833/diff/


Testing
---

Added negative tests. 

Modified existing unit tests to use the newly added parameter.


Thanks,

Mehant Baid

Test timeout failures

2015-04-29 Thread Mehant Baid


Hey guys,

I was testing the 0.9 release artifacts and was experiencing time out 
failures with the fork count set to '1c' (8). However I can consistently 
get clean runs by reducing the fork count to 2 (0.25c). I was wondering 
if this was an issue experienced by others or its just a problem with my 
VM on which I run the tests. I don't think its an issue to hold the 
release but if enough people were seeing this behavior we can apply a 
patch with the reduced fork count.


Thanks
Mehant

[jira] [Created] (DRILL-2906) CTAS with store.format = 'json' returns incorrect results

2015-04-29 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-2906:
--

 Summary: CTAS with store.format = 'json' returns incorrect results
 Key: DRILL-2906
 URL: https://issues.apache.org/jira/browse/DRILL-2906
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON, Storage - Writer
Reporter: Mehant Baid
Assignee: Steven Phillips
 Fix For: 1.0.0


Performing a CTAS with 'store.format' = 'json' and querying the table results 
in projecting an addition field '*' will null values. Below is a simple repro

0: jdbc:drill:zk=local create table t as select timestamp '1980-10-01 
00:00:00' from cp.`employee.json` limit 1;
++---+
|  Fragment  | Number of records written |
++---+
| 0_0| 1 |
++---+
1 row selected (0.314 seconds)
0: jdbc:drill:zk=local select * from t;
+++
|   EXPR$0   | *  |
+++
| 1980-10-01 00:00:00.0 | null   |
+++

Notice in the above result set we get an extra column '*' with null value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2753) Implicit cast fails when comparing a double column and a varchar literal

2015-04-27 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-2753.

Resolution: Fixed

Fxied in e33ffa2197306ba833f0d5ea867969781cd733cc

 Implicit cast fails when comparing a double column and a varchar literal
 

 Key: DRILL-2753
 URL: https://issues.apache.org/jira/browse/DRILL-2753
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Reporter: Abhishek Girish
Assignee: Mehant Baid
Priority: Critical
 Fix For: 1.0.0

 Attachments: DRILL-2753.patch


 Query fails when an implicit cast is used between a column of double data 
 type and a double literal. 
 *Drill:*
 {code:sql}
  select ss_customer_sk, ss_ticket_number from store_sales  where ss_promo_sk 
  = '50' order by ss_promo_sk limit 1;
 ++--+
 | ss_customer_sk | ss_ticket_number |
 ++--+
 | 53792  | 44   |
 ++--+
 1 row selected (1.045 seconds)
  select ss_customer_sk, ss_ticket_number from store_sales  where  
  ss_wholesale_cost = '38.19'  order by ss_promo_sk limit 1;
 Query failed: RemoteRpcException: Failure while running fragment., 38.19 [ 
 d8f86a4f-226a-4e30-bb23-5b20ae5294e0 on abhi7.qa.lab:31010 ]
 [ d8f86a4f-226a-4e30-bb23-5b20ae5294e0 on abhi7.qa.lab:31010 ]
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}
 *Postgres:*
 {code:sql}
 # select ss_customer_sk, ss_ticket_number from store_sales  where  
 ss_wholesale_cost = '38.19'  order by ss_promo_sk limit 1;
  ss_customer_sk | ss_ticket_number
 +--
   44923 |   148425
 (1 row)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2411) Scalar SUM/AVG over empty result set returns no rows instead of NULL

2015-04-27 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-2411.

Resolution: Fixed

Fixed as part of DRILL-2277 in 3689522d4a7035a966f19695a678c6881fdaeba6

 Scalar SUM/AVG over empty result set returns no rows instead of NULL
 

 Key: DRILL-2411
 URL: https://issues.apache.org/jira/browse/DRILL-2411
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Victoria Markman
Assignee: Mehant Baid
Priority: Critical
 Fix For: 1.0.0


 Queries below should return NULL:
 {code}
 0: jdbc:drill:schema=dfs select sum(a2) from t2 where 1=0;
 ++
 |   EXPR$0   |
 ++
 ++
 No rows selected (0.08 seconds)
 0: jdbc:drill:schema=dfs select avg(a2) from t2 where 1=0;
 ++
 |   EXPR$0   |
 ++
 ++
 No rows selected (0.074 seconds)
 {code}
 When grouped, result is correct:
 {code}
 0: jdbc:drill:schema=dfs select a2, sum(a2) from t2 where 1=0 group by a2;
 +++
 | a2 |   EXPR$1   |
 +++
 +++
 No rows selected (0.11 seconds)
 {code}
 I'm not convinced and it is not very intuitive that correct result should be 
 NULL, but this is what postgres returns and Aman thinks NULL is the correct 
 behavior :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2848) Disable decimal data type by default

2015-04-22 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-2848:
--

 Summary: Disable decimal data type by default
 Key: DRILL-2848
 URL: https://issues.apache.org/jira/browse/DRILL-2848
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
Priority: Critical
 Fix For: 1.0.0


Due to the difference in the storage format of decimal data type in parquet 
versus the in-memory format within Drill using the decimal data type is not 
performant. Also some of the rules for calculating the scale and precision need 
to be changed. These two concerns will be addressed post 1.0.0 release and to 
prevent users from running into this we are disabling decimal data type by 
default. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2511) Assert with full outer join when one of the join predicates is of a required type (nullabe parquet)

2015-04-20 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-2511.

Resolution: Fixed

This should be fixed as part of DRILL-2707

 Assert with full outer join when one of the join predicates is of a required 
 type (nullabe parquet)
 ---

 Key: DRILL-2511
 URL: https://issues.apache.org/jira/browse/DRILL-2511
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Mehant Baid
 Fix For: 1.0.0

 Attachments: j3.parquet.required, j4.parquet.required


 Columns in tables j3 and j4 are created as 'required' data type:
 {code}
 [Fri Mar 20 11:30:42 root@~/parquet-tools-1.5.1-SNAPSHOT ] # ./parquet-schema 
 ~/0_0_0.parquet
 message root {
   required binary c_varchar (UTF8);
   required int32 c_integer;
   required int64 c_bigint;
   required float c_float;
   required double c_double;
   required int32 c_date (DATE);
   required int32 c_time (TIME);
   required int64 c_timestamp (TIMESTAMP);
   required boolean c_boolean;
   required double d9;
   required double d18;
   required double d28;
   required double d38;
 }
 {code}
 Full outer join on j3/j4 asserts.
 This is happening with the join predicate of every SQL type except boolean.
 {code}
 select * from j3 full outer join j4 on (j3.c_varchar = j4.c_varchar);
 java.lang.AssertionError at 
 org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:382)
 at 
 org.apache.drill.exec.vector.VarCharVector$Accessor.getObject(VarCharVector.java:408)
 at 
 org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:98)
 at 
 org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:137)
 at 
 org.apache.drill.jdbc.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:146)
 at 
 net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
 at sqlline.SqlLine$Rows$Row.init(SqlLine.java:2388)
 at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2504)
 at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
 at sqlline.SqlLine.print(SqlLine.java:1809)
 at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
 at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
 at sqlline.SqlLine.dispatch(SqlLine.java:889)
 at sqlline.SqlLine.begin(SqlLine.java:763)
 at sqlline.SqlLine.start(SqlLine.java:498)
 at sqlline.SqlLine.main(SqlLine.java:460)
 {code}
 Same problem happens if you one table column types are optional and the other 
 one is required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 33343: DRILL-2823: Implicit cast for comparisions in join conditions

2015-04-19 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33343/
---

Review request for drill and Aman Sinha.


Repository: drill-git


Description
---

DRILL-2753 aims to remove the comparision function implementations that have 
different data type inputs (Look at the JIRA for more details). As a result we 
need to modify hash join and merge join so that when we have different types in 
the join condition they can apply implicit casts to do the comparison. We have 
resolved the issue of distribution (as part of DRILL-2244) and so different 
data types with the same numeric value will be correctly distributed to the 
same node. 

As part of this change we materialize the expression in the join condition, 
check if the types are different and apply casts if necessary.


Diffs
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
 9df67d8 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
 7fa79a1 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
 8fce52e 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java
 796f6fe 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestJoinAdvanced.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/33343/diff/


Testing
---

Added unit tests plus ran existing tests with different data types in join 
conditions.


Thanks,

Mehant Baid

Re: Review Request 33343: DRILL-2823: Implicit cast for comparisions in join conditions

2015-04-19 Thread Mehant Baid

On April 20, 2015, 1:01 a.m., Aman Sinha wrote:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java,
line 189
https://reviews.apache.org/r/33343/diff/1/?file=935063#file935063line189

If we have int a = bigint b , both sides will be hash distributed as
doubles, then when we are doing the join (either hash or merge join), we
will ignore that they were distributed as double and just implict cast the
int to bigint ...is that correct ?

Any thoughts on the performance impact ? Previously the comparison
function would have taken care of comparing different types, now it must go
through implicit casting.

we will ignore that they were distributed as double and just implict cast the
int to bigint ...is that correct ?
Yes that is the behavior currently.

Performance impact should be minimal or none, I would think since even with the
explicit function implementations we were doing the casting. However we
probably might be creating an extra object (for the holder) during implicit
casting but I think that should be ok.

On April 20, 2015, 1:01 a.m., Aman Sinha wrote:
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestJoinAdvanced.java,
line 80
https://reviews.apache.org/r/33343/diff/1/?file=935067#file935067line80

It would be good to have a test with join conditions on 2 or more
columns that require implicit casting -
WHERE t1.a1 = t2.a2
AND t1.b1 = t2.b2
AND t1.c1 = t2.c2
where a1:int, b1:bigint, b1:float, b2:double, c1:double, c2: varchar
with numeric double values.
This does not necesarily have to be unit test but see if you can get it
included either as unit or functional test.

I have added a unit test with miniscule data as part of the unit test. I will
add a test with more substantial data to the functional suite as well.

I have added the case with multiple join conditions with the following data
types on the left and right sides
1. bigint and int
2. double and float
3. bigint and double

The case that you mention about varchar and double isn't supported, since our
distribution only takes care of distributing the numeric values as double. This
behavior is consistent with Postgres where in such cases users are required to
put an explicit cast. To support such a scenario we would have to hash
everything as varchar which will not be very efficient.

- Mehant

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33343/#review80643
---

On April 20, 2015, 4:55 a.m., Mehant Baid wrote:

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33343/
---

(Updated April 20, 2015, 4:55 a.m.)

Review request for drill and Aman Sinha.

Repository: drill-git

Description
---

DRILL-2753 aims to remove the comparision function implementations that have
different data type inputs (Look at the JIRA for more details). As a result
we need to modify hash join and merge join so that when we have different
types in the join condition they can apply implicit casts to do the
comparison. We have resolved the issue of distribution (as part of
DRILL-2244) and so different data types with the same numeric value will be
correctly distributed to the same node.

As part of this change we materialize the expression in the join condition,
check if the types are different and apply casts if necessary.

Diffs
-

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
9df67d8

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
7fa79a1

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
8fce52e

exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java
796f6fe

exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestJoinAdvanced.java
PRE-CREATION
exec/java-exec/src/test/resources/jsoninput/implicit_cast_join_1.json
PRE-CREATION

Diff: https://reviews.apache.org/r/33343/diff/

Testing
---

Added unit tests plus ran existing tests with different data types in join
conditions.

Thanks,

Mehant Baid

Re: Review Request 33343: DRILL-2823: Implicit cast for comparisions in join conditions

2015-04-19 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33343/
---

(Updated April 20, 2015, 4:55 a.m.)


Review request for drill and Aman Sinha.


Changes
---

Addressed review comments.


Repository: drill-git


Description
---

DRILL-2753 aims to remove the comparision function implementations that have 
different data type inputs (Look at the JIRA for more details). As a result we 
need to modify hash join and merge join so that when we have different types in 
the join condition they can apply implicit casts to do the comparison. We have 
resolved the issue of distribution (as part of DRILL-2244) and so different 
data types with the same numeric value will be correctly distributed to the 
same node. 

As part of this change we materialize the expression in the join condition, 
check if the types are different and apply casts if necessary.


Diffs (updated)
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
 9df67d8 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
 7fa79a1 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
 8fce52e 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java
 796f6fe 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestJoinAdvanced.java
 PRE-CREATION 
  exec/java-exec/src/test/resources/jsoninput/implicit_cast_join_1.json 
PRE-CREATION 

Diff: https://reviews.apache.org/r/33343/diff/


Testing
---

Added unit tests plus ran existing tests with different data types in join 
conditions.


Thanks,

Mehant Baid

[jira] [Created] (DRILL-2823) Merge join should use implicit cast

2015-04-18 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-2823:
--

 Summary: Merge join should use implicit cast
 Key: DRILL-2823
 URL: https://issues.apache.org/jira/browse/DRILL-2823
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.0.0


Currently merge join does not use implicit cast if the two expressions in the 
join condition are of different data types. However if we have such a situation 
it works as expected because we have function implementations for numeric 
comparisons with different data types. As part of DRILL-2753 we are getting rid 
of it so merge join will need to use implicit cast if the expressions in the 
join condition are of different types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2073) Filter on a field in a nested repeated type throws an exception

2015-04-18 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-2073.

Resolution: Fixed

Works as expected on latest master.

 Filter on a field in a nested repeated type throws an exception
 ---

 Key: DRILL-2073
 URL: https://issues.apache.org/jira/browse/DRILL-2073
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Reporter: Rahul Challapalli
Assignee: Mehant Baid
 Fix For: 1.0.0

 Attachments: error.log


 git.commit.id.abbrev=3c6d0ef
 Data Set :
 {code}
 {
   rm: [
 {rptd: [{ a: foo},{b:boo}]}
   ],
   rm1:[{a:foo},{b:boo}]
 }
 {code}
 The below query tries to apply a filter on field which does not exist. 
 However the field is still present in a different element of the same array.
 {code}
 select rm[0].rptd[0] from `temp.json` where rm[0].rptd[0].b = 'boo';
 Query failed: Query failed: Failure while running fragment., index: -4, 
 length: 4 (expected: range(0, 16384)) [ 01887113-c758-41bf-96d1-5eede9b1e411 
 on qa-node191.qa.lab:31010 ]
 [ 01887113-c758-41bf-96d1-5eede9b1e411 on qa-node191.qa.lab:31010 ]
 {code}
 The above query should result in an empty result.The above error only happens 
 when we apply a filter on a nested array element. The below query works fine
 {code}
 0: jdbc:drill:schema=dfs.drillTestDir select rm1[0].a from `nested.json` 
 where rm1[0].b = 'boo';
 ++
 |   EXPR$0   |
 ++
 ++
 {code}
 Attached the log file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2824) Function resolution should be deterministic

2015-04-18 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-2824:
--

 Summary: Function resolution should be deterministic
 Key: DRILL-2824
 URL: https://issues.apache.org/jira/browse/DRILL-2824
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
Priority: Critical
 Fix For: 1.0.0


Currently as part of function resolution we cost all the possible function 
matches and pick the one with the best cost. However we simply pick the first 
one with the best cost, there may be a possibility that we have multiple 
functions that could have the same best cost and based on which function was 
first in the map we will execute different functions on different clusters. 
This JIRA aims to resolve functions in a deterministic way so we pick the same 
function consistently. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2781) Protobuf changes for nested loop join

2015-04-14 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-2781:
--

 Summary: Protobuf changes for nested loop join
 Key: DRILL-2781
 URL: https://issues.apache.org/jira/browse/DRILL-2781
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
Priority: Minor
 Fix For: 0.9.0


A couple of the protobuf files were not regenerated as part of the nested loop 
join change. Will regenerate the couple of files and merge as part of this 
issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 33052: DRILL-2611: value vectors should report valid value count

2015-04-14 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33052/#review80041
---

Ship it!


Ship It!

- Mehant Baid


On April 13, 2015, 11:01 p.m., Hanifi Gunes wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33052/
 ---
 
 (Updated April 13, 2015, 11:01 p.m.)
 
 
 Review request for drill, Mehant Baid and Parth Chandra.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 DRILL-2611: value vectors should report valid value count
 
 Changes
 - unify the behavior of value count interfaces across VVs -- get/setters
 - ensure value count reported reflects underlying state of the buffer
 - enforce consumers to use getAccessor().get/setValueCount
 - ensure metadata created based on getAccessor().getValueCount
 
 
 Diffs
 -
 
   exec/java-exec/src/main/codegen/templates/ComplexWriters.java 
 576fd8352197ba950be7d7e661fb52dd92b52f2a 
   exec/java-exec/src/main/codegen/templates/FixedValueVectors.java 
 e9ec220dc653db1e1acb0538bbbc1207fb4ee194 
   exec/java-exec/src/main/codegen/templates/NullableValueVectors.java 
 075316e4f3ac5327e7893688c5e88cfee98e50bc 
   exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 
 c7cf8e6fe18f1b9813ae22495ac79a447f61cfff 
   exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java 
 edb851eb10be43d889ce5fd98d9bde036707870a 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReader.java
  759327a307aefd51dc69ea4282a7d58d6309e142 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/FixedByteAlignedReader.java
  c2af964fd606924587fe2093b3ccb1ec1de922af 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/FixedWidthRepeatedReader.java
  f20d7655c76237fc5d3a95760f00ca2400a8df07 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableColumnReader.java
  16519a851a18924fb59753c456a2da4076d5d245 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableFixedByteAlignedReaders.java
  8087118e1de8ef1b043c80ab4fc85215284670dc 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/VarLengthColumnReaders.java
  7464f30179059a728f5c30f37c17adf0f332604c 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseDataValueVector.java
  d48ea99237bb822cafc8b835c3af0f4789c6eb29 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseValueVector.java
  81d3a8623fb86068d8c81f08e1d38d37b856e26c 
   exec/java-exec/src/main/java/org/apache/drill/exec/vector/BitVector.java 
 d8bd9723db9f2ecd1466b1144345ca371f68a3bb 
 
 Diff: https://reviews.apache.org/r/33052/diff/
 
 
 Testing
 ---
 
 unit, reg, sf100
 
 
 Thanks,
 
 Hanifi Gunes

Re: Review Request 33035: DRILL-2719: ValueVector#getBuffers(clear) must consistently clear vectors retain buffers

2015-04-14 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33035/#review80039
---

Ship it!


Ship It!

- Mehant Baid


On April 13, 2015, 10:07 p.m., Hanifi Gunes wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33035/
 ---
 
 (Updated April 13, 2015, 10:07 p.m.)
 
 
 Review request for drill, Mehant Baid and Parth Chandra.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 DRILL-2719: ValueVector#getBuffers(clear) must consistently clear vectors  
 retain buffers
 
 BaseDataValueVector
 - getBuffers now rely on getBufferSize while determining buffers to return
 - getBuffers maintains reference count to underlying buffers while clearing 
 the vector
 - getBufferSize relies on value count reported by accessor while determining 
 buffer size
 - replaced DeadBuf references with an empty buffer. underlying buffer now 
 should never be *null*.
 
 Templates  VV subtypes
 - ensure getBuffers conforms to VV#getBuffers
 
 TestEmptyPopulator
 - make mock allocator return an empty buffer when requested
 
 
 Diffs
 -
 
   exec/java-exec/src/main/codegen/templates/NullableValueVectors.java 
 075316e4f3ac5327e7893688c5e88cfee98e50bc 
   exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 
 c7cf8e6fe18f1b9813ae22495ac79a447f61cfff 
   exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java 
 edb851eb10be43d889ce5fd98d9bde036707870a 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseDataValueVector.java
  d48ea99237bb822cafc8b835c3af0f4789c6eb29 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java
  b0783afe57317dcd8dc7a2a8d967dcdb1f305edb 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedListVector.java
  c0f529961343145d67f835a95f58c4eaf2fae2a4 
   
 exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/TestEmptyPopulator.java
  8426a6abbf0c20be6f81bc82521f31ed7cde2557 
 
 Diff: https://reviews.apache.org/r/33035/diff/
 
 
 Testing
 ---
 
 unit and beyond
 
 
 Thanks,
 
 Hanifi Gunes

[jira] [Resolved] (DRILL-2781) Protobuf changes for nested loop join

2015-04-14 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-2781.

Resolution: Fixed

fixed in 5441e72c0d97e8ccd7c196f5a9f6f23fdc8d2b32

 Protobuf changes for nested loop join
 -

 Key: DRILL-2781
 URL: https://issues.apache.org/jira/browse/DRILL-2781
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
Priority: Minor
 Fix For: 0.9.0

 Attachments: DRILL-2781.patch


 A couple of the protobuf files were not regenerated as part of the nested 
 loop join change. Will regenerate the couple of files and merge as part of 
 this issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2771) Right outer join with a map projection throws exception

2015-04-13 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-2771:
--

 Summary: Right outer join with a map projection throws exception
 Key: DRILL-2771
 URL: https://issues.apache.org/jira/browse/DRILL-2771
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 0.9.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2707) Projecting a required varchar column after a Full Outer Join results in an IOOBException

2015-04-13 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-2707.

Resolution: Fixed

Fixed in 3b5a87e89e55c14bb792ae2d0e7429e34cb035a9

 Projecting a required varchar column after a Full Outer Join results in an 
 IOOBException 
 -

 Key: DRILL-2707
 URL: https://issues.apache.org/jira/browse/DRILL-2707
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types, Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Rahul Challapalli
Assignee: Mehant Baid
 Fix For: 0.9.0

 Attachments: DRILL-2707.patch, fewtypes.parquet, fewtypes_null.json


 git.commit.id.abbrev=a53e123
 I tried to project a required varchar column after a FOJ. Below is what I see 
 {code}
 0: jdbc:drill:schema=dfs_eea select
 . . . . . . . . . . . . . .  p.varchar_col
 . . . . . . . . . . . . . .  from dfs.`cross-sources`.`fewtypes.parquet` p
 . . . . . . . . . . . . . .  full outer join 
 dfs.`cross-sources`.`fewtypes_null.json` o
 . . . . . . . . . . . . . .  on p.int_col=o.int_col;
 +-+
 | varchar_col |
 +-+
 java.lang.IndexOutOfBoundsException: index: 180, length: 10 (expected: 
 range(0, 180))
   at io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1143)
   at 
 io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:136)
   at io.netty.buffer.WrappedByteBuf.getBytes(WrappedByteBuf.java:289)
   at 
 io.netty.buffer.UnsafeDirectLittleEndian.getBytes(UnsafeDirectLittleEndian.java:25)
   at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:596)
   at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:596)
   at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:596)
   at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:596)
   at 
 org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:387)
   at 
 org.apache.drill.exec.vector.VarCharVector$Accessor.getObject(VarCharVector.java:411)
   at 
 org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:108)
   at 
 org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:137)
   at 
 org.apache.drill.jdbc.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:165)
   at 
 net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
   at sqlline.SqlLine$Rows$Row.init(SqlLine.java:2388)
   at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2504)
   at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
   at sqlline.SqlLine.print(SqlLine.java:1809)
   at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
   at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
   at sqlline.SqlLine.dispatch(SqlLine.java:889)
   at sqlline.SqlLine.begin(SqlLine.java:763)
   at sqlline.SqlLine.start(SqlLine.java:498)
   at sqlline.SqlLine.main(SqlLine.java:460)
 {code}
 Not sure if this is a client-specific issue as there is no exception from the 
 drillbit log files
 However if I project a varchar column (nullable) from a json file after a 
 FOJ, there seems to be no issues
 {code}
 0: jdbc:drill:schema=dfs_eea select
 . . . . . . . . . . . . . .  o.varchar_col
 . . . . . . . . . . . . . .  from dfs.`cross-sources`.`fewtypes.parquet` p
 . . . . . . . . . . . . . .  full outer join 
 dfs.`cross-sources`.`fewtypes_null.json` o
 . . . . . . . . . . . . . .  on p.int_col=o.int_col;
 +-+
 | varchar_col |
 +-+
 | jllkjsdhfg  |
 | null|
 | gfdstweopiu |
 | gjklhsdfgkjhkASDF |
 | oieoiutriotureWERTgwgEWRg |
 | gjkdfkjglfd |
 | ioerutklsdfASDgerGWEr |
 | lkjgfiurtoUYFHfahui |
 | IOUfiuodsfIUfjkh |
 | iweuoHUIhUwer |
 | null|
 | dfgoiuert   |
 | uitreo  |
 | uigoMnvjjkdf |
 | NvvdfHVG|
 | null|
 | null|
 | uiuikjk |
 | null|
 | hjiwgh  |
 | null|
 | jhgduitweriuoert |
 | KfijUIwre   |
 | Nhkhuivb|
 | null|
 | null|
 +-+
 26 rows selected (0.212 seconds)
 {code}
 I attached the parquet and json files used. Let me know if you need anything 
 more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2771) Right outer join with a map projection throws exception

2015-04-13 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-2771.

Resolution: Fixed

Fixed in 3b5a87e89e55c14bb792ae2d0e7429e34cb035a9

 Right outer join with a map projection throws exception
 ---

 Key: DRILL-2771
 URL: https://issues.apache.org/jira/browse/DRILL-2771
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 0.9.0

 Attachments: DRILL-2771.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 33030: DRILL-2685: Unique-ify local Hive metastore directory per test JVM

2015-04-10 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33030/#review79754
---

Ship it!


Ship It!

- Mehant Baid


On April 10, 2015, 1:40 a.m., Venki Korukanti wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33030/
 ---
 
 (Updated April 10, 2015, 1:40 a.m.)
 
 
 Review request for drill, Hanifi Gunes and Mehant Baid.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 Please see DRILL-2685 for details.
 
 
 Diffs
 -
 
   
 contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/HiveTestBase.java
  1c7e16d 
   
 contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java
  a76128f 
   
 contrib/storage-hive/core/src/test/java/org/apache/drill/exec/sql/hive/TestViewSupportOnHiveTables.java
  14ab506 
   
 contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java
  657da61 
 
 Diff: https://reviews.apache.org/r/33030/diff/
 
 
 Testing
 ---
 
 Ran concurrent tests by setting forkCount  1
 
 
 Thanks,
 
 Venki Korukanti

Re: Review Request 32945: DRILL-2715: Implement nested loop join operator

2015-04-09 Thread Mehant Baid



 On April 9, 2015, 1:22 a.m., Aman Sinha wrote:
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java,
   lines 93-94
  https://reviews.apache.org/r/32945/diff/1/?file=920309#file920309line93
 
  Will this loop work for left outer join ?  Suppose the right input is 
  empty, we still want to produce all left rows.  e.g in the following 
  example query: 
SELECT * FROM t1 
 LEFT OUTER JOIN
(SELECT * FROM t2 WHERE 1=0) 
ON 1=1  /* for  cartesian join */

Yes, currently we will not output any rows. But like discussed, we wouldn't use 
NLJ for generic outer joins anyways as the filter above would filter out the 
rows not conforming to the join condition. For outer joins we will use NLJ only 
with scalar sub queries.


- Mehant


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32945/#review79455
---


On April 9, 2015, 9:03 p.m., Mehant Baid wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32945/
 ---
 
 (Updated April 9, 2015, 9:03 p.m.)
 
 
 Review request for drill and Aman Sinha.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 This patch implements the nested loop join operator. The main changes are in 
 the files NestedLoopJoinBatch and NestedLoopJoinTemplate. This patch only 
 contains the execution changes. Planning patch will be posted in a separate 
 review request by Aman.
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractPhysicalVisitor.java
  27b0ecb 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/PhysicalVisitor.java
  e6a89d0 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/NestedLoopJoinPOP.java
  PRE-CREATION 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoin.java
  PRE-CREATION 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatch.java
  PRE-CREATION 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatchCreator.java
  PRE-CREATION 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java
  PRE-CREATION 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/record/ExpandableHyperContainer.java
  90310e2 
   protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java 
 9a9d196 
   protocol/src/main/protobuf/UserBitShared.proto 5e44655 
 
 Diff: https://reviews.apache.org/r/32945/diff/
 
 
 Testing
 ---
 
 The tests are dependent on the planning changes, hence not uploaded as part 
 of this patch. However 
 https://github.com/mehant/drill/blob/notin_1/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java
  is a working branch that contains a bunch of tests (planning  execution) 
 added for nested loop join.
 
 
 Thanks,
 
 Mehant Baid

Re: Review Request 32945: DRILL-2715: Implement nested loop join operator

2015-04-09 Thread Mehant Baid

On April 8, 2015, 10:55 p.m., Hanifi Gunes wrote:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java,
line 39
https://reviews.apache.org/r/32945/diff/1/?file=920309#file920309line39

is this instance variable needed when we are holding left instance? can
we localize?

I will make a local copy of this member for performance but I think having this
as a class member is ok here. This is used in two cases 1. first time we invoke
outputRecords(), we use the record count to determine if we need to invoke
populateOutgoingBatch() otherwise we'll still go over all the records in the
right. 2. Once we have processed one left batch this is also useful to check if
we need to end processing. If I remove this I will have to maintain some other
state like 'isFirst' in the template which does not serve the purpose.

On April 8, 2015, 10:55 p.m., Hanifi Gunes wrote:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java,
line 98
https://reviews.apache.org/r/32945/diff/1/?file=920309#file920309line98

Afaik any non-long integer arithmetic results in an int by default. We
may save some cpu cycles if we declare nextRightRecordToProcess be short
and get rid of bit masking here.

Also is there any reason for not altering method signature to
emitRight(compositeIndex, recordIndex, outIndex) instead of making this
computation for each tuple?

Good catch.

- Mehant

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32945/#review79434
---

On April 9, 2015, 9:03 p.m., Mehant Baid wrote:

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32945/
---

(Updated April 9, 2015, 9:03 p.m.)

Review request for drill and Aman Sinha.

Repository: drill-git

Description
---

This patch implements the nested loop join operator. The main changes are in
the files NestedLoopJoinBatch and NestedLoopJoinTemplate. This patch only
contains the execution changes. Planning patch will be posted in a separate
review request by Aman.

Diffs
-

exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractPhysicalVisitor.java
27b0ecb

exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/PhysicalVisitor.java
e6a89d0

exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/NestedLoopJoinPOP.java
PRE-CREATION

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoin.java
PRE-CREATION

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatch.java
PRE-CREATION

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatchCreator.java
PRE-CREATION

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java
PRE-CREATION

exec/java-exec/src/main/java/org/apache/drill/exec/record/ExpandableHyperContainer.java
90310e2
protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java
9a9d196
protocol/src/main/protobuf/UserBitShared.proto 5e44655

Diff: https://reviews.apache.org/r/32945/diff/

Testing
---

The tests are dependent on the planning changes, hence not uploaded as part
of this patch. However
https://github.com/mehant/drill/blob/notin_1/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java
is a working branch that contains a bunch of tests (planning execution)
added for nested loop join.

Thanks,

Mehant Baid

[jira] [Created] (DRILL-2715) Implement nested loop join operator

2015-04-07 Thread Mehant Baid (JIRA)

Mehant Baid created DRILL-2715:
--

 Summary: Implement nested loop join operator
 Key: DRILL-2715
 URL: https://issues.apache.org/jira/browse/DRILL-2715
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 0.9.0


For certain types of queries with scalar sub queries and others with 'not in' 
clause Calcite produces plans with nested loop join. This JIRA covers the 
changes required to implement the execution side of the nested loop join 
operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 32945: DRILL-2715: Implement nested loop join operator

2015-04-07 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32945/
---

Review request for drill and Aman Sinha.


Repository: drill-git


Description
---

This patch implements the nested loop join operator. The main changes are in 
the files NestedLoopJoinBatch and NestedLoopJoinTemplate. This patch only 
contains the execution changes. Planning patch will be posted in a separate 
review request by Aman.


Diffs
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractPhysicalVisitor.java
 27b0ecb 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/PhysicalVisitor.java
 e6a89d0 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/NestedLoopJoinPOP.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/ExpandableHyperContainerContext.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoin.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatch.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatchCreator.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java
 PRE-CREATION 
  protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java 9a9d196 
  protocol/src/main/protobuf/UserBitShared.proto 5e44655 

Diff: https://reviews.apache.org/r/32945/diff/


Testing
---

The tests are dependent on the planning changes, hence not uploaded as part of 
this patch. However 
https://github.com/mehant/drill/blob/notin_1/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java
 is a working branch that contains a bunch of tests (planning  execution) 
added for nested loop join.


Thanks,

Mehant Baid

[jira] [Resolved] (DRILL-2511) Assert with full outer join when one of the join predicates is of a required type (nullabe parquet)

2015-04-03 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-2511.

Resolution: Cannot Reproduce
  Assignee: Victoria Markman  (was: Mehant Baid)

The files uploaded don't contain required types. However I created parquet file 
with required types and tried a similar query and it seems to work on the 
latest master. Could you verify if this is still reproducing for you?

 Assert with full outer join when one of the join predicates is of a required 
 type (nullabe parquet)
 ---

 Key: DRILL-2511
 URL: https://issues.apache.org/jira/browse/DRILL-2511
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Victoria Markman
 Fix For: 0.9.0

 Attachments: t3.parquet, t4.parquet


 Columns in tables j3 and j4 are created as 'required' data type:
 {code}
 [Fri Mar 20 11:30:42 root@~/parquet-tools-1.5.1-SNAPSHOT ] # ./parquet-schema 
 ~/0_0_0.parquet
 message root {
   required binary c_varchar (UTF8);
   required int32 c_integer;
   required int64 c_bigint;
   required float c_float;
   required double c_double;
   required int32 c_date (DATE);
   required int32 c_time (TIME);
   required int64 c_timestamp (TIMESTAMP);
   required boolean c_boolean;
   required double d9;
   required double d18;
   required double d28;
   required double d38;
 }
 {code}
 Full outer join on j3/j4 asserts.
 This is happening with the join predicate of every SQL type except boolean.
 {code}
 select * from j3 full outer join j4 on (j3.c_varchar = j4.c_varchar);
 java.lang.AssertionError at 
 org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:382)
 at 
 org.apache.drill.exec.vector.VarCharVector$Accessor.getObject(VarCharVector.java:408)
 at 
 org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:98)
 at 
 org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:137)
 at 
 org.apache.drill.jdbc.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:146)
 at 
 net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
 at sqlline.SqlLine$Rows$Row.init(SqlLine.java:2388)
 at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2504)
 at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
 at sqlline.SqlLine.print(SqlLine.java:1809)
 at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
 at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
 at sqlline.SqlLine.dispatch(SqlLine.java:889)
 at sqlline.SqlLine.begin(SqlLine.java:763)
 at sqlline.SqlLine.start(SqlLine.java:498)
 at sqlline.SqlLine.main(SqlLine.java:460)
 {code}
 Same problem happens if you one table column types are optional and the other 
 one is required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions

2015-04-03 Thread Mehant Baid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/
---

(Updated April 3, 2015, 8:13 a.m.)


Review request for drill and Aman Sinha.


Changes
---

Updated patch to address review comment from Aman to not use hashing as double 
for aggregates and only to use it for joins and distribution.


Repository: drill-git


Description
---

If the return type of the two expressions in the join condition are of 
different types then we need to inject a cast on one of the sides for 
comparison functions to work as expected


Diffs (updated)
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64AsDouble.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64Functions.java
 57154ed 
  
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64FunctionsWithSeed.java
 b9ec956 
  
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64WithSeedAsDouble.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
 84a2956 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java
 f7c144f 
  
exec/java-exec/src/main/java/org/apache/drill/exec/resolver/TypeCastRules.java 
d8652f2 
  exec/java-exec/src/test/java/org/apache/drill/TestFunctionsQuery.java f1005ab 

Diff: https://reviews.apache.org/r/31160/diff/


Testing
---

Added unit tests with hash join and merge join


Thanks,

Mehant Baid

[jira] [Resolved] (DRILL-2546) Implicit Cast : Joining 2 tables on columns which are float and double should succeed when the values are equal

2015-04-03 Thread Mehant Baid (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-2546.

Resolution: Invalid

Certain floating point values cannot be represented accurately in binary, hence 
their imprecise representations are used while storing it as float and double. 
However the imprecise representations of float and double are not the same. So 
when float is implicitly promoted to double the equality comparison will fail 
for the same initial value. This behavior is consistent with postgres and in 
general its not a good idea to be joining on columns with float and double. 
Here is a stackoverflow link talking about a similar issue: 
http://stackoverflow.com/questions/16627813/why-is-comparing-a-float-to-a-double-inconsistent-in-java

 Implicit Cast : Joining 2 tables on columns which are float and double should 
 succeed when the values are equal
 ---

 Key: DRILL-2546
 URL: https://issues.apache.org/jira/browse/DRILL-2546
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types, Execution - Relational Operators
Reporter: Rahul Challapalli
Assignee: Mehant Baid
 Fix For: 0.9.0

 Attachments: fewtypes_null.json, fewtypes_null.parquet


 git.commit.id.abbrev=f1b59ed
 I attached 2 files which contains the same values. 
 The below query is not doing an implicit cast between double and float
 {code}
 select count(*) from dfs.`cross-sources`.`fewtypes_null.parquet` p
 . . . . . . . . . . . . . .  inner join 
 dfs.`cross-sources`.`fewtypes_null.json` o
 . . . . . . . . . . . . . .  on p.float_col=o.float_col;
 ++
 |   EXPR$0   |
 ++
 | 1  |
 ++
 1 row selected (0.148 seconds)
 {code}
 However if we do an explicit cast, we get the right result
 {code}
 select count(*) from dfs.`cross-sources`.`fewtypes_null.parquet` p
 . . . . . . . . . . . . . .  inner join 
 dfs.`cross-sources`.`fewtypes_null.json` o
 . . . . . . . . . . . . . .  on p.float_col=cast(o.float_col as float);
 ++
 |   EXPR$0   |
 ++
 | 17 |
 ++
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 163 matches

Mail list logo