[jira] [Resolved] (DRILL-2138) Failure while querying MapR-DB Views/Tables

2015-02-02 Thread Abhishek Girish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish resolved DRILL-2138.

Resolution: Later

> Failure while querying MapR-DB Views/Tables
> ---
>
> Key: DRILL-2138
> URL: https://issues.apache.org/jira/browse/DRILL-2138
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Abhishek Girish
>Assignee: Jacques Nadeau
>Priority: Critical
>
> I'm hitting an issue while querying views on top of MapR-DB tables. Issuing a 
> simple select * on any one of the views succeeds after which any subsequent 
> query fails, irrespective of the schema. 
> > select * from customer limit 1;
> ++++-+++--++
> | c_custkey  |   c_name   | c_address  | c_nationkey |  c_phone   | c_acctbal 
>  | c_mktsegment | c_comment  |
> ++++-+++--++
> | 1  | Customer#1 | IVhzIApeRb ot,c,E | 15  | 
> 25-989-741-2988 | 711.56 | BUILDING | to the even, regular platelets. 
> regular, ironic epitaphs nag e |
> ++++-+++--++
> 1 row selected (1.113 seconds)
> > select * from customer limit 1;
> Query failed: SqlValidatorException: Table 'customer' not found
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> > use dfs.tmp;
> +++
> | ok |  summary   |
> +++
> | true   | Default schema changed to 'dfs.tmp' |
> +++
> 1 row selected (0.179 seconds)
> > show files;
> Query failed: IOException: Filesystem closed
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> Log:
> 2015-02-01 18:39:29,437 [2b311c9e-2130-ead7-d1bc-df1148f48cff:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - State change requested.  PENDING --> 
> FAILED
> org.apache.drill.exec.planner.sql.QueryInputException: Failure handling SQL.
> at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:149)
>  ~[drill-java-exec-0.7.0-r3-SNAPSHOT-rebuffed.jar:0.7.0-r3-SNAPSHOT]
>   ...
> Caused by: java.io.IOException: Filesystem closed
> at com.mapr.fs.MapRFileSystem.checkOpen(MapRFileSystem.java:1359) 
> ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr]
> at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:481) 
> ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr]
> at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:567) 
> ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr]
> at 
> com.mapr.fs.MapRFileSystem.listMapRStatus(MapRFileSystem.java:1276) 
> ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr]
> at com.mapr.fs.MapRFileSystem.listStatus(MapRFileSystem.java:1334) 
> ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr]
> at com.mapr.fs.MapRFileSystem.listStatus(MapRFileSystem.java:66) 
> ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr]
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1519) 
> ~[hadoop-common-2.4.1-mapr-1408.jar:na]
> ...
> ...
> 2015-02-01 18:39:29,443 [2b311c9e-2130-ead7-d1bc-df1148f48cff:foreman] ERROR 
> o.a.drill.exec.work.foreman.Foreman - Error 
> e55832c9-35a8-4287-9eb4-758b1b241cb1: IOException: Filesystem closed
> org.apache.drill.exec.planner.sql.QueryInputException: Failure handling SQL.
> at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:149)
>  ~[drill-java-exec-0.7.0-r3-SNAPSHOT-rebuffed.jar:0.7.0-r3-SNAPSHOT]
> ...
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
> Caused by: java.io.IOException: Filesystem closed
> at com.mapr.fs.MapRFileSystem.checkOpen(MapRFileSystem.java:1359) 
> ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr]
> at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:481) 
> ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr]
> at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:567) 
> ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr]
> at 
> com.mapr.fs.MapRFileSystem.listMapRStatus(MapRFileSystem.java:1276) 
> ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr]
> at com.mapr.fs.MapRFileSystem.listStatus(MapRFileSystem.java:1334) 
> ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr]
> at com.mapr.fs.MapRFileSystem.listStatus(MapRFileSystem.java:66) 
> ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr]
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1519) 
> ~[hadoop-common-2.4.1-mapr-1408.jar:na]
>...
> 

[jira] [Created] (DRILL-2139) Star is not expanded correctly in "select distinct" query

2015-02-02 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-2139:
---

 Summary: Star is not expanded correctly in "select distinct" query
 Key: DRILL-2139
 URL: https://issues.apache.org/jira/browse/DRILL-2139
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni


{code}
0: jdbc:drill:schema=dfs> select distinct * from t1;
++
| *  |
++
| null   |
++
1 row selected (0.14 seconds)


0: jdbc:drill:schema=dfs> select distinct * from `test.json`;
++
| *  |
++
| null   |
++
1 row selected (0.163 seconds)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-2140) RPC Error querying JSON with empty nested maps

2015-02-02 Thread Andries Engelbrecht (JIRA)
Andries Engelbrecht created DRILL-2140:
--

 Summary: RPC Error querying JSON with empty nested maps
 Key: DRILL-2140
 URL: https://issues.apache.org/jira/browse/DRILL-2140
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - RPC
Affects Versions: 0.7.0
 Environment: Centos 4 node MapR cluster
Reporter: Andries Engelbrecht
Assignee: Jacques Nadeau


When querying large number of documents in multiple directories with multiple 
JSON files in each, and some documents have no top level map that is used for a 
predicate, Drill produces a RPC error in the log.

Query
select t.retweeted_status.`user`.name as name, 
count(t.retweeted_status.favorited) as rt_count from `./nfl` t where 
t.retweeted_status.`user`.name is not null group by 
t.retweeted_status.`user`.name order by count(t.retweeted_status.favorited) 
desc limit 10;

Screen Error
Query failed: Query failed: Failure while running fragment., index: 0, length: 
1 (expected: range(0, 0)) [ b96e3bfa-74c9-4b78-886b-9a2c3fc4ea9b on 
se-node13.se.lab:31010 ]
[ b96e3bfa-74c9-4b78-886b-9a2c3fc4ea9b on se-node13.se.lab:31010 ]

Drillbit log attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 30503: Process the null comparisons for IS NOT DISTINCT FROM operator.

2015-02-02 Thread Aman Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30503/
---

Review request for drill, Jinfeng Ni and Mehant Baid.


Bugs: DRILL-2092
https://issues.apache.org/jira/browse/DRILL-2092


Repository: drill-git


Description
---

For queries of the form SELECT a, COUNT(DISTINCT b), SUM(b) FROM T GROUP BY a,  
Calcite generates a plan where there are 2 subqueries each of which corresponds 
to an aggregate function with group-by and the 2 subqeuries are joined in the 
outer query block on the group-by columns.  This join-back uses 'IS NOT 
DISTINCT FROM' rather than equality..e.g here's an extract from the Explain: 
  DrillJoinRel(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT 
FROM($1, $6))], joinType=[inner])

The IS NOT DISTINCT FROM differs from '=' in terms of null handling.  null == 
null is FALSE, whereas null IS NOT DISTINCT FROM null is TRUE.  This patch 
handles the nulls for this comparator both during planning and execution 
operations.


Diffs
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
 fd6a3e2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 4af0292 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
 14bc094 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java
 d9a7277 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java
 d5473f2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java
 f6b7ef6 
  
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java
 2b3ff50 

Diff: https://reviews.apache.org/r/30503/diff/


Testing
---

Added new unit tests.  Ran unit tests, functional suite and tpch 100.


Thanks,

Aman Sinha



Re: Review Request 30051: DRILL-1908: new window function implementation

2015-02-02 Thread Chris Westin


> On Jan. 30, 2015, 1:12 a.m., Chris Westin wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/GenerateTestData.java,
> >  line 121
> > 
> >
> > Make the file prefix a command-line argument so that people besides 
> > yourself can run this.
> 
> abdelhakim deneche wrote:
> Argh, I forgot to remove GenerateTestData again!
> I just used this to generate the data used in the tests, it was never 
> intended to be part of the final code.
> Sorry about that

I would check it in. We might need it again in the future. What if something 
changes and we have to re-generate the test data?


> On Jan. 30, 2015, 1:12 a.m., Chris Westin wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java,
> >  line 32
> > 
> >
> > logger should be private.
> 
> Jacques Nadeau wrote:
> I disagree.  Our current standard is package private.  If you think we 
> should change this throughout the code, we should have a discussion but my 
> preference is to maintain the current standard until we decide upon a new one.

Loggers identify their source in log messages thanks to the class argument 
given to getLogger(). They're meant to be associated with a class in a 
one-to-one manner -- why else would getLogger() have this parameter?

There are no uses of the pattern "otherclass.logger...", so there's no reason 
for them to be package private. However, I have come across a few uses where a 
derived class uses the logger from its base class, and this is confusing. This 
has sent me looking in the wrong place for the source of the message, so we 
shouldn't do it. I've assumed it was accidental, and slipped by because the 
base class's logger wasn't private, so the author was able to use it without 
realizing it. Making them private will prevent that, and ensure that log 
messages correctly identify their real source.

Because we have not written standard that described this, and because it goes 
against common best practice elsewhere, I've been converting these to private 
wherever I've come across them. In only a couple of cases has this made me add 
new loggers where a derived class was accidentally using its base class's 
logger.


- Chris


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review70296
---


On Jan. 28, 2015, 7:50 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> ---
> 
> (Updated Jan. 28, 2015, 7:50 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
> https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch was needed. This patch adds a new 
> WindowFrameRecordBatch that correctly handles window functions with or 
> without order by clauses.
> This code still lacks support for frame clauses and may be optimized to 
> reduce unneeded frame computations.
> 
> 
> Diffs
> -
> 
>   
> common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java
>  28424a5 
>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 
> 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
> 5288f5d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
>  17738ee 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/OverFinder.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
>  9b8929f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
>  26d23f2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
>  e2c7e9e 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
>  9588cef 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/e

[jira] [Created] (DRILL-2141) Data type error in group by and order by for JSON

2015-02-02 Thread Andries Engelbrecht (JIRA)
Andries Engelbrecht created DRILL-2141:
--

 Summary: Data type error in group by and order by for JSON
 Key: DRILL-2141
 URL: https://issues.apache.org/jira/browse/DRILL-2141
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 0.7.0
Reporter: Andries Engelbrecht
Assignee: Daniel Barclay (Drill/MapR)
 Attachments: drillbit.log

When doing group by and oder by on complex nested JSON getting Data type errors.

Query:
select t.retweeted_status.`user`.name as name, count(t.retweeted_status.id) as 
rt_count from `./nfl` t where t.retweeted_status.`user`.name is not null group 
by t.retweeted_status.`user`.name order by count(t.retweeted_status.id) desc 
limit 10;

Screen output:
Query failed: Query failed: Failure while running fragment., Failure while 
reading vector.  Expected vector class of 
org.apache.drill.exec.vector.NullableIntVector but was holding vector class 
org.apache.drill.exec.vector.NullableVarCharVector. [ 
c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]
[ c6ea670f-5fa0-491c-acfb-5ccd128ec324 on drilldemo:31010 ]


java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
query.
at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
at sqlline.SqlLine.print(SqlLine.java:1809)
at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
at sqlline.SqlLine.dispatch(SqlLine.java:889)
at sqlline.SqlLine.begin(SqlLine.java:763)
at sqlline.SqlLine.start(SqlLine.java:498)
at sqlline.SqlLine.main(SqlLine.java:460)

Drill log attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30466: DRILL-133: LocalExchange (planning and parallelization)

2015-02-02 Thread Venki Korukanti

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30466/
---

(Updated Feb. 2, 2015, 7:21 p.m.)


Review request for drill, Chris Westin, Jacques Nadeau, and Steven Phillips.


Repository: drill-git


Description
---

In this patch LocalExchange contains only multiplexing exchange. Currently 
working on demultiplexing exchange (there are few failures in executions and 
currently debugging those). Sending this patch to get initial feedback.

Brief overview of changes:
1. Traverse the PRel tree after all optimizations. Whenever a 
HashToRandomExchangePrel is encountered insert a MuxExchange before 
HashToRandomExchangePrel and DemuxExchange after HashToRandomExchangePrel.
2. Parallelization changes: 
   i) Traverse the physical operator tree and divide it into Fragments.
   ii) Based on the affinity of the sending Exchange, set parallelization 
dependencies between fragments. 
   iii) Start parallelizing from the leaf fragments (fragment that have no 
other fragments depending on them for parallelization info). Stats collection 
include collecting parallelization info (minWidth, maxWidth, affinityMap) and 
cost.
3. Change the Receiver to accept set of (minorFragmentId, DrillbitEndpoint) as 
sender list. This also involved few changes in DataCollector.
4. Change SingleSender to accept custom minorFragmentId instead of default 
minorFragmentId of 0


Diffs
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/EndpointAffinity.java
 df31f74 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractExchange.java
 73280ea 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScan.java
 5d0d9bf 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/Exchange.java 
7be7f20 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/GroupScan.java 
23860a3 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/HasAffinity.java
 52462db 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/Receiver.java 
0c67770 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/Store.java 
94411ea 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/BroadcastExchange.java
 73a1d20 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashToMergeExchange.java
 f62d922 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashToRandomExchange.java
 fac374b 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/MergingReceiverPOP.java
 f5dca1a 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/MuxExchange.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/OrderedPartitionExchange.java
 8e1526a 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/Screen.java 
58c8e29 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/SingleMergeExchange.java
 2914112 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/SingleSender.java
 4a11a51 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/UnionExchange.java
 cfc21ac 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/UnorderedReceiver.java
 3a4dd0e 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/SingleSenderCreator.java
 6db9f4a 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Fragment.java
 ac63bde 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/MakeFragmentsVisitor.java
 8756e5b 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java
 961b603 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/ParallelizationInfo.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/PlanningSet.java
 8cc6c85 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/SimpleParallelizer.java
 434cdd4 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Stats.java 
eda364b 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/StatsCollector.java
 41ff678 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Wrapper.java
 86b395e 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MuxExchangePrel.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 faa8546 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/InsertLocalExchangeVisitor.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
 79603eb 
  
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
 f20627d 
  
exec/java-exec/src/

Re: Review Request 30305: Add IO wait time stats for Parquet and Json input files

2015-02-02 Thread Steven Phillips

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30305/#review70608
---

Ship it!


Ship It!


exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java


remove



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java


remove


- Steven Phillips


On Jan. 27, 2015, 4:06 a.m., Venki Korukanti wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30305/
> ---
> 
> (Updated Jan. 27, 2015, 4:06 a.m.)
> 
> 
> Review request for drill, Jacques Nadeau and Steven Phillips.
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> See DRILL-2080 for details.
> 
> 
> Diffs
> -
> 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/DotDrillFile.java 
> 009cd00 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/DotDrillUtil.java 
> 63b22e9 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
>  d68a5b5 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/ShowFileHandler.java
>  ff3542d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/BasicFormatMatcher.java
>  2ba2910 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFSDataInputStream.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java
>  cf8937f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java
>  db6c0c7 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FormatCreator.java
>  e5c0487 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FormatPlugin.java
>  27f83f0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
>  7b9d52c 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java
>  9cc1808 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyGroupScan.java
>  b505535 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/shim/DrillFileSystem.java
>  d3f9134 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/shim/DrillInputStream.java
>  8c3b5ae 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/shim/DrillOutputStream.java
>  8e56232 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/shim/FileSystemCreator.java
>  a5ad257 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/shim/fallback/FallbackFileSystem.java
>  959529a 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JSONFormatPlugin.java
>  d41243d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JSONRecordReader.java
>  0070d18 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java
>  b64a032 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java
>  109033a 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
>  8ddf5fd 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java
>  dc1d892 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetReader.java
>  25f383f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/sys/local/FilePStore.java
>  40f25e7 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/sys/local/LocalPStoreProvider.java
>  ac53a61 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/sys/zk/ZkPStore.java 
> a597381 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/sys/zk/ZkPStoreProvider.java
>  f8fa2bc 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/store/dfs/TestDrillFileSystem.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30305/diff/
> 
> 
> Testing
> ---
> 
> Added a unittest for DrillFileSystem. Need to add couple of tests that 
> involve querying actual JSON/Parquet files. Will add those in next patch.
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>



Join us for the Drill Google hangout tomorrow 10am Pacific

2015-02-02 Thread Jason Altekruse
Hello Drillers,

Please join us tomorrow at 10am Pacific for our community meeting. If you
are new to Drill, have questions about the current work being done
throughout the community, or you just want to listen in, anyone is welcome
to participate. The link is always available from the website under
"Community -> Get Involved", I have copied it below as well.

https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

As I have done for the last few sessions, I encourage anyone thinking of
attending to suggest topics they would like to discuss.

Thanks,
Jason Altekruse


[jira] [Created] (DRILL-2142) Refactor complex value vector classes

2015-02-02 Thread Mehant Baid (JIRA)
Mehant Baid created DRILL-2142:
--

 Summary: Refactor complex value vector classes
 Key: DRILL-2142
 URL: https://issues.apache.org/jira/browse/DRILL-2142
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid


Currently we have one base class for the complex vectors. 
AbstractContainerVector is the base class for RepeatedListVector, 
RepeatedMapVector and MapVector. However lot of the functionality related to 
maps in AbstractContainerVector is not needed by RepeatedListVector. In this 
patch I would like to move all the map related logic to a new base class called 
AbstractMapVector which would be used by RepeatedMap and Map classes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-2143) Remove RecordBatch from setup method of

2015-02-02 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-2143:
--

 Summary: Remove RecordBatch from setup method of 
 Key: DRILL-2143
 URL: https://issues.apache.org/jira/browse/DRILL-2143
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jason Altekruse






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30466: DRILL-133: LocalExchange (planning and parallelization)

2015-02-02 Thread Chris Westin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30466/#review70618
---



exec/java-exec/src/main/java/org/apache/drill/exec/physical/EndpointAffinity.java


It looks like endpoint should be final.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/EndpointAffinity.java


If I created this for a particular endpoint, why would I want to be able to 
change it?

If I do change it, why doesn't that invalidate the affinity value?



exec/java-exec/src/main/java/org/apache/drill/exec/physical/EndpointAffinity.java


What does it mean for the two endpoint values not to be equal? How do you 
expect this to be used?



exec/java-exec/src/main/java/org/apache/drill/exec/physical/EndpointAffinity.java


Can this condition ever occur? It it actually possible to reach 
POSITIVE_INFINITY by adding to the affinity as in addAffinity() (and the 
condition in there also looks suspicious)?



exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractExchange.java


Why are these not final (and set in an AbstractExchange constructor)?

Is the MuxExchange below, the sender major fragment id is referenced, but 
has never been set. It looks like these aren't both used. Perhaps they 
shouldn't be declared in this class, but each class that does need them should 
declare whichever one it needs itself.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScan.java


This will make a new (empty) list each time it is called. Shouldn't the 
class be hanging on to its currently known operator affinity list, even an 
empty one?



exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/Exchange.java


It would be helpful to expand on this comment with some of the reasons 
these might have affinity with each other. In my mind, they would always want 
to be together (if possible) to avoid any network transit, so I'm having a hard 
time with finding a reason they wouldn't want to have affinity.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/MergingReceiverPOP.java


From these declarations, we can't tell what kind of List senders is; it 
might be an ArrayList, or it might be a LinkedList. In the latter case, get(i) 
is expensive, because it will iterate down the list to get to each item. 
Because of that, we should iterate over that list instead, something like this:

int i = 0;
for(DrillbitEndpoint de : senders) {
  this.senders.put(i, de);
  ++i;
}



exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/MuxExchange.java


"this." isn't necessary here.

I don't see any code that actually sets this variable.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/UnorderedReceiver.java


Why don't we make this map final?



exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/UnorderedReceiver.java


This code is the same as that in MergingReceiverPOP (which has one 
additional different line at its end); should it be pulled up into 
AbstractReceiver?



exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/UnorderedReceiver.java


How does this get used? Would it be dangerous if the caller modified the 
map? Should we return a copy?


- Chris Westin


On Feb. 2, 2015, 7:21 p.m., Venki Korukanti wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30466/
> ---
> 
> (Updated Feb. 2, 2015, 7:21 p.m.)
> 
> 
> Review request for drill, Chris Westin, Jacques Nadeau, and Steven Phillips.
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In this patch LocalExchange contains only multiplexing exchange. Currently 
> working on demultiplexing exchange (there are few failures in executions and 
> currently debugging those). Sending this patch to get initial feedback.
> 
> Brief overview of changes:
> 1. Traverse the PRel tree after all optimizations. Whenever a 
> HashToRandomExchangePrel is encountered insert a MuxExchange before 
> HashToRandomExchangePrel and DemuxExchange after 

[jira] [Created] (DRILL-2144) Refactor repeated map to be structured like other repeated types

2015-02-02 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-2144:
--

 Summary: Refactor repeated map to be structured like other 
repeated types
 Key: DRILL-2144
 URL: https://issues.apache.org/jira/browse/DRILL-2144
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Jason Altekruse


Currently there is a lot of shared code between map vector and repeated map 
vector. We should instead use the pattern from the other repeated types, where 
a repeated wrapper is added around the scalar type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


jdbc unit tests -- connection caching

2015-02-02 Thread Hanifi Gunes
Folks,

I have been working on DRILL-2127 to enable connection caching in the jdbc
test suites. The patch provides a convenience method for creating a
connection via JdbcTest#connect. The caching happens transparently. Dev's
should not be concerned about explicitly caching a connection. As a good
practice, however, we should still write clean, integrant unit tests,
freeing the resources after use. This should include closing any
connections created in the test case imho.

I just wanted to inform everyone about the patch so that new jdbc unit
tests could hopefully leverage the caching which seems to reduce test
runtime substantially. On my setup jdbc tests went from ~6:30 down to ~4:30
mins.

Let me know.

Regards.
-Hanifi


Re: Review Request 30466: DRILL-133: LocalExchange (planning and parallelization)

2015-02-02 Thread Chris Westin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30466/#review70639
---



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Fragment.java


But addOperator() above will silently drop this on the floor if root is 
already set. Should that give an error similar to the above?



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java


This class has no member variables? Perhaps the constructor should be 
private so that no one else creates more instances, and they are always forced 
to use the singleton?



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/ParallelizationInfo.java


It looks like the affinityMap can be final.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/PlanningSet.java


It looks like the fragmentMap should be final.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/SimpleParallelizer.java


Put the result of context.getOptions() into a local and reuse that 
throughout this function.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/SimpleParallelizer.java


Given how you're using it, roots should be a HashSet.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/SimpleParallelizer.java


Put stats.getParallelizationInfo() into a local, and then reuse that 
throughout the rest of this function.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/SimpleParallelizer.java


Move the declaration of this list down to just before the while() loop 
where it is used, below.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/SimpleParallelizer.java


Why do you create the affinedEPs copy of the endpointAffinityMap values, 
when you could just use endpointAffinityMap.values() directly here?



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/SimpleParallelizer.java


Why does this need to be sorted? What is the sort key?



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/SimpleParallelizer.java


This is a little unusual, because most code would normally not expect the 
size of a collection to change like this, and many would use
final int count = endpoints.size();
while(count < ) {
  ...
  }
A comment here would be really helpful, something like "keep adding 
endpoints until we have the same number as the slot count."



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/SimpleParallelizer.java


Removing random items from a list is going to cause a list traversal for 
each item in the list. In this case, it would be better to create "all" as an 
empty list, and then iterate over activeEndpoints, and only add it to all if it 
is not in the endpointAffinityMap() (which is a hash lookup to test).



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/SimpleParallelizer.java


Don't create this empty list if we don't need it.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/SimpleParallelizer.java


Don't create this empty list if we don't need it.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/SimpleParallelizer.java


I don't understand what this loop is doing. It seems to leave 
sendingEndpoints set to the last parallelized sendingFragment's endpoints. What 
is significant about the last one it finds?



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Wrapper.java


Can this be final?



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Wrapper.java


Can this be final?



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Wrapper.java


this. isn't necessary here.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Wrapper.java


[jira] [Created] (DRILL-2145) Add mysql as the new data sources of drill

2015-02-02 Thread xiaobing liu (JIRA)
xiaobing liu created DRILL-2145:
---

 Summary: Add mysql as the new data sources of drill
 Key: DRILL-2145
 URL: https://issues.apache.org/jira/browse/DRILL-2145
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 0.8.0
 Environment: CentOS 6.5
JDK 1.7.0_71
Reporter: xiaobing liu


Add mysql as the new data sources of drill



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30503: Process the null comparisons for IS NOT DISTINCT FROM operator.

2015-02-02 Thread Jinfeng Ni

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30503/#review70688
---



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java


Is it better to use "equal", instead of "startsWith"? If someone put any 
string starts with "=" in the physical plan, will the code treat it as an 
EQUALS?



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java


Same as the startsWith("=").



exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java


In stead of convert the condition into string first, the call split(), can 
we check the condition is a SqlBinaryOperator and it's SqlKind is 
IS_NOT_DISTINCT_FROM?   Using string comparison sometimes could cause issue, 
when the sql parser/planner uses a different string for that operator.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java


Same comment as in line 125.



exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java


Because of this transformation of join, Drill will not be able to handle 
case without "group by", because it will convert into cartesian join.

select a1, count(distinct a1), sum(b1) from t1;

Maybe we should log a new issue for this particular case?


- Jinfeng Ni


On Feb. 2, 2015, 10:15 a.m., Aman Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30503/
> ---
> 
> (Updated Feb. 2, 2015, 10:15 a.m.)
> 
> 
> Review request for drill, Jinfeng Ni and Mehant Baid.
> 
> 
> Bugs: DRILL-2092
> https://issues.apache.org/jira/browse/DRILL-2092
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> For queries of the form SELECT a, COUNT(DISTINCT b), SUM(b) FROM T GROUP BY 
> a,  Calcite generates a plan where there are 2 subqueries each of which 
> corresponds to an aggregate function with group-by and the 2 subqeuries are 
> joined in the outer query block on the group-by columns.  This join-back uses 
> 'IS NOT DISTINCT FROM' rather than equality..e.g here's an extract from the 
> Explain: 
>   DrillJoinRel(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT 
> DISTINCT FROM($1, $6))], joinType=[inner])
> 
> The IS NOT DISTINCT FROM differs from '=' in terms of null handling.  null == 
> null is FALSE, whereas null IS NOT DISTINCT FROM null is TRUE.  This patch 
> handles the nulls for this comparator both during planning and execution 
> operations.
> 
> 
> Diffs
> -
> 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
>  fd6a3e2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
>  4af0292 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
>  14bc094 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java
>  d9a7277 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java
>  d5473f2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java
>  f6b7ef6 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java
>  2b3ff50 
> 
> Diff: https://reviews.apache.org/r/30503/diff/
> 
> 
> Testing
> ---
> 
> Added new unit tests.  Ran unit tests, functional suite and tpch 100.
> 
> 
> Thanks,
> 
> Aman Sinha
> 
>



Drill-2145: Add mysql as the new data sources of drill

2015-02-02 Thread LIU Xiaobing
Hi all,

Our team will do the work done in 1 day. I have created on issue in the
JIRA. Please pay close attention to it.

Regards


[jira] [Created] (DRILL-2146) Flatten Function on two layer deep field fails to plan

2015-02-02 Thread Sean Hsuan-Yi Chu (JIRA)
Sean Hsuan-Yi Chu created DRILL-2146:


 Summary: Flatten Function on two layer deep field fails to plan
 Key: DRILL-2146
 URL: https://issues.apache.org/jira/browse/DRILL-2146
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: Sean Hsuan-Yi Chu
Assignee: Sean Hsuan-Yi Chu


Use the same data in DRILL-2012 to run this query:

select flatten(j.batters.batter) bb 
from dfs_test.`%s` j where j.type = 'donut'

fails to plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-2147) Refactor ValueVector design

2015-02-02 Thread Hanifi Gunes (JIRA)
Hanifi Gunes created DRILL-2147:
---

 Summary: Refactor ValueVector design
 Key: DRILL-2147
 URL: https://issues.apache.org/jira/browse/DRILL-2147
 Project: Apache Drill
  Issue Type: Bug
Reporter: Hanifi Gunes
Assignee: Hanifi Gunes


The overall design of value vectors has become unclear and inconsistent with 
additions from multiple contributors over the time. Also we need proper 
documentation for the abstractions made for consistently communicating with 
developers. 

There are many instances that indicate possible design issues.

For instance, ValueVector implements Iterator. This seems to 
assume all vectors are somewhat hierarchical. This does not truly capture 
scalar vectors as they have no child.

Similarly, RepeatedVector has the following interface definition:
{code:title=RepeatedVector}
interface RepeatedVector {
  RepeatedFixedWidthVector.RepeatedAccessor getAccessor()
}
{code}

Yet, RepeatedFixedWidthVector implements RepeatedVector as follows
{code:title=RepeatedFixedWidthVector}
interface RepeatedFixedWidthVector extends ValueVector, RepeatedVector {
  interface RepeatedAccessor extends Accessor {...}
  interface RepeatedMutator extends Mutator {...}
}
{code}

A super-type that is aware of its sub-type hints a need for re-design.

Examples could be multiplied here: some method names are not self-explaining or 
wrongly named or seems to be misplaced. There are couple of more places where 
design is not capturing the nature of vectors such like missing abstractions 
for Repeated vs Composite vectors. We should consider a design refactoring.

This is an umbrella issue for tracking ValueVector design refactoring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30503: Process the null comparisons for IS NOT DISTINCT FROM operator.

2015-02-02 Thread Aman Sinha


> On Feb. 3, 2015, 1:39 a.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java,
> >  line 35
> > 
> >
> > Is it better to use "equal", instead of "startsWith"? If someone put 
> > any string starts with "=" in the physical plan, will the code treat it as 
> > an EQUALS?

The comparison expression is actually something like '=($2, $4)' which is why I 
was not able to use exact equality. But I will make this checking smarter..


> On Feb. 3, 2015, 1:39 a.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java,
> >  line 42
> > 
> >
> > Same as the startsWith("=").

see above..


> On Feb. 3, 2015, 1:39 a.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java,
> >  line 127
> > 
> >
> > In stead of convert the condition into string first, the call split(), 
> > can we check the condition is a SqlBinaryOperator and it's SqlKind is 
> > IS_NOT_DISTINCT_FROM?   Using string comparison sometimes could cause 
> > issue, when the sql parser/planner uses a different string for that 
> > operator.

Let me look into it...should be doable.


> On Feb. 3, 2015, 1:39 a.m., Jinfeng Ni wrote:
> > exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java,
> >  line 68
> > 
> >
> > Because of this transformation of join, Drill will not be able to 
> > handle case without "group by", because it will convert into cartesian join.
> > 
> > select a1, count(distinct a1), sum(b1) from t1;
> > 
> > Maybe we should log a new issue for this particular case?

There might be an open JIRA .. I will check.


- Aman


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30503/#review70688
---


On Feb. 2, 2015, 6:15 p.m., Aman Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30503/
> ---
> 
> (Updated Feb. 2, 2015, 6:15 p.m.)
> 
> 
> Review request for drill, Jinfeng Ni and Mehant Baid.
> 
> 
> Bugs: DRILL-2092
> https://issues.apache.org/jira/browse/DRILL-2092
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> For queries of the form SELECT a, COUNT(DISTINCT b), SUM(b) FROM T GROUP BY 
> a,  Calcite generates a plan where there are 2 subqueries each of which 
> corresponds to an aggregate function with group-by and the 2 subqeuries are 
> joined in the outer query block on the group-by columns.  This join-back uses 
> 'IS NOT DISTINCT FROM' rather than equality..e.g here's an extract from the 
> Explain: 
>   DrillJoinRel(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT 
> DISTINCT FROM($1, $6))], joinType=[inner])
> 
> The IS NOT DISTINCT FROM differs from '=' in terms of null handling.  null == 
> null is FALSE, whereas null IS NOT DISTINCT FROM null is TRUE.  This patch 
> handles the nulls for this comparator both during planning and execution 
> operations.
> 
> 
> Diffs
> -
> 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
>  fd6a3e2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
>  4af0292 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
>  14bc094 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java
>  d9a7277 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java
>  d5473f2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java
>  f6b7ef6 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java
>  2b3ff50 
> 
> Diff: https://reviews.apache.org/r/30503/diff/
> 
> 
> Testing
> ---
> 
> Added new unit tests.  Ran unit tests, functional suite and tpch 100.
> 
> 
> Thanks,
> 
> Aman Sinha
> 
>



[jira] [Created] (DRILL-2148) Wrong result with grouping on a column of date type with streaming aggregation

2015-02-02 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-2148:
---

 Summary: Wrong result with grouping on a column of date type with 
streaming aggregation
 Key: DRILL-2148
 URL: https://issues.apache.org/jira/browse/DRILL-2148
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Operators
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Chris Westin
Priority: Critical


Disable hash aggregation  and run query below:
{code}

alter system set `planner.enable_hashagg` = false;

select
c_date,
COUNT(*)
fromt1
group by
c_date
order by
c_date;

{code}

You will get wrong result. Because NULLs are sorted in the middle ( see 
DRILL-2084 ) they are folded in one of the non related groups.
We might have the same problem with the merge join on date, time and timestamp 
columns.
Attached is a parquet file that was used in this query.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-2008) select a || b from ... fails while select cast(a as varchar) || cast(b as varchar) from ...

2015-02-02 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu resolved DRILL-2008.
--
Resolution: Duplicate

Solved by 2054

> select a || b from ... fails while select cast(a as varchar) || cast(b as 
> varchar) from ...
> ---
>
> Key: DRILL-2008
> URL: https://issues.apache.org/jira/browse/DRILL-2008
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
> Fix For: 0.8.0
>
>
> The first query would fail at planning by optiq/calcite. 
> For example,
> select n_name || n_name from cp.`tpch/nation.parquet`;
> org.apache.drill.exec.rpc.RpcException: Query failed: Unexpected exception 
> during fragment initialization: null
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.batchArrived(QueryResultHandler.java:79)
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:93)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:52)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:34)
>   at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:58)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:194)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:173)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:161)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>   at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)