Re: Review Request 34541: DRILL-3147: tpcds-sf1-parquet query 73 causes memory leak

2015-06-01 Thread Jason Altekruse

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34541/#review85986
---

Ship it!


Ship It!

- Jason Altekruse


On May 27, 2015, 2:19 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34541/
> ---
> 
> (Updated May 27, 2015, 2:19 p.m.)
> 
> 
> Review request for drill, Chris Westin and Jacques Nadeau.
> 
> 
> Bugs: DRILL-3147
> https://issues.apache.org/jira/browse/DRILL-3147
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> - BaseRawBatchBuffer methods enqueue() and kill() are now synchronized
> - TestTpcdsSf1Leak test reproduces the leak, it's ignored by default because 
> it requires a large dataset
> 
> 
> Diffs
> -
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/rpc/data/DataServer.java 
> 80d2d6e 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/BaseRawBatchBuffer.java
>  11b6cc8 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/RootFragmentManager.java
>  b770a33 
>   exec/java-exec/src/test/java/org/apache/drill/BaseTestQuery.java a07f621 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/server/TestTpcdsSf1Leaks.java
>  PRE-CREATION 
>   exec/java-exec/src/test/resources/tpcds-sf1/q73.sql PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/34541/diff/
> 
> 
> Testing
> ---
> 
> still need to run the tests!
> 
> 
> Thanks,
> 
> abdelhakim deneche
> 
>



Re: Review Request 34541: DRILL-3147: tpcds-sf1-parquet query 73 causes memory leak

2015-06-01 Thread Sudheesh Katkam

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34541/#review85997
---


Ship it (non-binding)

- Sudheesh Katkam


On May 27, 2015, 2:19 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34541/
> ---
> 
> (Updated May 27, 2015, 2:19 p.m.)
> 
> 
> Review request for drill, Chris Westin and Jacques Nadeau.
> 
> 
> Bugs: DRILL-3147
> https://issues.apache.org/jira/browse/DRILL-3147
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> - BaseRawBatchBuffer methods enqueue() and kill() are now synchronized
> - TestTpcdsSf1Leak test reproduces the leak, it's ignored by default because 
> it requires a large dataset
> 
> 
> Diffs
> -
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/rpc/data/DataServer.java 
> 80d2d6e 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/BaseRawBatchBuffer.java
>  11b6cc8 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/RootFragmentManager.java
>  b770a33 
>   exec/java-exec/src/test/java/org/apache/drill/BaseTestQuery.java a07f621 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/server/TestTpcdsSf1Leaks.java
>  PRE-CREATION 
>   exec/java-exec/src/test/resources/tpcds-sf1/q73.sql PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/34541/diff/
> 
> 
> Testing
> ---
> 
> still need to run the tests!
> 
> 
> Thanks,
> 
> abdelhakim deneche
> 
>



[jira] [Created] (DRILL-3228) Implement Embedded Type

2015-06-01 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-3228:
-

 Summary: Implement Embedded Type
 Key: DRILL-3228
 URL: https://issues.apache.org/jira/browse/DRILL-3228
 Project: Apache Drill
  Issue Type: Task
  Components: Execution - Codegen, Execution - Data Types, Execution - 
Relational Operators, Functions - Drill
Reporter: Jacques Nadeau
Assignee: Jacques Nadeau


An Umbrella for the implementation of Embedded types within Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3229) Create a new EmbeddedVector

2015-06-01 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-3229:
-

 Summary: Create a new EmbeddedVector
 Key: DRILL-3229
 URL: https://issues.apache.org/jira/browse/DRILL-3229
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Jacques Nadeau


Embedded Vector will leverage a binary encoding for holding information about 
type for each individual field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34829: DRILL-3190: Invalid FragmentState transition from CANCELLATION_REQUESTED in QueryManager

2015-06-01 Thread Sudheesh Katkam


> On May 29, 2015, 9:49 p.m., abdelhakim deneche wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/FragmentData.java,
> >  line 69
> > 
> >
> > Foreman state management is already split between the Foreman and the 
> > QueryManager. This will make the FragmentData handle part of that logic too!
> > 
> > Shouldn't we move this to the QueryManager instead ? at least we would 
> > have one less class to check involved

Will make the change :)


- Sudheesh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34829/#review85822
---


On May 29, 2015, 8:44 p.m., Sudheesh Katkam wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34829/
> ---
> 
> (Updated May 29, 2015, 8:44 p.m.)
> 
> 
> Review request for drill, abdelhakim deneche, Chris Westin, and Jacques 
> Nadeau.
> 
> 
> Bugs: DRILL-3190
> https://issues.apache.org/jira/browse/DRILL-3190
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> DRILL-3190: Check for transition from CANCELLATION_REQUESTED to non-terminal 
> state in FragmentData
> 
> 
> Diffs
> -
> 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/FragmentData.java
>  ceb77f0 
> 
> Diff: https://reviews.apache.org/r/34829/diff/
> 
> 
> Testing
> ---
> 
> Running unit and regresssion tests.
> 
> 
> Thanks,
> 
> Sudheesh Katkam
> 
>



[jira] [Created] (DRILL-3230) Local file system plug-in must be disabled in distributed mode

2015-06-01 Thread Abhishek Girish (JIRA)
Abhishek Girish created DRILL-3230:
--

 Summary: Local file system plug-in must be disabled in distributed 
mode
 Key: DRILL-3230
 URL: https://issues.apache.org/jira/browse/DRILL-3230
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - HTTP
Reporter: Abhishek Girish
Assignee: Jacques Nadeau


The local file system plug-in (The "file:///" connection string in dfs storage 
plug-in) does not behave as expected for both CTAS and querying files, when 
Drill is configured with distributed mode (multiple drill-bits across nodes). 

In case of CTAS, parquet files will be written to a specific node's local file 
system, depending on which Drill-bit the client connects to. And if the table 
is moderate to large in size, Drill may process them in a distributed manner 
and write data into more than one node - data is partitioned into different 
nodes. 

In case of queries, it could be confusing again, as the behavior will depend on 
which drill-bit the client connects to. Hence the behavior seen would be 
inconsistent - queries would return only partial results, which depend on the 
drillbit connected to.

My suggestion would be that the local file system plugin be disabled with 
distributed mode. With multiple drill bits and a centralized plugin for local 
file system, consistent behavior cannot be expected. 

It should be either disabled when distributed mode is detected or we could add 
support for multiple namespaces (using IP of nodes) with local file systems 
(might still not fix all issues). Or may be there could be other ways to 
resolve this, which I might be overlooking or not aware of. 

There have been many issues seen on the user ML, where inconsistent behaviors 
have been observed by users.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34829: DRILL-3190: Invalid FragmentState transition from CANCELLATION_REQUESTED in QueryManager

2015-06-01 Thread Sudheesh Katkam

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34829/
---

(Updated June 1, 2015, 5:54 p.m.)


Review request for drill, abdelhakim deneche, Chris Westin, and Jacques Nadeau.


Changes
---

Addressed review comments
+ move fragment state transition logic to QueryManager


Bugs: DRILL-3190
https://issues.apache.org/jira/browse/DRILL-3190


Repository: drill-git


Description (updated)
---

DRILL-3190: Check for transitions from CANCELLATION_REQUESTED state
+ Moved state transition checks to QueryManager


Diffs (updated)
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/FragmentData.java
 ceb77f0 
  
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryManager.java
 71b77c6 

Diff: https://reviews.apache.org/r/34829/diff/


Testing (updated)
---

Successful unit and regression tests.


Thanks,

Sudheesh Katkam



Re: Review Request 34528: DRILL-2746, DRILL-3130: Add a new DrillRule DrillUnionRule

2015-06-01 Thread Aman Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34528/#review86015
---

Ship it!


Ship It!

- Aman Sinha


On May 29, 2015, 8:05 p.m., Sean Hsuan-Yi Chu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34528/
> ---
> 
> (Updated May 29, 2015, 8:05 p.m.)
> 
> 
> Review request for drill, Aman Sinha and Jinfeng Ni.
> 
> 
> Bugs: DRILL-2746
> https://issues.apache.org/jira/browse/DRILL-2746
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> Add two DrillRules to push Project and Filter below set operators
> 
> 
> Diffs
> -
> 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillProjectSetOpTransposeRule.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
>  f7cfbf4 
>   exec/java-exec/src/test/java/org/apache/drill/TestUnionAll.java 5f98d90 
>   
> exec/java-exec/src/test/resources/testframework/testUnionAllQueries/testProjectDownOverUnionAllImplicitCasting.tsv
>  PRE-CREATION 
>   
> exec/java-exec/src/test/resources/testframework/testUnionAllQueries/testProjectFiltertPushDownOverUnionAll.tsv
>  PRE-CREATION 
>   
> exec/java-exec/src/test/resources/testframework/testUnionAllQueries/testProjectPushDownOverUnionAllWithProject.tsv
>  PRE-CREATION 
>   
> exec/java-exec/src/test/resources/testframework/testUnionAllQueries/testProjectPushDownOverUnionAllWithoutProject.tsv
>  PRE-CREATION 
>   
> exec/java-exec/src/test/resources/testframework/testUnionAllQueries/testProjectPushDownProjectColumnReorderingAndAlias.tsv
>  PRE-CREATION 
>   
> exec/java-exec/src/test/resources/testframework/testUnionAllQueries/testProjectWithExpressionPushDownOverUnionAll.tsv
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/34528/diff/
> 
> 
> Testing
> ---
> 
> Unit test, tpch, fucntional
> 
> 
> Thanks,
> 
> Sean Hsuan-Yi Chu
> 
>



Re: Review Request 34829: DRILL-3190: Invalid FragmentState transition from CANCELLATION_REQUESTED in QueryManager

2015-06-01 Thread Chris Westin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34829/#review86016
---


Ship it (non-binding)

- Chris Westin


On June 1, 2015, 10:54 a.m., Sudheesh Katkam wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34829/
> ---
> 
> (Updated June 1, 2015, 10:54 a.m.)
> 
> 
> Review request for drill, abdelhakim deneche, Chris Westin, and Jacques 
> Nadeau.
> 
> 
> Bugs: DRILL-3190
> https://issues.apache.org/jira/browse/DRILL-3190
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> DRILL-3190: Check for transitions from CANCELLATION_REQUESTED state
> + Moved state transition checks to QueryManager
> 
> 
> Diffs
> -
> 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/FragmentData.java
>  ceb77f0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryManager.java
>  71b77c6 
> 
> Diff: https://reviews.apache.org/r/34829/diff/
> 
> 
> Testing
> ---
> 
> Successful unit and regression tests.
> 
> 
> Thanks,
> 
> Sudheesh Katkam
> 
>



Re: Review Request 34829: DRILL-3190: Invalid FragmentState transition from CANCELLATION_REQUESTED in QueryManager

2015-06-01 Thread abdelhakim deneche

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34829/#review86032
---

Ship it!


Ship It!

- abdelhakim deneche


On June 1, 2015, 5:54 p.m., Sudheesh Katkam wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34829/
> ---
> 
> (Updated June 1, 2015, 5:54 p.m.)
> 
> 
> Review request for drill, abdelhakim deneche, Chris Westin, and Jacques 
> Nadeau.
> 
> 
> Bugs: DRILL-3190
> https://issues.apache.org/jira/browse/DRILL-3190
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> DRILL-3190: Check for transitions from CANCELLATION_REQUESTED state
> + Moved state transition checks to QueryManager
> 
> 
> Diffs
> -
> 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/FragmentData.java
>  ceb77f0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryManager.java
>  71b77c6 
> 
> Diff: https://reviews.apache.org/r/34829/diff/
> 
> 
> Testing
> ---
> 
> Successful unit and regression tests.
> 
> 
> Thanks,
> 
> Sudheesh Katkam
> 
>



[jira] [Resolved] (DRILL-2746) Filter is not pushed into subquery past UNION ALL

2015-06-01 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu resolved DRILL-2746.
--
   Resolution: Fixed
Fix Version/s: (was: 1.2.0)
   1.1.0

> Filter is not pushed into subquery past UNION ALL
> -
>
> Key: DRILL-2746
> URL: https://issues.apache.org/jira/browse/DRILL-2746
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 0.9.0
>Reporter: Victoria Markman
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.1.0
>
>
> I expected to see filter pushed to at least left side of UNION ALL, instead 
> it is applied after UNION ALL
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select * from (select a1, b1, c1 
> from t1 union all select a2, b2, c2 from t2 )  where a1 = 10;
> +++
> |text|json|
> +++
> | 00-00Screen
> 00-01  Project(a1=[$0], b1=[$1], c1=[$2])
> 00-02SelectionVectorRemover
> 00-03  Filter(condition=[=($0, 10)])
> 00-04UnionAll(all=[true])
> 00-06  Project(a1=[$2], b1=[$1], c1=[$0])
> 00-08Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/predicates/t1]], 
> selectionRoot=/drill/testdata/predicates/t1, numFiles=1, columns=[`a1`, `b1`, 
> `c1`]]])
> 00-05  Project(a2=[$1], b2=[$0], c2=[$2])
> 00-07Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/predicates/t2]], 
> selectionRoot=/drill/testdata/predicates/t2, numFiles=1, columns=[`a2`, `b2`, 
> `c2`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3130) Project can be pushed below union all / union to improve performance

2015-06-01 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu resolved DRILL-3130.
--
  Resolution: Fixed
   Fix Version/s: 1.1.0
Target Version/s:   (was: Future)

> Project can be pushed below union all / union to improve performance
> 
>
> Key: DRILL-3130
> URL: https://issues.apache.org/jira/browse/DRILL-3130
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.1.0
>
>
> A query such as 
> {code}
> Select a from 
> (select a, b, c, ..., union all select a, b, c, ...)
> {code}
> will perform Union-All over all the specified columns on the two sides, 
> despite the fact that only one column is asked for at the end. Ideally, we 
> should perform ProjectPushDown rule for Union & Union-All to avoid them to 
> generate results which will be discarded at the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: Update 030-developing-an-aggregate-function.md

2015-06-01 Thread hsuanyi
Github user hsuanyi closed the pull request at:

https://github.com/apache/drill/pull/78


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: Update 020-develop-a-simple-function.md

2015-06-01 Thread hsuanyi
Github user hsuanyi closed the pull request at:

https://github.com/apache/drill/pull/77


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-3231) Throw better error messages for schema changes

2015-06-01 Thread Hanifi Gunes (JIRA)
Hanifi Gunes created DRILL-3231:
---

 Summary: Throw better error messages for schema changes
 Key: DRILL-3231
 URL: https://issues.apache.org/jira/browse/DRILL-3231
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.0.0
Reporter: Hanifi Gunes
Assignee: Hanifi Gunes


This task is concerned about making error messages more intelligible especially 
for the case of schema changes.

{code:title=current error message}
Error: DATA_READ ERROR: Error parsing JSON - You tried to write a BigInt
type when you are using a ValueWriter of type NullableFloat8WriterImpl.
{code}

Proposed message should be non-technical possibly with some more context that 
helps investigate the problem such like line and column number and name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3232) Modify existing vectors to allow type promotion

2015-06-01 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-3232:
-

 Summary: Modify existing vectors to allow type promotion
 Key: DRILL-3232
 URL: https://issues.apache.org/jira/browse/DRILL-3232
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Jacques Nadeau


Support the ability for existing vectors to be promoted similar to supported 
implicit casting rules.

For example:

INT > DOUBLE > STRING > EMBEDDED




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3233) Update code generation & function code to support reading and writing embedded type

2015-06-01 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-3233:
-

 Summary: Update code generation & function code to support reading 
and writing embedded type
 Key: DRILL-3233
 URL: https://issues.apache.org/jira/browse/DRILL-3233
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Jacques Nadeau






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3234) Drill fails to implicit cast hive tinyint and smallint data as int

2015-06-01 Thread Krystal (JIRA)
Krystal created DRILL-3234:
--

 Summary: Drill fails to implicit cast hive tinyint and smallint 
data as int
 Key: DRILL-3234
 URL: https://issues.apache.org/jira/browse/DRILL-3234
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Affects Versions: 1.0.0
Reporter: Krystal
Assignee: Venki Korukanti


I have the following hive table:
 describe `hive.default`.voter_hive;
+++--+
|  COLUMN_NAME   | DATA_TYPE  | IS_NULLABLE  |
+++--+
| voter_id   | SMALLINT   | YES  |
| name   | VARCHAR| YES  |
| age| TINYINT| YES  |
| registration   | VARCHAR| YES  |
| contributions  | DECIMAL| YES  |
| voterzone  | INTEGER| YES  |
| create_time| TIMESTAMP  | YES  |
+++--+

If just include the voter_id and age fields in the select, then the query works 
fine.  However if I include them in the where clause, the query would fail. For 
example:

select voter_id, name, age from voter_hive where age < 30;
Error: SYSTEM ERROR: org.apache.drill.exec.exception.SchemaChangeException: 
Failure while trying to materialize incoming schema.  Errors:
 
Error in expression at index -1.  Error: Missing function implementation: 
[castINT(TINYINT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3235) Enhance JSON reader to leverage EmbeddedType

2015-06-01 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-3235:
-

 Summary: Enhance JSON reader to leverage EmbeddedType
 Key: DRILL-3235
 URL: https://issues.apache.org/jira/browse/DRILL-3235
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Jacques Nadeau






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3236) Enhance JSON writer to write EmbeddedType

2015-06-01 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-3236:
-

 Summary: Enhance JSON writer to write EmbeddedType
 Key: DRILL-3236
 URL: https://issues.apache.org/jira/browse/DRILL-3236
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Jacques Nadeau






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3237) Come up with enhanced AbstractRecordBatch and AbstractSingleRecordBatch to better handle type promotion and schema change

2015-06-01 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-3237:
-

 Summary: Come up with enhanced AbstractRecordBatch and 
AbstractSingleRecordBatch to better handle type promotion and schema change
 Key: DRILL-3237
 URL: https://issues.apache.org/jira/browse/DRILL-3237
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Jacques Nadeau






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: Update 020-core-modules.md

2015-06-01 Thread jinfengni
Github user jinfengni closed the pull request at:

https://github.com/apache/drill/pull/76


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: known issue? Problem reading JSON

2015-06-01 Thread Hanifi Gunes
The fact that count does not fail but select fails is known and will be
there at least until we support heterogenous types. Also we handle these
queries differently at JSON processor. The former query does read and
vectorize every single field/column, thus field type matters whereas the
latter does not really read at field level but simply counts individual
JSON records thereby very efficient in time (~90x in a single very wide
record) and memory. That's the reason why your count(*) query succeeds
while select(*) fails.

I agree that error messages need a touch. Filed DRILL-3231 to track this.


On Sat, May 30, 2015 at 10:51 PM, Ted Dunning  wrote:

> OK.
>
> But this *is* in a data file that we distribute as part of Drill.
>
> Perhaps a better error message is warranted?
>
> Also, this seems to be a serious limitation that appears only to be fixable
> using a sledge-hammer.
>
>
>
> On Sun, May 31, 2015 at 3:31 AM, Jacques Nadeau 
> wrote:
>
> > The second error is stating that you have a column that is a string in
> one
> > row and a double in another.
> >
> > On Sat, May 30, 2015 at 3:16 PM, Ted Dunning 
> > wrote:
> >
> > > This seems wrong.  I can count the records in a JSON table, but select
> *
> > > doesn't work.
> > >
> > > Is this a known issue?
> > >
> > >
> > >
> > > ted:apache-drill-1.0.0$ bin/drill-embedded
> > > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
> > > MaxPermSize=512M; support was removed in 8.0
> > > May 31, 2015 12:14:52 AM org.glassfish.jersey.server.ApplicationHandler
> > > initialize
> > > INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29
> > > 01:25:26...
> > > apache drill 1.0.0
> > > "got drill?"
> > > 0: jdbc:drill:zk=local> *select count(*) from
> > > cp.`sales_fact_1997_collapsed.json` ;*
> > > +-+
> > > | EXPR$0  |
> > > +-+
> > > | 86837   |
> > > +-+
> > > 1 row selected (1.316 seconds)
> > > 0: jdbc:drill:zk=local> *select * from
> > cp.`sales_fact_1997_collapsed.json`
> > > limit 3;*
> > > Error: DATA_READ ERROR: Error parsing JSON - You tried to write a
> BigInt
> > > type when you are using a ValueWriter of type NullableFloat8WriterImpl.
> > >
> > > File  /sales_fact_1997_collapsed.json
> > > Record  3
> > > Fragment 0:0
> > >
> > > [Error Id: 8a9ac2c1-9764-42fd-bdeb-ec0b5e408438 on 192.168.1.38:31010]
> > > (state=,code=0)
> > > 0: jdbc:drill:zk=local> *ALTER SYSTEM SET
> > > `store.json.read_numbers_as_double` = true;*
> > > +---+-+
> > > |  ok   |   summary   |
> > > +---+-+
> > > | true  | store.json.read_numbers_as_double updated.  |
> > > +---+-+
> > > 1 row selected (0.086 seconds)
> > > 0: jdbc:drill:zk=local> *select * from
> > cp.`sales_fact_1997_collapsed.json`
> > > limit 3;*
> > > Error: DATA_READ ERROR: Error parsing JSON - You tried to write a
> VarChar
> > > type when you are using a ValueWriter of type NullableFloat8WriterImpl.
> > >
> > > File  /sales_fact_1997_collapsed.json
> > > Record  47
> > > Fragment 0:0
> > >
> >
>


Re: known issue? Problem reading JSON

2015-06-01 Thread Hanifi Gunes
* The former query(select) does read and vectorize every single
field/column, thus field type matters whereas the latter(count) does not
really read at field level but simply counts individual JSON records
thereby very efficient in time (~90x in a single very wide record) and
memory.

On Mon, Jun 1, 2015 at 12:38 PM, Hanifi Gunes  wrote:

> The fact that count does not fail but select fails is known and will be
> there at least until we support heterogenous types. Also we handle these
> queries differently at JSON processor. The former query does read and
> vectorize every single field/column, thus field type matters whereas the
> latter does not really read at field level but simply counts individual
> JSON records thereby very efficient in time (~90x in a single very wide
> record) and memory. That's the reason why your count(*) query succeeds
> while select(*) fails.
>
> I agree that error messages need a touch. Filed DRILL-3231 to track this.
>
>
> On Sat, May 30, 2015 at 10:51 PM, Ted Dunning 
> wrote:
>
>> OK.
>>
>> But this *is* in a data file that we distribute as part of Drill.
>>
>> Perhaps a better error message is warranted?
>>
>> Also, this seems to be a serious limitation that appears only to be
>> fixable
>> using a sledge-hammer.
>>
>>
>>
>> On Sun, May 31, 2015 at 3:31 AM, Jacques Nadeau 
>> wrote:
>>
>> > The second error is stating that you have a column that is a string in
>> one
>> > row and a double in another.
>> >
>> > On Sat, May 30, 2015 at 3:16 PM, Ted Dunning 
>> > wrote:
>> >
>> > > This seems wrong.  I can count the records in a JSON table, but
>> select *
>> > > doesn't work.
>> > >
>> > > Is this a known issue?
>> > >
>> > >
>> > >
>> > > ted:apache-drill-1.0.0$ bin/drill-embedded
>> > > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
>> > > MaxPermSize=512M; support was removed in 8.0
>> > > May 31, 2015 12:14:52 AM
>> org.glassfish.jersey.server.ApplicationHandler
>> > > initialize
>> > > INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29
>> > > 01:25:26...
>> > > apache drill 1.0.0
>> > > "got drill?"
>> > > 0: jdbc:drill:zk=local> *select count(*) from
>> > > cp.`sales_fact_1997_collapsed.json` ;*
>> > > +-+
>> > > | EXPR$0  |
>> > > +-+
>> > > | 86837   |
>> > > +-+
>> > > 1 row selected (1.316 seconds)
>> > > 0: jdbc:drill:zk=local> *select * from
>> > cp.`sales_fact_1997_collapsed.json`
>> > > limit 3;*
>> > > Error: DATA_READ ERROR: Error parsing JSON - You tried to write a
>> BigInt
>> > > type when you are using a ValueWriter of type
>> NullableFloat8WriterImpl.
>> > >
>> > > File  /sales_fact_1997_collapsed.json
>> > > Record  3
>> > > Fragment 0:0
>> > >
>> > > [Error Id: 8a9ac2c1-9764-42fd-bdeb-ec0b5e408438 on 192.168.1.38:31010
>> ]
>> > > (state=,code=0)
>> > > 0: jdbc:drill:zk=local> *ALTER SYSTEM SET
>> > > `store.json.read_numbers_as_double` = true;*
>> > > +---+-+
>> > > |  ok   |   summary   |
>> > > +---+-+
>> > > | true  | store.json.read_numbers_as_double updated.  |
>> > > +---+-+
>> > > 1 row selected (0.086 seconds)
>> > > 0: jdbc:drill:zk=local> *select * from
>> > cp.`sales_fact_1997_collapsed.json`
>> > > limit 3;*
>> > > Error: DATA_READ ERROR: Error parsing JSON - You tried to write a
>> VarChar
>> > > type when you are using a ValueWriter of type
>> NullableFloat8WriterImpl.
>> > >
>> > > File  /sales_fact_1997_collapsed.json
>> > > Record  47
>> > > Fragment 0:0
>> > >
>> >
>>
>
>


Re: question about correlated arrays and flatten

2015-06-01 Thread Hanifi Gunes
Idea of having functional primitives with Drill sounds really handy. It
would be great if we could support left-right folding as well. I can see
many great use cases of project/map, fold/reduce, zip, flatten when
combined.

On Sat, May 30, 2015 at 12:57 AM, Ted Dunning  wrote:

> OK.  I will file a JIRA for a zip function.  No idea if I will be able to
> get one written in the available cracks of time.
>
>
>
> On Fri, May 29, 2015 at 7:17 PM, Steven Phillips 
> wrote:
>
> > I think your use case could be solved by adding a UDF that can combine
> > multiple arrays into a single array. The result of this function could
> then
> > be handled by our current implementation of flatten.
> >
> > I think this is preferable to enhancing flatten itself to handle it,
> since
> > flatten is not an ordinary UDF, and thus more difficult to modify and
> > maintain.
> >
> > On Fri, May 29, 2015 at 3:20 PM, Ted Dunning 
> > wrote:
> >
> > > My particular use case can throw an error if the lists are different
> > > length.
> > >
> > > I think our real goal should be to have a logically complete set of
> > simple
> > > primitives that lets any sort of back and forward conversions of this
> > kind.
> > >
> > >
> > >
> > >
> > > On Fri, May 29, 2015 at 9:58 AM, Jason Altekruse <
> > altekruseja...@gmail.com
> > > >
> > > wrote:
> > >
> > > > I understand what you want to do, unfortunately we don't have support
> > for
> > > > this right now. A UDF is the best I can suggest at this point.
> > > >
> > > > Just to explore the idea a little further for the sake of creating a
> > > > complete feature request, I assume you would just want nulls filled
> in
> > > for
> > > > the cases where the lists were different lengths?
> > > >
> > > > On Fri, May 29, 2015 at 8:58 AM, Ted Dunning 
> > > > wrote:
> > > >
> > > > > Input is here:
> https://gist.github.com/tdunning/07ce66e7e4d4af41afd7
> > > > >
> > > > > Output is here:
> > https://gist.github.com/tdunning/3aa841c56bfcdc0ab90e
> > > > >
> > > > > log-synth schema for generating input data is here:
> > > > > https://gist.github.com/tdunning/638dd52c00569ffa9582
> > > > >
> > > > >
> > > > > Preferred syntax would be like
> > > > >
> > > > > select flatten(t, v1, v2) from ...
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, May 29, 2015 at 7:04 AM, Neeraja Rentachintala <
> > > > > nrentachint...@maprtech.com> wrote:
> > > > >
> > > > > > Ted
> > > > > > can you pls give an example with few data elements in a, b and
> the
> > > > > expected
> > > > > > output you are looking from the query.
> > > > > >
> > > > > > -Neeraja
> > > > > >
> > > > > > On Fri, May 29, 2015 at 6:43 AM, Ted Dunning <
> > ted.dunn...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I have two arrays.  Their elements are correlated times and
> > values.
> > > > I
> > > > > > > would like to flatten them into rows, each with two elements.
> > > > > > >
> > > > > > > The query
> > > > > > >
> > > > > > >select flatten(a), flatten(b) from ...
> > > > > > >
> > > > > > > doesn't work because I get the cartesian product (of course).
> > The
> > > > > query
> > > > > > >
> > > > > > >select flatten(a, b) from ...
> > > > > > >
> > > > > > > also doesn't work because flatten doesn't have a multi-argument
> > > form.
> > > > > > >
> > > > > > > Going crazy, this query kind of sort of almost works, but not
> > > really:
> > > > > > >
> > > > > > >  select r.x.`key`, flatten(r.x.`value`)  from (
> > > > > > >
> > > > > > >  select flatten(kvgen(x)) as x from ...) r;
> > > > > > >
> > > > > > > What I really want to see is something like this:
> > > > > > >select zip(flatten(a), flatten(b)) from ...
> > > > > > >
> > > > > > > Any pointers?  Is my next step to write a UDF?
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >  Steven Phillips
> >  Software Engineer
> >
> >  mapr.com
> >
>


Re: Review Request 34839: DRILL-3155: Part 2

2015-06-01 Thread Mehant Baid

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34839/
---

(Updated June 1, 2015, 8:54 p.m.)


Review request for drill and Hanifi Gunes.


Repository: drill-git


Description
---

While allocating memory for composite vectors if one of the allocation fails we 
need to release all the allocated memory upto that point.


Diffs (updated)
-

  exec/java-exec/src/main/codegen/templates/NullableValueVectors.java 90ec6be 
  exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 7b2b78d 
  exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java b3389e2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java
 3c01939 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java
 a97847b 

Diff: https://reviews.apache.org/r/34839/diff/


Testing
---


Thanks,

Mehant Baid



Re: Review Request 34838: DRILL-3155: Part 1

2015-06-01 Thread Hanifi Gunes


> On June 1, 2015, 9:01 p.m., Hanifi Gunes wrote:
> >

Few lil stuff.


- Hanifi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34838/#review86070
---


On May 30, 2015, 7:57 a.m., Mehant Baid wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34838/
> ---
> 
> (Updated May 30, 2015, 7:57 a.m.)
> 
> 
> Review request for drill and Hanifi Gunes.
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> This patch is a simple refactoring. Moved the classes related to complex 
> vectors in the appropriate package.
> 
> 
> Diffs
> -
> 
>   exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 7b2b78d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java
>  00a78fd 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenTemplate.java
>  b8d040c 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/Flattener.java
>  323bf43 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/VectorHolder.java 
> e602fd7 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/FixedWidthRepeatedReader.java
>  2b929a4 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java
>  0cbd480 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/AllocationHelper.java
>  eddefd0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseRepeatedValueVector.java
>  d5a0d62 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/ContainerVectorLike.java
>  95e3365 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedFixedWidthVectorLike.java
>  450c673 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedMutator.java
>  8e097e4 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedValueVector.java
>  95a7252 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedVariableWidthVectorLike.java
>  ac8589e 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/ContainerVectorLike.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedFixedWidthVectorLike.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedListVector.java
>  a5553b2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java
>  a97847b 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMutator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedValueVector.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedVariableWidthVectorLike.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/34838/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Mehant Baid
> 
>



Re: Review Request 34838: DRILL-3155: Part 1

2015-06-01 Thread Hanifi Gunes

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34838/#review86070
---



exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedFixedWidthVectorLike.java


Should drop *public* modifier from interface.



exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMutator.java


This class is dead and should be removed along with its uses. We rely on 
RVV#RepeatedMutator consistently.



exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedVariableWidthVectorLike.java


Should drop *public* as well.


- Hanifi Gunes


On May 30, 2015, 7:57 a.m., Mehant Baid wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34838/
> ---
> 
> (Updated May 30, 2015, 7:57 a.m.)
> 
> 
> Review request for drill and Hanifi Gunes.
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> This patch is a simple refactoring. Moved the classes related to complex 
> vectors in the appropriate package.
> 
> 
> Diffs
> -
> 
>   exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 7b2b78d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java
>  00a78fd 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenTemplate.java
>  b8d040c 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/Flattener.java
>  323bf43 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/VectorHolder.java 
> e602fd7 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/FixedWidthRepeatedReader.java
>  2b929a4 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java
>  0cbd480 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/AllocationHelper.java
>  eddefd0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/BaseRepeatedValueVector.java
>  d5a0d62 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/ContainerVectorLike.java
>  95e3365 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedFixedWidthVectorLike.java
>  450c673 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedMutator.java
>  8e097e4 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedValueVector.java
>  95a7252 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/RepeatedVariableWidthVectorLike.java
>  ac8589e 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/ContainerVectorLike.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedFixedWidthVectorLike.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedListVector.java
>  a5553b2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java
>  a97847b 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMutator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedValueVector.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedVariableWidthVectorLike.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/34838/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Mehant Baid
> 
>



[jira] [Created] (DRILL-3238) Cannot Plan Exception is raised when the same window partition is defined in select & window clauses

2015-06-01 Thread Sean Hsuan-Yi Chu (JIRA)
Sean Hsuan-Yi Chu created DRILL-3238:


 Summary: Cannot Plan Exception is raised when the same window 
partition is defined in select & window clauses
 Key: DRILL-3238
 URL: https://issues.apache.org/jira/browse/DRILL-3238
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: Sean Hsuan-Yi Chu
Assignee: Sean Hsuan-Yi Chu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3239) Join between empty hive tables throws an IllegalStateException

2015-06-01 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-3239:


 Summary: Join between empty hive tables throws an 
IllegalStateException
 Key: DRILL-3239
 URL: https://issues.apache.org/jira/browse/DRILL-3239
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Reporter: Rahul Challapalli
Assignee: Venki Korukanti
 Attachments: error.log

git.commit.id.abbrev=6f54223

Created 2 hive tables on top of tpch data in orc format. The tables are empty. 
Below query returns 0 rows from hive. However it fails with an 
IllegalStateException from drill

{code}
select * from customer c, orders o where c.c_custkey = o.o_custkey;
Error: SYSTEM ERROR: java.lang.IllegalStateException: You tried to do a batch 
data read operation when you were in a state of NONE.  You can only do this 
type of operation when you are in a state of OK or OK_NEW_SCHEMA.

Fragment 0:0

[Error Id: 8483cab2-d771-4337-ae65-1db41eb5720d on qa-node191.qa.lab:31010] 
(state=,code=0)
{code}

Below is the hive ddl I used
{code}
create table if not exists tpch01_orc.customer (
c_custkey int,
c_name string,
c_address string,
c_nationkey int,
c_phone string,
c_acctbal double,
c_mktsegment string,
c_comment string
)
STORED AS orc
LOCATION '/drill/testdata/Tpch0.01/orc/customer';

create table if not exists tpch01_orc.orders (
o_orderkey int,
o_custkey int,
o_orderstatus string,
o_totalprice double,
o_orderdate date,
o_orderpriority string,
o_clerk string,
o_shippriority int,
o_comment string
)
STORED AS orc
LOCATION '/drill/testdata/Tpch0.01/orc/orders';
{code}

I attached the log files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3240) Fetch hadoop maven profile specific Hive version in Hive storage plugin

2015-06-01 Thread Venki Korukanti (JIRA)
Venki Korukanti created DRILL-3240:
--

 Summary: Fetch hadoop maven profile specific Hive version in Hive 
storage plugin
 Key: DRILL-3240
 URL: https://issues.apache.org/jira/browse/DRILL-3240
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Hive, Tools, Build & Test
Affects Versions: 0.4.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
Priority: Minor
 Fix For: 1.1.0


Currently we always fetch the Apache Hive libs irrespective of the Hadoop 
vendor profile used in {{mvn clean install}}. This jira is to allow specifying 
the custom version of Hive in hadoop vendor profile.

Note: Hive storage plugin assumes there are no major differences in Hive APIs 
between different vendor specific custom Hive builds. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3241) Query with window function runs out of direct memory and does not report back to client that it did

2015-06-01 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-3241:
---

 Summary: Query with window function runs out of direct memory and 
does not report back to client that it did
 Key: DRILL-3241
 URL: https://issues.apache.org/jira/browse/DRILL-3241
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.0.0
Reporter: Victoria Markman
Assignee: Chris Westin


Even though query run out of memory and was cancelled on the server, client 
(sqlline) was never notified of the event and it appears to the user that query 
is hung. 

Configuration:
Single drillbit configured with:
DRILL_MAX_DIRECT_MEMORY="2G"
DRILL_HEAP="1G"
TPCDS100 parquet files

Query:
{code}
select 
  sum(ss_quantity) over(partition by ss_store_sk order by ss_sold_date_sk) 
from store_sales;
{code}

drillbit.log
{code}
2015-06-01 21:42:29,514 [BitServer-5] ERROR o.a.d.exec.rpc.RpcExceptionHandler 
- Exception in RPC communication.  Connection: /10.10.88.133:31012 <--> 
/10.10.88.133:38887 (data server).  Closing connection.
io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
buffer memory
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
 ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
 [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at 
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) 
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 [netty-common-4.0.27.Final.jar:4.0.27.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_71]
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
~[na:1.7.0_71]
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
~[na:1.7.0_71]
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:437) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at io.netty.buffer.PoolArena.reallocate(PoolArena.java:280) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:110) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at io.netty.buffer.WrappedByteBuf.writeBytes(WrappedByteBuf.java:600) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.buffer.UnsafeDirectLittleEndian.writeBytes(UnsafeDirectLittleEndian.java:28)
 ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:4.0.27.Final]
at 
io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:92)
 ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:227)
 ~[

[jira] [Created] (DRILL-3242) Enhance RPC layer to offload all request work onto a separate thread.

2015-06-01 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-3242:
-

 Summary: Enhance RPC layer to offload all request work onto a 
separate thread.
 Key: DRILL-3242
 URL: https://issues.apache.org/jira/browse/DRILL-3242
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - RPC
Reporter: Jacques Nadeau
Assignee: Jacques Nadeau
 Fix For: 1.1.0


Right now, the app is responsible for ensuring that very small amounts of work 
are done on the RPC thread.  In some cases, the app doesn't do this correctly.  
Additionally, in high load situations these small amounts of work become no 
trivial.  As such, we need to make RPC layer protect itself from slow 
requests/responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3243) Need a better error message - Use of alias in window function definition

2015-06-01 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-3243:
-

 Summary: Need a better error message - Use of alias in window 
function definition
 Key: DRILL-3243
 URL: https://issues.apache.org/jira/browse/DRILL-3243
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.0.0
Reporter: Khurram Faraaz
Assignee: Chris Westin


Need a better error message when we use alias for window definition in query 
that uses window functions. for example, OVER(PARTITION BY columns[0] ORDER BY 
columns[1]) tmp, and if alias "tmp" is used in the predicate we need a message 
that says, column "tmp" does not exist, that is how it is in Postgres 9.3

Postgres 9.3

{code}
postgres=# select count(*) OVER(partition by type order by id) `tmp` from 
airports where tmp is not null;
ERROR:  column "tmp" does not exist
LINE 1: ...ect count(*) OVER(partition by type order by id) `tmp` from ...
 ^
{code}

Drill 1.0
{code}
0: jdbc:drill:schema=dfs.tmp> select count(*) OVER(partition by columns[2] 
order by columns[0]) tmp from `airports.csv` where tmp is not null;
Error: SYSTEM ERROR: java.lang.IllegalArgumentException: Selected column(s) 
must have name 'columns' or must be plain '*'

Fragment 0:0

[Error Id: 66987b81-fe50-422d-95e4-9ce61c873584 on centos-02.qa.lab:31010] 
(state=,code=0)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: question about correlated arrays and flatten

2015-06-01 Thread Ted Dunning
How could we make functional primitives work without lambda?



On Mon, Jun 1, 2015 at 9:55 PM, Hanifi Gunes  wrote:

> Idea of having functional primitives with Drill sounds really handy. It
> would be great if we could support left-right folding as well. I can see
> many great use cases of project/map, fold/reduce, zip, flatten when
> combined.
>
> On Sat, May 30, 2015 at 12:57 AM, Ted Dunning 
> wrote:
>
> > OK.  I will file a JIRA for a zip function.  No idea if I will be able to
> > get one written in the available cracks of time.
> >
> >
> >
> > On Fri, May 29, 2015 at 7:17 PM, Steven Phillips  >
> > wrote:
> >
> > > I think your use case could be solved by adding a UDF that can combine
> > > multiple arrays into a single array. The result of this function could
> > then
> > > be handled by our current implementation of flatten.
> > >
> > > I think this is preferable to enhancing flatten itself to handle it,
> > since
> > > flatten is not an ordinary UDF, and thus more difficult to modify and
> > > maintain.
> > >
> > > On Fri, May 29, 2015 at 3:20 PM, Ted Dunning 
> > > wrote:
> > >
> > > > My particular use case can throw an error if the lists are different
> > > > length.
> > > >
> > > > I think our real goal should be to have a logically complete set of
> > > simple
> > > > primitives that lets any sort of back and forward conversions of this
> > > kind.
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, May 29, 2015 at 9:58 AM, Jason Altekruse <
> > > altekruseja...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > I understand what you want to do, unfortunately we don't have
> support
> > > for
> > > > > this right now. A UDF is the best I can suggest at this point.
> > > > >
> > > > > Just to explore the idea a little further for the sake of creating
> a
> > > > > complete feature request, I assume you would just want nulls filled
> > in
> > > > for
> > > > > the cases where the lists were different lengths?
> > > > >
> > > > > On Fri, May 29, 2015 at 8:58 AM, Ted Dunning <
> ted.dunn...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Input is here:
> > https://gist.github.com/tdunning/07ce66e7e4d4af41afd7
> > > > > >
> > > > > > Output is here:
> > > https://gist.github.com/tdunning/3aa841c56bfcdc0ab90e
> > > > > >
> > > > > > log-synth schema for generating input data is here:
> > > > > > https://gist.github.com/tdunning/638dd52c00569ffa9582
> > > > > >
> > > > > >
> > > > > > Preferred syntax would be like
> > > > > >
> > > > > > select flatten(t, v1, v2) from ...
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, May 29, 2015 at 7:04 AM, Neeraja Rentachintala <
> > > > > > nrentachint...@maprtech.com> wrote:
> > > > > >
> > > > > > > Ted
> > > > > > > can you pls give an example with few data elements in a, b and
> > the
> > > > > > expected
> > > > > > > output you are looking from the query.
> > > > > > >
> > > > > > > -Neeraja
> > > > > > >
> > > > > > > On Fri, May 29, 2015 at 6:43 AM, Ted Dunning <
> > > ted.dunn...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I have two arrays.  Their elements are correlated times and
> > > values.
> > > > > I
> > > > > > > > would like to flatten them into rows, each with two elements.
> > > > > > > >
> > > > > > > > The query
> > > > > > > >
> > > > > > > >select flatten(a), flatten(b) from ...
> > > > > > > >
> > > > > > > > doesn't work because I get the cartesian product (of course).
> > > The
> > > > > > query
> > > > > > > >
> > > > > > > >select flatten(a, b) from ...
> > > > > > > >
> > > > > > > > also doesn't work because flatten doesn't have a
> multi-argument
> > > > form.
> > > > > > > >
> > > > > > > > Going crazy, this query kind of sort of almost works, but not
> > > > really:
> > > > > > > >
> > > > > > > >  select r.x.`key`, flatten(r.x.`value`)  from (
> > > > > > > >
> > > > > > > >  select flatten(kvgen(x)) as x from ...) r;
> > > > > > > >
> > > > > > > > What I really want to see is something like this:
> > > > > > > >select zip(flatten(a), flatten(b)) from ...
> > > > > > > >
> > > > > > > > Any pointers?  Is my next step to write a UDF?
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >  Steven Phillips
> > >  Software Engineer
> > >
> > >  mapr.com
> > >
> >
>


Re: Review Request 34839: DRILL-3155: Part 2

2015-06-01 Thread Hanifi Gunes

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34839/#review86083
---


Almost there.


exec/java-exec/src/main/codegen/templates/NullableValueVectors.java


This applies to all vectors: we should be consistent across 
allocateNew/Safe in when to trigger clear. I would recommend to clear iff 
allocation fails. All other cases should be handled by consumers.



exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java


We should execute this after both allocations are completed.



exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java


We should move this allocation out of try-finally.



exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java


Minor cosmetic issue: perhaps we should consider returning after finally 
since we declare the variable before try.


- Hanifi Gunes


On June 1, 2015, 8:54 p.m., Mehant Baid wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34839/
> ---
> 
> (Updated June 1, 2015, 8:54 p.m.)
> 
> 
> Review request for drill and Hanifi Gunes.
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> While allocating memory for composite vectors if one of the allocation fails 
> we need to release all the allocated memory upto that point.
> 
> 
> Diffs
> -
> 
>   exec/java-exec/src/main/codegen/templates/NullableValueVectors.java 90ec6be 
>   exec/java-exec/src/main/codegen/templates/RepeatedValueVectors.java 7b2b78d 
>   exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java 
> b3389e2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java
>  3c01939 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java
>  a97847b 
> 
> Diff: https://reviews.apache.org/r/34839/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Mehant Baid
> 
>



Reminder: Weekly hangout tomorrow(Tuesday) 10am Pacific

2015-06-01 Thread Jason Altekruse
Hello Drillers,

Please join us tomorrow at 10am Pacific for our weekly hangout. As always
the meeting is open to anyone interested in Drill, come to discuss current
issues, upcoming development or just to learn more about what is happening
in the community.

https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

- Jason Altekruse


Re: Review Request 34567: DRILL-3166: Cost model of functions should account for field reader arguments

2015-06-01 Thread Mehant Baid

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34567/#review86153
---

Ship it!


Ship It!

- Mehant Baid


On May 21, 2015, 10:35 p.m., Hanifi Gunes wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34567/
> ---
> 
> (Updated May 21, 2015, 10:35 p.m.)
> 
> 
> Review request for drill and Mehant Baid.
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> DRILL-3166:  Cost model of functions should account for field reader arguments
> 
> - Updated cost model to add the max cost in case a field reader is encountered
> 
> 
> Diffs
> -
> 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/resolver/TypeCastRules.java
>  ef4bff3eb728ce5fac75ea98c5b3c257d4660307 
> 
> Diff: https://reviews.apache.org/r/34567/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Hanifi Gunes
> 
>



Build Error

2015-06-01 Thread bo yang
Hi Guys,

I downloaded Drill 1.0.0 source code from github, and tried to build. But I
got following error when running "mvm compile". Any suggestion to fix this?

[ERROR] Failed to execute goal on project drill-java-exec: Could not
resolve dependencies for project
org.apache.drill.exec:drill-java-exec:jar:1.0.0: Could not find artifact
org.apache.drill:drill-common:jar:tests:1.0.0 in conjars (
http://conjars.org/repo)

Thanks,
Bo


Re: Build Error

2015-06-01 Thread Jacques Nadeau
You need to run mvn install instead of mvn compile.  This is because there
are some special interdependencies between the modules.

On Mon, Jun 1, 2015 at 8:05 PM, bo yang  wrote:

> Hi Guys,
>
> I downloaded Drill 1.0.0 source code from github, and tried to build. But I
> got following error when running "mvm compile". Any suggestion to fix this?
>
> [ERROR] Failed to execute goal on project drill-java-exec: Could not
> resolve dependencies for project
> org.apache.drill.exec:drill-java-exec:jar:1.0.0: Could not find artifact
> org.apache.drill:drill-common:jar:tests:1.0.0 in conjars (
> http://conjars.org/repo)
>
> Thanks,
> Bo
>