Re: Review Request 37027: DRILL-3557: Reading empty CSV file fails with SYSTEM ERROR

2015-08-05 Thread Sean Hsuan-Yi Chu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37027/
---

(Updated Aug. 5, 2015, 3:15 p.m.)


Review request for drill and Parth Chandra.


Changes
---

Addressed comments


Bugs: DRILL-3557
https://issues.apache.org/jira/browse/DRILL-3557


Repository: drill-git


Description
---

Ensure empty CSV's path can be added


Diffs (updated)
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/store/schedule/BlockMapBuilder.java
 e233dda 
  exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java e8af325 
  exec/java-exec/src/test/resources/store/text/directoryWithEmpyCSV/empty.csv 
PRE-CREATION 

Diff: https://reviews.apache.org/r/37027/diff/


Testing
---

Unit tests, functional, tpch


Thanks,

Sean Hsuan-Yi Chu



[jira] [Created] (DRILL-3607) small typo in configuring-resources-for-a-shared-drillbit page

2015-08-05 Thread Deneche A. Hakim (JIRA)
Deneche A. Hakim created DRILL-3607:
---

 Summary: small typo in configuring-resources-for-a-shared-drillbit 
page
 Key: DRILL-3607
 URL: https://issues.apache.org/jira/browse/DRILL-3607
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Reporter: Deneche A. Hakim
Assignee: Bridget Bevens
Priority: Minor


In the documentation for [Configuring resources for a shared 
drillbit|https://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/]
there is a small typo {{planner.width.max_per_node}} section. In the first line 
of this section we can read:
{quote}
Configure the *planner.width.max.per.node* to achieve
{quote}

but it actually should be
{quote}
Configure the *planner.width.max_per_node* to achieve
{quote}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3606) Wrong results - Lead(char-column) without PARTITION BY clause

2015-08-05 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim resolved DRILL-3606.
-
Resolution: Fixed

Fixed in private branch

 Wrong results - Lead(char-column) without PARTITION BY clause
 -

 Key: DRILL-3606
 URL: https://issues.apache.org/jira/browse/DRILL-3606
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.2.0
 Environment: private-branch-with-new-window-functions
Reporter: Khurram Faraaz
Assignee: Deneche A. Hakim
  Labels: window_function
 Fix For: 1.2.0


 Window function query that does not use partition by clause in window 
 definition and uses LEAD function returns wrong results, on developer's 
 private branch. This issue may be related to DRILL-3605
 Results returned by Drill
 {code}
 0: jdbc:drill:schema=dfs.tmp select lead(col2) over (order by col0) 
 lead_col0 from `fewRowsAllData.parquet`;
 +---+
 | lead_col0 |
 +---+
 | NHIN |
 | INCACO |
 | CACOSCSD |
 | COSCSDWYLA |
 | SCSDWYLAKSCO |
 | SDWYLAKSCONYNY |
 | WYLAKSCONYNYSDGA |
 | LAKSCONYNYSDGAMOIN |
 | KSCONYNYSDGAMOINMNIA |
 | CONYNYSDGAMOINMNIAGAMN |
 | NYNYSDGAMOINMNIAGAMNMNMI |
 | NYSDGAMOINMNIAGAMNMNMIRISD |
 | SDGAMOINMNIAGAMNMNMIRISDINWI |
 | GAMOINMNIAGAMNMNMIRISDINWIMAIA |
 | MOINMNIAGAMNMNMIRISDINWIMAIANDMA |
 | INMNIAGAMNMNMIRISDINWIMAIANDMARIME |
 | MNIAGAMNMNMIRISDINWIMAIANDMARIMEMNCO |
 | IAGAMNMNMIRISDINWIMAIANDMARIMEMNCOOHMO |
 | GAMNMNMIRISDINWIMAIANDMARIMEMNCOOHMOGAVT |
 | MNMNMIRISDINWIMAIANDMARIMEMNCOOHMOGAVTNDNH |
 | MNMIRISDINWIMAIANDMARIMEMNCOOHMOGAVTNDNHRIOR |
 | MIRISDINWIMAIANDMARIMEMNCOOHMOGAVTNDNHRIORNCAZ |
 | RISDINWIMAIANDMARIMEMNCOOHMOGAVTNDNHRIORNCAZORMD |
 | SDINWIMAIANDMARIMEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMA |
 | INWIMAIANDMARIMEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUT |
 | WIMAIANDMARIMEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWY |
 | MAIANDMARIMEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWY |
 | IANDMARIMEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAK |
 | NDMARIMEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPA |
 | MARIMEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGA |
 | RIMEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVT |
 | MEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTIN |
 | MNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWV |
 | COOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMN |
 | OHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVT |
 | MOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUT |
 | 
 GAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVT |
 | 
 VTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISC
  |
 | 
 NDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME
  |
 | 
 NHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | RIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | ORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | NCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | AZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | ORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | MDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | HIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | MANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | NYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | UTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | DEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | WYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | OHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | WYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | NHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | AKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | MDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | PAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | MNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | GAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | MOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | VTUTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | UTINWYWVIAMNAZVTIAUTWIVTRISCME |
 | INWYWVIAMNAZVTIAUTWIVTRISCME |
 | WYWVIAMNAZVTIAUTWIVTRISCME |
 | WVIAMNAZVTIAUTWIVTRISCME |
 | IAMNAZVTIAUTWIVTRISCME |
 | MNAZVTIAUTWIVTRISCME |
 | AZVTIAUTWIVTRISCME |
 | VTIAUTWIVTRISCME |
 | IAUTWIVTRISCME |
 | UTWIVTRISCME |
 | WIVTRISCME |
 | VTRISCME |
 | RISCME |
 | SCME |
 | ME |
 | null |
 +---+
 78 rows selected (0.301 seconds)
 {code}
 Results returned by Postgres
 {code}
 postgres=# select lead(col2) over (order by col0) lead_col0 from tbl_alldata;
  

[jira] [Resolved] (DRILL-3605) Wrong results - Lead(char-column)

2015-08-05 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim resolved DRILL-3605.
-
Resolution: Fixed

Fixed in private branch

 Wrong results - Lead(char-column) 
 --

 Key: DRILL-3605
 URL: https://issues.apache.org/jira/browse/DRILL-3605
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.2.0
 Environment: private-branch-with-new-window-functions
Reporter: Khurram Faraaz
Assignee: Deneche A. Hakim
  Labels: window_function
 Fix For: 1.2.0

 Attachments: fewRowsAllData.parquet


 col2 is of type char(2) in the parquet file.
 Results returned by Drill
 {code}
 0: jdbc:drill:schema=dfs.tmp select col2, lead(col2) over (partition by col2 
 order by col0) lead_col0 from `fewRowsAllData.parquet`;
 +--+---+
 | col2 | lead_col0 |
 +--+---+
 | AK | null |
 | AZ | AZCACO |
 | AZ | null |
 | CA | null |
 | CO | COCODEGAGAGA |
 | CO | CODEGAGAGAGAHI |
 | CO | null |
 | DE | null |
 | GA | GAGAGAHIIAIAIAIAININ |
 | GA | GAGAHIIAIAIAIAININININ |
 | GA | GAHIIAIAIAIAININININKSLA |
 | GA | null |
 | HI | null |
 | IA | IAIAIAININININKSLAMAMAMAMDMDME |
 | IA | IAIAININININKSLAMAMAMAMDMDMEMEMI |
 | IA | IAININININKSLAMAMAMAMDMDMEMEMIMNMN |
 | IA | null |
 | IN | INININKSLAMAMAMAMDMDMEMEMIMNMNMNMNMNMN |
 | IN | ININKSLAMAMAMAMDMDMEMEMIMNMNMNMNMNMNMOMO |
 | IN | INKSLAMAMAMAMDMDMEMEMIMNMNMNMNMNMNMOMOMONC |
 | IN | null |
 | KS | null |
 | LA | null |
 | MA | MAMAMDMDMEMEMIMNMNMNMNMNMNMOMOMONCNDNDNENHNHNHNYNY |
 | MA | MAMDMDMEMEMIMNMNMNMNMNMNMOMOMONCNDNDNENHNHNHNYNYNYOH |
 | MA | null |
 | MD | MDMEMEMIMNMNMNMNMNMNMOMOMONCNDNDNENHNHNHNYNYNYOHOHORORPA |
 | MD | null |
 | ME | MEMIMNMNMNMNMNMNMOMOMONCNDNDNENHNHNHNYNYNYOHOHORORPARIRIRIRI |
 | ME | null |
 | MI | null |
 | MN | MNMNMNMNMNMOMOMONCNDNDNENHNHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUT |
 | MN | MNMNMNMNMOMOMONCNDNDNENHNHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUT |
 | MN | MNMNMNMOMOMONCNDNDNENHNHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVT 
 |
 | MN | 
 MNMNMOMOMONCNDNDNENHNHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVT |
 | MN | 
 MNMOMOMONCNDNDNENHNHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWI |
 | MN | null |
 | MO | 
 MOMONCNDNDNENHNHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWY
  |
 | MO | 
 MONCNDNDNENHNHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY
  |
 | MO | null |
 | NC | null |
 | ND | 
 NDNENHNHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY |
 | ND | null |
 | NE | null |
 | NH | NHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY |
 | NH | NHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY |
 | NH | null |
 | NY | NYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY |
 | NY | NYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY |
 | NY | null |
 | OH | OHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY |
 | OH | null |
 | OR | ORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY |
 | OR | null |
 | PA | null |
 | RI | RIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY |
 | RI | RIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY |
 | RI | RISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY |
 | RI | null |
 | SC | SCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY |
 | SC | null |
 | SD | SDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY |
 | SD | SDUTUTUTVTVTVTVTWIWIWVWYWYWYWY |
 | SD | null |
 | UT | UTUTVTVTVTVTWIWIWVWYWYWYWY |
 | UT | UTVTVTVTVTWIWIWVWYWYWYWY |
 | UT | null |
 | VT | VTVTVTWIWIWVWYWYWYWY |
 | VT | VTVTWIWIWVWYWYWYWY |
 | VT | VTWIWIWVWYWYWYWY |
 | VT | null |
 | WI | WIWVWYWYWYWY |
 | WI | null |
 | WV | null |
 | WY | WYWYWY |
 | WY | WYWY |
 | WY | WY |
 | WY | null |
 +--+---+
 78 rows selected (0.307 seconds)
 {code}
 Results returned by Postgres.
 {code}
 postgres=# select col2,lead(col2) over (partition by col2 order by col0) 
 lead_col0 from tbl_alldata;
  col2 | lead_col0 
 --+---
  AK   | 
  AZ   | AZ
  AZ   | 
  CA   | 
  CO   | CO
  CO   | CO
  CO   | 
  DE   | 
  GA   | GA
  GA   | GA
  GA   | GA
  GA   | 
  HI   | 
  IA   | IA
  IA   | IA
  IA   | IA
  IA   | 
  IN   | IN
  IN   | IN
  IN   | IN
  IN   | 
  KS   | 
  LA   | 
  MA   | MA
  MA   | MA
  MA   | 
  MD   | MD
  MD   | 
  ME   | ME
  ME   | 
  MI   | 
  MN   | MN
  MN   | MN
  MN   | MN
  MN   | MN
  MN   | MN
  MN   | 
  MO   | MO
  MO   | MO
  MO   | 
  NC   | 
  ND   | ND
  ND   | 
  NE   | 
  NH   | NH
  NH   | NH
  NH   | 
  NY   | NY
  NY   | NY
  NY   | 
  OH   | OH
  OH   | 
  OR   | OR
  OR   | 
  PA   | 
  RI   | RI
  RI   | RI
  RI   | RI
  RI   | 
  SC   | SC
  SC   | 
  SD   | SD
  SD   | SD
  SD   | 
  UT   | UT
  UT   | UT
  UT   | 
  VT   | VT
  VT   | VT
  VT   | VT
  VT   | 
  WI   | WI
  WI   | 
  WV   | 
  WY   | WY
  

[jira] [Created] (DRILL-3608) add support for FIRST_VALUE and LAST_VALUE

2015-08-05 Thread Deneche A. Hakim (JIRA)
Deneche A. Hakim created DRILL-3608:
---

 Summary: add support for FIRST_VALUE and LAST_VALUE
 Key: DRILL-3608
 URL: https://issues.apache.org/jira/browse/DRILL-3608
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Deneche A. Hakim






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3609) Support non default arguments for LEAD and LAG window functions

2015-08-05 Thread Deneche A. Hakim (JIRA)
Deneche A. Hakim created DRILL-3609:
---

 Summary: Support non default arguments for LEAD and LAG window 
functions
 Key: DRILL-3609
 URL: https://issues.apache.org/jira/browse/DRILL-3609
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Deneche A. Hakim
Assignee: Deneche A. Hakim
 Fix For: Future


The current implementation of LEAD and LAG only supports the default 
arguments: offset = 1, default value = NULL. 
Extend the implementation to support non default arguments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3604) LEAD(varchar-column) returns IOB Exception

2015-08-05 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim resolved DRILL-3604.
-
Resolution: Fixed

Fixed in private branch

 LEAD(varchar-column) returns IOB Exception 
 -

 Key: DRILL-3604
 URL: https://issues.apache.org/jira/browse/DRILL-3604
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.2.0
 Environment: private branch: 
 https://github.com/adeneche/incubator-drill/tree/new-window-funcs
Reporter: Khurram Faraaz
Assignee: Deneche A. Hakim
  Labels: window_function
 Fix For: 1.2.0


 Query that uses LEAD(varchar-column) returns IOB Exception on developers 
 private branch. 
 {code}
 0: jdbc:drill:schema=dfs.tmp select lead(col3) over (partition by col2 order 
 by col0) lead_col0 from `fewRowsAllData.parquet`;
 java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: 
 IndexOutOfBoundsException: index: 31668, length: 2444 (expected: range(0, 
 32768))
 Fragment 0:0
 [Error Id: a52df546-f567-4e07-aa68-96a149e413da on centos-04.qa.lab:31010]
   at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
   at 
 sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
   at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
   at sqlline.SqlLine.print(SqlLine.java:1583)
   at sqlline.Commands.execute(Commands.java:852)
   at sqlline.Commands.sql(Commands.java:751)
   at sqlline.SqlLine.dispatch(SqlLine.java:738)
   at sqlline.SqlLine.begin(SqlLine.java:612)
   at sqlline.SqlLine.start(SqlLine.java:366)
   at sqlline.SqlLine.main(SqlLine.java:259)
 {code}
 Stack trace from /var/log/drill/sqlline.out
 {code}
 2015-08-04 21:56:32,515 [Client-1] INFO  
 o.a.d.j.i.DrillResultSetImpl$ResultsListener - [#24] Query failed:
 org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
 IndexOutOfBoundsException: index: 31668, length: 2444 (expected: range(0, 
 32768))
 Fragment 0:0
 [Error Id: 94e70408-66b6-426d-8f57-8a782ba974c0 on centos-04.qa.lab:31010]
 at 
 org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118)
  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
 at 
 org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:111) 
 [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
 at 
 org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47)
  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
 at 
 org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32)
  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
 at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61) 
 [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
 at 
 org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233) 
 [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
 at 
 org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:205) 
 [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
 at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
  [netty-codec-4.0.27.Final.jar:4.0.27.Final]
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
 at 
 io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
  [netty-handler-4.0.27.Final.jar:4.0.27.Final]
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
 at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
  [netty-codec-4.0.27.Final.jar:4.0.27.Final]
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
 at 
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
  [netty-codec-4.0.27.Final.jar:4.0.27.Final]
 at 
 

Re: Review Request 37116: DRILL-3567: Wrong result in a query with multiple window functions and different over clauses

2015-08-05 Thread Sean Hsuan-Yi Chu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37116/
---

(Updated Aug. 5, 2015, 4:09 p.m.)


Review request for drill and Aman Sinha.


Changes
---

addressed comments


Bugs: DRILL-3567
https://issues.apache.org/jira/browse/DRILL-3567


Repository: drill-git


Description
---

Support multiple window definitions; Bump DrillCalcite version number to 
1.1.0-drill-r16


Diffs (updated)
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/UnsupportedOperatorsVisitor.java
 17689ad 
  exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java 
222c7b7 
  pom.xml 3ec79e1 

Diff: https://reviews.apache.org/r/37116/diff/


Testing
---

Unit, TPCH, functional


Thanks,

Sean Hsuan-Yi Chu



[jira] [Created] (DRILL-3610) TimestampAdd/Diff (SQL_TSI_) functions

2015-08-05 Thread Andries Engelbrecht (JIRA)
Andries Engelbrecht created DRILL-3610:
--

 Summary: TimestampAdd/Diff (SQL_TSI_) functions
 Key: DRILL-3610
 URL: https://issues.apache.org/jira/browse/DRILL-3610
 Project: Apache Drill
  Issue Type: Improvement
  Components: Functions - Drill
Reporter: Andries Engelbrecht
Assignee: Mehant Baid


Add TimestampAdd and TimestampDiff (SQL_TSI) functions for year, quarter, 
month, week, day, hour, minute, second.

Examples
SELECT CAST(TIMESTAMPADD(SQL_TSI_QUARTER,1,Date('2013-03-31'), SQL_DATE) AS 
`column_quarter`
FROM `table_in`
HAVING (COUNT(1)  0)

SELECT `table_in`.`datetime` AS `column1`,
  `table`.`Key` AS `column_Key`,
  TIMESTAMPDIFF(SQL_TSI_MINUTE,to_timestamp('2004-07-04', 
'-MM-dd'),`table_in`.`datetime`) AS `sum_datediff_minute`
FROM `calcs`




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 37027: DRILL-3557: Reading empty CSV file fails with SYSTEM ERROR

2015-08-05 Thread Parth Chandra

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37027/#review94263
---

Ship it!


Ship It!

- Parth Chandra


On Aug. 5, 2015, 3:15 p.m., Sean Hsuan-Yi Chu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/37027/
 ---
 
 (Updated Aug. 5, 2015, 3:15 p.m.)
 
 
 Review request for drill and Parth Chandra.
 
 
 Bugs: DRILL-3557
 https://issues.apache.org/jira/browse/DRILL-3557
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 Ensure empty CSV's path can be added
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/store/schedule/BlockMapBuilder.java
  e233dda 
   exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 
 e8af325 
   exec/java-exec/src/test/resources/store/text/directoryWithEmpyCSV/empty.csv 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/37027/diff/
 
 
 Testing
 ---
 
 Unit tests, functional, tpch
 
 
 Thanks,
 
 Sean Hsuan-Yi Chu
 




Re: Review Request 37116: DRILL-3567: Wrong result in a query with multiple window functions and different over clauses

2015-08-05 Thread Aman Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37116/#review94267
---

Ship it!


Ship It!

- Aman Sinha


On Aug. 5, 2015, 4:09 p.m., Sean Hsuan-Yi Chu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/37116/
 ---
 
 (Updated Aug. 5, 2015, 4:09 p.m.)
 
 
 Review request for drill and Aman Sinha.
 
 
 Bugs: DRILL-3567
 https://issues.apache.org/jira/browse/DRILL-3567
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 Support multiple window definitions; Bump DrillCalcite version number to 
 1.1.0-drill-r16
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/UnsupportedOperatorsVisitor.java
  17689ad 
   exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java 
 222c7b7 
   pom.xml 3ec79e1 
 
 Diff: https://reviews.apache.org/r/37116/diff/
 
 
 Testing
 ---
 
 Unit, TPCH, functional
 
 
 Thanks,
 
 Sean Hsuan-Yi Chu
 




Re: anyone seen these errors on master ?

2015-08-05 Thread Jinfeng Ni
In that case,  we probably need do binary search to figure out which recent
patch is causing this problem.

On Wed, Aug 5, 2015 at 10:03 AM, Abdel Hakim Deneche adene...@maprtech.com
wrote:

 Just got those errors on master too

 On Wed, Aug 5, 2015 at 9:07 AM, Abdel Hakim Deneche adene...@maprtech.com
 
 wrote:

  I'm seeing those errors intermittently when building my private branch, I
  don't believe I made any change that would have caused them. Anyone seen
  them too ?
 
 
 testBitVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
  Time elapsed: 2.043 sec   ERROR!
  java.lang.Exception: Unexpected exception,
  expectedorg.apache.drill.exec.exception.OversizedAllocationException
 but
  wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
  at java.nio.Bits.reserveMemory(Bits.java:658)
  at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
  at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
  at
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
  at
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
  at
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
  at
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  at
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
  at
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
  at
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100)
  at
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116)
  at org.apache.drill.exec.vector.BitVector.reAlloc(BitVector.java:139)
  at
 
 org.apache.drill.exec.record.vector.TestValueVector.testBitVectorReallocation(TestValueVector.java:125)
 
 
 
 testFixedVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
  Time elapsed: 0.436 sec   ERROR!
  java.lang.Exception: Unexpected exception,
  expectedorg.apache.drill.exec.exception.OversizedAllocationException
 but
  wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
  at java.nio.Bits.reserveMemory(Bits.java:658)
  at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
  at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
  at
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
  at
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
  at
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
  at
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  at
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
  at
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
  at
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100)
  at
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116)
  at
 
 org.apache.drill.exec.vector.UInt4Vector.allocateBytes(UInt4Vector.java:187)
  at
 
 org.apache.drill.exec.vector.UInt4Vector.allocateNew(UInt4Vector.java:177)
  at
 
 org.apache.drill.exec.record.vector.TestValueVector.testFixedVectorReallocation(TestValueVector.java:85)
 
 
 
 testVariableVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
  Time elapsed: 0.788 sec   ERROR!
  java.lang.Exception: Unexpected exception,
  expectedorg.apache.drill.exec.exception.OversizedAllocationException
 but
  wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
  at java.nio.Bits.reserveMemory(Bits.java:658)
  at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
  at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
  at
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
  at
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
  at
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
  at
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  at
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
  at
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
  at
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100)
  at
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116)
  at
 
 org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:372)
  at
 
 org.apache.drill.exec.record.vector.TestValueVector.testVariableVectorReallocation(TestValueVector.java:142)
 
 
  Thanks
  --
 
  Abdelhakim Deneche
 
  Software Engineer
 
http://www.mapr.com/
 
 
  Now Available - 

Re: [DISCUSS] Publishing advanced/functional tests

2015-08-05 Thread Ramana I N
@Jacques, Ted

in the mean time, we risk patches being merged that have less than complete
 testing.


While I agree with the premise of getting the tests out as soon as possible
it does not help us achieve anything except transparency. Your statement
that getting the tests out will increase quality is dependent on someone
actually being able to run the tests once they have access to it.

Maybe we should focus on making a jenkins job to run the tests publicly.
With that in place we can exclude the TPC* datasets as well as the yelp
data sets from the framework and avoid licensing issues.

Regards
Ramana


On Tue, Aug 4, 2015 at 11:39 AM, Abhishek Girish abhishek.gir...@gmail.com
wrote:

 We not only re-distribute external data-sets as-is, but also include
 variants for those (text - parquet, json, ...). So the challenge here is
 not simply disabling automatic downloads via the framework, and point users
 to manually download the files before running the framework, but also about
 how we will handle tests which require variants of the data sets. It simply
 isn't practical to users of the framework to (1) download data-gen manually
 (2) use specific seed / options before generating data, (3) convert them to
 parquet, etc.. (4) move them to specific locations inside their copy of the
 framework.

 Something we'll need to know is how other projects are handling bench-mark
  other external datasets.

 -Abhishek

 On Tue, Aug 4, 2015 at 11:23 AM, rahul challapalli 
 challapallira...@gmail.com wrote:

  Thanks for your inputs.
 
  Once issue with just publishing the tests in their current state is that,
  the framework re-distributes tpch, tpcds, yelp data sets without
 requiring
  the users to accept their relevant licenses. A good number of tests uses
  these data sets. Any thoughts on how to handle this?
 
  - Rahul
 
  On Wed, Jul 29, 2015 at 12:07 AM, Ted Dunning ted.dunn...@gmail.com
  wrote:
 
   +1.  Get it out there.
  
  
  
   On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau jacq...@dremio.com
   wrote:
  
Hey Rahul,
   
My suggestion would be to the lower bar--do the absolute bare minimum
  to
get the tests out there.  For example, simply remove proprietary
information and then get it on a public github (whether your personal
github or a corporate one).  From there, people can help by
 submitting
   pull
requests to improve the infrastructure and harness.  Making things
  easier
is something that can be done over time.  For example, we've had
 offers
from a couple different Linux Admins to help on something.  I'm sure
  that
they could help with a number of the items you've identified.  In the
   mean
time, we risk patches being merged that have less than complete
  testing.
   
   
--
Jacques Nadeau
CTO and Co-Founder, Dremio
   
On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli 
challapallira...@gmail.com wrote:
   
 Jacques,

 I am breaking down steps 1,2  3 into sub-tasks so we can
   add/prioritize
 these tasks

 Item #TaskSub-TaskCommentsPriority1*Publish the tests*




 Remove Proprietary Data  Queries
 0

 Redact Propriety Data/Queries



 Move tests into drill repo
 This requires some refactoring to the framework code since the test
 framework uses a 2-level directory structure



 Organize the tests using a label based approach
 This involves code changes and moving a lot of files. When doing a
  one
time
 push it might be better to do this before publishing the tests?


 Each suite should be independentSome suites wrongly assume that the
   data
is
 present. They should be identified and fixed


 Cleanup hardcoded dependencies during data generationSome data-gen
scripts
 have hard-coded references


 Cleanup downloadsThe same dataset is being downloaded multiple
 times
  by
 different suites


 Licenses for downloadsThe framework downloads some files
  automatically.
 These files are publicly available.
 However before downloading them users need to agree to certain
 terms.
   By
 using the framework users might be skipping this step. We should
 look
into
 this
 2*Setup a cluster infrastructure to run the pre-commit tests*


 3*Local debugging of tests*




 Add an optional maven target for running tests on a local machine
 Tests can launch an embedded drillbit or they can connect to a
  running
 drillbit through zookeeper


 Running suites which require additional setup (hive, hbase etc)
  should
   be
 made optional

 4*Documentation*




 Running Tests (options available and also listing the asumed
  defaults)



 Explaining how tests are organized



 Process for adding a new suite



 On Fri, Jul 24, 

Re: [DISCUSS] Drop table support

2015-08-05 Thread Mehant Baid
I agree, it is definitely restrictive. We can lift the restriction for 
being able to drop a table (when security is off) only if the Drill user 
owns it. I think the check for homogenous files should give us enough 
confidence that we are not deleting a non Drill directory.


Thanks
Mehant

On 8/4/15 10:00 PM, Neeraja Rentachintala wrote:

Ted, thats fair point on the recovery part.

Regarding the other point by Mehant (copied below) ,there is an implication
that user can drop only Drill managed tables (i.e created as Drill user)
when security is not enabled. I think this check is too restrictive (also
unintuitive). Drill doesn't have the concept of external/managed tables and
a user (impersonated user if security is enabled or Drillbit service user
if no security is enabled) should be able to drop the table if they have
permissions to do so. The above design proposes a check to verify if the
files that need to be deleted are readable by Drill and I believe is a good
validation to have.

/The above check is in the case when security is not enabled. Meaning we
are executing as the Drill user. If we are running as the Drill user (which
might be root or a super user) its likely that this user has permissions to
delete most files and checking for permissions might not suffice. So when
security isn't enabled the proposal is to delete only those files that are
owned (created) by the Drill user./


On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com wrote:


On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala 
nrentachint...@maprtech.com wrote:


Also will there any mechanism to recover once you accidentally drop?


yes.  Snapshots https://www.mapr.com/resources/videos/mapr-snapshots.

Seriously, recovery of data due to user error is a platform thing.  How can
we recover from turning off the cluster?  From removing a disk on an Oracle
node?

I don't think that this is Drill's business.





[DISCUSS] Publishing advanced/functional tests

2015-08-05 Thread Abhishek Girish
Ramana,

I think the issue with licenses is mostly resolved. It was discussed that
for TPC-*, since we shall not be redistributing the data-gen software, but
distributing a randomized variant of the data generated by it, we should be
okay to include it part of our framework. For other datasets, we shall
either provide their copy of license with our framework, or simply provide
a link for users to download data before they execute.

For now we should focus on having the framework out with minimal cleanup.
In near future we can work on setting up infrastructure and enhancing the
framework itself.

-Abhishek

On Wed, Aug 5, 2015 at 10:46 AM, Ramana I N inram...@gmail.com
javascript:_e(%7B%7D,'cvml','inram...@gmail.com'); wrote:

 @Jacques, Ted

 in the mean time, we risk patches being merged that have less than complete
  testing.


 While I agree with the premise of getting the tests out as soon as possible
 it does not help us achieve anything except transparency. Your statement
 that getting the tests out will increase quality is dependent on someone
 actually being able to run the tests once they have access to it.

 Maybe we should focus on making a jenkins job to run the tests publicly.
 With that in place we can exclude the TPC* datasets as well as the yelp
 data sets from the framework and avoid licensing issues.

 Regards
 Ramana


 On Tue, Aug 4, 2015 at 11:39 AM, Abhishek Girish 
 abhishek.gir...@gmail.com
 javascript:_e(%7B%7D,'cvml','abhishek.gir...@gmail.com');
 wrote:

  We not only re-distribute external data-sets as-is, but also include
  variants for those (text - parquet, json, ...). So the challenge here is
  not simply disabling automatic downloads via the framework, and point
 users
  to manually download the files before running the framework, but also
 about
  how we will handle tests which require variants of the data sets. It
 simply
  isn't practical to users of the framework to (1) download data-gen
 manually
  (2) use specific seed / options before generating data, (3) convert them
 to
  parquet, etc.. (4) move them to specific locations inside their copy of
 the
  framework.
 
  Something we'll need to know is how other projects are handling
 bench-mark
   other external datasets.
 
  -Abhishek
 
  On Tue, Aug 4, 2015 at 11:23 AM, rahul challapalli 
  challapallira...@gmail.com
 javascript:_e(%7B%7D,'cvml','challapallira...@gmail.com'); wrote:
 
   Thanks for your inputs.
  
   Once issue with just publishing the tests in their current state is
 that,
   the framework re-distributes tpch, tpcds, yelp data sets without
  requiring
   the users to accept their relevant licenses. A good number of tests
 uses
   these data sets. Any thoughts on how to handle this?
  
   - Rahul
  
   On Wed, Jul 29, 2015 at 12:07 AM, Ted Dunning ted.dunn...@gmail.com
 javascript:_e(%7B%7D,'cvml','ted.dunn...@gmail.com');
   wrote:
  
+1.  Get it out there.
   
   
   
On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau jacq...@dremio.com
 javascript:_e(%7B%7D,'cvml','jacq...@dremio.com');
wrote:
   
 Hey Rahul,

 My suggestion would be to the lower bar--do the absolute bare
 minimum
   to
 get the tests out there.  For example, simply remove proprietary
 information and then get it on a public github (whether your
 personal
 github or a corporate one).  From there, people can help by
  submitting
pull
 requests to improve the infrastructure and harness.  Making things
   easier
 is something that can be done over time.  For example, we've had
  offers
 from a couple different Linux Admins to help on something.  I'm
 sure
   that
 they could help with a number of the items you've identified.  In
 the
mean
 time, we risk patches being merged that have less than complete
   testing.


 --
 Jacques Nadeau
 CTO and Co-Founder, Dremio

 On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli 
 challapallira...@gmail.com
 javascript:_e(%7B%7D,'cvml','challapallira...@gmail.com'); wrote:

  Jacques,
 
  I am breaking down steps 1,2  3 into sub-tasks so we can
add/prioritize
  these tasks
 
  Item #TaskSub-TaskCommentsPriority1*Publish the tests*
 
 
 
 
  Remove Proprietary Data  Queries
  0
 
  Redact Propriety Data/Queries
 
 
 
  Move tests into drill repo
  This requires some refactoring to the framework code since the
 test
  framework uses a 2-level directory structure
 
 
 
  Organize the tests using a label based approach
  This involves code changes and moving a lot of files. When doing
 a
   one
 time
  push it might be better to do this before publishing the tests?
 
 
  Each suite should be independentSome suites wrongly assume that
 the
data
 is
  present. They should be identified and fixed
 
 
  Cleanup hardcoded dependencies during data generationSome
 

Re: [DISCUSS] Drop table support

2015-08-05 Thread Ramana I N
The homogenous check- Will it be just checking for types are homogenous or
if they are actually types that can be read by drill?
Also, is there a good way to determine if a file can be read by drill? And
will there be a perf hit if there are large number of files?

Regards
Ramana


On Wed, Aug 5, 2015 at 11:03 AM, Mehant Baid baid.meh...@gmail.com wrote:

 I agree, it is definitely restrictive. We can lift the restriction for
 being able to drop a table (when security is off) only if the Drill user
 owns it. I think the check for homogenous files should give us enough
 confidence that we are not deleting a non Drill directory.

 Thanks
 Mehant


 On 8/4/15 10:00 PM, Neeraja Rentachintala wrote:

 Ted, thats fair point on the recovery part.

 Regarding the other point by Mehant (copied below) ,there is an
 implication
 that user can drop only Drill managed tables (i.e created as Drill user)
 when security is not enabled. I think this check is too restrictive (also
 unintuitive). Drill doesn't have the concept of external/managed tables
 and
 a user (impersonated user if security is enabled or Drillbit service user
 if no security is enabled) should be able to drop the table if they have
 permissions to do so. The above design proposes a check to verify if the
 files that need to be deleted are readable by Drill and I believe is a
 good
 validation to have.

 /The above check is in the case when security is not enabled. Meaning we
 are executing as the Drill user. If we are running as the Drill user
 (which
 might be root or a super user) its likely that this user has permissions
 to
 delete most files and checking for permissions might not suffice. So when
 security isn't enabled the proposal is to delete only those files that are
 owned (created) by the Drill user./


 On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com
 wrote:

 On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala 
 nrentachint...@maprtech.com wrote:

 Also will there any mechanism to recover once you accidentally drop?

 yes.  Snapshots https://www.mapr.com/resources/videos/mapr-snapshots.

 Seriously, recovery of data due to user error is a platform thing.  How
 can
 we recover from turning off the cluster?  From removing a disk on an
 Oracle
 node?

 I don't think that this is Drill's business.





Re: [DISCUSS] Drop table support

2015-08-05 Thread Ramana I N
Sorry,

Did not realize you had covered that as part of the original discussion.
Looks like a sound mechanism.

Regards
Ramana


On Wed, Aug 5, 2015 at 11:09 AM, Ramana I N inram...@gmail.com wrote:

 The homogenous check- Will it be just checking for types are homogenous or
 if they are actually types that can be read by drill?
 Also, is there a good way to determine if a file can be read by drill? And
 will there be a perf hit if there are large number of files?

 Regards
 Ramana


 On Wed, Aug 5, 2015 at 11:03 AM, Mehant Baid baid.meh...@gmail.com
 wrote:

 I agree, it is definitely restrictive. We can lift the restriction for
 being able to drop a table (when security is off) only if the Drill user
 owns it. I think the check for homogenous files should give us enough
 confidence that we are not deleting a non Drill directory.

 Thanks
 Mehant


 On 8/4/15 10:00 PM, Neeraja Rentachintala wrote:

 Ted, thats fair point on the recovery part.

 Regarding the other point by Mehant (copied below) ,there is an
 implication
 that user can drop only Drill managed tables (i.e created as Drill user)
 when security is not enabled. I think this check is too restrictive (also
 unintuitive). Drill doesn't have the concept of external/managed tables
 and
 a user (impersonated user if security is enabled or Drillbit service user
 if no security is enabled) should be able to drop the table if they have
 permissions to do so. The above design proposes a check to verify if the
 files that need to be deleted are readable by Drill and I believe is a
 good
 validation to have.

 /The above check is in the case when security is not enabled. Meaning we
 are executing as the Drill user. If we are running as the Drill user
 (which
 might be root or a super user) its likely that this user has permissions
 to
 delete most files and checking for permissions might not suffice. So when
 security isn't enabled the proposal is to delete only those files that
 are
 owned (created) by the Drill user./


 On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com
 wrote:

 On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala 
 nrentachint...@maprtech.com wrote:

 Also will there any mechanism to recover once you accidentally drop?

 yes.  Snapshots https://www.mapr.com/resources/videos/mapr-snapshots
 .

 Seriously, recovery of data due to user error is a platform thing.  How
 can
 we recover from turning off the cluster?  From removing a disk on an
 Oracle
 node?

 I don't think that this is Drill's business.






Re: [DISCUSS] Drop table support

2015-08-05 Thread Ted Dunning
Is any check really necessary?

Can't we just say that for data sources that are file-like that drop is a
rough synonym for rm? If you have permission to remove files and
directories, you can do it.  If you don't, it will fail, possibly half
done. I have never seen a bug filed against rm to add more elaborate
semantics, so why is it so necessary for Drill to have elaborate semantics
here?



On Wed, Aug 5, 2015 at 11:09 AM, Ramana I N inram...@gmail.com wrote:

 The homogenous check- Will it be just checking for types are homogenous or
 if they are actually types that can be read by drill?
 Also, is there a good way to determine if a file can be read by drill? And
 will there be a perf hit if there are large number of files?

 Regards
 Ramana


 On Wed, Aug 5, 2015 at 11:03 AM, Mehant Baid baid.meh...@gmail.com
 wrote:

  I agree, it is definitely restrictive. We can lift the restriction for
  being able to drop a table (when security is off) only if the Drill user
  owns it. I think the check for homogenous files should give us enough
  confidence that we are not deleting a non Drill directory.
 
  Thanks
  Mehant
 
 
  On 8/4/15 10:00 PM, Neeraja Rentachintala wrote:
 
  Ted, thats fair point on the recovery part.
 
  Regarding the other point by Mehant (copied below) ,there is an
  implication
  that user can drop only Drill managed tables (i.e created as Drill user)
  when security is not enabled. I think this check is too restrictive
 (also
  unintuitive). Drill doesn't have the concept of external/managed tables
  and
  a user (impersonated user if security is enabled or Drillbit service
 user
  if no security is enabled) should be able to drop the table if they have
  permissions to do so. The above design proposes a check to verify if the
  files that need to be deleted are readable by Drill and I believe is a
  good
  validation to have.
 
  /The above check is in the case when security is not enabled. Meaning we
  are executing as the Drill user. If we are running as the Drill user
  (which
  might be root or a super user) its likely that this user has permissions
  to
  delete most files and checking for permissions might not suffice. So
 when
  security isn't enabled the proposal is to delete only those files that
 are
  owned (created) by the Drill user./
 
 
  On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com
  wrote:
 
  On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala 
  nrentachint...@maprtech.com wrote:
 
  Also will there any mechanism to recover once you accidentally drop?
 
  yes.  Snapshots https://www.mapr.com/resources/videos/mapr-snapshots
 .
 
  Seriously, recovery of data due to user error is a platform thing.  How
  can
  we recover from turning off the cluster?  From removing a disk on an
  Oracle
  node?
 
  I don't think that this is Drill's business.
 
 
 



Re: [DISCUSS] Drop table support

2015-08-05 Thread Mehant Baid
What you are suggesting makes sense in the case when security is 
enabled. So when Drill is accessing the file system it will impersonate 
the user who issued the command and drop will happen if the user has 
sufficient permissions.


However when security isn't enabled, Drill will be accessing the file 
system as the Drill user itself which is most likely to be a super user 
who has permissions to delete most files. To prevent any catastrophic 
drops checking for homogenous file formats makes sure that at least the 
directory being dropped is something that can be read by Drill. This 
will prevent any accidental drops (like dropping the home directory etc, 
because its likely to have file formats that cannot be read by Drill). 
This will not prevent against malicious behavior (for handling this 
security should be enabled).


Thanks
Mehant
On 8/5/15 11:43 AM, Ted Dunning wrote:

Is any check really necessary?

Can't we just say that for data sources that are file-like that drop is a
rough synonym for rm? If you have permission to remove files and
directories, you can do it.  If you don't, it will fail, possibly half
done. I have never seen a bug filed against rm to add more elaborate
semantics, so why is it so necessary for Drill to have elaborate semantics
here?



On Wed, Aug 5, 2015 at 11:09 AM, Ramana I N inram...@gmail.com wrote:


The homogenous check- Will it be just checking for types are homogenous or
if they are actually types that can be read by drill?
Also, is there a good way to determine if a file can be read by drill? And
will there be a perf hit if there are large number of files?

Regards
Ramana


On Wed, Aug 5, 2015 at 11:03 AM, Mehant Baid baid.meh...@gmail.com
wrote:


I agree, it is definitely restrictive. We can lift the restriction for
being able to drop a table (when security is off) only if the Drill user
owns it. I think the check for homogenous files should give us enough
confidence that we are not deleting a non Drill directory.

Thanks
Mehant


On 8/4/15 10:00 PM, Neeraja Rentachintala wrote:


Ted, thats fair point on the recovery part.

Regarding the other point by Mehant (copied below) ,there is an
implication
that user can drop only Drill managed tables (i.e created as Drill user)
when security is not enabled. I think this check is too restrictive

(also

unintuitive). Drill doesn't have the concept of external/managed tables
and
a user (impersonated user if security is enabled or Drillbit service

user

if no security is enabled) should be able to drop the table if they have
permissions to do so. The above design proposes a check to verify if the
files that need to be deleted are readable by Drill and I believe is a
good
validation to have.

/The above check is in the case when security is not enabled. Meaning we
are executing as the Drill user. If we are running as the Drill user
(which
might be root or a super user) its likely that this user has permissions
to
delete most files and checking for permissions might not suffice. So

when

security isn't enabled the proposal is to delete only those files that

are

owned (created) by the Drill user./


On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com
wrote:

On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala 

nrentachint...@maprtech.com wrote:

Also will there any mechanism to recover once you accidentally drop?

yes.  Snapshots https://www.mapr.com/resources/videos/mapr-snapshots

.

Seriously, recovery of data due to user error is a platform thing.  How
can
we recover from turning off the cluster?  From removing a disk on an
Oracle
node?

I don't think that this is Drill's business.






Re: [DISCUSS] Insert into Table support

2015-08-05 Thread Chris Westin
Re #7 in the original post Select table syntax can specify constant values
for one or more columns:
I would have assumed the select list can have any expressions that can be
evaluated on a row from the source; that includes columns, expressions on
columns, or constants. It's probably not your intent, but the stated form
implies that all I get are column values and constants. Which is it?

On Mon, Jul 27, 2015 at 5:40 PM, Mehant Baid baid.meh...@gmail.com wrote:

 I wanted to start a conversation around supporting the Insert into Table
 feature. As of 1.2 we initially want to support inserting into a table with
 Parquet files. Support for Json, CSV and other sources will follow as
 future enhancements.

 Aman, Jinfeng, Neeraja and I had an initial discussion about this and
 Neeraja provided a good summary of our discussion (pasted below) also
 stating some of the requirements for this feature.

  A ) Support Insert into a non-partitioned table
 -

 Ex: INSERT INTO T1 [col1, col2, col3]  SELECT col4, col5, col6 from T2
 (Source table: T2, Target table T1)
 Requirements:

 1. Target table column list specification is optional for Insert statement
 2. When specified, the column list in the Insert statement should
contain all the columns present in the target table (i.e No support
for partial insert)
 3. The column names specified for the source table do not need to match
to the target table column names. Match is performed based on ordinal.
 4.   # of Source table columns specified must be same as # of target
table columns
 5. Types of specified source table columns must match to the types of
target table columns
 6. Specification of * is not allowed in the Select table syntax
 7. Select table syntax can specify constant values for one or more columns


  B ) Support insert into a partitioned table
 --

 Ex: INSERT INTO T1 col1, col2,col3  partition by col1,col2 SELECT
 col4,col,col6 from T2

  * Target column specification is required when inserting data into an
already partitioned table
  * Requirements A.3-A.7 above apply for insert into partitioned tables
as well
  * A partition by clause along with one or more columns is required
  * All the columns specified in partition by clause must exist in the
target column list
  * Partition by columns specified do not need to match to the list of
columns that the original table partitioned with (i.e if the
original table is partitioned with col1, col2,  new data during
insert can be partitioned by col3 or just with col1 or col2..)


 Couple of open questions from the design perspective are

 1. How do we perform validation. Validation of data types, number of
 columns being inserted etc. In addition to validation we need to make sure
 that when we insert into an existing tables we insert data with the
 existing column names (select column list can have different names). This
 poses problems around needing to know the metadata at planning time, two
 approaches that have been floating around are
 * DotDrill files: We can store metadata, partitioning columns and
 other useful information here and we can perform validation during planning
 time. However the challenges with introducing DotDrill files include
  - consistency between metadata and the actual data (Nothing
 preventing users to copy files directly).
  - security around DotDrill files (can be dealt in the same
 way we perform security checks for drill tables in hdfs)
  - interface to change the DotDrill file, in the case we need
 to add a column to the table or add a new partition etc.

 * Explicit Syntax/ No metadata approach: Another approach is to
 avoid DotDrill files and use explicit syntax to glean as much information
 as possible from the SQL statement itself. Some of the challenges with this
 approach are
  - Gathering metadata information: Since we have no idea what
 the existing schema is we would need to perform a mini scan to learn the
 schema at planning time to be able to perform some validation. The problem
 with this approach is how do we determine how many files we need to read in
 order to learn the schema? If we use a sample set and not all the files
 have the same schema,
 we could have non-deterministic results based on the
 sample of files read. Also reading all the files and merging the schema
 seems like an expensive cost to pay.
  - From the user's perspective, while inserting into a
 partitioned table, user will have to specify the partitioning columns again
 in the Insert statement, despite having specified the partition columns in
 the CTAS.

 2. What is a reasonable assumption for a Drill table in terms of changing
 schema. Having the same exact schema for all files in a table is too rigid
 an assumption at this point?

 One thing to remember with DotDrill file is to also the repercussions on
 Drop table, 

Re: [DISCUSS] Drop table support

2015-08-05 Thread Ted Dunning
On Wed, Aug 5, 2015 at 11:54 AM, Mehant Baid baid.meh...@gmail.com wrote:

 To prevent any catastrophic drops checking for homogenous file formats
 makes sure that at least the directory being dropped is something that can
 be read by Drill.


Or we could just disable drop unless permissions can be enforced.


Re: anyone seen these errors on master ?

2015-08-05 Thread Hanifi Gunes
Did you tighten your memory settings? How many forks are you running with?
I bet you are truly running out of memory while executing this particular
test case.

-H+

On Wed, Aug 5, 2015 at 8:56 PM, Sudheesh Katkam skat...@maprtech.com
wrote:

 b2bbd99 committed on July 6th introduced the test.

  On Aug 5, 2015, at 10:21 AM, Jinfeng Ni jinfengn...@gmail.com wrote:
 
  In that case,  we probably need do binary search to figure out which
 recent
  patch is causing this problem.
 
  On Wed, Aug 5, 2015 at 10:03 AM, Abdel Hakim Deneche 
 adene...@maprtech.com
  wrote:
 
  Just got those errors on master too
 
  On Wed, Aug 5, 2015 at 9:07 AM, Abdel Hakim Deneche 
 adene...@maprtech.com
 
  wrote:
 
  I'm seeing those errors intermittently when building my private
 branch, I
  don't believe I made any change that would have caused them. Anyone
 seen
  them too ?
 
 
 
 testBitVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
  Time elapsed: 2.043 sec   ERROR!
  java.lang.Exception: Unexpected exception,
  expectedorg.apache.drill.exec.exception.OversizedAllocationException
  but
  wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
  at java.nio.Bits.reserveMemory(Bits.java:658)
  at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
  at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
  at
 
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
  at
 
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
  at
 
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
  at
 
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  at
 
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
  at
 
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
  at
 
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100)
  at
 
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116)
  at org.apache.drill.exec.vector.BitVector.reAlloc(BitVector.java:139)
  at
 
 
 org.apache.drill.exec.record.vector.TestValueVector.testBitVectorReallocation(TestValueVector.java:125)
 
 
 
 
 testFixedVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
  Time elapsed: 0.436 sec   ERROR!
  java.lang.Exception: Unexpected exception,
  expectedorg.apache.drill.exec.exception.OversizedAllocationException
  but
  wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
  at java.nio.Bits.reserveMemory(Bits.java:658)
  at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
  at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
  at
 
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
  at
 
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
  at
 
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
  at
 
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  at
 
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
  at
 
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
  at
 
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100)
  at
 
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116)
  at
 
 
 org.apache.drill.exec.vector.UInt4Vector.allocateBytes(UInt4Vector.java:187)
  at
 
 
 org.apache.drill.exec.vector.UInt4Vector.allocateNew(UInt4Vector.java:177)
  at
 
 
 org.apache.drill.exec.record.vector.TestValueVector.testFixedVectorReallocation(TestValueVector.java:85)
 
 
 
 
 testVariableVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
  Time elapsed: 0.788 sec   ERROR!
  java.lang.Exception: Unexpected exception,
  expectedorg.apache.drill.exec.exception.OversizedAllocationException
  but
  wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
  at java.nio.Bits.reserveMemory(Bits.java:658)
  at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
  at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
  at
 
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
  at
 
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
  at
 
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
  at
 
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  at
 
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
  at
 
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
  at
 
 
 

Re: [DISCUSS] Drop table support

2015-08-05 Thread Neeraja Rentachintala
I think enabling drop only when security is enabled is too restrictive.

On Wed, Aug 5, 2015 at 12:46 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 On Wed, Aug 5, 2015 at 11:54 AM, Mehant Baid baid.meh...@gmail.com
 wrote:

  To prevent any catastrophic drops checking for homogenous file formats
  makes sure that at least the directory being dropped is something that
 can
  be read by Drill.


 Or we could just disable drop unless permissions can be enforced.



Re: anyone seen these errors on master ?

2015-08-05 Thread Jason Altekruse
I also saw this failure running the tests on my linux vm, the only changed
setting is the -PlargeTests flag, memory and fork count are defaults. All
of the tests pass on my mac with default settings, no flags.

On Wed, Aug 5, 2015 at 1:03 PM, Hanifi Gunes hgu...@maprtech.com wrote:

 Did you tighten your memory settings? How many forks are you running with?
 I bet you are truly running out of memory while executing this particular
 test case.

 -H+

 On Wed, Aug 5, 2015 at 8:56 PM, Sudheesh Katkam skat...@maprtech.com
 wrote:

  b2bbd99 committed on July 6th introduced the test.
 
   On Aug 5, 2015, at 10:21 AM, Jinfeng Ni jinfengn...@gmail.com wrote:
  
   In that case,  we probably need do binary search to figure out which
  recent
   patch is causing this problem.
  
   On Wed, Aug 5, 2015 at 10:03 AM, Abdel Hakim Deneche 
  adene...@maprtech.com
   wrote:
  
   Just got those errors on master too
  
   On Wed, Aug 5, 2015 at 9:07 AM, Abdel Hakim Deneche 
  adene...@maprtech.com
  
   wrote:
  
   I'm seeing those errors intermittently when building my private
  branch, I
   don't believe I made any change that would have caused them. Anyone
  seen
   them too ?
  
  
  
 
 testBitVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
   Time elapsed: 2.043 sec   ERROR!
   java.lang.Exception: Unexpected exception,
  
 expectedorg.apache.drill.exec.exception.OversizedAllocationException
   but
   wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
   at java.nio.Bits.reserveMemory(Bits.java:658)
   at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
   at
  
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
   at
  
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
   at
  
  
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
   at
  
  
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
   at
  
  
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
   at
  
  
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
   at
  
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100)
   at
  
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116)
   at
 org.apache.drill.exec.vector.BitVector.reAlloc(BitVector.java:139)
   at
  
  
 
 org.apache.drill.exec.record.vector.TestValueVector.testBitVectorReallocation(TestValueVector.java:125)
  
  
  
  
 
 testFixedVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
   Time elapsed: 0.436 sec   ERROR!
   java.lang.Exception: Unexpected exception,
  
 expectedorg.apache.drill.exec.exception.OversizedAllocationException
   but
   wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
   at java.nio.Bits.reserveMemory(Bits.java:658)
   at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
   at
  
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
   at
  
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
   at
  
  
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
   at
  
  
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
   at
  
  
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
   at
  
  
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
   at
  
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100)
   at
  
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116)
   at
  
  
 
 org.apache.drill.exec.vector.UInt4Vector.allocateBytes(UInt4Vector.java:187)
   at
  
  
 
 org.apache.drill.exec.vector.UInt4Vector.allocateNew(UInt4Vector.java:177)
   at
  
  
 
 org.apache.drill.exec.record.vector.TestValueVector.testFixedVectorReallocation(TestValueVector.java:85)
  
  
  
  
 
 testVariableVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
   Time elapsed: 0.788 sec   ERROR!
   java.lang.Exception: Unexpected exception,
  
 expectedorg.apache.drill.exec.exception.OversizedAllocationException
   but
   wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
   at java.nio.Bits.reserveMemory(Bits.java:658)
   at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
   at
  
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
   at
  
  
 
 

Re: [DISCUSS] Drop table support

2015-08-05 Thread Neeraja Rentachintala
Another question/comment.

Does Drill need to manage concurrency for the Drop table i.e how do you
deal with users trying to read the data while somebody is dropping. Does it
need to implement some kind of locking.

I have some thoughts on that but would like to know others think - Drill is
not (yet) a transactional system but rather an interactive query layer on
variety of stores. The couple of most common use cases I can think of in
this context  are - a user doing analytics/exploration and as part of it he
would create some intermediate tables, insert data into them and drop the
tables or BI tools generating these intermediate tables for processing
queries. Both these do not have the concurrency issue..
Additionally given that the data is externally managed, there could always
be other processes adding and deleting files and Drill doesn't even have
control over them.
Overall, I think the first phase of DROP implementation might be ok not to
have these locking/concurrency checks.

Thoughts?

-Neeraja





On Wed, Aug 5, 2015 at 11:54 AM, Mehant Baid baid.meh...@gmail.com wrote:

 What you are suggesting makes sense in the case when security is enabled.
 So when Drill is accessing the file system it will impersonate the user who
 issued the command and drop will happen if the user has sufficient
 permissions.

 However when security isn't enabled, Drill will be accessing the file
 system as the Drill user itself which is most likely to be a super user who
 has permissions to delete most files. To prevent any catastrophic drops
 checking for homogenous file formats makes sure that at least the directory
 being dropped is something that can be read by Drill. This will prevent any
 accidental drops (like dropping the home directory etc, because its likely
 to have file formats that cannot be read by Drill). This will not prevent
 against malicious behavior (for handling this security should be enabled).

 Thanks
 Mehant

 On 8/5/15 11:43 AM, Ted Dunning wrote:

 Is any check really necessary?

 Can't we just say that for data sources that are file-like that drop is a
 rough synonym for rm? If you have permission to remove files and
 directories, you can do it.  If you don't, it will fail, possibly half
 done. I have never seen a bug filed against rm to add more elaborate
 semantics, so why is it so necessary for Drill to have elaborate semantics
 here?



 On Wed, Aug 5, 2015 at 11:09 AM, Ramana I N inram...@gmail.com wrote:

 The homogenous check- Will it be just checking for types are homogenous or
 if they are actually types that can be read by drill?
 Also, is there a good way to determine if a file can be read by drill?
 And
 will there be a perf hit if there are large number of files?

 Regards
 Ramana


 On Wed, Aug 5, 2015 at 11:03 AM, Mehant Baid baid.meh...@gmail.com
 wrote:

 I agree, it is definitely restrictive. We can lift the restriction for
 being able to drop a table (when security is off) only if the Drill user
 owns it. I think the check for homogenous files should give us enough
 confidence that we are not deleting a non Drill directory.

 Thanks
 Mehant


 On 8/4/15 10:00 PM, Neeraja Rentachintala wrote:

 Ted, thats fair point on the recovery part.

 Regarding the other point by Mehant (copied below) ,there is an
 implication
 that user can drop only Drill managed tables (i.e created as Drill
 user)
 when security is not enabled. I think this check is too restrictive

 (also

 unintuitive). Drill doesn't have the concept of external/managed tables
 and
 a user (impersonated user if security is enabled or Drillbit service

 user

 if no security is enabled) should be able to drop the table if they have
 permissions to do so. The above design proposes a check to verify if
 the
 files that need to be deleted are readable by Drill and I believe is a
 good
 validation to have.

 /The above check is in the case when security is not enabled. Meaning
 we
 are executing as the Drill user. If we are running as the Drill user
 (which
 might be root or a super user) its likely that this user has
 permissions
 to
 delete most files and checking for permissions might not suffice. So

 when

 security isn't enabled the proposal is to delete only those files that

 are

 owned (created) by the Drill user./


 On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com
 wrote:

 On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala 

 nrentachint...@maprtech.com wrote:

 Also will there any mechanism to recover once you accidentally drop?

 yes.  Snapshots 
 https://www.mapr.com/resources/videos/mapr-snapshots

 .

 Seriously, recovery of data due to user error is a platform thing.  How
 can
 we recover from turning off the cluster?  From removing a disk on an
 Oracle
 node?

 I don't think that this is Drill's business.






Re: anyone seen these errors on master ?

2015-08-05 Thread Abdel Hakim Deneche
I didn't make any change, I'm running 2 forks (the default). I got those
errors 3 times now, 2 on a linux VM and 1 on a linux physical node

On Wed, Aug 5, 2015 at 1:03 PM, Hanifi Gunes hgu...@maprtech.com wrote:

 Did you tighten your memory settings? How many forks are you running with?
 I bet you are truly running out of memory while executing this particular
 test case.

 -H+

 On Wed, Aug 5, 2015 at 8:56 PM, Sudheesh Katkam skat...@maprtech.com
 wrote:

  b2bbd99 committed on July 6th introduced the test.
 
   On Aug 5, 2015, at 10:21 AM, Jinfeng Ni jinfengn...@gmail.com wrote:
  
   In that case,  we probably need do binary search to figure out which
  recent
   patch is causing this problem.
  
   On Wed, Aug 5, 2015 at 10:03 AM, Abdel Hakim Deneche 
  adene...@maprtech.com
   wrote:
  
   Just got those errors on master too
  
   On Wed, Aug 5, 2015 at 9:07 AM, Abdel Hakim Deneche 
  adene...@maprtech.com
  
   wrote:
  
   I'm seeing those errors intermittently when building my private
  branch, I
   don't believe I made any change that would have caused them. Anyone
  seen
   them too ?
  
  
  
 
 testBitVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
   Time elapsed: 2.043 sec   ERROR!
   java.lang.Exception: Unexpected exception,
  
 expectedorg.apache.drill.exec.exception.OversizedAllocationException
   but
   wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
   at java.nio.Bits.reserveMemory(Bits.java:658)
   at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
   at
  
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
   at
  
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
   at
  
  
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
   at
  
  
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
   at
  
  
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
   at
  
  
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
   at
  
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100)
   at
  
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116)
   at
 org.apache.drill.exec.vector.BitVector.reAlloc(BitVector.java:139)
   at
  
  
 
 org.apache.drill.exec.record.vector.TestValueVector.testBitVectorReallocation(TestValueVector.java:125)
  
  
  
  
 
 testFixedVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
   Time elapsed: 0.436 sec   ERROR!
   java.lang.Exception: Unexpected exception,
  
 expectedorg.apache.drill.exec.exception.OversizedAllocationException
   but
   wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
   at java.nio.Bits.reserveMemory(Bits.java:658)
   at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
   at
  
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
   at
  
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
   at
  
  
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
   at
  
  
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
   at
  
  
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
   at
  
  
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
   at
  
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100)
   at
  
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116)
   at
  
  
 
 org.apache.drill.exec.vector.UInt4Vector.allocateBytes(UInt4Vector.java:187)
   at
  
  
 
 org.apache.drill.exec.vector.UInt4Vector.allocateNew(UInt4Vector.java:177)
   at
  
  
 
 org.apache.drill.exec.record.vector.TestValueVector.testFixedVectorReallocation(TestValueVector.java:85)
  
  
  
  
 
 testVariableVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
   Time elapsed: 0.788 sec   ERROR!
   java.lang.Exception: Unexpected exception,
  
 expectedorg.apache.drill.exec.exception.OversizedAllocationException
   but
   wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
   at java.nio.Bits.reserveMemory(Bits.java:658)
   at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
   at
  
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
   at
  
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
   at
  
  
 
 

Re: anyone seen these errors on master ?

2015-08-05 Thread Hanifi Gunes
I don't seem to be able to re-prod this. Let me look at this and update you
all.

On Thu, Aug 6, 2015 at 12:03 AM, Abdel Hakim Deneche adene...@maprtech.com
wrote:

 I didn't make any change, I'm running 2 forks (the default). I got those
 errors 3 times now, 2 on a linux VM and 1 on a linux physical node

 On Wed, Aug 5, 2015 at 1:03 PM, Hanifi Gunes hgu...@maprtech.com wrote:

  Did you tighten your memory settings? How many forks are you running
 with?
  I bet you are truly running out of memory while executing this particular
  test case.
 
  -H+
 
  On Wed, Aug 5, 2015 at 8:56 PM, Sudheesh Katkam skat...@maprtech.com
  wrote:
 
   b2bbd99 committed on July 6th introduced the test.
  
On Aug 5, 2015, at 10:21 AM, Jinfeng Ni jinfengn...@gmail.com
 wrote:
   
In that case,  we probably need do binary search to figure out which
   recent
patch is causing this problem.
   
On Wed, Aug 5, 2015 at 10:03 AM, Abdel Hakim Deneche 
   adene...@maprtech.com
wrote:
   
Just got those errors on master too
   
On Wed, Aug 5, 2015 at 9:07 AM, Abdel Hakim Deneche 
   adene...@maprtech.com
   
wrote:
   
I'm seeing those errors intermittently when building my private
   branch, I
don't believe I made any change that would have caused them. Anyone
   seen
them too ?
   
   
   
  
 
 testBitVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
Time elapsed: 2.043 sec   ERROR!
java.lang.Exception: Unexpected exception,
   
  expectedorg.apache.drill.exec.exception.OversizedAllocationException
but
wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
at
   
   
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
at
   
   
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
at
   
   
  
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
at
   
   
  
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
at
   
   
  
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
at
   
   
  
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
at
   
   
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100)
at
   
   
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116)
at
  org.apache.drill.exec.vector.BitVector.reAlloc(BitVector.java:139)
at
   
   
  
 
 org.apache.drill.exec.record.vector.TestValueVector.testBitVectorReallocation(TestValueVector.java:125)
   
   
   
   
  
 
 testFixedVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
Time elapsed: 0.436 sec   ERROR!
java.lang.Exception: Unexpected exception,
   
  expectedorg.apache.drill.exec.exception.OversizedAllocationException
but
wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
at
   
   
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
at
   
   
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
at
   
   
  
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
at
   
   
  
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
at
   
   
  
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
at
   
   
  
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
at
   
   
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100)
at
   
   
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116)
at
   
   
  
 
 org.apache.drill.exec.vector.UInt4Vector.allocateBytes(UInt4Vector.java:187)
at
   
   
  
 
 org.apache.drill.exec.vector.UInt4Vector.allocateNew(UInt4Vector.java:177)
at
   
   
  
 
 org.apache.drill.exec.record.vector.TestValueVector.testFixedVectorReallocation(TestValueVector.java:85)
   
   
   
   
  
 
 testVariableVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
Time elapsed: 0.788 sec   ERROR!
java.lang.Exception: Unexpected exception,
   
  expectedorg.apache.drill.exec.exception.OversizedAllocationException
but
wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
at java.nio.Bits.reserveMemory(Bits.java:658)
at 

Re: anyone seen these errors on master ?

2015-08-05 Thread Chris Westin
Given that the difference is just

 java.lang.Exception: Unexpected exception,
 expectedorg.apache.drill.exec.exception.OversizedAllocationException but
 wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException

The question of what constitutes an oversized allocation? comes to mind.
Is this test fragile relative to being run in different environments?
I haven't seen the test so how is the determination that something is
oversized made? It seems like that criterion sometimes fails, and we get an
OOM because whatever the request is is still very large.


On Wed, Aug 5, 2015 at 2:26 PM, Hanifi Gunes hgu...@maprtech.com wrote:

 I don't seem to be able to re-prod this. Let me look at this and update you
 all.

 On Thu, Aug 6, 2015 at 12:03 AM, Abdel Hakim Deneche 
 adene...@maprtech.com
 wrote:

  I didn't make any change, I'm running 2 forks (the default). I got those
  errors 3 times now, 2 on a linux VM and 1 on a linux physical node
 
  On Wed, Aug 5, 2015 at 1:03 PM, Hanifi Gunes hgu...@maprtech.com
 wrote:
 
   Did you tighten your memory settings? How many forks are you running
  with?
   I bet you are truly running out of memory while executing this
 particular
   test case.
  
   -H+
  
   On Wed, Aug 5, 2015 at 8:56 PM, Sudheesh Katkam skat...@maprtech.com
   wrote:
  
b2bbd99 committed on July 6th introduced the test.
   
 On Aug 5, 2015, at 10:21 AM, Jinfeng Ni jinfengn...@gmail.com
  wrote:

 In that case,  we probably need do binary search to figure out
 which
recent
 patch is causing this problem.

 On Wed, Aug 5, 2015 at 10:03 AM, Abdel Hakim Deneche 
adene...@maprtech.com
 wrote:

 Just got those errors on master too

 On Wed, Aug 5, 2015 at 9:07 AM, Abdel Hakim Deneche 
adene...@maprtech.com

 wrote:

 I'm seeing those errors intermittently when building my private
branch, I
 don't believe I made any change that would have caused them.
 Anyone
seen
 them too ?



   
  
 
 testBitVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
 Time elapsed: 2.043 sec   ERROR!
 java.lang.Exception: Unexpected exception,

   expectedorg.apache.drill.exec.exception.OversizedAllocationException
 but
 wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
 at java.nio.Bits.reserveMemory(Bits.java:658)
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
 at


   
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
 at


   
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
 at


   
  
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
 at


   
  
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
 at


   
  
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
 at


   
  
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
 at


   
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100)
 at


   
  
 
 org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116)
 at
   org.apache.drill.exec.vector.BitVector.reAlloc(BitVector.java:139)
 at


   
  
 
 org.apache.drill.exec.record.vector.TestValueVector.testBitVectorReallocation(TestValueVector.java:125)




   
  
 
 testFixedVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector)
 Time elapsed: 0.436 sec   ERROR!
 java.lang.Exception: Unexpected exception,

   expectedorg.apache.drill.exec.exception.OversizedAllocationException
 but
 wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException
 at java.nio.Bits.reserveMemory(Bits.java:658)
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123)
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
 at


   
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
 at


   
  
 
 io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69)
 at


   
  
 
 io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
 at


   
  
 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
 at


   
  
 
 io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130)
 at


   
  
 
 io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171)
 at


   
  
 
 

Re: [DISCUSS] Insert into Table support

2015-08-05 Thread Aman Sinha
@Jacques..
Since the topic of  metadata caching is closely related, the main issue is
how is the metadata file maintained in the presence of either concurrent
INSERTs or concurrent REFRESH METADATA operations ?  One could maintain
multiple versions of the metadata file or keep the version information
inside a single metadata file.  Is that what you were thinking when you
mention Lucene's index versioning ? (I am not familiar with lucene's
versioning).

Aman

On Wed, Aug 5, 2015 at 12:02 PM, Chris Westin chriswesti...@gmail.com
wrote:

 Re #7 in the original post Select table syntax can specify constant values
 for one or more columns:
 I would have assumed the select list can have any expressions that can be
 evaluated on a row from the source; that includes columns, expressions on
 columns, or constants. It's probably not your intent, but the stated form
 implies that all I get are column values and constants. Which is it?

 On Mon, Jul 27, 2015 at 5:40 PM, Mehant Baid baid.meh...@gmail.com
 wrote:

  I wanted to start a conversation around supporting the Insert into
 Table
  feature. As of 1.2 we initially want to support inserting into a table
 with
  Parquet files. Support for Json, CSV and other sources will follow as
  future enhancements.
 
  Aman, Jinfeng, Neeraja and I had an initial discussion about this and
  Neeraja provided a good summary of our discussion (pasted below) also
  stating some of the requirements for this feature.
 
   A ) Support Insert into a non-partitioned table
  -
 
  Ex: INSERT INTO T1 [col1, col2, col3]  SELECT col4, col5, col6 from T2
  (Source table: T2, Target table T1)
  Requirements:
 
  1. Target table column list specification is optional for Insert
 statement
  2. When specified, the column list in the Insert statement should
 contain all the columns present in the target table (i.e No support
 for partial insert)
  3. The column names specified for the source table do not need to match
 to the target table column names. Match is performed based on ordinal.
  4.   # of Source table columns specified must be same as # of target
 table columns
  5. Types of specified source table columns must match to the types of
 target table columns
  6. Specification of * is not allowed in the Select table syntax
  7. Select table syntax can specify constant values for one or more
 columns
 
 
   B ) Support insert into a partitioned table
  --
 
  Ex: INSERT INTO T1 col1, col2,col3  partition by col1,col2 SELECT
  col4,col,col6 from T2
 
   * Target column specification is required when inserting data into an
 already partitioned table
   * Requirements A.3-A.7 above apply for insert into partitioned tables
 as well
   * A partition by clause along with one or more columns is required
   * All the columns specified in partition by clause must exist in the
 target column list
   * Partition by columns specified do not need to match to the list of
 columns that the original table partitioned with (i.e if the
 original table is partitioned with col1, col2,  new data during
 insert can be partitioned by col3 or just with col1 or col2..)
 
 
  Couple of open questions from the design perspective are
 
  1. How do we perform validation. Validation of data types, number of
  columns being inserted etc. In addition to validation we need to make
 sure
  that when we insert into an existing tables we insert data with the
  existing column names (select column list can have different names). This
  poses problems around needing to know the metadata at planning time, two
  approaches that have been floating around are
  * DotDrill files: We can store metadata, partitioning columns and
  other useful information here and we can perform validation during
 planning
  time. However the challenges with introducing DotDrill files include
   - consistency between metadata and the actual data (Nothing
  preventing users to copy files directly).
   - security around DotDrill files (can be dealt in the same
  way we perform security checks for drill tables in hdfs)
   - interface to change the DotDrill file, in the case we need
  to add a column to the table or add a new partition etc.
 
  * Explicit Syntax/ No metadata approach: Another approach is to
  avoid DotDrill files and use explicit syntax to glean as much information
  as possible from the SQL statement itself. Some of the challenges with
 this
  approach are
   - Gathering metadata information: Since we have no idea what
  the existing schema is we would need to perform a mini scan to learn
 the
  schema at planning time to be able to perform some validation. The
 problem
  with this approach is how do we determine how many files we need to read
 in
  order to learn the schema? If we use a sample set and not all the files
  have the same schema,
  we could have 

[jira] [Created] (DRILL-3611) Drill/client unstable in connection-closed state

2015-08-05 Thread Daniel Barclay (Drill) (JIRA)
Daniel Barclay (Drill) created DRILL-3611:
-

 Summary: Drill/client unstable in connection-closed state
 Key: DRILL-3611
 URL: https://issues.apache.org/jira/browse/DRILL-3611
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


When Drill and/or a client get into the state in which the client reports that 
the connection is closed, the error messages are not stable.  

In the following (a series of empty queries executed about a half a second 
apart), notice how sometimes the exception is a CONNECTION ERROR: ... closed 
unexpectedly exception and sometimes it is a SYSTEM ERROR: 
ChannelClosedException exception:


{noformat}
0: jdbc:drill: 
0: jdbc:drill: ;
Error: CONNECTION ERROR: Connection /127.0.0.1:46726 -- /127.0.0.1:31010 
(user client) closed unexpectedly.


[Error Id: 0848c18e-64e9-41e2-90d9-3a0ffaebc14e ] (state=,code=0)
0: jdbc:drill: ;
Error: SYSTEM ERROR: ChannelClosedException


[Error Id: b465b0e7-55a2-4ef6-ad0e-01258468f4e7 ] (state=,code=0)
0: jdbc:drill: ;
Error: SYSTEM ERROR: ChannelClosedException


[Error Id: 0b50a10c-42eb-47b6-bc3d-9a42afe4cd28 ] (state=,code=0)
0: jdbc:drill: ;
Error: SYSTEM ERROR: ChannelClosedException


[Error Id: 9cd1fd96-0aed-4d06-b0ae-d48ddc70b91e ] (state=,code=0)
0: jdbc:drill: ;
Error: CONNECTION ERROR: Connection /127.0.0.1:46726 -- /127.0.0.1:31010 
(user client) closed unexpectedly.


[Error Id: 222a5358-6b2e-49e1-a1ec-931cacbbdbd1 ] (state=,code=0)
0: jdbc:drill: ;
Error: SYSTEM ERROR: ChannelClosedException


[Error Id: fc589b70-dd10-4484-963a-21bc88147a0d ] (state=,code=0)
0: jdbc:drill: ;
Error: CONNECTION ERROR: Connection /127.0.0.1:46726 -- /127.0.0.1:31010 
(user client) closed unexpectedly.


[Error Id: 19965e75-9f2e-4a73-b1d8-29d61e6ea31a ] (state=,code=0)
0: jdbc:drill: 
0: jdbc:drill: 
0: jdbc:drill: 
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 37151: DRILL-3579: Fix issues in reading Hive tables with partition value __HIVE_DEFAULT_PARTITION__

2015-08-05 Thread Venki Korukanti

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37151/
---

Review request for drill and Mehant Baid.


Repository: drill-git


Description
---

Please see DRILL-3579 for details.


Diffs
-

  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java
 d323db9 
  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/HivePushPartitionFilterIntoScan.java
 90b0c5f 
  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDataTypeUtility.java
 84d8790 
  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveFieldConverter.java
 a59d37b 
  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveRecordReader.java
 1a66ad9 
  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java
 22552b7 
  
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java
 PRE-CREATION 
  
contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java
 0ea9d53 
  
contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java
 21d4f7b 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java
 af67282 

Diff: https://reviews.apache.org/r/37151/diff/


Testing
---

Added unittests.


Thanks,

Venki Korukanti



Re: [DISCUSS] Insert into Table support

2015-08-05 Thread Adam Gilmore
I thought I'd add my two cents based on my effort with Parquet pushdown
filtering.

It sounds like merging schemas is going to be pretty daunting, considering
the work planned around embedded types and considering Parquet doesn't
support those.

Furthermore, metadata sounds like it's going to be fairly specific to each
storage type.  In the Parquet example, it's very beneficial in having
statistics for each column in the file so that you can easily filter out
files that clearly won't match a filter (which is what I did in the
pushdown implementation).  This is the challenge with the current metadata
implementation (it doesn't include that information), so you end up
planning batches out to many Drillbits, often with a lot of those batches
entirely skipped due to mismatching filters.

I completely agree with Jacques's point with regards to not changing the
outcome of the query if the dot drill/metadata files are not present - it
just simply makes the query more expensive without them.

On Thu, Aug 6, 2015 at 8:01 AM, Aman Sinha asi...@maprtech.com wrote:

 @Jacques..
 Since the topic of  metadata caching is closely related, the main issue is
 how is the metadata file maintained in the presence of either concurrent
 INSERTs or concurrent REFRESH METADATA operations ?  One could maintain
 multiple versions of the metadata file or keep the version information
 inside a single metadata file.  Is that what you were thinking when you
 mention Lucene's index versioning ? (I am not familiar with lucene's
 versioning).

 Aman

 On Wed, Aug 5, 2015 at 12:02 PM, Chris Westin chriswesti...@gmail.com
 wrote:

  Re #7 in the original post Select table syntax can specify constant
 values
  for one or more columns:
  I would have assumed the select list can have any expressions that can be
  evaluated on a row from the source; that includes columns, expressions on
  columns, or constants. It's probably not your intent, but the stated form
  implies that all I get are column values and constants. Which is it?
 
  On Mon, Jul 27, 2015 at 5:40 PM, Mehant Baid baid.meh...@gmail.com
  wrote:
 
   I wanted to start a conversation around supporting the Insert into
  Table
   feature. As of 1.2 we initially want to support inserting into a table
  with
   Parquet files. Support for Json, CSV and other sources will follow as
   future enhancements.
  
   Aman, Jinfeng, Neeraja and I had an initial discussion about this and
   Neeraja provided a good summary of our discussion (pasted below) also
   stating some of the requirements for this feature.
  
A ) Support Insert into a non-partitioned table
   -
  
   Ex: INSERT INTO T1 [col1, col2, col3]  SELECT col4, col5, col6 from T2
   (Source table: T2, Target table T1)
   Requirements:
  
   1. Target table column list specification is optional for Insert
  statement
   2. When specified, the column list in the Insert statement should
  contain all the columns present in the target table (i.e No support
  for partial insert)
   3. The column names specified for the source table do not need to match
  to the target table column names. Match is performed based on
 ordinal.
   4.   # of Source table columns specified must be same as # of target
  table columns
   5. Types of specified source table columns must match to the types of
  target table columns
   6. Specification of * is not allowed in the Select table syntax
   7. Select table syntax can specify constant values for one or more
  columns
  
  
B ) Support insert into a partitioned table
   --
  
   Ex: INSERT INTO T1 col1, col2,col3  partition by col1,col2 SELECT
   col4,col,col6 from T2
  
* Target column specification is required when inserting data into an
  already partitioned table
* Requirements A.3-A.7 above apply for insert into partitioned tables
  as well
* A partition by clause along with one or more columns is required
* All the columns specified in partition by clause must exist in the
  target column list
* Partition by columns specified do not need to match to the list of
  columns that the original table partitioned with (i.e if the
  original table is partitioned with col1, col2,  new data during
  insert can be partitioned by col3 or just with col1 or col2..)
  
  
   Couple of open questions from the design perspective are
  
   1. How do we perform validation. Validation of data types, number of
   columns being inserted etc. In addition to validation we need to make
  sure
   that when we insert into an existing tables we insert data with the
   existing column names (select column list can have different names).
 This
   poses problems around needing to know the metadata at planning time,
 two
   approaches that have been floating around are
   * DotDrill files: We can store metadata, partitioning columns
 and
   other useful information here and we can perform validation during