Re: Review Request 37027: DRILL-3557: Reading empty CSV file fails with SYSTEM ERROR
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37027/ --- (Updated Aug. 5, 2015, 3:15 p.m.) Review request for drill and Parth Chandra. Changes --- Addressed comments Bugs: DRILL-3557 https://issues.apache.org/jira/browse/DRILL-3557 Repository: drill-git Description --- Ensure empty CSV's path can be added Diffs (updated) - exec/java-exec/src/main/java/org/apache/drill/exec/store/schedule/BlockMapBuilder.java e233dda exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java e8af325 exec/java-exec/src/test/resources/store/text/directoryWithEmpyCSV/empty.csv PRE-CREATION Diff: https://reviews.apache.org/r/37027/diff/ Testing --- Unit tests, functional, tpch Thanks, Sean Hsuan-Yi Chu
[jira] [Created] (DRILL-3607) small typo in configuring-resources-for-a-shared-drillbit page
Deneche A. Hakim created DRILL-3607: --- Summary: small typo in configuring-resources-for-a-shared-drillbit page Key: DRILL-3607 URL: https://issues.apache.org/jira/browse/DRILL-3607 Project: Apache Drill Issue Type: Bug Components: Documentation Reporter: Deneche A. Hakim Assignee: Bridget Bevens Priority: Minor In the documentation for [Configuring resources for a shared drillbit|https://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/] there is a small typo {{planner.width.max_per_node}} section. In the first line of this section we can read: {quote} Configure the *planner.width.max.per.node* to achieve {quote} but it actually should be {quote} Configure the *planner.width.max_per_node* to achieve {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3606) Wrong results - Lead(char-column) without PARTITION BY clause
[ https://issues.apache.org/jira/browse/DRILL-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim resolved DRILL-3606. - Resolution: Fixed Fixed in private branch Wrong results - Lead(char-column) without PARTITION BY clause - Key: DRILL-3606 URL: https://issues.apache.org/jira/browse/DRILL-3606 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.2.0 Environment: private-branch-with-new-window-functions Reporter: Khurram Faraaz Assignee: Deneche A. Hakim Labels: window_function Fix For: 1.2.0 Window function query that does not use partition by clause in window definition and uses LEAD function returns wrong results, on developer's private branch. This issue may be related to DRILL-3605 Results returned by Drill {code} 0: jdbc:drill:schema=dfs.tmp select lead(col2) over (order by col0) lead_col0 from `fewRowsAllData.parquet`; +---+ | lead_col0 | +---+ | NHIN | | INCACO | | CACOSCSD | | COSCSDWYLA | | SCSDWYLAKSCO | | SDWYLAKSCONYNY | | WYLAKSCONYNYSDGA | | LAKSCONYNYSDGAMOIN | | KSCONYNYSDGAMOINMNIA | | CONYNYSDGAMOINMNIAGAMN | | NYNYSDGAMOINMNIAGAMNMNMI | | NYSDGAMOINMNIAGAMNMNMIRISD | | SDGAMOINMNIAGAMNMNMIRISDINWI | | GAMOINMNIAGAMNMNMIRISDINWIMAIA | | MOINMNIAGAMNMNMIRISDINWIMAIANDMA | | INMNIAGAMNMNMIRISDINWIMAIANDMARIME | | MNIAGAMNMNMIRISDINWIMAIANDMARIMEMNCO | | IAGAMNMNMIRISDINWIMAIANDMARIMEMNCOOHMO | | GAMNMNMIRISDINWIMAIANDMARIMEMNCOOHMOGAVT | | MNMNMIRISDINWIMAIANDMARIMEMNCOOHMOGAVTNDNH | | MNMIRISDINWIMAIANDMARIMEMNCOOHMOGAVTNDNHRIOR | | MIRISDINWIMAIANDMARIMEMNCOOHMOGAVTNDNHRIORNCAZ | | RISDINWIMAIANDMARIMEMNCOOHMOGAVTNDNHRIORNCAZORMD | | SDINWIMAIANDMARIMEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMA | | INWIMAIANDMARIMEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUT | | WIMAIANDMARIMEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWY | | MAIANDMARIMEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWY | | IANDMARIMEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAK | | NDMARIMEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPA | | MARIMEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGA | | RIMEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVT | | MEMNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTIN | | MNCOOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWV | | COOHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMN | | OHMOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVT | | MOGAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUT | | GAVTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVT | | VTNDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISC | | NDNHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | NHRIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | RIORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | ORNCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | NCAZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | AZORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | ORMDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | MDHIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | HIMANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | MANYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | NYUTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | UTDEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | DEWYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | WYOHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | OHWYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | WYNHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | NHAKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | AKMDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | MDPAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | PAMNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | MNGAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | GAMOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | MOVTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | VTUTINWYWVIAMNAZVTIAUTWIVTRISCME | | UTINWYWVIAMNAZVTIAUTWIVTRISCME | | INWYWVIAMNAZVTIAUTWIVTRISCME | | WYWVIAMNAZVTIAUTWIVTRISCME | | WVIAMNAZVTIAUTWIVTRISCME | | IAMNAZVTIAUTWIVTRISCME | | MNAZVTIAUTWIVTRISCME | | AZVTIAUTWIVTRISCME | | VTIAUTWIVTRISCME | | IAUTWIVTRISCME | | UTWIVTRISCME | | WIVTRISCME | | VTRISCME | | RISCME | | SCME | | ME | | null | +---+ 78 rows selected (0.301 seconds) {code} Results returned by Postgres {code} postgres=# select lead(col2) over (order by col0) lead_col0 from tbl_alldata;
[jira] [Resolved] (DRILL-3605) Wrong results - Lead(char-column)
[ https://issues.apache.org/jira/browse/DRILL-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim resolved DRILL-3605. - Resolution: Fixed Fixed in private branch Wrong results - Lead(char-column) -- Key: DRILL-3605 URL: https://issues.apache.org/jira/browse/DRILL-3605 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.2.0 Environment: private-branch-with-new-window-functions Reporter: Khurram Faraaz Assignee: Deneche A. Hakim Labels: window_function Fix For: 1.2.0 Attachments: fewRowsAllData.parquet col2 is of type char(2) in the parquet file. Results returned by Drill {code} 0: jdbc:drill:schema=dfs.tmp select col2, lead(col2) over (partition by col2 order by col0) lead_col0 from `fewRowsAllData.parquet`; +--+---+ | col2 | lead_col0 | +--+---+ | AK | null | | AZ | AZCACO | | AZ | null | | CA | null | | CO | COCODEGAGAGA | | CO | CODEGAGAGAGAHI | | CO | null | | DE | null | | GA | GAGAGAHIIAIAIAIAININ | | GA | GAGAHIIAIAIAIAININININ | | GA | GAHIIAIAIAIAININININKSLA | | GA | null | | HI | null | | IA | IAIAIAININININKSLAMAMAMAMDMDME | | IA | IAIAININININKSLAMAMAMAMDMDMEMEMI | | IA | IAININININKSLAMAMAMAMDMDMEMEMIMNMN | | IA | null | | IN | INININKSLAMAMAMAMDMDMEMEMIMNMNMNMNMNMN | | IN | ININKSLAMAMAMAMDMDMEMEMIMNMNMNMNMNMNMOMO | | IN | INKSLAMAMAMAMDMDMEMEMIMNMNMNMNMNMNMOMOMONC | | IN | null | | KS | null | | LA | null | | MA | MAMAMDMDMEMEMIMNMNMNMNMNMNMOMOMONCNDNDNENHNHNHNYNY | | MA | MAMDMDMEMEMIMNMNMNMNMNMNMOMOMONCNDNDNENHNHNHNYNYNYOH | | MA | null | | MD | MDMEMEMIMNMNMNMNMNMNMOMOMONCNDNDNENHNHNHNYNYNYOHOHORORPA | | MD | null | | ME | MEMIMNMNMNMNMNMNMOMOMONCNDNDNENHNHNHNYNYNYOHOHORORPARIRIRIRI | | ME | null | | MI | null | | MN | MNMNMNMNMNMOMOMONCNDNDNENHNHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUT | | MN | MNMNMNMNMOMOMONCNDNDNENHNHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUT | | MN | MNMNMNMOMOMONCNDNDNENHNHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVT | | MN | MNMNMOMOMONCNDNDNENHNHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVT | | MN | MNMOMOMONCNDNDNENHNHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWI | | MN | null | | MO | MOMONCNDNDNENHNHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWY | | MO | MONCNDNDNENHNHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY | | MO | null | | NC | null | | ND | NDNENHNHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY | | ND | null | | NE | null | | NH | NHNHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY | | NH | NHNYNYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY | | NH | null | | NY | NYNYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY | | NY | NYOHOHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY | | NY | null | | OH | OHORORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY | | OH | null | | OR | ORPARIRIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY | | OR | null | | PA | null | | RI | RIRIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY | | RI | RIRISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY | | RI | RISCSCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY | | RI | null | | SC | SCSDSDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY | | SC | null | | SD | SDSDUTUTUTVTVTVTVTWIWIWVWYWYWYWY | | SD | SDUTUTUTVTVTVTVTWIWIWVWYWYWYWY | | SD | null | | UT | UTUTVTVTVTVTWIWIWVWYWYWYWY | | UT | UTVTVTVTVTWIWIWVWYWYWYWY | | UT | null | | VT | VTVTVTWIWIWVWYWYWYWY | | VT | VTVTWIWIWVWYWYWYWY | | VT | VTWIWIWVWYWYWYWY | | VT | null | | WI | WIWVWYWYWYWY | | WI | null | | WV | null | | WY | WYWYWY | | WY | WYWY | | WY | WY | | WY | null | +--+---+ 78 rows selected (0.307 seconds) {code} Results returned by Postgres. {code} postgres=# select col2,lead(col2) over (partition by col2 order by col0) lead_col0 from tbl_alldata; col2 | lead_col0 --+--- AK | AZ | AZ AZ | CA | CO | CO CO | CO CO | DE | GA | GA GA | GA GA | GA GA | HI | IA | IA IA | IA IA | IA IA | IN | IN IN | IN IN | IN IN | KS | LA | MA | MA MA | MA MA | MD | MD MD | ME | ME ME | MI | MN | MN MN | MN MN | MN MN | MN MN | MN MN | MO | MO MO | MO MO | NC | ND | ND ND | NE | NH | NH NH | NH NH | NY | NY NY | NY NY | OH | OH OH | OR | OR OR | PA | RI | RI RI | RI RI | RI RI | SC | SC SC | SD | SD SD | SD SD | UT | UT UT | UT UT | VT | VT VT | VT VT | VT VT | WI | WI WI | WV | WY | WY
[jira] [Created] (DRILL-3608) add support for FIRST_VALUE and LAST_VALUE
Deneche A. Hakim created DRILL-3608: --- Summary: add support for FIRST_VALUE and LAST_VALUE Key: DRILL-3608 URL: https://issues.apache.org/jira/browse/DRILL-3608 Project: Apache Drill Issue Type: Sub-task Reporter: Deneche A. Hakim -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3609) Support non default arguments for LEAD and LAG window functions
Deneche A. Hakim created DRILL-3609: --- Summary: Support non default arguments for LEAD and LAG window functions Key: DRILL-3609 URL: https://issues.apache.org/jira/browse/DRILL-3609 Project: Apache Drill Issue Type: Improvement Reporter: Deneche A. Hakim Assignee: Deneche A. Hakim Fix For: Future The current implementation of LEAD and LAG only supports the default arguments: offset = 1, default value = NULL. Extend the implementation to support non default arguments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3604) LEAD(varchar-column) returns IOB Exception
[ https://issues.apache.org/jira/browse/DRILL-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim resolved DRILL-3604. - Resolution: Fixed Fixed in private branch LEAD(varchar-column) returns IOB Exception - Key: DRILL-3604 URL: https://issues.apache.org/jira/browse/DRILL-3604 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.2.0 Environment: private branch: https://github.com/adeneche/incubator-drill/tree/new-window-funcs Reporter: Khurram Faraaz Assignee: Deneche A. Hakim Labels: window_function Fix For: 1.2.0 Query that uses LEAD(varchar-column) returns IOB Exception on developers private branch. {code} 0: jdbc:drill:schema=dfs.tmp select lead(col3) over (partition by col2 order by col0) lead_col0 from `fewRowsAllData.parquet`; java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: IndexOutOfBoundsException: index: 31668, length: 2444 (expected: range(0, 32768)) Fragment 0:0 [Error Id: a52df546-f567-4e07-aa68-96a149e413da on centos-04.qa.lab:31010] at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) at sqlline.SqlLine.print(SqlLine.java:1583) at sqlline.Commands.execute(Commands.java:852) at sqlline.Commands.sql(Commands.java:751) at sqlline.SqlLine.dispatch(SqlLine.java:738) at sqlline.SqlLine.begin(SqlLine.java:612) at sqlline.SqlLine.start(SqlLine.java:366) at sqlline.SqlLine.main(SqlLine.java:259) {code} Stack trace from /var/log/drill/sqlline.out {code} 2015-08-04 21:56:32,515 [Client-1] INFO o.a.d.j.i.DrillResultSetImpl$ResultsListener - [#24] Query failed: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IndexOutOfBoundsException: index: 31668, length: 2444 (expected: range(0, 32768)) Fragment 0:0 [Error Id: 94e70408-66b6-426d-8f57-8a782ba974c0 on centos-04.qa.lab:31010] at org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:111) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:205) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [netty-codec-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) [netty-handler-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) [netty-codec-4.0.27.Final.jar:4.0.27.Final] at
Re: Review Request 37116: DRILL-3567: Wrong result in a query with multiple window functions and different over clauses
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37116/ --- (Updated Aug. 5, 2015, 4:09 p.m.) Review request for drill and Aman Sinha. Changes --- addressed comments Bugs: DRILL-3567 https://issues.apache.org/jira/browse/DRILL-3567 Repository: drill-git Description --- Support multiple window definitions; Bump DrillCalcite version number to 1.1.0-drill-r16 Diffs (updated) - exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/UnsupportedOperatorsVisitor.java 17689ad exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java 222c7b7 pom.xml 3ec79e1 Diff: https://reviews.apache.org/r/37116/diff/ Testing --- Unit, TPCH, functional Thanks, Sean Hsuan-Yi Chu
[jira] [Created] (DRILL-3610) TimestampAdd/Diff (SQL_TSI_) functions
Andries Engelbrecht created DRILL-3610: -- Summary: TimestampAdd/Diff (SQL_TSI_) functions Key: DRILL-3610 URL: https://issues.apache.org/jira/browse/DRILL-3610 Project: Apache Drill Issue Type: Improvement Components: Functions - Drill Reporter: Andries Engelbrecht Assignee: Mehant Baid Add TimestampAdd and TimestampDiff (SQL_TSI) functions for year, quarter, month, week, day, hour, minute, second. Examples SELECT CAST(TIMESTAMPADD(SQL_TSI_QUARTER,1,Date('2013-03-31'), SQL_DATE) AS `column_quarter` FROM `table_in` HAVING (COUNT(1) 0) SELECT `table_in`.`datetime` AS `column1`, `table`.`Key` AS `column_Key`, TIMESTAMPDIFF(SQL_TSI_MINUTE,to_timestamp('2004-07-04', '-MM-dd'),`table_in`.`datetime`) AS `sum_datediff_minute` FROM `calcs` -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 37027: DRILL-3557: Reading empty CSV file fails with SYSTEM ERROR
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37027/#review94263 --- Ship it! Ship It! - Parth Chandra On Aug. 5, 2015, 3:15 p.m., Sean Hsuan-Yi Chu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37027/ --- (Updated Aug. 5, 2015, 3:15 p.m.) Review request for drill and Parth Chandra. Bugs: DRILL-3557 https://issues.apache.org/jira/browse/DRILL-3557 Repository: drill-git Description --- Ensure empty CSV's path can be added Diffs - exec/java-exec/src/main/java/org/apache/drill/exec/store/schedule/BlockMapBuilder.java e233dda exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java e8af325 exec/java-exec/src/test/resources/store/text/directoryWithEmpyCSV/empty.csv PRE-CREATION Diff: https://reviews.apache.org/r/37027/diff/ Testing --- Unit tests, functional, tpch Thanks, Sean Hsuan-Yi Chu
Re: Review Request 37116: DRILL-3567: Wrong result in a query with multiple window functions and different over clauses
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37116/#review94267 --- Ship it! Ship It! - Aman Sinha On Aug. 5, 2015, 4:09 p.m., Sean Hsuan-Yi Chu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37116/ --- (Updated Aug. 5, 2015, 4:09 p.m.) Review request for drill and Aman Sinha. Bugs: DRILL-3567 https://issues.apache.org/jira/browse/DRILL-3567 Repository: drill-git Description --- Support multiple window definitions; Bump DrillCalcite version number to 1.1.0-drill-r16 Diffs - exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/UnsupportedOperatorsVisitor.java 17689ad exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java 222c7b7 pom.xml 3ec79e1 Diff: https://reviews.apache.org/r/37116/diff/ Testing --- Unit, TPCH, functional Thanks, Sean Hsuan-Yi Chu
Re: anyone seen these errors on master ?
In that case, we probably need do binary search to figure out which recent patch is causing this problem. On Wed, Aug 5, 2015 at 10:03 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: Just got those errors on master too On Wed, Aug 5, 2015 at 9:07 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: I'm seeing those errors intermittently when building my private branch, I don't believe I made any change that would have caused them. Anyone seen them too ? testBitVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 2.043 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69) at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) at io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130) at io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116) at org.apache.drill.exec.vector.BitVector.reAlloc(BitVector.java:139) at org.apache.drill.exec.record.vector.TestValueVector.testBitVectorReallocation(TestValueVector.java:125) testFixedVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 0.436 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69) at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) at io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130) at io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116) at org.apache.drill.exec.vector.UInt4Vector.allocateBytes(UInt4Vector.java:187) at org.apache.drill.exec.vector.UInt4Vector.allocateNew(UInt4Vector.java:177) at org.apache.drill.exec.record.vector.TestValueVector.testFixedVectorReallocation(TestValueVector.java:85) testVariableVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 0.788 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69) at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) at io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130) at io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116) at org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:372) at org.apache.drill.exec.record.vector.TestValueVector.testVariableVectorReallocation(TestValueVector.java:142) Thanks -- Abdelhakim Deneche Software Engineer http://www.mapr.com/ Now Available -
Re: [DISCUSS] Publishing advanced/functional tests
@Jacques, Ted in the mean time, we risk patches being merged that have less than complete testing. While I agree with the premise of getting the tests out as soon as possible it does not help us achieve anything except transparency. Your statement that getting the tests out will increase quality is dependent on someone actually being able to run the tests once they have access to it. Maybe we should focus on making a jenkins job to run the tests publicly. With that in place we can exclude the TPC* datasets as well as the yelp data sets from the framework and avoid licensing issues. Regards Ramana On Tue, Aug 4, 2015 at 11:39 AM, Abhishek Girish abhishek.gir...@gmail.com wrote: We not only re-distribute external data-sets as-is, but also include variants for those (text - parquet, json, ...). So the challenge here is not simply disabling automatic downloads via the framework, and point users to manually download the files before running the framework, but also about how we will handle tests which require variants of the data sets. It simply isn't practical to users of the framework to (1) download data-gen manually (2) use specific seed / options before generating data, (3) convert them to parquet, etc.. (4) move them to specific locations inside their copy of the framework. Something we'll need to know is how other projects are handling bench-mark other external datasets. -Abhishek On Tue, Aug 4, 2015 at 11:23 AM, rahul challapalli challapallira...@gmail.com wrote: Thanks for your inputs. Once issue with just publishing the tests in their current state is that, the framework re-distributes tpch, tpcds, yelp data sets without requiring the users to accept their relevant licenses. A good number of tests uses these data sets. Any thoughts on how to handle this? - Rahul On Wed, Jul 29, 2015 at 12:07 AM, Ted Dunning ted.dunn...@gmail.com wrote: +1. Get it out there. On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau jacq...@dremio.com wrote: Hey Rahul, My suggestion would be to the lower bar--do the absolute bare minimum to get the tests out there. For example, simply remove proprietary information and then get it on a public github (whether your personal github or a corporate one). From there, people can help by submitting pull requests to improve the infrastructure and harness. Making things easier is something that can be done over time. For example, we've had offers from a couple different Linux Admins to help on something. I'm sure that they could help with a number of the items you've identified. In the mean time, we risk patches being merged that have less than complete testing. -- Jacques Nadeau CTO and Co-Founder, Dremio On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli challapallira...@gmail.com wrote: Jacques, I am breaking down steps 1,2 3 into sub-tasks so we can add/prioritize these tasks Item #TaskSub-TaskCommentsPriority1*Publish the tests* Remove Proprietary Data Queries 0 Redact Propriety Data/Queries Move tests into drill repo This requires some refactoring to the framework code since the test framework uses a 2-level directory structure Organize the tests using a label based approach This involves code changes and moving a lot of files. When doing a one time push it might be better to do this before publishing the tests? Each suite should be independentSome suites wrongly assume that the data is present. They should be identified and fixed Cleanup hardcoded dependencies during data generationSome data-gen scripts have hard-coded references Cleanup downloadsThe same dataset is being downloaded multiple times by different suites Licenses for downloadsThe framework downloads some files automatically. These files are publicly available. However before downloading them users need to agree to certain terms. By using the framework users might be skipping this step. We should look into this 2*Setup a cluster infrastructure to run the pre-commit tests* 3*Local debugging of tests* Add an optional maven target for running tests on a local machine Tests can launch an embedded drillbit or they can connect to a running drillbit through zookeeper Running suites which require additional setup (hive, hbase etc) should be made optional 4*Documentation* Running Tests (options available and also listing the asumed defaults) Explaining how tests are organized Process for adding a new suite On Fri, Jul 24,
Re: [DISCUSS] Drop table support
I agree, it is definitely restrictive. We can lift the restriction for being able to drop a table (when security is off) only if the Drill user owns it. I think the check for homogenous files should give us enough confidence that we are not deleting a non Drill directory. Thanks Mehant On 8/4/15 10:00 PM, Neeraja Rentachintala wrote: Ted, thats fair point on the recovery part. Regarding the other point by Mehant (copied below) ,there is an implication that user can drop only Drill managed tables (i.e created as Drill user) when security is not enabled. I think this check is too restrictive (also unintuitive). Drill doesn't have the concept of external/managed tables and a user (impersonated user if security is enabled or Drillbit service user if no security is enabled) should be able to drop the table if they have permissions to do so. The above design proposes a check to verify if the files that need to be deleted are readable by Drill and I believe is a good validation to have. /The above check is in the case when security is not enabled. Meaning we are executing as the Drill user. If we are running as the Drill user (which might be root or a super user) its likely that this user has permissions to delete most files and checking for permissions might not suffice. So when security isn't enabled the proposal is to delete only those files that are owned (created) by the Drill user./ On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala nrentachint...@maprtech.com wrote: Also will there any mechanism to recover once you accidentally drop? yes. Snapshots https://www.mapr.com/resources/videos/mapr-snapshots. Seriously, recovery of data due to user error is a platform thing. How can we recover from turning off the cluster? From removing a disk on an Oracle node? I don't think that this is Drill's business.
[DISCUSS] Publishing advanced/functional tests
Ramana, I think the issue with licenses is mostly resolved. It was discussed that for TPC-*, since we shall not be redistributing the data-gen software, but distributing a randomized variant of the data generated by it, we should be okay to include it part of our framework. For other datasets, we shall either provide their copy of license with our framework, or simply provide a link for users to download data before they execute. For now we should focus on having the framework out with minimal cleanup. In near future we can work on setting up infrastructure and enhancing the framework itself. -Abhishek On Wed, Aug 5, 2015 at 10:46 AM, Ramana I N inram...@gmail.com javascript:_e(%7B%7D,'cvml','inram...@gmail.com'); wrote: @Jacques, Ted in the mean time, we risk patches being merged that have less than complete testing. While I agree with the premise of getting the tests out as soon as possible it does not help us achieve anything except transparency. Your statement that getting the tests out will increase quality is dependent on someone actually being able to run the tests once they have access to it. Maybe we should focus on making a jenkins job to run the tests publicly. With that in place we can exclude the TPC* datasets as well as the yelp data sets from the framework and avoid licensing issues. Regards Ramana On Tue, Aug 4, 2015 at 11:39 AM, Abhishek Girish abhishek.gir...@gmail.com javascript:_e(%7B%7D,'cvml','abhishek.gir...@gmail.com'); wrote: We not only re-distribute external data-sets as-is, but also include variants for those (text - parquet, json, ...). So the challenge here is not simply disabling automatic downloads via the framework, and point users to manually download the files before running the framework, but also about how we will handle tests which require variants of the data sets. It simply isn't practical to users of the framework to (1) download data-gen manually (2) use specific seed / options before generating data, (3) convert them to parquet, etc.. (4) move them to specific locations inside their copy of the framework. Something we'll need to know is how other projects are handling bench-mark other external datasets. -Abhishek On Tue, Aug 4, 2015 at 11:23 AM, rahul challapalli challapallira...@gmail.com javascript:_e(%7B%7D,'cvml','challapallira...@gmail.com'); wrote: Thanks for your inputs. Once issue with just publishing the tests in their current state is that, the framework re-distributes tpch, tpcds, yelp data sets without requiring the users to accept their relevant licenses. A good number of tests uses these data sets. Any thoughts on how to handle this? - Rahul On Wed, Jul 29, 2015 at 12:07 AM, Ted Dunning ted.dunn...@gmail.com javascript:_e(%7B%7D,'cvml','ted.dunn...@gmail.com'); wrote: +1. Get it out there. On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau jacq...@dremio.com javascript:_e(%7B%7D,'cvml','jacq...@dremio.com'); wrote: Hey Rahul, My suggestion would be to the lower bar--do the absolute bare minimum to get the tests out there. For example, simply remove proprietary information and then get it on a public github (whether your personal github or a corporate one). From there, people can help by submitting pull requests to improve the infrastructure and harness. Making things easier is something that can be done over time. For example, we've had offers from a couple different Linux Admins to help on something. I'm sure that they could help with a number of the items you've identified. In the mean time, we risk patches being merged that have less than complete testing. -- Jacques Nadeau CTO and Co-Founder, Dremio On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli challapallira...@gmail.com javascript:_e(%7B%7D,'cvml','challapallira...@gmail.com'); wrote: Jacques, I am breaking down steps 1,2 3 into sub-tasks so we can add/prioritize these tasks Item #TaskSub-TaskCommentsPriority1*Publish the tests* Remove Proprietary Data Queries 0 Redact Propriety Data/Queries Move tests into drill repo This requires some refactoring to the framework code since the test framework uses a 2-level directory structure Organize the tests using a label based approach This involves code changes and moving a lot of files. When doing a one time push it might be better to do this before publishing the tests? Each suite should be independentSome suites wrongly assume that the data is present. They should be identified and fixed Cleanup hardcoded dependencies during data generationSome
Re: [DISCUSS] Drop table support
The homogenous check- Will it be just checking for types are homogenous or if they are actually types that can be read by drill? Also, is there a good way to determine if a file can be read by drill? And will there be a perf hit if there are large number of files? Regards Ramana On Wed, Aug 5, 2015 at 11:03 AM, Mehant Baid baid.meh...@gmail.com wrote: I agree, it is definitely restrictive. We can lift the restriction for being able to drop a table (when security is off) only if the Drill user owns it. I think the check for homogenous files should give us enough confidence that we are not deleting a non Drill directory. Thanks Mehant On 8/4/15 10:00 PM, Neeraja Rentachintala wrote: Ted, thats fair point on the recovery part. Regarding the other point by Mehant (copied below) ,there is an implication that user can drop only Drill managed tables (i.e created as Drill user) when security is not enabled. I think this check is too restrictive (also unintuitive). Drill doesn't have the concept of external/managed tables and a user (impersonated user if security is enabled or Drillbit service user if no security is enabled) should be able to drop the table if they have permissions to do so. The above design proposes a check to verify if the files that need to be deleted are readable by Drill and I believe is a good validation to have. /The above check is in the case when security is not enabled. Meaning we are executing as the Drill user. If we are running as the Drill user (which might be root or a super user) its likely that this user has permissions to delete most files and checking for permissions might not suffice. So when security isn't enabled the proposal is to delete only those files that are owned (created) by the Drill user./ On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala nrentachint...@maprtech.com wrote: Also will there any mechanism to recover once you accidentally drop? yes. Snapshots https://www.mapr.com/resources/videos/mapr-snapshots. Seriously, recovery of data due to user error is a platform thing. How can we recover from turning off the cluster? From removing a disk on an Oracle node? I don't think that this is Drill's business.
Re: [DISCUSS] Drop table support
Sorry, Did not realize you had covered that as part of the original discussion. Looks like a sound mechanism. Regards Ramana On Wed, Aug 5, 2015 at 11:09 AM, Ramana I N inram...@gmail.com wrote: The homogenous check- Will it be just checking for types are homogenous or if they are actually types that can be read by drill? Also, is there a good way to determine if a file can be read by drill? And will there be a perf hit if there are large number of files? Regards Ramana On Wed, Aug 5, 2015 at 11:03 AM, Mehant Baid baid.meh...@gmail.com wrote: I agree, it is definitely restrictive. We can lift the restriction for being able to drop a table (when security is off) only if the Drill user owns it. I think the check for homogenous files should give us enough confidence that we are not deleting a non Drill directory. Thanks Mehant On 8/4/15 10:00 PM, Neeraja Rentachintala wrote: Ted, thats fair point on the recovery part. Regarding the other point by Mehant (copied below) ,there is an implication that user can drop only Drill managed tables (i.e created as Drill user) when security is not enabled. I think this check is too restrictive (also unintuitive). Drill doesn't have the concept of external/managed tables and a user (impersonated user if security is enabled or Drillbit service user if no security is enabled) should be able to drop the table if they have permissions to do so. The above design proposes a check to verify if the files that need to be deleted are readable by Drill and I believe is a good validation to have. /The above check is in the case when security is not enabled. Meaning we are executing as the Drill user. If we are running as the Drill user (which might be root or a super user) its likely that this user has permissions to delete most files and checking for permissions might not suffice. So when security isn't enabled the proposal is to delete only those files that are owned (created) by the Drill user./ On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala nrentachint...@maprtech.com wrote: Also will there any mechanism to recover once you accidentally drop? yes. Snapshots https://www.mapr.com/resources/videos/mapr-snapshots . Seriously, recovery of data due to user error is a platform thing. How can we recover from turning off the cluster? From removing a disk on an Oracle node? I don't think that this is Drill's business.
Re: [DISCUSS] Drop table support
Is any check really necessary? Can't we just say that for data sources that are file-like that drop is a rough synonym for rm? If you have permission to remove files and directories, you can do it. If you don't, it will fail, possibly half done. I have never seen a bug filed against rm to add more elaborate semantics, so why is it so necessary for Drill to have elaborate semantics here? On Wed, Aug 5, 2015 at 11:09 AM, Ramana I N inram...@gmail.com wrote: The homogenous check- Will it be just checking for types are homogenous or if they are actually types that can be read by drill? Also, is there a good way to determine if a file can be read by drill? And will there be a perf hit if there are large number of files? Regards Ramana On Wed, Aug 5, 2015 at 11:03 AM, Mehant Baid baid.meh...@gmail.com wrote: I agree, it is definitely restrictive. We can lift the restriction for being able to drop a table (when security is off) only if the Drill user owns it. I think the check for homogenous files should give us enough confidence that we are not deleting a non Drill directory. Thanks Mehant On 8/4/15 10:00 PM, Neeraja Rentachintala wrote: Ted, thats fair point on the recovery part. Regarding the other point by Mehant (copied below) ,there is an implication that user can drop only Drill managed tables (i.e created as Drill user) when security is not enabled. I think this check is too restrictive (also unintuitive). Drill doesn't have the concept of external/managed tables and a user (impersonated user if security is enabled or Drillbit service user if no security is enabled) should be able to drop the table if they have permissions to do so. The above design proposes a check to verify if the files that need to be deleted are readable by Drill and I believe is a good validation to have. /The above check is in the case when security is not enabled. Meaning we are executing as the Drill user. If we are running as the Drill user (which might be root or a super user) its likely that this user has permissions to delete most files and checking for permissions might not suffice. So when security isn't enabled the proposal is to delete only those files that are owned (created) by the Drill user./ On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala nrentachint...@maprtech.com wrote: Also will there any mechanism to recover once you accidentally drop? yes. Snapshots https://www.mapr.com/resources/videos/mapr-snapshots . Seriously, recovery of data due to user error is a platform thing. How can we recover from turning off the cluster? From removing a disk on an Oracle node? I don't think that this is Drill's business.
Re: [DISCUSS] Drop table support
What you are suggesting makes sense in the case when security is enabled. So when Drill is accessing the file system it will impersonate the user who issued the command and drop will happen if the user has sufficient permissions. However when security isn't enabled, Drill will be accessing the file system as the Drill user itself which is most likely to be a super user who has permissions to delete most files. To prevent any catastrophic drops checking for homogenous file formats makes sure that at least the directory being dropped is something that can be read by Drill. This will prevent any accidental drops (like dropping the home directory etc, because its likely to have file formats that cannot be read by Drill). This will not prevent against malicious behavior (for handling this security should be enabled). Thanks Mehant On 8/5/15 11:43 AM, Ted Dunning wrote: Is any check really necessary? Can't we just say that for data sources that are file-like that drop is a rough synonym for rm? If you have permission to remove files and directories, you can do it. If you don't, it will fail, possibly half done. I have never seen a bug filed against rm to add more elaborate semantics, so why is it so necessary for Drill to have elaborate semantics here? On Wed, Aug 5, 2015 at 11:09 AM, Ramana I N inram...@gmail.com wrote: The homogenous check- Will it be just checking for types are homogenous or if they are actually types that can be read by drill? Also, is there a good way to determine if a file can be read by drill? And will there be a perf hit if there are large number of files? Regards Ramana On Wed, Aug 5, 2015 at 11:03 AM, Mehant Baid baid.meh...@gmail.com wrote: I agree, it is definitely restrictive. We can lift the restriction for being able to drop a table (when security is off) only if the Drill user owns it. I think the check for homogenous files should give us enough confidence that we are not deleting a non Drill directory. Thanks Mehant On 8/4/15 10:00 PM, Neeraja Rentachintala wrote: Ted, thats fair point on the recovery part. Regarding the other point by Mehant (copied below) ,there is an implication that user can drop only Drill managed tables (i.e created as Drill user) when security is not enabled. I think this check is too restrictive (also unintuitive). Drill doesn't have the concept of external/managed tables and a user (impersonated user if security is enabled or Drillbit service user if no security is enabled) should be able to drop the table if they have permissions to do so. The above design proposes a check to verify if the files that need to be deleted are readable by Drill and I believe is a good validation to have. /The above check is in the case when security is not enabled. Meaning we are executing as the Drill user. If we are running as the Drill user (which might be root or a super user) its likely that this user has permissions to delete most files and checking for permissions might not suffice. So when security isn't enabled the proposal is to delete only those files that are owned (created) by the Drill user./ On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala nrentachint...@maprtech.com wrote: Also will there any mechanism to recover once you accidentally drop? yes. Snapshots https://www.mapr.com/resources/videos/mapr-snapshots . Seriously, recovery of data due to user error is a platform thing. How can we recover from turning off the cluster? From removing a disk on an Oracle node? I don't think that this is Drill's business.
Re: [DISCUSS] Insert into Table support
Re #7 in the original post Select table syntax can specify constant values for one or more columns: I would have assumed the select list can have any expressions that can be evaluated on a row from the source; that includes columns, expressions on columns, or constants. It's probably not your intent, but the stated form implies that all I get are column values and constants. Which is it? On Mon, Jul 27, 2015 at 5:40 PM, Mehant Baid baid.meh...@gmail.com wrote: I wanted to start a conversation around supporting the Insert into Table feature. As of 1.2 we initially want to support inserting into a table with Parquet files. Support for Json, CSV and other sources will follow as future enhancements. Aman, Jinfeng, Neeraja and I had an initial discussion about this and Neeraja provided a good summary of our discussion (pasted below) also stating some of the requirements for this feature. A ) Support Insert into a non-partitioned table - Ex: INSERT INTO T1 [col1, col2, col3] SELECT col4, col5, col6 from T2 (Source table: T2, Target table T1) Requirements: 1. Target table column list specification is optional for Insert statement 2. When specified, the column list in the Insert statement should contain all the columns present in the target table (i.e No support for partial insert) 3. The column names specified for the source table do not need to match to the target table column names. Match is performed based on ordinal. 4. # of Source table columns specified must be same as # of target table columns 5. Types of specified source table columns must match to the types of target table columns 6. Specification of * is not allowed in the Select table syntax 7. Select table syntax can specify constant values for one or more columns B ) Support insert into a partitioned table -- Ex: INSERT INTO T1 col1, col2,col3 partition by col1,col2 SELECT col4,col,col6 from T2 * Target column specification is required when inserting data into an already partitioned table * Requirements A.3-A.7 above apply for insert into partitioned tables as well * A partition by clause along with one or more columns is required * All the columns specified in partition by clause must exist in the target column list * Partition by columns specified do not need to match to the list of columns that the original table partitioned with (i.e if the original table is partitioned with col1, col2, new data during insert can be partitioned by col3 or just with col1 or col2..) Couple of open questions from the design perspective are 1. How do we perform validation. Validation of data types, number of columns being inserted etc. In addition to validation we need to make sure that when we insert into an existing tables we insert data with the existing column names (select column list can have different names). This poses problems around needing to know the metadata at planning time, two approaches that have been floating around are * DotDrill files: We can store metadata, partitioning columns and other useful information here and we can perform validation during planning time. However the challenges with introducing DotDrill files include - consistency between metadata and the actual data (Nothing preventing users to copy files directly). - security around DotDrill files (can be dealt in the same way we perform security checks for drill tables in hdfs) - interface to change the DotDrill file, in the case we need to add a column to the table or add a new partition etc. * Explicit Syntax/ No metadata approach: Another approach is to avoid DotDrill files and use explicit syntax to glean as much information as possible from the SQL statement itself. Some of the challenges with this approach are - Gathering metadata information: Since we have no idea what the existing schema is we would need to perform a mini scan to learn the schema at planning time to be able to perform some validation. The problem with this approach is how do we determine how many files we need to read in order to learn the schema? If we use a sample set and not all the files have the same schema, we could have non-deterministic results based on the sample of files read. Also reading all the files and merging the schema seems like an expensive cost to pay. - From the user's perspective, while inserting into a partitioned table, user will have to specify the partitioning columns again in the Insert statement, despite having specified the partition columns in the CTAS. 2. What is a reasonable assumption for a Drill table in terms of changing schema. Having the same exact schema for all files in a table is too rigid an assumption at this point? One thing to remember with DotDrill file is to also the repercussions on Drop table,
Re: [DISCUSS] Drop table support
On Wed, Aug 5, 2015 at 11:54 AM, Mehant Baid baid.meh...@gmail.com wrote: To prevent any catastrophic drops checking for homogenous file formats makes sure that at least the directory being dropped is something that can be read by Drill. Or we could just disable drop unless permissions can be enforced.
Re: anyone seen these errors on master ?
Did you tighten your memory settings? How many forks are you running with? I bet you are truly running out of memory while executing this particular test case. -H+ On Wed, Aug 5, 2015 at 8:56 PM, Sudheesh Katkam skat...@maprtech.com wrote: b2bbd99 committed on July 6th introduced the test. On Aug 5, 2015, at 10:21 AM, Jinfeng Ni jinfengn...@gmail.com wrote: In that case, we probably need do binary search to figure out which recent patch is causing this problem. On Wed, Aug 5, 2015 at 10:03 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: Just got those errors on master too On Wed, Aug 5, 2015 at 9:07 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: I'm seeing those errors intermittently when building my private branch, I don't believe I made any change that would have caused them. Anyone seen them too ? testBitVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 2.043 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69) at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) at io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130) at io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116) at org.apache.drill.exec.vector.BitVector.reAlloc(BitVector.java:139) at org.apache.drill.exec.record.vector.TestValueVector.testBitVectorReallocation(TestValueVector.java:125) testFixedVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 0.436 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69) at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) at io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130) at io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116) at org.apache.drill.exec.vector.UInt4Vector.allocateBytes(UInt4Vector.java:187) at org.apache.drill.exec.vector.UInt4Vector.allocateNew(UInt4Vector.java:177) at org.apache.drill.exec.record.vector.TestValueVector.testFixedVectorReallocation(TestValueVector.java:85) testVariableVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 0.788 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69) at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) at io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130) at io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171) at
Re: [DISCUSS] Drop table support
I think enabling drop only when security is enabled is too restrictive. On Wed, Aug 5, 2015 at 12:46 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Wed, Aug 5, 2015 at 11:54 AM, Mehant Baid baid.meh...@gmail.com wrote: To prevent any catastrophic drops checking for homogenous file formats makes sure that at least the directory being dropped is something that can be read by Drill. Or we could just disable drop unless permissions can be enforced.
Re: anyone seen these errors on master ?
I also saw this failure running the tests on my linux vm, the only changed setting is the -PlargeTests flag, memory and fork count are defaults. All of the tests pass on my mac with default settings, no flags. On Wed, Aug 5, 2015 at 1:03 PM, Hanifi Gunes hgu...@maprtech.com wrote: Did you tighten your memory settings? How many forks are you running with? I bet you are truly running out of memory while executing this particular test case. -H+ On Wed, Aug 5, 2015 at 8:56 PM, Sudheesh Katkam skat...@maprtech.com wrote: b2bbd99 committed on July 6th introduced the test. On Aug 5, 2015, at 10:21 AM, Jinfeng Ni jinfengn...@gmail.com wrote: In that case, we probably need do binary search to figure out which recent patch is causing this problem. On Wed, Aug 5, 2015 at 10:03 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: Just got those errors on master too On Wed, Aug 5, 2015 at 9:07 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: I'm seeing those errors intermittently when building my private branch, I don't believe I made any change that would have caused them. Anyone seen them too ? testBitVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 2.043 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69) at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) at io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130) at io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116) at org.apache.drill.exec.vector.BitVector.reAlloc(BitVector.java:139) at org.apache.drill.exec.record.vector.TestValueVector.testBitVectorReallocation(TestValueVector.java:125) testFixedVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 0.436 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69) at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) at io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130) at io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116) at org.apache.drill.exec.vector.UInt4Vector.allocateBytes(UInt4Vector.java:187) at org.apache.drill.exec.vector.UInt4Vector.allocateNew(UInt4Vector.java:177) at org.apache.drill.exec.record.vector.TestValueVector.testFixedVectorReallocation(TestValueVector.java:85) testVariableVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 0.788 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) at
Re: [DISCUSS] Drop table support
Another question/comment. Does Drill need to manage concurrency for the Drop table i.e how do you deal with users trying to read the data while somebody is dropping. Does it need to implement some kind of locking. I have some thoughts on that but would like to know others think - Drill is not (yet) a transactional system but rather an interactive query layer on variety of stores. The couple of most common use cases I can think of in this context are - a user doing analytics/exploration and as part of it he would create some intermediate tables, insert data into them and drop the tables or BI tools generating these intermediate tables for processing queries. Both these do not have the concurrency issue.. Additionally given that the data is externally managed, there could always be other processes adding and deleting files and Drill doesn't even have control over them. Overall, I think the first phase of DROP implementation might be ok not to have these locking/concurrency checks. Thoughts? -Neeraja On Wed, Aug 5, 2015 at 11:54 AM, Mehant Baid baid.meh...@gmail.com wrote: What you are suggesting makes sense in the case when security is enabled. So when Drill is accessing the file system it will impersonate the user who issued the command and drop will happen if the user has sufficient permissions. However when security isn't enabled, Drill will be accessing the file system as the Drill user itself which is most likely to be a super user who has permissions to delete most files. To prevent any catastrophic drops checking for homogenous file formats makes sure that at least the directory being dropped is something that can be read by Drill. This will prevent any accidental drops (like dropping the home directory etc, because its likely to have file formats that cannot be read by Drill). This will not prevent against malicious behavior (for handling this security should be enabled). Thanks Mehant On 8/5/15 11:43 AM, Ted Dunning wrote: Is any check really necessary? Can't we just say that for data sources that are file-like that drop is a rough synonym for rm? If you have permission to remove files and directories, you can do it. If you don't, it will fail, possibly half done. I have never seen a bug filed against rm to add more elaborate semantics, so why is it so necessary for Drill to have elaborate semantics here? On Wed, Aug 5, 2015 at 11:09 AM, Ramana I N inram...@gmail.com wrote: The homogenous check- Will it be just checking for types are homogenous or if they are actually types that can be read by drill? Also, is there a good way to determine if a file can be read by drill? And will there be a perf hit if there are large number of files? Regards Ramana On Wed, Aug 5, 2015 at 11:03 AM, Mehant Baid baid.meh...@gmail.com wrote: I agree, it is definitely restrictive. We can lift the restriction for being able to drop a table (when security is off) only if the Drill user owns it. I think the check for homogenous files should give us enough confidence that we are not deleting a non Drill directory. Thanks Mehant On 8/4/15 10:00 PM, Neeraja Rentachintala wrote: Ted, thats fair point on the recovery part. Regarding the other point by Mehant (copied below) ,there is an implication that user can drop only Drill managed tables (i.e created as Drill user) when security is not enabled. I think this check is too restrictive (also unintuitive). Drill doesn't have the concept of external/managed tables and a user (impersonated user if security is enabled or Drillbit service user if no security is enabled) should be able to drop the table if they have permissions to do so. The above design proposes a check to verify if the files that need to be deleted are readable by Drill and I believe is a good validation to have. /The above check is in the case when security is not enabled. Meaning we are executing as the Drill user. If we are running as the Drill user (which might be root or a super user) its likely that this user has permissions to delete most files and checking for permissions might not suffice. So when security isn't enabled the proposal is to delete only those files that are owned (created) by the Drill user./ On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala nrentachint...@maprtech.com wrote: Also will there any mechanism to recover once you accidentally drop? yes. Snapshots https://www.mapr.com/resources/videos/mapr-snapshots . Seriously, recovery of data due to user error is a platform thing. How can we recover from turning off the cluster? From removing a disk on an Oracle node? I don't think that this is Drill's business.
Re: anyone seen these errors on master ?
I didn't make any change, I'm running 2 forks (the default). I got those errors 3 times now, 2 on a linux VM and 1 on a linux physical node On Wed, Aug 5, 2015 at 1:03 PM, Hanifi Gunes hgu...@maprtech.com wrote: Did you tighten your memory settings? How many forks are you running with? I bet you are truly running out of memory while executing this particular test case. -H+ On Wed, Aug 5, 2015 at 8:56 PM, Sudheesh Katkam skat...@maprtech.com wrote: b2bbd99 committed on July 6th introduced the test. On Aug 5, 2015, at 10:21 AM, Jinfeng Ni jinfengn...@gmail.com wrote: In that case, we probably need do binary search to figure out which recent patch is causing this problem. On Wed, Aug 5, 2015 at 10:03 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: Just got those errors on master too On Wed, Aug 5, 2015 at 9:07 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: I'm seeing those errors intermittently when building my private branch, I don't believe I made any change that would have caused them. Anyone seen them too ? testBitVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 2.043 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69) at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) at io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130) at io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116) at org.apache.drill.exec.vector.BitVector.reAlloc(BitVector.java:139) at org.apache.drill.exec.record.vector.TestValueVector.testBitVectorReallocation(TestValueVector.java:125) testFixedVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 0.436 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69) at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) at io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130) at io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116) at org.apache.drill.exec.vector.UInt4Vector.allocateBytes(UInt4Vector.java:187) at org.apache.drill.exec.vector.UInt4Vector.allocateNew(UInt4Vector.java:177) at org.apache.drill.exec.record.vector.TestValueVector.testFixedVectorReallocation(TestValueVector.java:85) testVariableVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 0.788 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69) at
Re: anyone seen these errors on master ?
I don't seem to be able to re-prod this. Let me look at this and update you all. On Thu, Aug 6, 2015 at 12:03 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: I didn't make any change, I'm running 2 forks (the default). I got those errors 3 times now, 2 on a linux VM and 1 on a linux physical node On Wed, Aug 5, 2015 at 1:03 PM, Hanifi Gunes hgu...@maprtech.com wrote: Did you tighten your memory settings? How many forks are you running with? I bet you are truly running out of memory while executing this particular test case. -H+ On Wed, Aug 5, 2015 at 8:56 PM, Sudheesh Katkam skat...@maprtech.com wrote: b2bbd99 committed on July 6th introduced the test. On Aug 5, 2015, at 10:21 AM, Jinfeng Ni jinfengn...@gmail.com wrote: In that case, we probably need do binary search to figure out which recent patch is causing this problem. On Wed, Aug 5, 2015 at 10:03 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: Just got those errors on master too On Wed, Aug 5, 2015 at 9:07 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: I'm seeing those errors intermittently when building my private branch, I don't believe I made any change that would have caused them. Anyone seen them too ? testBitVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 2.043 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69) at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) at io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130) at io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116) at org.apache.drill.exec.vector.BitVector.reAlloc(BitVector.java:139) at org.apache.drill.exec.record.vector.TestValueVector.testBitVectorReallocation(TestValueVector.java:125) testFixedVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 0.436 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69) at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) at io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130) at io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116) at org.apache.drill.exec.vector.UInt4Vector.allocateBytes(UInt4Vector.java:187) at org.apache.drill.exec.vector.UInt4Vector.allocateNew(UInt4Vector.java:177) at org.apache.drill.exec.record.vector.TestValueVector.testFixedVectorReallocation(TestValueVector.java:85) testVariableVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 0.788 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at
Re: anyone seen these errors on master ?
Given that the difference is just java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException The question of what constitutes an oversized allocation? comes to mind. Is this test fragile relative to being run in different environments? I haven't seen the test so how is the determination that something is oversized made? It seems like that criterion sometimes fails, and we get an OOM because whatever the request is is still very large. On Wed, Aug 5, 2015 at 2:26 PM, Hanifi Gunes hgu...@maprtech.com wrote: I don't seem to be able to re-prod this. Let me look at this and update you all. On Thu, Aug 6, 2015 at 12:03 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: I didn't make any change, I'm running 2 forks (the default). I got those errors 3 times now, 2 on a linux VM and 1 on a linux physical node On Wed, Aug 5, 2015 at 1:03 PM, Hanifi Gunes hgu...@maprtech.com wrote: Did you tighten your memory settings? How many forks are you running with? I bet you are truly running out of memory while executing this particular test case. -H+ On Wed, Aug 5, 2015 at 8:56 PM, Sudheesh Katkam skat...@maprtech.com wrote: b2bbd99 committed on July 6th introduced the test. On Aug 5, 2015, at 10:21 AM, Jinfeng Ni jinfengn...@gmail.com wrote: In that case, we probably need do binary search to figure out which recent patch is causing this problem. On Wed, Aug 5, 2015 at 10:03 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: Just got those errors on master too On Wed, Aug 5, 2015 at 9:07 AM, Abdel Hakim Deneche adene...@maprtech.com wrote: I'm seeing those errors intermittently when building my private branch, I don't believe I made any change that would have caused them. Anyone seen them too ? testBitVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 2.043 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69) at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) at io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130) at io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:100) at org.apache.drill.exec.memory.TopLevelAllocator.buffer(TopLevelAllocator.java:116) at org.apache.drill.exec.vector.BitVector.reAlloc(BitVector.java:139) at org.apache.drill.exec.record.vector.TestValueVector.testBitVectorReallocation(TestValueVector.java:125) testFixedVectorReallocation(org.apache.drill.exec.record.vector.TestValueVector) Time elapsed: 0.436 sec ERROR! java.lang.Exception: Unexpected exception, expectedorg.apache.drill.exec.exception.OversizedAllocationException but wasorg.apache.drill.exec.memory.OutOfMemoryRuntimeException at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.init(UnpooledUnsafeDirectByteBuf.java:69) at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) at io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:130) at io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:171) at
Re: [DISCUSS] Insert into Table support
@Jacques.. Since the topic of metadata caching is closely related, the main issue is how is the metadata file maintained in the presence of either concurrent INSERTs or concurrent REFRESH METADATA operations ? One could maintain multiple versions of the metadata file or keep the version information inside a single metadata file. Is that what you were thinking when you mention Lucene's index versioning ? (I am not familiar with lucene's versioning). Aman On Wed, Aug 5, 2015 at 12:02 PM, Chris Westin chriswesti...@gmail.com wrote: Re #7 in the original post Select table syntax can specify constant values for one or more columns: I would have assumed the select list can have any expressions that can be evaluated on a row from the source; that includes columns, expressions on columns, or constants. It's probably not your intent, but the stated form implies that all I get are column values and constants. Which is it? On Mon, Jul 27, 2015 at 5:40 PM, Mehant Baid baid.meh...@gmail.com wrote: I wanted to start a conversation around supporting the Insert into Table feature. As of 1.2 we initially want to support inserting into a table with Parquet files. Support for Json, CSV and other sources will follow as future enhancements. Aman, Jinfeng, Neeraja and I had an initial discussion about this and Neeraja provided a good summary of our discussion (pasted below) also stating some of the requirements for this feature. A ) Support Insert into a non-partitioned table - Ex: INSERT INTO T1 [col1, col2, col3] SELECT col4, col5, col6 from T2 (Source table: T2, Target table T1) Requirements: 1. Target table column list specification is optional for Insert statement 2. When specified, the column list in the Insert statement should contain all the columns present in the target table (i.e No support for partial insert) 3. The column names specified for the source table do not need to match to the target table column names. Match is performed based on ordinal. 4. # of Source table columns specified must be same as # of target table columns 5. Types of specified source table columns must match to the types of target table columns 6. Specification of * is not allowed in the Select table syntax 7. Select table syntax can specify constant values for one or more columns B ) Support insert into a partitioned table -- Ex: INSERT INTO T1 col1, col2,col3 partition by col1,col2 SELECT col4,col,col6 from T2 * Target column specification is required when inserting data into an already partitioned table * Requirements A.3-A.7 above apply for insert into partitioned tables as well * A partition by clause along with one or more columns is required * All the columns specified in partition by clause must exist in the target column list * Partition by columns specified do not need to match to the list of columns that the original table partitioned with (i.e if the original table is partitioned with col1, col2, new data during insert can be partitioned by col3 or just with col1 or col2..) Couple of open questions from the design perspective are 1. How do we perform validation. Validation of data types, number of columns being inserted etc. In addition to validation we need to make sure that when we insert into an existing tables we insert data with the existing column names (select column list can have different names). This poses problems around needing to know the metadata at planning time, two approaches that have been floating around are * DotDrill files: We can store metadata, partitioning columns and other useful information here and we can perform validation during planning time. However the challenges with introducing DotDrill files include - consistency between metadata and the actual data (Nothing preventing users to copy files directly). - security around DotDrill files (can be dealt in the same way we perform security checks for drill tables in hdfs) - interface to change the DotDrill file, in the case we need to add a column to the table or add a new partition etc. * Explicit Syntax/ No metadata approach: Another approach is to avoid DotDrill files and use explicit syntax to glean as much information as possible from the SQL statement itself. Some of the challenges with this approach are - Gathering metadata information: Since we have no idea what the existing schema is we would need to perform a mini scan to learn the schema at planning time to be able to perform some validation. The problem with this approach is how do we determine how many files we need to read in order to learn the schema? If we use a sample set and not all the files have the same schema, we could have
[jira] [Created] (DRILL-3611) Drill/client unstable in connection-closed state
Daniel Barclay (Drill) created DRILL-3611: - Summary: Drill/client unstable in connection-closed state Key: DRILL-3611 URL: https://issues.apache.org/jira/browse/DRILL-3611 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) When Drill and/or a client get into the state in which the client reports that the connection is closed, the error messages are not stable. In the following (a series of empty queries executed about a half a second apart), notice how sometimes the exception is a CONNECTION ERROR: ... closed unexpectedly exception and sometimes it is a SYSTEM ERROR: ChannelClosedException exception: {noformat} 0: jdbc:drill: 0: jdbc:drill: ; Error: CONNECTION ERROR: Connection /127.0.0.1:46726 -- /127.0.0.1:31010 (user client) closed unexpectedly. [Error Id: 0848c18e-64e9-41e2-90d9-3a0ffaebc14e ] (state=,code=0) 0: jdbc:drill: ; Error: SYSTEM ERROR: ChannelClosedException [Error Id: b465b0e7-55a2-4ef6-ad0e-01258468f4e7 ] (state=,code=0) 0: jdbc:drill: ; Error: SYSTEM ERROR: ChannelClosedException [Error Id: 0b50a10c-42eb-47b6-bc3d-9a42afe4cd28 ] (state=,code=0) 0: jdbc:drill: ; Error: SYSTEM ERROR: ChannelClosedException [Error Id: 9cd1fd96-0aed-4d06-b0ae-d48ddc70b91e ] (state=,code=0) 0: jdbc:drill: ; Error: CONNECTION ERROR: Connection /127.0.0.1:46726 -- /127.0.0.1:31010 (user client) closed unexpectedly. [Error Id: 222a5358-6b2e-49e1-a1ec-931cacbbdbd1 ] (state=,code=0) 0: jdbc:drill: ; Error: SYSTEM ERROR: ChannelClosedException [Error Id: fc589b70-dd10-4484-963a-21bc88147a0d ] (state=,code=0) 0: jdbc:drill: ; Error: CONNECTION ERROR: Connection /127.0.0.1:46726 -- /127.0.0.1:31010 (user client) closed unexpectedly. [Error Id: 19965e75-9f2e-4a73-b1d8-29d61e6ea31a ] (state=,code=0) 0: jdbc:drill: 0: jdbc:drill: 0: jdbc:drill: {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 37151: DRILL-3579: Fix issues in reading Hive tables with partition value __HIVE_DEFAULT_PARTITION__
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37151/ --- Review request for drill and Mehant Baid. Repository: drill-git Description --- Please see DRILL-3579 for details. Diffs - contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java d323db9 contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/HivePushPartitionFilterIntoScan.java 90b0c5f contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDataTypeUtility.java 84d8790 contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveFieldConverter.java a59d37b contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveRecordReader.java 1a66ad9 contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java 22552b7 contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java PRE-CREATION contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java 0ea9d53 contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java 21d4f7b exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java af67282 Diff: https://reviews.apache.org/r/37151/diff/ Testing --- Added unittests. Thanks, Venki Korukanti
Re: [DISCUSS] Insert into Table support
I thought I'd add my two cents based on my effort with Parquet pushdown filtering. It sounds like merging schemas is going to be pretty daunting, considering the work planned around embedded types and considering Parquet doesn't support those. Furthermore, metadata sounds like it's going to be fairly specific to each storage type. In the Parquet example, it's very beneficial in having statistics for each column in the file so that you can easily filter out files that clearly won't match a filter (which is what I did in the pushdown implementation). This is the challenge with the current metadata implementation (it doesn't include that information), so you end up planning batches out to many Drillbits, often with a lot of those batches entirely skipped due to mismatching filters. I completely agree with Jacques's point with regards to not changing the outcome of the query if the dot drill/metadata files are not present - it just simply makes the query more expensive without them. On Thu, Aug 6, 2015 at 8:01 AM, Aman Sinha asi...@maprtech.com wrote: @Jacques.. Since the topic of metadata caching is closely related, the main issue is how is the metadata file maintained in the presence of either concurrent INSERTs or concurrent REFRESH METADATA operations ? One could maintain multiple versions of the metadata file or keep the version information inside a single metadata file. Is that what you were thinking when you mention Lucene's index versioning ? (I am not familiar with lucene's versioning). Aman On Wed, Aug 5, 2015 at 12:02 PM, Chris Westin chriswesti...@gmail.com wrote: Re #7 in the original post Select table syntax can specify constant values for one or more columns: I would have assumed the select list can have any expressions that can be evaluated on a row from the source; that includes columns, expressions on columns, or constants. It's probably not your intent, but the stated form implies that all I get are column values and constants. Which is it? On Mon, Jul 27, 2015 at 5:40 PM, Mehant Baid baid.meh...@gmail.com wrote: I wanted to start a conversation around supporting the Insert into Table feature. As of 1.2 we initially want to support inserting into a table with Parquet files. Support for Json, CSV and other sources will follow as future enhancements. Aman, Jinfeng, Neeraja and I had an initial discussion about this and Neeraja provided a good summary of our discussion (pasted below) also stating some of the requirements for this feature. A ) Support Insert into a non-partitioned table - Ex: INSERT INTO T1 [col1, col2, col3] SELECT col4, col5, col6 from T2 (Source table: T2, Target table T1) Requirements: 1. Target table column list specification is optional for Insert statement 2. When specified, the column list in the Insert statement should contain all the columns present in the target table (i.e No support for partial insert) 3. The column names specified for the source table do not need to match to the target table column names. Match is performed based on ordinal. 4. # of Source table columns specified must be same as # of target table columns 5. Types of specified source table columns must match to the types of target table columns 6. Specification of * is not allowed in the Select table syntax 7. Select table syntax can specify constant values for one or more columns B ) Support insert into a partitioned table -- Ex: INSERT INTO T1 col1, col2,col3 partition by col1,col2 SELECT col4,col,col6 from T2 * Target column specification is required when inserting data into an already partitioned table * Requirements A.3-A.7 above apply for insert into partitioned tables as well * A partition by clause along with one or more columns is required * All the columns specified in partition by clause must exist in the target column list * Partition by columns specified do not need to match to the list of columns that the original table partitioned with (i.e if the original table is partitioned with col1, col2, new data during insert can be partitioned by col3 or just with col1 or col2..) Couple of open questions from the design perspective are 1. How do we perform validation. Validation of data types, number of columns being inserted etc. In addition to validation we need to make sure that when we insert into an existing tables we insert data with the existing column names (select column list can have different names). This poses problems around needing to know the metadata at planning time, two approaches that have been floating around are * DotDrill files: We can store metadata, partitioning columns and other useful information here and we can perform validation during