[jira] [Updated] (DRILL-5258) Allow "extended" mock tables access from SQL queries
[ https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Khatua updated DRILL-5258: Affects Version/s: (was: 1.10) 1.10.0 > Allow "extended" mock tables access from SQL queries > > > Key: DRILL-5258 > URL: https://issues.apache.org/jira/browse/DRILL-5258 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Labels: ready-to-commit > Fix For: 1.10.0 > > > DRILL-5152 provided a simple way to generate sample data in SQL using a new, > simplified version of the mock data generator. This approach is very > convenient, but is inherently limited. For example, the limited syntax > available in SQL does not encoding much information about columns such as > repeat count, data generator or so on. The simple SQL approach does not allow > generating multiple groups of data. > However, all these features are present in the original mock data source via > a special JSON configuration file. Previously, only physical plans could > access that extended syntax. > This ticket requests a SQL interface to the extended mock data source: > {code} > SELECT * FROM `mock`.`example/mock-options.json` > {code} > Mock data source options are always stored as a JSON file. Since the existing > mock data generator for SQL never uses JSON files, a simple rule is that if > the table name ends in ".json" then it is a specification, else the > information is encoded in table and column names. > The format of the data generation syntax is documented in the mock data > source classes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5258) Allow "extended" mock tables access from SQL queries
[ https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Khatua updated DRILL-5258: Fix Version/s: (was: 1.10) 1.10.0 > Allow "extended" mock tables access from SQL queries > > > Key: DRILL-5258 > URL: https://issues.apache.org/jira/browse/DRILL-5258 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Labels: ready-to-commit > Fix For: 1.10.0 > > > DRILL-5152 provided a simple way to generate sample data in SQL using a new, > simplified version of the mock data generator. This approach is very > convenient, but is inherently limited. For example, the limited syntax > available in SQL does not encoding much information about columns such as > repeat count, data generator or so on. The simple SQL approach does not allow > generating multiple groups of data. > However, all these features are present in the original mock data source via > a special JSON configuration file. Previously, only physical plans could > access that extended syntax. > This ticket requests a SQL interface to the extended mock data source: > {code} > SELECT * FROM `mock`.`example/mock-options.json` > {code} > Mock data source options are always stored as a JSON file. Since the existing > mock data generator for SQL never uses JSON files, a simple rule is that if > the table name ends in ".json" then it is a specification, else the > information is encoded in table and column names. > The format of the data generation syntax is documented in the mock data > source classes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5287) Provide option to skip updates of ephemeral state changes in Zookeeper
[ https://issues.apache.org/jira/browse/DRILL-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Khatua updated DRILL-5287: Fix Version/s: (was: 1.10) 1.10.0 > Provide option to skip updates of ephemeral state changes in Zookeeper > -- > > Key: DRILL-5287 > URL: https://issues.apache.org/jira/browse/DRILL-5287 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy > Fix For: 1.10.0 > > > We put transient profiles in zookeeper and update state as query progresses > and changes states. It is observed that this adds latency of ~45msec for each > update in the query execution path. This gets even worse when high number of > concurrent queries are in progress. For concurrency=100, the average query > response time even for short queries is 8 sec vs 0.2 sec with these updates > disabled. For short lived queries in a high-throughput scenario, it is of no > value to update state changes in zookeeper. We need an option to disable > these updates for short running operational queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-462) Periodic connection failure with Drillbit
[ https://issues.apache.org/jira/browse/DRILL-462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] prasann...@trinitymobility.com updated DRILL-462: - Currently i am using apache drill 1.9.0. Now I am also facing same issue,my java application is using drill for query purpose.But periodically disconnects with drillbit service after few queries executions. If we use jps it is showing drillbit service is running.I am not find the cause,am i missing anything in the flow.I am getting logs as, ERROR 2017-02-28 10:12 (http-nio-8098-exec-5) org.trinity.social.dao.SocialDataDaoImpl StatementCallback; uncategorized SQLException for SQL [SELECT source, sentiment, COUNT(*) AS data_count FROM dfs.tmp.social_data_nlp WHERE created_time > DATE_SUB(CURRENT_DATE, 1) AND keyword_search = true AND ( searched_keyword = 'fire' OR searched_keyword = 'acciden' OR user_name = 'aajtak') GROUP BY source, sentiment]; SQL state [null]; error code [0]; SYSTEM ERROR: RetriesExhaustedException: Can't get the locations [Error Id: 9f6ccd37-ee0f-47b4-9a6b-4eeb4456d542 on trinitybdClusterM02.trinitymobility.local:31010]; nested exception is java.sql.SQLException: SYSTEM ERROR: RetriesExhaustedException: Can't get the locations [Error Id: 9f6ccd37-ee0f-47b4-9a6b-4eeb4456d542 on trinitybdClusterM02.trinitymobility.local:31010] org.springframework.jdbc.UncategorizedSQLException: StatementCallback; uncategorized SQLException for SQL [SELECT source, sentiment, COUNT(*) AS data_count FROM dfs.tmp.social_data_nlp WHERE created_time > DATE_SUB(CURRENT_DATE, 1) AND keyword_search = true AND ( searched_keyword = 'fire' OR searched_keyword = 'acciden' OR user_name = 'aajtak') GROUP BY source, sentiment]; SQL state [null]; error code [0]; SYSTEM ERROR: RetriesExhaustedException: Can't get the locations [Error Id: 9f6ccd37-ee0f-47b4-9a6b-4eeb4456d542 on trinitybdClusterM02.trinitymobility.local:31010]; nested exception is java.sql.SQLException: SYSTEM ERROR: RetriesExhaustedException: Can't get the locations [Error Id: 9f6ccd37-ee0f-47b4-9a6b-4eeb4456d542 on trinitybdClusterM02.trinitymobility.local:31010] at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.tran slate(AbstractFallbackSQLExceptionTranslator.java:84) ~[spring-jdbc-4.3.5.RELEASE.jar:4.3.5.RELEASE] at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.tran slate(AbstractFallbackSQLExceptionTranslator.java:81) ~[spring-jdbc-4.3.5.RELEASE.jar:4.3.5.RELEASE] at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.tran slate(AbstractFallbackSQLExceptionTranslator.java:81) ~[spring-jdbc-4.3.5.RELEASE.jar:4.3.5.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:419) ~[spring-jdbc-4.3.5.RELEASE.jar:4.3.5.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:474) ~[spring-jdbc-4.3.5.RELEASE.jar:4.3.5.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:484) ~[spring-jdbc-4.3.5.RELEASE.jar:4.3.5.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.queryForList(JdbcTemplate.java:51 0) ~[spring-jdbc-4.3.5.RELEASE.jar:4.3.5.RELEASE] at org.trinity.social.dao.SocialDataDaoImpl.getAllDataByKeywordCount(SocialData DaoImpl.java:98) [classes/:?] at org.trinity.social.service.SpatialDataFilterServiceImpl.getDataByKeywords(Sp atialDataFilterServiceImpl.java:48) [classes/:?] at org.trinity.social.controller.SocialDataController.getKeywordData(SocialData Controller.java:26) [classes/:?] at sun.reflect.GeneratedMethodAccessor106.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:43) ~[?:1.8.0_66] at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66] at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(Invoc ableHandlerMethod.java:220) [spring-web-4.3.5.RELEASE.jar:4.3.5.RELEASE] at org.springframework.web.method.support.InvocableHandlerMethod.invokeForReque st(InvocableHandlerMethod.java:134) [spring-web-4.3.5.RELEASE.jar:4.3.5.RELEASE] at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandle rMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:116) [spring-webmvc-4.3.5.RELEASE.jar:4.3.5.RELEASE] at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerA dapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827) [spring-webmvc-4.3.5.RELEASE.jar:4.3.5.RELEASE] at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerA dapter.handleInternal(RequestMappingHandlerAdapter.java:738) [spring-webmvc-4.3.5.RELEASE.jar:4.3.5.RELEASE] at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.hand
[jira] [Commented] (DRILL-5290) Provide an option to build operator table once for built-in static functions and reuse it across queries.
[ https://issues.apache.org/jira/browse/DRILL-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887206#comment-15887206 ] ASF GitHub Bot commented on DRILL-5290: --- Github user sudheeshkatkam commented on the issue: https://github.com/apache/drill/pull/757 +1 Please squash commits, and open a ticket for the enhancement you mentioned. > Provide an option to build operator table once for built-in static functions > and reuse it across queries. > - > > Key: DRILL-5290 > URL: https://issues.apache.org/jira/browse/DRILL-5290 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy > Labels: doc-impacting > Fix For: 1.10.0 > > > Currently, DrillOperatorTable which contains standard SQL operators and > functions and Drill User Defined Functions (UDFs) (built-in and dynamic) gets > built for each query as part of creating QueryContext. This is an expensive > operation ( ~30 msec to build) and allocates ~2M on heap for each query. For > high throughput, low latency operational queries, we quickly run out of heap > memory, causing JVM hangs. Build operator table once during startup for > static built-in functions and save in DrillbitContext, so we can reuse it > across queries. > Provide a system/session option to not use dynamic UDFs so we can use the > operator table saved in DrillbitContext and avoid building each time. > *Please note, changes are adding new option exec.udf.use_dynamic which needs > to be documented.* -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5293) Poor performance of Hash Table due to same hash value as distribution below
[ https://issues.apache.org/jira/browse/DRILL-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong updated DRILL-5293: Reviewer: Chunhui Shi Assigned Reviewer to [~cshi] > Poor performance of Hash Table due to same hash value as distribution below > --- > > Key: DRILL-5293 > URL: https://issues.apache.org/jira/browse/DRILL-5293 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Codegen >Affects Versions: 1.8.0 >Reporter: Boaz Ben-Zvi >Assignee: Boaz Ben-Zvi > > The computation of the hash value is basically the same whether for the Hash > Table (used by Hash Agg, and Hash Join), or for distribution of rows at the > exchange. As a result, a specific Hash Table (in a parallel minor fragment) > gets only rows "filtered out" by the partition below ("upstream"), so the > pattern of this filtering leads to a non uniform usage of the hash buckets in > the table. > Here is a simplified example: An exchange partitions into TWO (minor > fragments), each running a Hash Agg. So the partition sends rows of EVEN hash > values to the first, and rows of ODD hash values to the second. Now the first > recomputes the _same_ hash value for its Hash table -- and only the even > buckets get used !! (Or with a partition into EIGHT -- possibly only one > eighth of the buckets would be used !! ) >This would lead to longer hash chains and thus a _poor performance_ ! > A possible solution -- add a distribution function distFunc (only for > partitioning) that takes the hash value and "scrambles" it so that the > entropy in all the bits effects the low bits of the output. This function > should be applied (in HashPrelUtil) over the generated code that produces the > hash value, like: >distFunc( hash32(field1, hash32(field2, hash32(field3, 0))) ); > Tested with a huge hash aggregate (64 M rows) and a parallelism of 8 ( > planner.width.max_per_node = 8 ); minor fragments 0 and 4 used only 1/8 of > their buckets, the others used 1/4 of their buckets. Maybe the reason for > this variance is that distribution is using "hash32AsDouble" and hash agg is > using "hash32". -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5287) Provide option to skip updates of ephemeral state changes in Zookeeper
[ https://issues.apache.org/jira/browse/DRILL-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887108#comment-15887108 ] ASF GitHub Bot commented on DRILL-5287: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/758#discussion_r103364711 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryManager.java --- @@ -280,8 +281,15 @@ public void interrupted(final InterruptedException ex) { } } - QueryState updateEphemeralState(final QueryState queryState) { -switch (queryState) { + void updateEphemeralState(final QueryState queryState) { + // If query is already in zk transient store, ignore the transient state update option. + // Else, they will not be removed from transient store upon completion. + if (transientProfiles.get(stringQueryId) == null && --- End diff -- Why not just check the option? `transientProfiles.get(stringQueryId)` is quite expensive itself ([contacts ZooKeeper and deserializes data](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZkEphemeralStore.java#L61)). > Provide option to skip updates of ephemeral state changes in Zookeeper > -- > > Key: DRILL-5287 > URL: https://issues.apache.org/jira/browse/DRILL-5287 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy > Fix For: 1.10 > > > We put transient profiles in zookeeper and update state as query progresses > and changes states. It is observed that this adds latency of ~45msec for each > update in the query execution path. This gets even worse when high number of > concurrent queries are in progress. For concurrency=100, the average query > response time even for short queries is 8 sec vs 0.2 sec with these updates > disabled. For short lived queries in a high-throughput scenario, it is of no > value to update state changes in zookeeper. We need an option to disable > these updates for short running operational queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5221) cancel message is delayed until queryid or data is received
[ https://issues.apache.org/jira/browse/DRILL-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887012#comment-15887012 ] ASF GitHub Bot commented on DRILL-5221: --- Github user vkorukanti commented on the issue: https://github.com/apache/drill/pull/733 LGTM, +1. We may end up sending CANCEL twice to server, but the server already has state management, so should be fine. > cancel message is delayed until queryid or data is received > --- > > Key: DRILL-5221 > URL: https://issues.apache.org/jira/browse/DRILL-5221 > Project: Apache Drill > Issue Type: Improvement > Components: Client - C++ >Affects Versions: 1.9.0 >Reporter: Laurent Goujon >Assignee: Laurent Goujon > > When user is calling the cancel method of the C++ client, the client wait for > a message from the server to reply back with a cancellation message. > In case of queries taking a long time to return batch results, it means > cancellation won't be effective until the next batch is received, instead of > cancelling right away the query (assuming the query id has already been > received, which is generally the case). > It seems this was foreseen by [~vkorukanti] in his initial patch > (https://github.com/vkorukanti/drill/commit/e0ef6349aac48de5828b6d725c2cf013905d18eb) > but was omitted when I backported it post metadata changes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5167) C++ connector does not set escape string for metadata search pattern
[ https://issues.apache.org/jira/browse/DRILL-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886964#comment-15886964 ] ASF GitHub Bot commented on DRILL-5167: --- Github user vkorukanti commented on the issue: https://github.com/apache/drill/pull/712 +1 > C++ connector does not set escape string for metadata search pattern > > > Key: DRILL-5167 > URL: https://issues.apache.org/jira/browse/DRILL-5167 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Minor > > C++ connector does not set the escape string for search pattern when doing > metadata operation (getCatalogs/getSchema/getTables/getColumns). It is > assumed to be '\\' as returned by DrillMetadata::getSearchEscapeString(), but > because this is not sent over the wire, the server will actually consider > that there's no escape character, and might return different or no result > compared to what has been requested. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort
[ https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886821#comment-15886821 ] ASF GitHub Bot commented on DRILL-5284: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/761 Fixed code review comments. Rebased on latest master. > Roll-up of final fixes for managed sort > --- > > Key: DRILL-5284 > URL: https://issues.apache.org/jira/browse/DRILL-5284 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.10.0 > > > The managed external sort was introduced in DRILL-5080. Since that time, > extensive testing has identified a number of minor fixes and improvements. > Given the long PR cycles, it is not practical to spend a week or two to do a > PR for each fix individually. This ticket represents a roll-up of a > combination of a number of fixes. Small fixes are listed here, larger items > appear as sub-tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort
[ https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886810#comment-15886810 ] ASF GitHub Bot commented on DRILL-5284: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103332364 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -392,22 +448,31 @@ private void configure(DrillConfig config) { // Set too large and the ratio between memory and input data sizes becomes // small. Set too small and disk seek times dominate performance. -spillBatchSize = config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE); -spillBatchSize = Math.max(spillBatchSize, MIN_SPILL_BATCH_SIZE); +preferredSpillBatchSize = config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE); + +// In low memory, use no more than 1/4 of memory for each spill batch. Ensures we +// can merge. + +preferredSpillBatchSize = Math.min(preferredSpillBatchSize, memoryLimit / 4); --- End diff -- In low memory conditions, restrict the spill batch size to 1/4 of memory. Why? * We need to accumulate at least 2 such batches to do a merge. (Now at 1/2 of memory.) * We need to create an output batch from the two inputs (3/4 of memory). * Need overhead for other direct memory uses. (Remaining 1/4 of memory.) Sadly, memory management in Drill is not very precise: batch sizes can't be predicted with any accuracy. Trying to use, say, 1/3 of memory for the spill batch would seem more logical. (Two batches into the merge, one out), but the allocator issues a fatal error if we guess wrong by even one byte. So, we are forced to be conservative. If we had better control, or a more forgiving allocator, we could make different choices. Also, why try to sort GBs of data in 20 MB? Yet, this is the test case that had to be solved and that this particular fix enables. I'm open to suggestions for better solutions; this is a very tricky area... > Roll-up of final fixes for managed sort > --- > > Key: DRILL-5284 > URL: https://issues.apache.org/jira/browse/DRILL-5284 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.10.0 > > > The managed external sort was introduced in DRILL-5080. Since that time, > extensive testing has identified a number of minor fixes and improvements. > Given the long PR cycles, it is not practical to spend a week or two to do a > PR for each fix individually. This ticket represents a roll-up of a > combination of a number of fixes. Small fixes are listed here, larger items > appear as sub-tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort
[ https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886819#comment-15886819 ] ASF GitHub Bot commented on DRILL-5284: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103330903 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/SpillSet.java --- @@ -357,9 +393,13 @@ public SpillSet(FragmentContext context, PhysicalOperator popConfig) { } else { fileManager = new HadoopFileManager(spillFs); } -FragmentHandle handle = context.getHandle(); -spillDirName = String.format("%s_major%s_minor%s_op%s", QueryIdHelper.getQueryId(handle.getQueryId()), -handle.getMajorFragmentId(), handle.getMinorFragmentId(), popConfig.getOperatorId()); +spillDirName = String.format( --- End diff -- Either format is fine. Go ahead and overwrite this change with your preferred format when you commit your work. > Roll-up of final fixes for managed sort > --- > > Key: DRILL-5284 > URL: https://issues.apache.org/jira/browse/DRILL-5284 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.10.0 > > > The managed external sort was introduced in DRILL-5080. Since that time, > extensive testing has identified a number of minor fixes and improvements. > Given the long PR cycles, it is not practical to spend a week or two to do a > PR for each fix individually. This ticket represents a roll-up of a > combination of a number of fixes. Small fixes are listed here, larger items > appear as sub-tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort
[ https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886816#comment-15886816 ] ASF GitHub Bot commented on DRILL-5284: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103332807 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -765,12 +838,12 @@ private void processBatch() { spillFromMemory(); } -// Sanity check. We should now be above the spill point. +// Sanity check. We should now be below the buffer memory maximum. long startMem = allocator.getAllocatedMemory(); -if (memoryLimit - startMem < spillPoint) { - logger.error( "ERROR: Failed to spill below the spill point. Spill point = {}, free memory = {}", -spillPoint, memoryLimit - startMem); +if (startMem > bufferMemoryPool) { + logger.error( "ERROR: Failed to spill above buffer limit. Buffer pool = {}, memory = {}", + bufferMemoryPool, startMem); --- End diff -- We could. But, at this point, it is a potential problem, not a real one. Maybe the input has no rows; we won't overflow memory. Maybe just one or two rows and we'll be fine. This warning says, "if we continue as we are now, and we have large amounts of data, we'll run off the rails." It helps explain any later OOM error that occurs. > Roll-up of final fixes for managed sort > --- > > Key: DRILL-5284 > URL: https://issues.apache.org/jira/browse/DRILL-5284 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.10.0 > > > The managed external sort was introduced in DRILL-5080. Since that time, > extensive testing has identified a number of minor fixes and improvements. > Given the long PR cycles, it is not practical to spend a week or two to do a > PR for each fix individually. This ticket represents a roll-up of a > combination of a number of fixes. Small fixes are listed here, larger items > appear as sub-tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort
[ https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886811#comment-15886811 ] ASF GitHub Bot commented on DRILL-5284: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103332576 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -392,22 +448,31 @@ private void configure(DrillConfig config) { // Set too large and the ratio between memory and input data sizes becomes // small. Set too small and disk seek times dominate performance. -spillBatchSize = config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE); -spillBatchSize = Math.max(spillBatchSize, MIN_SPILL_BATCH_SIZE); +preferredSpillBatchSize = config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE); + +// In low memory, use no more than 1/4 of memory for each spill batch. Ensures we +// can merge. + +preferredSpillBatchSize = Math.min(preferredSpillBatchSize, memoryLimit / 4); + +// But, the spill batch should be above some minimum size to prevent complete +// thrashing. + +preferredSpillBatchSize = Math.max(preferredSpillBatchSize, MIN_SPILL_BATCH_SIZE); --- End diff -- Done later when we tally up memory needs and compare it to available memory. We issue an error log message if memory overflow is likely. This way, when a query fails, we can look at the log to see that we knew it would fail due to low memory limits. > Roll-up of final fixes for managed sort > --- > > Key: DRILL-5284 > URL: https://issues.apache.org/jira/browse/DRILL-5284 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.10.0 > > > The managed external sort was introduced in DRILL-5080. Since that time, > extensive testing has identified a number of minor fixes and improvements. > Given the long PR cycles, it is not practical to spend a week or two to do a > PR for each fix individually. This ticket represents a roll-up of a > combination of a number of fixes. Small fixes are listed here, larger items > appear as sub-tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort
[ https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886812#comment-15886812 ] ASF GitHub Bot commented on DRILL-5284: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r10813 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -948,50 +1027,50 @@ private void updateMemoryEstimates(long memoryDelta, RecordBatchSizer sizer) { // spill batches of either 64K records, or as many records as fit into the // amount of memory dedicated to each spill batch, whichever is less. -spillBatchRowCount = (int) Math.max(1, spillBatchSize / estimatedRowWidth); +spillBatchRowCount = (int) Math.max(1, preferredSpillBatchSize / estimatedRowWidth / 2); --- End diff -- Yes. Another wonderful Drill artifact. Suppose we have 1023 bytes of data. We will allocate a vector of 1024 bytes in size. Suppose we have 1025 bytes of data. (Just 0.2% more.) We allocate a vector of 2048 bytes. Now, we could be more conservative and assume that, on average, each vector will bye 3/4 full, so we should us a factor of 1.5 for the calcs. We can file a JIRA and experiment with this change as a future enhancement. It would also help if the allocator didn't kill the query if we allocate even one extra byte. But, since math errors are fatal, we are super-conservative for now. > Roll-up of final fixes for managed sort > --- > > Key: DRILL-5284 > URL: https://issues.apache.org/jira/browse/DRILL-5284 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.10.0 > > > The managed external sort was introduced in DRILL-5080. Since that time, > extensive testing has identified a number of minor fixes and improvements. > Given the long PR cycles, it is not practical to spend a week or two to do a > PR for each fix individually. This ticket represents a roll-up of a > combination of a number of fixes. Small fixes are listed here, larger items > appear as sub-tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort
[ https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886815#comment-15886815 ] ASF GitHub Bot commented on DRILL-5284: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103335045 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -1231,52 +1308,44 @@ private boolean consolidateBatches() { * This method spills only half the accumulated batches * minimizing unnecessary disk writes. The exact count must lie between * the minimum and maximum spill counts. -*/ + */ private void spillFromMemory() { // Determine the number of batches to spill to create a spill file // of the desired size. The actual file size might be a bit larger // or smaller than the target, which is expected. -long estSize = 0; int spillCount = 0; +long spillSize = 0; for (InputBatch batch : bufferedBatches) { - estSize += batch.getDataSize(); - if (estSize > spillFileSize) { -break; } + long batchSize = batch.getDataSize(); + spillSize += batchSize; spillCount++; + if (spillSize + batchSize / 2 > spillFileSize) { +break; } } -// Should not happen, but just to be sure... +// Must always spill at least 2, even if this creates an over-size +// spill file. -if (spillCount == 0) { - return; } +spillCount = Math.max(spillCount, 2); // Do the actual spill. -logger.trace("Starting spill from memory. Memory = {}, Buffered batch count = {}, Spill batch count = {}", - allocator.getAllocatedMemory(), bufferedBatches.size(), spillCount); mergeAndSpill(bufferedBatches, spillCount); } private void mergeAndSpill(LinkedList source, int count) { -if (count == 0) { - return; } spilledRuns.add(doMergeAndSpill(source, count)); } private BatchGroup.SpilledRun doMergeAndSpill(LinkedList batchGroups, int spillCount) { List batchesToSpill = Lists.newArrayList(); spillCount = Math.min(batchGroups.size(), spillCount); assert spillCount > 0 : "Spill count to mergeAndSpill must not be zero"; -long spillSize = 0; for (int i = 0; i < spillCount; i++) { - @SuppressWarnings("resource") - BatchGroup batch = batchGroups.pollFirst(); - assert batch != null : "Encountered a null batch during merge and spill operation"; - batchesToSpill.add(batch); - spillSize += batch.getDataSize(); + batchesToSpill.add(batchGroups.pollFirst()); --- End diff -- This won't happen for first-generation spills due to the check in `isSpillNeeded`. But, it could happen when spilling for merges, so I fixed the code. > Roll-up of final fixes for managed sort > --- > > Key: DRILL-5284 > URL: https://issues.apache.org/jira/browse/DRILL-5284 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.10.0 > > > The managed external sort was introduced in DRILL-5080. Since that time, > extensive testing has identified a number of minor fixes and improvements. > Given the long PR cycles, it is not practical to spend a week or two to do a > PR for each fix individually. This ticket represents a roll-up of a > combination of a number of fixes. Small fixes are listed here, larger items > appear as sub-tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort
[ https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886814#comment-15886814 ] ASF GitHub Bot commented on DRILL-5284: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103331438 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -219,7 +220,18 @@ private BatchSchema schema; + /** + * Incoming batches buffered in memory prior to spilling + * or an in-memory merge. + */ + private LinkedList bufferedBatches = Lists.newLinkedList(); + + /** + * Spilled runs consisting of a large number of spilled + * in-memory batches. --- End diff -- Fixed. > Roll-up of final fixes for managed sort > --- > > Key: DRILL-5284 > URL: https://issues.apache.org/jira/browse/DRILL-5284 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.10.0 > > > The managed external sort was introduced in DRILL-5080. Since that time, > extensive testing has identified a number of minor fixes and improvements. > Given the long PR cycles, it is not practical to spend a week or two to do a > PR for each fix individually. This ticket represents a roll-up of a > combination of a number of fixes. Small fixes are listed here, larger items > appear as sub-tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort
[ https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886818#comment-15886818 ] ASF GitHub Bot commented on DRILL-5284: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103334294 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -948,50 +1027,50 @@ private void updateMemoryEstimates(long memoryDelta, RecordBatchSizer sizer) { // spill batches of either 64K records, or as many records as fit into the // amount of memory dedicated to each spill batch, whichever is less. -spillBatchRowCount = (int) Math.max(1, spillBatchSize / estimatedRowWidth); +spillBatchRowCount = (int) Math.max(1, preferredSpillBatchSize / estimatedRowWidth / 2); spillBatchRowCount = Math.min(spillBatchRowCount, Character.MAX_VALUE); +// Compute the actual spill batch size which may be larger or smaller +// than the preferred size depending on the row width. Double the estimated +// memory needs to allow for power-of-two rounding. + +targetSpillBatchSize = spillBatchRowCount * estimatedRowWidth * 2; + // Determine the number of records per batch per merge step. The goal is to // merge batches of either 64K records, or as many records as fit into the // amount of memory dedicated to each merge batch, whichever is less. -targetMergeBatchSize = preferredMergeBatchSize; -mergeBatchRowCount = (int) Math.max(1, targetMergeBatchSize / estimatedRowWidth); +mergeBatchRowCount = (int) Math.max(1, preferredMergeBatchSize / estimatedRowWidth / 2); mergeBatchRowCount = Math.min(mergeBatchRowCount, Character.MAX_VALUE); +targetMergeBatchSize = mergeBatchRowCount * estimatedRowWidth * 2; // Determine the minimum memory needed for spilling. Spilling is done just // before accepting a batch, so we must spill if we don't have room for a // (worst case) input batch. To spill, we need room for the output batch created // by merging the batches already in memory. Double this to allow for power-of-two // memory allocations. -spillPoint = estimatedInputBatchSize + 2 * spillBatchSize; +long spillPoint = estimatedInputBatchSize + 2 * targetSpillBatchSize; // The merge memory pool assumes we can spill all input batches. To make // progress, we must have at least two merge batches (same size as an output // batch) and one output batch. Again, double to allow for power-of-two // allocation and add one for a margin of error. -int minMergeBatches = 2 * 3 + 1; -long minMergeMemory = minMergeBatches * targetMergeBatchSize; +long minMergeMemory = Math.round((2 * targetSpillBatchSize + targetMergeBatchSize) * 1.05); // If we are in a low-memory condition, then we might not have room for the // default output batch size. In that case, pick a smaller size. -long minMemory = Math.max(spillPoint, minMergeMemory); -if (minMemory > memoryLimit) { - - // Figure out the minimum output batch size based on memory, but can't be - // any smaller than the defined minimum. - - targetMergeBatchSize = Math.max(MIN_MERGED_BATCH_SIZE, memoryLimit / minMergeBatches); +if (minMergeMemory > memoryLimit) { - // Regardless of anything else, the batch must hold at least one - // complete row. + // Figure out the minimum output batch size based on memory, + // must hold at least one complete row. - targetMergeBatchSize = Math.max(estimatedRowWidth, targetMergeBatchSize); - spillPoint = estimatedInputBatchSize + 2 * spillBatchSize; - minMergeMemory = minMergeBatches * targetMergeBatchSize; + long mergeAllowance = Math.round((memoryLimit - 2 * targetSpillBatchSize) * 0.95); + targetMergeBatchSize = Math.max(estimatedRowWidth, mergeAllowance / 2); + mergeBatchRowCount = (int) (targetMergeBatchSize / estimatedRowWidth / 2); --- End diff -- Good catch! Fixed. > Roll-up of final fixes for managed sort > --- > > Key: DRILL-5284 > URL: https://issues.apache.org/jira/browse/DRILL-5284 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.10.0 > > > The managed external sort was introduced in DRILL-5080. Since that time, > extensive testing has identified a number of minor fixes and improvements. >
[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort
[ https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886813#comment-15886813 ] ASF GitHub Bot commented on DRILL-5284: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r10406 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -934,6 +1005,14 @@ private void updateMemoryEstimates(long memoryDelta, RecordBatchSizer sizer) { long origInputBatchSize = estimatedInputBatchSize; estimatedInputBatchSize = Math.max(estimatedInputBatchSize, actualBatchSize); +// The row width may end up as zero if all fields are nulls or some +// other unusual situation. In this case, assume a width of 10 just +// to avoid lots of special case code. + +if (estimatedRowWidth == 0) { + estimatedRowWidth = 10; --- End diff -- This is a very peculiar case that came up in testing. It seems that we can have a row with one column and that one column is always null. Imagine a Parquet file that has 1 million Varchars, all of which are null. In every batch, the row width will be 0. Since we often divide by the row width, bad things happen. So, here, we arbitrarily say that if the row is abnormally small, just assume 10 bytes to avoid the need for a bunch of special case calcs. (The calcs are already too complex already.) If there are 1000 columns, all of which are null, we would write 1000 "bit" (really byte) vectors, so each row would be 1000 bytes wide. But, in such a case, the batch analyzer should have come up with a number other than 0 for the row width. > Roll-up of final fixes for managed sort > --- > > Key: DRILL-5284 > URL: https://issues.apache.org/jira/browse/DRILL-5284 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.10.0 > > > The managed external sort was introduced in DRILL-5080. Since that time, > extensive testing has identified a number of minor fixes and improvements. > Given the long PR cycles, it is not practical to spend a week or two to do a > PR for each fix individually. This ticket represents a roll-up of a > combination of a number of fixes. Small fixes are listed here, larger items > appear as sub-tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5226) External Sort encountered an error while spilling to disk
[ https://issues.apache.org/jira/browse/DRILL-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rahul Challapalli updated DRILL-5226: - Attachment: scenario3.log profile_scenario3.sys.drill One more scenario : {code} ALTER SESSION SET `exec.sort.disable_managed` = false; alter session set `planner.width.max_per_node` = 1; alter session set `planner.memory.max_query_memory_per_node` = 104857600; select col11 from (select * from dfs.`/drill/testdata/identical1` order by col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, col11 desc) d where d.col11 < 10; {code} The logfile (scenario3.log) and profile(profile_scenario3.sys.drill) are attached > External Sort encountered an error while spilling to disk > - > > Key: DRILL-5226 > URL: https://issues.apache.org/jira/browse/DRILL-5226 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli > Attachments: 277578d5-8bea-27db-0da1-cec0f53a13df.sys.drill, > profile_scenario3.sys.drill, scenario3.log > > > Environment : > {code} > git.commit.id.abbrev=2af709f > DRILL_MAX_DIRECT_MEMORY="32G" > DRILL_MAX_HEAP="4G" > Nodes in Mapr Cluster : 1 > Data Size : ~ 0.35 GB > No of Columns : 1 > Width of column : 256 chars > {code} > The below query fails before spilling to disk due to wrong estimates of the > record batch size. > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `planner.width.max_per_node` = 1; > +---+--+ > | ok | summary| > +---+--+ > | true | planner.width.max_per_node updated. | > +---+--+ > 1 row selected (1.11 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `planner.memory.max_query_memory_per_node` = 62914560; > +---++ > | ok | summary | > +---++ > | true | planner.memory.max_query_memory_per_node updated. | > +---++ > 1 row selected (0.362 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `planner.disable_exchanges` = true; > +---+-+ > | ok | summary | > +---+-+ > | true | planner.disable_exchanges updated. | > +---+-+ > 1 row selected (0.277 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> select * from (select * from > dfs.`/drill/testdata/resource-manager/250wide-small.tbl` order by > columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf'; > Error: RESOURCE ERROR: External Sort encountered an error while spilling to > disk > Unable to allocate buffer of size 1048576 (rounded from 618889) due to memory > limit. Current allocation: 62736000 > Fragment 0:0 > [Error Id: 1bb933c8-7dc6-4cbd-8c8e-0e095baac719 on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > Exception from the logs > {code} > 2017-01-26 15:33:09,307 [277578d5-8bea-27db-0da1-cec0f53a13df:frag:0:0] INFO > o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: External Sort > encountered an error while spilling to disk (Unable to allocate buffer of > size 1048576 (rounded from 618889) due to memory limit. Current allocation: > 62736000) > org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: External > Sort encountered an error while spilling to disk > Unable to allocate buffer of size 1048576 (rounded from 618889) due to memory > limit. Current allocation: 62736000 > [Error Id: 1bb933c8-7dc6-4cbd-8c8e-0e095baac719 ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) > ~[drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:603) > [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:411) > [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) > [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) >
[jira] [Commented] (DRILL-5114) Rationalize use of Logback logging in unit tests
[ https://issues.apache.org/jira/browse/DRILL-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886670#comment-15886670 ] ASF GitHub Bot commented on DRILL-5114: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/762 Can't achieve different log levels for different appenders. To favor Lilith, cannot reduce the level for console. So, need to come up with an alternative solution. Closing again until that is sorted out. > Rationalize use of Logback logging in unit tests > > > Key: DRILL-5114 > URL: https://issues.apache.org/jira/browse/DRILL-5114 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > Drill uses Logback as its logger. The logger is used in several to display > some test output. Test output is sent to stdout, rather than a log file. > Since Drill also uses Logback, that same configuration sends much Drill > logging output to stdout as well, cluttering test output. > Logback requires that that one Logback config file (either logback.xml or > hogback-test.xml) exist on the class path. Tests store the config file in the > src/test/resources folder of each sub-project. > These files set the default logging level to debug. While this setting is > fine when working with individual tests, the output is overwhelming for bulk > test runs. > The first requested change is to set the default logging level to error. > The existing config files are usually called "logback.xml." Change the name > of test files to "logback-test.xml" to make clear that they are, in fact, > test configs. > The {{exec/java-exec/src/test/resources/logback.xml}} config file is a full > version of Drill's production config file. Replace this with a config > suitable for testing (that is, the same as other modules.) > The java-exec project includes a production-like config file in its non-test > sources: {{exec/java-exec/src/main/resources/logback.xml}}. Remove this as it > is not needed. (Instead, rely on the one shipped in the distribution > subsystem, which is the one copied to the Drill distribution.) > Since Logback complains bitterly (via many log messages) when it cannot find > a configuration file (and each sub-module must have its own test > configuration), add missing logging configuration files: > * exec/memory/base/src/test/resources/logback-test.xml > * logical/src/test/resources/logback-test.xml -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5114) Rationalize use of Logback logging in unit tests
[ https://issues.apache.org/jira/browse/DRILL-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886668#comment-15886668 ] ASF GitHub Bot commented on DRILL-5114: --- Github user paul-rogers closed the pull request at: https://github.com/apache/drill/pull/762 > Rationalize use of Logback logging in unit tests > > > Key: DRILL-5114 > URL: https://issues.apache.org/jira/browse/DRILL-5114 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > Drill uses Logback as its logger. The logger is used in several to display > some test output. Test output is sent to stdout, rather than a log file. > Since Drill also uses Logback, that same configuration sends much Drill > logging output to stdout as well, cluttering test output. > Logback requires that that one Logback config file (either logback.xml or > hogback-test.xml) exist on the class path. Tests store the config file in the > src/test/resources folder of each sub-project. > These files set the default logging level to debug. While this setting is > fine when working with individual tests, the output is overwhelming for bulk > test runs. > The first requested change is to set the default logging level to error. > The existing config files are usually called "logback.xml." Change the name > of test files to "logback-test.xml" to make clear that they are, in fact, > test configs. > The {{exec/java-exec/src/test/resources/logback.xml}} config file is a full > version of Drill's production config file. Replace this with a config > suitable for testing (that is, the same as other modules.) > The java-exec project includes a production-like config file in its non-test > sources: {{exec/java-exec/src/main/resources/logback.xml}}. Remove this as it > is not needed. (Instead, rely on the one shipped in the distribution > subsystem, which is the one copied to the Drill distribution.) > Since Logback complains bitterly (via many log messages) when it cannot find > a configuration file (and each sub-module must have its own test > configuration), add missing logging configuration files: > * exec/memory/base/src/test/resources/logback-test.xml > * logical/src/test/resources/logback-test.xml -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5114) Rationalize use of Logback logging in unit tests
[ https://issues.apache.org/jira/browse/DRILL-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886630#comment-15886630 ] ASF GitHub Bot commented on DRILL-5114: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/762#discussion_r103325603 --- Diff: exec/java-exec/src/test/resources/logback-test.xml --- @@ -32,12 +32,13 @@ - + + --- End diff -- Why do we want to put all the log messages in STDOUT? > Rationalize use of Logback logging in unit tests > > > Key: DRILL-5114 > URL: https://issues.apache.org/jira/browse/DRILL-5114 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > Drill uses Logback as its logger. The logger is used in several to display > some test output. Test output is sent to stdout, rather than a log file. > Since Drill also uses Logback, that same configuration sends much Drill > logging output to stdout as well, cluttering test output. > Logback requires that that one Logback config file (either logback.xml or > hogback-test.xml) exist on the class path. Tests store the config file in the > src/test/resources folder of each sub-project. > These files set the default logging level to debug. While this setting is > fine when working with individual tests, the output is overwhelming for bulk > test runs. > The first requested change is to set the default logging level to error. > The existing config files are usually called "logback.xml." Change the name > of test files to "logback-test.xml" to make clear that they are, in fact, > test configs. > The {{exec/java-exec/src/test/resources/logback.xml}} config file is a full > version of Drill's production config file. Replace this with a config > suitable for testing (that is, the same as other modules.) > The java-exec project includes a production-like config file in its non-test > sources: {{exec/java-exec/src/main/resources/logback.xml}}. Remove this as it > is not needed. (Instead, rely on the one shipped in the distribution > subsystem, which is the one copied to the Drill distribution.) > Since Logback complains bitterly (via many log messages) when it cannot find > a configuration file (and each sub-module must have its own test > configuration), add missing logging configuration files: > * exec/memory/base/src/test/resources/logback-test.xml > * logical/src/test/resources/logback-test.xml -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5208) Finding path to java executable should be deterministic
[ https://issues.apache.org/jira/browse/DRILL-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886619#comment-15886619 ] ASF GitHub Bot commented on DRILL-5208: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/763 Adressed code review comments and rebased onto latest master. > Finding path to java executable should be deterministic > --- > > Key: DRILL-5208 > URL: https://issues.apache.org/jira/browse/DRILL-5208 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.10.0 >Reporter: Krystal >Assignee: Paul Rogers >Priority: Minor > > Command to find JAVA in drill-config.sh is not deterministic. > drill-config.sh uses the following command to find JAVA: > JAVA=`find -L "$JAVA_HOME" -name $JAVA_BIN -type f | head -n 1` > On one of my node the following command returned 2 entries: > find -L $JAVA_HOME -name java -type f > /usr/local/java/jdk1.7.0_67/jre/bin/java > /usr/local/java/jdk1.7.0_67/bin/java > On another node, the same command returned entries in different order: > find -L $JAVA_HOME -name java -type f > /usr/local/java/jdk1.7.0_67/bin/java > /usr/local/java/jdk1.7.0_67/jre/bin/java > The complete command picks the first one returned which may not be the same > on each node: > find -L $JAVA_HOME -name java -type f | head -n 1 > /usr/local/java/jdk1.7.0_67/jre/bin/java > If JAVA_HOME is found, we should just append the "bin/java" to the path" > JAVA=$JAVA_HOME/bin/java -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5290) Provide an option to build operator table once for built-in static functions and reuse it across queries.
[ https://issues.apache.org/jira/browse/DRILL-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886333#comment-15886333 ] ASF GitHub Bot commented on DRILL-5290: --- Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/757#discussion_r103285068 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java --- @@ -413,4 +413,8 @@ String DYNAMIC_UDF_SUPPORT_ENABLED = "exec.udf.enable_dynamic_support"; BooleanValidator DYNAMIC_UDF_SUPPORT_ENABLED_VALIDATOR = new BooleanValidator(DYNAMIC_UDF_SUPPORT_ENABLED, true, true); + + String USE_DYNAMIC_UDFS = "exec.udf.use_dynamic"; --- End diff -- ok, we need to use readWriteLocks if we update the table each time a function gets added/removed. That is unnecessary overhead and will cause contention with concurrency. One option is to split the table into two, one for built-in functions (which can be accessed without locks) and other for dynamic functions. That will be a bigger change and like I mentioned before, is not considered worth the effort. > Provide an option to build operator table once for built-in static functions > and reuse it across queries. > - > > Key: DRILL-5290 > URL: https://issues.apache.org/jira/browse/DRILL-5290 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy > Labels: doc-impacting > Fix For: 1.10.0 > > > Currently, DrillOperatorTable which contains standard SQL operators and > functions and Drill User Defined Functions (UDFs) (built-in and dynamic) gets > built for each query as part of creating QueryContext. This is an expensive > operation ( ~30 msec to build) and allocates ~2M on heap for each query. For > high throughput, low latency operational queries, we quickly run out of heap > memory, causing JVM hangs. Build operator table once during startup for > static built-in functions and save in DrillbitContext, so we can reuse it > across queries. > Provide a system/session option to not use dynamic UDFs so we can use the > operator table saved in DrillbitContext and avoid building each time. > *Please note, changes are adding new option exec.udf.use_dynamic which needs > to be documented.* -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5287) Provide option to skip updates of ephemeral state changes in Zookeeper
[ https://issues.apache.org/jira/browse/DRILL-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886231#comment-15886231 ] ASF GitHub Bot commented on DRILL-5287: --- Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/758#discussion_r103271146 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java --- @@ -413,4 +413,8 @@ String DYNAMIC_UDF_SUPPORT_ENABLED = "exec.udf.enable_dynamic_support"; BooleanValidator DYNAMIC_UDF_SUPPORT_ENABLED_VALIDATOR = new BooleanValidator(DYNAMIC_UDF_SUPPORT_ENABLED, true, true); + + String ZK_QUERY_STATE_UPDATE_KEY = "drill.exec.zk.query.state.update"; --- End diff -- I changed it to QUERY_TRANSIENT_STATE_UPDATE_KEY and exec.query.progress.update. Please review the new diffs. > Provide option to skip updates of ephemeral state changes in Zookeeper > -- > > Key: DRILL-5287 > URL: https://issues.apache.org/jira/browse/DRILL-5287 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy > Fix For: 1.10 > > > We put transient profiles in zookeeper and update state as query progresses > and changes states. It is observed that this adds latency of ~45msec for each > update in the query execution path. This gets even worse when high number of > concurrent queries are in progress. For concurrency=100, the average query > response time even for short queries is 8 sec vs 0.2 sec with these updates > disabled. For short lived queries in a high-throughput scenario, it is of no > value to update state changes in zookeeper. We need an option to disable > these updates for short running operational queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (DRILL-3161) Drill JDBC driver not visible/auto-registered via Service Provider Mechanism
[ https://issues.apache.org/jira/browse/DRILL-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laurent Goujon closed DRILL-3161. - Resolution: Duplicate > Drill JDBC driver not visible/auto-registered via Service Provider Mechanism > > > Key: DRILL-3161 > URL: https://issues.apache.org/jira/browse/DRILL-3161 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Reporter: Daniel Barclay > Fix For: Future > > > Drill's JDBC driver is not automatically made visible to JDBC's DriverManager > and auto-registered, because it does not use Java's Service Provider > Mechanism as specified by JDBC 4.0. > This usually means that instead of just having to put the Drill JDBC driver > Jar file on the class path and use a Drill JDBC URL (one starting with > "{{jdbc:drill:}}"), users also have to configure their tools or code with the > name of the Drill driver class. > > The Drill JDBC driver's Jar file should contain a > {{META-INF/services/java.sql.Driver}} file that contains a line consisting of > the fully qualified name of the Drill JDBC driver class > ({{org.apache.drill.jdbc.Driver}}). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-3510) Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL identifiers
[ https://issues.apache.org/jira/browse/DRILL-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886213#comment-15886213 ] ASF GitHub Bot commented on DRILL-3510: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/520#discussion_r103268387 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -695,6 +698,33 @@ public void runQuery(QueryType type, List planFragments, UserResul } /** + * Get server properties that represent the list of server session options. + * + * @return server properties for the server session options. + */ + public ServerProperties getOptions() throws RpcException { --- End diff -- Sorry, it took me longer than a week but PR #764 contains API change for server metadata support with C++ client/JDBC driver support. If approved, it should make things way easier for you as the only change you would need is to update the server metadata to get the quoting information from the session. > Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL > identifiers > -- > > Key: DRILL-3510 > URL: https://issues.apache.org/jira/browse/DRILL-3510 > Project: Apache Drill > Issue Type: Improvement > Components: SQL Parser >Reporter: Jinfeng Ni >Assignee: Vitalii Diravka > Labels: doc-impacting > Fix For: 1.10.0 > > Attachments: DRILL-3510.patch, DRILL-3510.patch > > > Currently Drill's SQL parser uses backtick as identifier quotes, the same as > what MySQL does. However, this is different from ANSI SQL specification, > where double quote is used as identifier quotes. > MySQL has an option "ANSI_QUOTES", which could be switched on/off by user. > Drill should follow the same way, so that Drill users do not have to rewrite > their existing queries, if their queries use double quotes. > {code} > SET sql_mode='ANSI_QUOTES'; > {code} > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-4994) Prepared statement stopped working between 1.8.0 client and < 1.8.0 server
[ https://issues.apache.org/jira/browse/DRILL-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886204#comment-15886204 ] Laurent Goujon commented on DRILL-4994: --- Part of this pull request: https://github.com/apache/drill/pull/613 > Prepared statement stopped working between 1.8.0 client and < 1.8.0 server > -- > > Key: DRILL-4994 > URL: https://issues.apache.org/jira/browse/DRILL-4994 > Project: Apache Drill > Issue Type: Bug >Reporter: Laurent Goujon >Assignee: Laurent Goujon > > Older servers (pre-1.8.0) don't support the prepared statement rpc method, > but the JDBC client doesn't check if it is available or not. The end result > is that the statement is stuck as the server is not responding back. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (DRILL-4994) Prepared statement stopped working between 1.8.0 client and < 1.8.0 server
[ https://issues.apache.org/jira/browse/DRILL-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laurent Goujon reassigned DRILL-4994: - Assignee: Laurent Goujon > Prepared statement stopped working between 1.8.0 client and < 1.8.0 server > -- > > Key: DRILL-4994 > URL: https://issues.apache.org/jira/browse/DRILL-4994 > Project: Apache Drill > Issue Type: Bug >Reporter: Laurent Goujon >Assignee: Laurent Goujon > > Older servers (pre-1.8.0) don't support the prepared statement rpc method, > but the JDBC client doesn't check if it is available or not. The end result > is that the statement is stuck as the server is not responding back. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5301) Add server metadata API
[ https://issues.apache.org/jira/browse/DRILL-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886202#comment-15886202 ] ASF GitHub Bot commented on DRILL-5301: --- GitHub user laurentgo opened a pull request: https://github.com/apache/drill/pull/764 DRILL-5301: Server metadata API Add a Server metadata API to the User protocol, to query server support of various SQL features. Add support to the client (DrillClient) to query this information. Add support to the JDBC driver to query this information, if the server supports the new API, or fallback to the previous behaviour (rely on Avatica defaults) otherwise. Add support to the Server metadata API to the C++ client if available. If the API is not supported to the server, fallback to the previous hard-coded values. You can merge this pull request into a Git repository by running: $ git pull https://github.com/laurentgo/drill laurent/server-meta Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/764.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #764 commit 48bf728c88b8244c0fc51ae8856d0f786bd9e986 Author: Laurent GoujonDate: 2016-11-04T20:31:19Z Refactor DrillCursor Refactor DrillCursor to be more self-contained. commit 6583d69df3b972270e146e53ab2ddcf9c4aff93c Author: Laurent Goujon Date: 2016-11-04T20:32:44Z DRILL-4730: Update JDBC DatabaseMetaData implementation to use new Metadata APIs Update JDBC driver to use Metadata APIs instead of executing SQL queries commit 17ce38a44d098e744620a28b25c93fd352e7c76d Author: Laurent Goujon Date: 2016-11-05T00:36:42Z DRILL-4994: Add back JDBC prepared statement for older servers When the JDBC client is connected to an older Drill server, it always attempted to use server-side prepared statement with no fallback. With this change, client will check server version and will fallback to the previous client-side prepared statement (which is still limited to only execute queries and does not provide metadata). commit 5048bb650bf3a42e9f7920727e19e33ae59f0188 Author: Laurent Goujon Date: 2017-02-24T23:41:07Z DRILL-5301: Server metadata API Add a Server metadata API to the User protocol, to query server support of various SQL features. Add support to the client (DrillClient) to query this information. Add support to the JDBC driver to query this information, if the server supports the new API, or fallback to the previous behaviour (rely on Avatica defaults) otherwise. commit d912267efad379e3730f800bc7b3af57bee2aa06 Author: Laurent Goujon Date: 2017-02-26T18:23:59Z DRILL-5301: Add C++ client support for Server metadata API Add support to the Server metadata API to the C++ client if available. If the API is not supported to the server, fallback to the previous hard-coded values. Update the querySubmitter example program to query the information. > Add server metadata API > --- > > Key: DRILL-5301 > URL: https://issues.apache.org/jira/browse/DRILL-5301 > Project: Apache Drill > Issue Type: Improvement > Components: Server, Client - C++, Client - Java, Client - JDBC, > Client - ODBC >Reporter: Laurent Goujon >Assignee: Laurent Goujon > > JDBC and ODBC clients exposes lots of metadata regarding server version and > support of various parts of the SQL standard. > Currently the returned information is hardcoded in both clients/drivers which > means that the infomation returned is support as of the client version, not > the server version. > Instead, a new method should be provided to the clients to query the actual > server support. Support on the client or the server should be optional (for > example a client should not use this API if the server doesn't support it and > fallback to default values). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (DRILL-5301) Add server metadata API
[ https://issues.apache.org/jira/browse/DRILL-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laurent Goujon reassigned DRILL-5301: - Assignee: Laurent Goujon > Add server metadata API > --- > > Key: DRILL-5301 > URL: https://issues.apache.org/jira/browse/DRILL-5301 > Project: Apache Drill > Issue Type: Improvement > Components: Server, Client - C++, Client - Java, Client - JDBC, > Client - ODBC >Reporter: Laurent Goujon >Assignee: Laurent Goujon > > JDBC and ODBC clients exposes lots of metadata regarding server version and > support of various parts of the SQL standard. > Currently the returned information is hardcoded in both clients/drivers which > means that the infomation returned is support as of the client version, not > the server version. > Instead, a new method should be provided to the clients to query the actual > server support. Support on the client or the server should be optional (for > example a client should not use this API if the server doesn't support it and > fallback to default values). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (DRILL-5301) Add server metadata API
Laurent Goujon created DRILL-5301: - Summary: Add server metadata API Key: DRILL-5301 URL: https://issues.apache.org/jira/browse/DRILL-5301 Project: Apache Drill Issue Type: Improvement Components: Server, Client - C++, Client - Java, Client - JDBC, Client - ODBC Reporter: Laurent Goujon JDBC and ODBC clients exposes lots of metadata regarding server version and support of various parts of the SQL standard. Currently the returned information is hardcoded in both clients/drivers which means that the infomation returned is support as of the client version, not the server version. Instead, a new method should be provided to the clients to query the actual server support. Support on the client or the server should be optional (for example a client should not use this API if the server doesn't support it and fallback to default values). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (DRILL-4385) Support metadata and prepare operations on User RPC layer
[ https://issues.apache.org/jira/browse/DRILL-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laurent Goujon closed DRILL-4385. - Resolution: Duplicate Assignee: Venki Korukanti > Support metadata and prepare operations on User RPC layer > - > > Key: DRILL-4385 > URL: https://issues.apache.org/jira/browse/DRILL-4385 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Jacques Nadeau >Assignee: Venki Korukanti > > Right now, we don't support prepare and metadata operations are done through > code and queries on the jdbc and odbc drivers. This is an umbrella task to > implement metadata and prepare operations directly at the RPC layer. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (DRILL-4419) JDBC driver should move to using the new metadata methods provided by DRILL-4385
[ https://issues.apache.org/jira/browse/DRILL-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laurent Goujon closed DRILL-4419. - Resolution: Fixed Assignee: Laurent Goujon > JDBC driver should move to using the new metadata methods provided by > DRILL-4385 > > > Key: DRILL-4419 > URL: https://issues.apache.org/jira/browse/DRILL-4419 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Jacques Nadeau >Assignee: Laurent Goujon > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-4385) Support metadata and prepare operations on User RPC layer
[ https://issues.apache.org/jira/browse/DRILL-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886173#comment-15886173 ] Laurent Goujon commented on DRILL-4385: --- I believe this was done as part of DRILL-4728 and DRILL-4729 > Support metadata and prepare operations on User RPC layer > - > > Key: DRILL-4385 > URL: https://issues.apache.org/jira/browse/DRILL-4385 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Jacques Nadeau > > Right now, we don't support prepare and metadata operations are done through > code and queries on the jdbc and odbc drivers. This is an umbrella task to > implement metadata and prepare operations directly at the RPC layer. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (DRILL-5300) SYSTEM ERROR: IllegalStateException: Memory was leaked by query while querying parquet files
[ https://issues.apache.org/jira/browse/DRILL-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886026#comment-15886026 ] Zelaine Fong edited comment on DRILL-5300 at 2/27/17 4:06 PM: -- Based on these lines in your stack trace: {code} ... 5 common frames omitted 2017-02-27 04:32:57,867 [drill-executor-453] ERROR o.a.d.exec.server.BootStrapContext - org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception. java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$DecompressionHelper.decompress(AsyncPageReader.java:402) ~[drill-java-exec-1.9.0.jar:1.9.0] {code} The memory leak appears to be DRILL-5160. The missing snappy dependency is DRILL-5157. If you pick up the fix for DRILL-5157, that will avoid the dependency problem you're hitting. was (Author: zfong): Based on these lines in your stack trace: ... 5 common frames omitted 2017-02-27 04:32:57,867 [drill-executor-453] ERROR o.a.d.exec.server.BootStrapContext - org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception. java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$DecompressionHelper.decompress(AsyncPageReader.java:402) ~[drill-java-exec-1.9.0.jar:1.9.0] The memory leak appears to be DRILL-5160. The missing snappy dependency is DRILL-5157. If you pick up the fix for DRILL-5157, that will avoid the dependency problem you're hitting. > SYSTEM ERROR: IllegalStateException: Memory was leaked by query while > querying parquet files > > > Key: DRILL-5300 > URL: https://issues.apache.org/jira/browse/DRILL-5300 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.9.0 > Environment: OS: Linux >Reporter: Muhammad Gelbana > Attachments: both_queries_logs.zip > > > Running the following query against parquet files (I modified some values for > privacy reasons) > {code:title=Query causing the long logs|borderStyle=solid} > SELECT AL4.NAME, AL5.SEGMENT2, SUM(AL1.AMOUNT), AL2.ATTRIBUTE4, > AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, > AL11.NAME FROM > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA__TRX_LINE_GL_DIST_ALL` > AL1, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA_OMER_TRX_ALL` > AL2, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX` > AL3, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_HR_COMMON/HR_ALL_ORGANIZATION_UNITS` > AL4, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_CODE_COMBINATIONS` > AL5, > dfs.`/disk2/XXX/XXX//data/../parquet//XXAT_AR_MU_TAB` > AL8, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX` > AL11, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S` > AL12, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS` > AL13, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL` > AL14, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL` > AL15, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL` > AL16, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL` > AL17, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS` > AL18, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S` > AL19 WHERE (AL2.SHIP_TO__USE_ID = AL15._USE_ID AND > AL15.___ID = AL14.___ID AND AL14.X__ID = > AL12.X__ID AND AL12.LOCATION_ID = AL13.LOCATION_ID AND > AL17.___ID = AL16.___ID AND AL16.X__ID = > AL19.X__ID AND AL19.LOCATION_ID = AL18.LOCATION_ID AND > AL2.BILL_TO__USE_ID = AL17._USE_ID AND AL2.SET_OF_X_ID = > AL3.SET_OF_X_ID AND AL1.CODE_COMBINATION_ID = AL5.CODE_COMBINATION_ID AND > AL5.SEGMENT4 = AL8.MU AND AL1.SET_OF_X_ID = AL11.SET_OF_X_ID AND > AL2.ORG_ID = AL4.ORGANIZATION_ID AND AL2.OMER_TRX_ID = > AL1.OMER_TRX_ID) AND ((AL5.SEGMENT2 = '41' AND AL1.AMOUNT <> 0 AND > AL4.NAME IN ('XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', > 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-') > AND AL3.NAME like '%-PR-%')) GROUP BY AL4.NAME, AL5.SEGMENT2, AL2.ATTRIBUTE4, > AL2.XXX__CODE,
[jira] [Commented] (DRILL-5300) SYSTEM ERROR: IllegalStateException: Memory was leaked by query while querying parquet files
[ https://issues.apache.org/jira/browse/DRILL-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886026#comment-15886026 ] Zelaine Fong commented on DRILL-5300: - Based on these lines in your stack trace: ... 5 common frames omitted 2017-02-27 04:32:57,867 [drill-executor-453] ERROR o.a.d.exec.server.BootStrapContext - org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception. java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$DecompressionHelper.decompress(AsyncPageReader.java:402) ~[drill-java-exec-1.9.0.jar:1.9.0] The memory leak appears to be DRILL-5160. The missing snappy dependency is DRILL-5157. If you pick up the fix for DRILL-5157, that will avoid the dependency problem you're hitting. > SYSTEM ERROR: IllegalStateException: Memory was leaked by query while > querying parquet files > > > Key: DRILL-5300 > URL: https://issues.apache.org/jira/browse/DRILL-5300 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.9.0 > Environment: OS: Linux >Reporter: Muhammad Gelbana > Attachments: both_queries_logs.zip > > > Running the following query against parquet files (I modified some values for > privacy reasons) > {code:title=Query causing the long logs|borderStyle=solid} > SELECT AL4.NAME, AL5.SEGMENT2, SUM(AL1.AMOUNT), AL2.ATTRIBUTE4, > AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, > AL11.NAME FROM > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA__TRX_LINE_GL_DIST_ALL` > AL1, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA_OMER_TRX_ALL` > AL2, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX` > AL3, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_HR_COMMON/HR_ALL_ORGANIZATION_UNITS` > AL4, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_CODE_COMBINATIONS` > AL5, > dfs.`/disk2/XXX/XXX//data/../parquet//XXAT_AR_MU_TAB` > AL8, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX` > AL11, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S` > AL12, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS` > AL13, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL` > AL14, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL` > AL15, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL` > AL16, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL` > AL17, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS` > AL18, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S` > AL19 WHERE (AL2.SHIP_TO__USE_ID = AL15._USE_ID AND > AL15.___ID = AL14.___ID AND AL14.X__ID = > AL12.X__ID AND AL12.LOCATION_ID = AL13.LOCATION_ID AND > AL17.___ID = AL16.___ID AND AL16.X__ID = > AL19.X__ID AND AL19.LOCATION_ID = AL18.LOCATION_ID AND > AL2.BILL_TO__USE_ID = AL17._USE_ID AND AL2.SET_OF_X_ID = > AL3.SET_OF_X_ID AND AL1.CODE_COMBINATION_ID = AL5.CODE_COMBINATION_ID AND > AL5.SEGMENT4 = AL8.MU AND AL1.SET_OF_X_ID = AL11.SET_OF_X_ID AND > AL2.ORG_ID = AL4.ORGANIZATION_ID AND AL2.OMER_TRX_ID = > AL1.OMER_TRX_ID) AND ((AL5.SEGMENT2 = '41' AND AL1.AMOUNT <> 0 AND > AL4.NAME IN ('XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', > 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-') > AND AL3.NAME like '%-PR-%')) GROUP BY AL4.NAME, AL5.SEGMENT2, AL2.ATTRIBUTE4, > AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, > AL11.NAME > {code} > {code:title=Query causing the short logs|borderStyle=solid} > SELECT AL11.NAME > FROM > dfs.`/XXX/XXX/XXX/data/../parquet/XXX_XXX_COMMON/GL_XXX` > LIMIT 10 > {code} > This issue may be a duplicate for [this > one|https://issues.apache.org/jira/browse/DRILL-4398] but I created a new one > based on [this > suggestion|https://issues.apache.org/jira/browse/DRILL-4398?focusedCommentId=15884846=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15884846]. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5300) SYSTEM ERROR: IllegalStateException: Memory was leaked by query while querying parquet files
[ https://issues.apache.org/jira/browse/DRILL-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Muhammad Gelbana updated DRILL-5300: Attachment: both_queries_logs.zip > SYSTEM ERROR: IllegalStateException: Memory was leaked by query while > querying parquet files > > > Key: DRILL-5300 > URL: https://issues.apache.org/jira/browse/DRILL-5300 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.9.0 > Environment: OS: Linux >Reporter: Muhammad Gelbana > Attachments: both_queries_logs.zip > > > Running the following query against parquet files (I modified some values for > privacy reasons) > {code:title=Query causing the long logs|borderStyle=solid} > SELECT AL4.NAME, AL5.SEGMENT2, SUM(AL1.AMOUNT), AL2.ATTRIBUTE4, > AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, > AL11.NAME FROM > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA__TRX_LINE_GL_DIST_ALL` > AL1, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA_OMER_TRX_ALL` > AL2, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX` > AL3, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_HR_COMMON/HR_ALL_ORGANIZATION_UNITS` > AL4, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_CODE_COMBINATIONS` > AL5, > dfs.`/disk2/XXX/XXX//data/../parquet//XXAT_AR_MU_TAB` > AL8, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX` > AL11, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S` > AL12, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS` > AL13, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL` > AL14, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL` > AL15, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL` > AL16, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL` > AL17, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS` > AL18, > dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S` > AL19 WHERE (AL2.SHIP_TO__USE_ID = AL15._USE_ID AND > AL15.___ID = AL14.___ID AND AL14.X__ID = > AL12.X__ID AND AL12.LOCATION_ID = AL13.LOCATION_ID AND > AL17.___ID = AL16.___ID AND AL16.X__ID = > AL19.X__ID AND AL19.LOCATION_ID = AL18.LOCATION_ID AND > AL2.BILL_TO__USE_ID = AL17._USE_ID AND AL2.SET_OF_X_ID = > AL3.SET_OF_X_ID AND AL1.CODE_COMBINATION_ID = AL5.CODE_COMBINATION_ID AND > AL5.SEGMENT4 = AL8.MU AND AL1.SET_OF_X_ID = AL11.SET_OF_X_ID AND > AL2.ORG_ID = AL4.ORGANIZATION_ID AND AL2.OMER_TRX_ID = > AL1.OMER_TRX_ID) AND ((AL5.SEGMENT2 = '41' AND AL1.AMOUNT <> 0 AND > AL4.NAME IN ('XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', > 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-') > AND AL3.NAME like '%-PR-%')) GROUP BY AL4.NAME, AL5.SEGMENT2, AL2.ATTRIBUTE4, > AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, > AL11.NAME > {code} > {code:title=Query causing the short logs|borderStyle=solid} > SELECT AL11.NAME > FROM > dfs.`/XXX/XXX/XXX/data/../parquet/XXX_XXX_COMMON/GL_XXX` > LIMIT 10 > {code} > This issue may be a duplicate for [this > one|https://issues.apache.org/jira/browse/DRILL-4398] but I created a new one > based on [this > suggestion|https://issues.apache.org/jira/browse/DRILL-4398?focusedCommentId=15884846=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15884846]. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (DRILL-5300) SYSTEM ERROR: IllegalStateException: Memory was leaked by query while querying parquet files
Muhammad Gelbana created DRILL-5300: --- Summary: SYSTEM ERROR: IllegalStateException: Memory was leaked by query while querying parquet files Key: DRILL-5300 URL: https://issues.apache.org/jira/browse/DRILL-5300 Project: Apache Drill Issue Type: Bug Affects Versions: 1.9.0 Environment: OS: Linux Reporter: Muhammad Gelbana Attachments: both_queries_logs.zip Running the following query against parquet files (I modified some values for privacy reasons) {code:title=Query causing the long logs|borderStyle=solid} SELECT AL4.NAME, AL5.SEGMENT2, SUM(AL1.AMOUNT), AL2.ATTRIBUTE4, AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, AL11.NAME FROM dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA__TRX_LINE_GL_DIST_ALL` AL1, dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA_OMER_TRX_ALL` AL2, dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX` AL3, dfs.`/disk2/XXX/XXX//data/../parquet/XXX_HR_COMMON/HR_ALL_ORGANIZATION_UNITS` AL4, dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_CODE_COMBINATIONS` AL5, dfs.`/disk2/XXX/XXX//data/../parquet//XXAT_AR_MU_TAB` AL8, dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX` AL11, dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S` AL12, dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS` AL13, dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL` AL14, dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL` AL15, dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL` AL16, dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL` AL17, dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS` AL18, dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S` AL19 WHERE (AL2.SHIP_TO__USE_ID = AL15._USE_ID AND AL15.___ID = AL14.___ID AND AL14.X__ID = AL12.X__ID AND AL12.LOCATION_ID = AL13.LOCATION_ID AND AL17.___ID = AL16.___ID AND AL16.X__ID = AL19.X__ID AND AL19.LOCATION_ID = AL18.LOCATION_ID AND AL2.BILL_TO__USE_ID = AL17._USE_ID AND AL2.SET_OF_X_ID = AL3.SET_OF_X_ID AND AL1.CODE_COMBINATION_ID = AL5.CODE_COMBINATION_ID AND AL5.SEGMENT4 = AL8.MU AND AL1.SET_OF_X_ID = AL11.SET_OF_X_ID AND AL2.ORG_ID = AL4.ORGANIZATION_ID AND AL2.OMER_TRX_ID = AL1.OMER_TRX_ID) AND ((AL5.SEGMENT2 = '41' AND AL1.AMOUNT <> 0 AND AL4.NAME IN ('XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-') AND AL3.NAME like '%-PR-%')) GROUP BY AL4.NAME, AL5.SEGMENT2, AL2.ATTRIBUTE4, AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, AL11.NAME {code} {code:title=Query causing the short logs|borderStyle=solid} SELECT AL11.NAME FROM dfs.`/XXX/XXX/XXX/data/../parquet/XXX_XXX_COMMON/GL_XXX` LIMIT 10 {code} This issue may be a duplicate for [this one|https://issues.apache.org/jira/browse/DRILL-4398] but I created a new one based on [this suggestion|https://issues.apache.org/jira/browse/DRILL-4398?focusedCommentId=15884846=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15884846]. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Issue Comment Deleted] (DRILL-5298) CTAS with 0 records from a SELECT query should create the table with metadata
[ https://issues.apache.org/jira/browse/DRILL-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Senthilkumar updated DRILL-5298: Comment: was deleted (was: The issue has not been fixed still. Table is not created on the event on empty dataset from a Hive Table) > CTAS with 0 records from a SELECT query should create the table with metadata > - > > Key: DRILL-5298 > URL: https://issues.apache.org/jira/browse/DRILL-5298 > Project: Apache Drill > Issue Type: Bug > Components: Metadata, Query Planning & Optimization, SQL Parser >Affects Versions: 1.9.0 > Environment: MapR 5.2 >Reporter: Senthilkumar > Fix For: 1.9.0 > > > Hello team, > I create a table in Drill using CTAS as > CREATE TABLE CTAS_TEST AS SELECT * FROM `hive.default`.`test` WHERE 1 = 0 > It runs successfully. > But the table is not getting created as there are 0 records getting returned > from the SELECT query. > CTAS should still go ahead and create the table with the column metadata. > When BI tools fire up multi-pass queries, with CTAS in the first query, the > subsequent queries fail because of a missing table. > In databases like SQL Server, Postgres, CTAS will create the table, even if > the SELECT doesnt return any rows. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (DRILL-5298) CTAS with 0 records from a SELECT query should create the table with metadata
[ https://issues.apache.org/jira/browse/DRILL-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Senthilkumar closed DRILL-5298. --- Resolution: Duplicate > CTAS with 0 records from a SELECT query should create the table with metadata > - > > Key: DRILL-5298 > URL: https://issues.apache.org/jira/browse/DRILL-5298 > Project: Apache Drill > Issue Type: Bug > Components: Metadata, Query Planning & Optimization, SQL Parser >Affects Versions: 1.9.0 > Environment: MapR 5.2 >Reporter: Senthilkumar > Fix For: 1.9.0 > > > Hello team, > I create a table in Drill using CTAS as > CREATE TABLE CTAS_TEST AS SELECT * FROM `hive.default`.`test` WHERE 1 = 0 > It runs successfully. > But the table is not getting created as there are 0 records getting returned > from the SELECT query. > CTAS should still go ahead and create the table with the column metadata. > When BI tools fire up multi-pass queries, with CTAS in the first query, the > subsequent queries fail because of a missing table. > In databases like SQL Server, Postgres, CTAS will create the table, even if > the SELECT doesnt return any rows. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Reopened] (DRILL-5298) CTAS with 0 records from a SELECT query should create the table with metadata
[ https://issues.apache.org/jira/browse/DRILL-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Senthilkumar reopened DRILL-5298: - The issue has not been fixed still. Table is not created on the event on empty dataset from a Hive Table > CTAS with 0 records from a SELECT query should create the table with metadata > - > > Key: DRILL-5298 > URL: https://issues.apache.org/jira/browse/DRILL-5298 > Project: Apache Drill > Issue Type: Bug > Components: Metadata, Query Planning & Optimization >Affects Versions: 1.9.0 > Environment: MapR 5.2 >Reporter: Senthilkumar > Fix For: 1.9.0 > > > Hello team, > I create a table in Drill using CTAS as > CREATE TABLE CTAS_TEST AS SELECT * FROM `hive.default`.`test` WHERE 1 = 0 > It runs successfully. > But the table is not getting created as there are 0 records getting returned > from the SELECT query. > CTAS should still go ahead and create the table with the column metadata. > When BI tools fire up multi-pass queries, with CTAS in the first query, the > subsequent queries fail because of a missing table. > In databases like SQL Server, Postgres, CTAS will create the table, even if > the SELECT doesnt return any rows. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5298) CTAS with 0 records from a SELECT query should create the table with metadata
[ https://issues.apache.org/jira/browse/DRILL-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885375#comment-15885375 ] Senthilkumar commented on DRILL-5298: - This issue exists in 1.9 still.. If you try the following statement CREATE TABLE CTAS_TEST AS SELECT * FROM `hive.default`.`test` WHERE 1 = 0 and then try with SELECT * FROM CTAS_TEST, it will still fail with NO TABLE exception. > CTAS with 0 records from a SELECT query should create the table with metadata > - > > Key: DRILL-5298 > URL: https://issues.apache.org/jira/browse/DRILL-5298 > Project: Apache Drill > Issue Type: Bug > Components: Metadata, Query Planning & Optimization >Affects Versions: 1.9.0 > Environment: MapR 5.2 >Reporter: Senthilkumar > Fix For: 1.9.0 > > > Hello team, > I create a table in Drill using CTAS as > CREATE TABLE CTAS_TEST AS SELECT * FROM `hive.default`.`test` WHERE 1 = 0 > It runs successfully. > But the table is not getting created as there are 0 records getting returned > from the SELECT query. > CTAS should still go ahead and create the table with the column metadata. > When BI tools fire up multi-pass queries, with CTAS in the first query, the > subsequent queries fail because of a missing table. > In databases like SQL Server, Postgres, CTAS will create the table, even if > the SELECT doesnt return any rows. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5298) CTAS with 0 records from a SELECT query should create the table with metadata
[ https://issues.apache.org/jira/browse/DRILL-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885370#comment-15885370 ] Khurram Faraaz commented on DRILL-5298: --- The last comment in DRILL-4517 says that Drill will no longer produce empty parquet files. > CTAS with 0 records from a SELECT query should create the table with metadata > - > > Key: DRILL-5298 > URL: https://issues.apache.org/jira/browse/DRILL-5298 > Project: Apache Drill > Issue Type: Bug > Components: Metadata, Query Planning & Optimization >Affects Versions: 1.9.0 > Environment: MapR 5.2 >Reporter: Senthilkumar > Fix For: 1.9.0 > > > Hello team, > I create a table in Drill using CTAS as > CREATE TABLE CTAS_TEST AS SELECT * FROM `hive.default`.`test` WHERE 1 = 0 > It runs successfully. > But the table is not getting created as there are 0 records getting returned > from the SELECT query. > CTAS should still go ahead and create the table with the column metadata. > When BI tools fire up multi-pass queries, with CTAS in the first query, the > subsequent queries fail because of a missing table. > In databases like SQL Server, Postgres, CTAS will create the table, even if > the SELECT doesnt return any rows. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5298) CTAS with 0 records from a SELECT query should create the table with metadata
[ https://issues.apache.org/jira/browse/DRILL-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885359#comment-15885359 ] Senthilkumar commented on DRILL-5298: - Khurram, I wanted to know if somebody is working on it already. > CTAS with 0 records from a SELECT query should create the table with metadata > - > > Key: DRILL-5298 > URL: https://issues.apache.org/jira/browse/DRILL-5298 > Project: Apache Drill > Issue Type: Bug > Components: Metadata, Query Planning & Optimization >Affects Versions: 1.9.0 > Environment: MapR 5.2 >Reporter: Senthilkumar > Fix For: 1.9.0 > > > Hello team, > I create a table in Drill using CTAS as > CREATE TABLE CTAS_TEST AS SELECT * FROM `hive.default`.`test` WHERE 1 = 0 > It runs successfully. > But the table is not getting created as there are 0 records getting returned > from the SELECT query. > CTAS should still go ahead and create the table with the column metadata. > When BI tools fire up multi-pass queries, with CTAS in the first query, the > subsequent queries fail because of a missing table. > In databases like SQL Server, Postgres, CTAS will create the table, even if > the SELECT doesnt return any rows. -- This message was sent by Atlassian JIRA (v6.3.15#6346)