[jira] [Created] (DRILL-4663) FileSystem properties Config block from filesystem plugin are not being applied for file writers
Jason Altekruse created DRILL-4663: -- Summary: FileSystem properties Config block from filesystem plugin are not being applied for file writers Key: DRILL-4663 URL: https://issues.apache.org/jira/browse/DRILL-4663 Project: Apache Drill Issue Type: Bug Reporter: Jason Altekruse Currently all of the record writers create their own empty filesystem configuration upon initialization. They do not currently apply the custom configurations that are included in the plugin configuration, which prevents users from setting custom properties on the write path. If possible this configuration should be shared with the readers. If there is a need to isolate this from the configuration used for the readers, we should still add the configurations from the storage plugin config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4445) Remove extra code to work around mixture of arrays and Lists used in Logical and Physical query plan nodes
[ https://issues.apache.org/jira/browse/DRILL-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4445. Resolution: Fixed Fixed in d24205d4e795a1aab54b64708dde1e7deeca668b > Remove extra code to work around mixture of arrays and Lists used in Logical > and Physical query plan nodes > -- > > Key: DRILL-4445 > URL: https://issues.apache.org/jira/browse/DRILL-4445 > Project: Apache Drill > Issue Type: Improvement >Reporter: Jason Altekruse >Assignee: Jason Altekruse > > The physical plan node classes for all of the operators currently use a mix > of arrays and Lists to refer to lists of incoming operators, expressions, and > other operator properties. This had lead to the introduction of several > utility methods for translating between the two representations, examples can > be seen in common/logical/data/Abstractbuilder. > This isn't a major problem, but the new operator test framework uses these > classes as a primary interface for setting up the tests. It seemed worthwhile > to just refactor the classes to be consistent so that the tests would all be > similar. There are a few changes to execution code, but they are all just > trivial changes to use the list based interfaces (length vs size(), set() > instead of arr[i] = foo, etc.) as Jackson just transparently handles both > types the same (which is why this hasn't really been a problem). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4437) Implement framework for testing operators in isolation
[ https://issues.apache.org/jira/browse/DRILL-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4437. Resolution: Fixed Fixed in d93a3633815ed1c7efd6660eae62b7351a2c9739 > Implement framework for testing operators in isolation > -- > > Key: DRILL-4437 > URL: https://issues.apache.org/jira/browse/DRILL-4437 > Project: Apache Drill > Issue Type: Test > Components: Tools, Build & Test >Reporter: Jason Altekruse >Assignee: Jason Altekruse > Fix For: 1.7.0 > > > Most of the tests written for Drill are end-to-end. We spin up a full > instance of the server, submit one or more SQL queries and check the results. > While integration tests like this are useful for ensuring that all features > are guaranteed to not break end-user functionality overuse of this approach > has caused a number of pain points. > Overall the tests end up running a lot of the exact same code, parsing and > planning many similar queries. > Creating consistent reproductions of issues, especially edge cases found in > clustered environments can be extremely difficult. Even the simpler case of > testing cases where operators are able to handle a particular series of > incoming batches of records has required hacks like generating large enough > files so that the scanners happen to break them up into separate batches. > These tests are brittle as they make assumptions about how the scanners will > work in the future. An example of when this could break, we might do perf > evaluation to find out we should be producing larger batches in some cases. > Existing tests that are trying to test multiple batches by producing a few > more records than the current threshold for batch size would not be testing > the same code paths. > We need to make more parts of the system testable without initializing the > entire Drill server, as well as making the different internal settings and > state of the server configurable for tests. > This is a first effort to enable testing the physical operators in Drill by > mocking the components of the system necessary to enable operators to > initialize and execute. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4551) Add some missing functions that are generated by Tableau (cot, regex_matches, split_part, isdate)
Jason Altekruse created DRILL-4551: -- Summary: Add some missing functions that are generated by Tableau (cot, regex_matches, split_part, isdate) Key: DRILL-4551 URL: https://issues.apache.org/jira/browse/DRILL-4551 Project: Apache Drill Issue Type: Improvement Reporter: Jason Altekruse Assignee: Jason Altekruse Several of these functions do not appear to be standard SQL functions, but they are available in several other popular databases like SQL Server, Oracle and Postgres. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4482) Avro no longer selects data correctly from a sub-structure
[ https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4482. Resolution: Fixed Assignee: Jason Altekruse (was: Stefán Baxter) Fixed in 64ab0a8ec9d98bf96f4d69274dddc180b8efe263 > Avro no longer selects data correctly from a sub-structure > -- > > Key: DRILL-4482 > URL: https://issues.apache.org/jira/browse/DRILL-4482 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Avro >Affects Versions: 1.6.0 >Reporter: Stefán Baxter >Assignee: Jason Altekruse >Priority: Blocker > Fix For: 1.6.0 > > > Parquet: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/processed/<>/transactions` as s limit 1; > ++ > | EXPR$0 | > ++ > | 87.55.171.210 | > ++ > 1 row selected (1.184 seconds) > Avro: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/streaming/<>/transactions` as s limit 1; > +-+ > | EXPR$0 | > +-+ > | null| > +-+ > 1 row selected (0.29 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4492) TestMergeJoinWithSchemaChanges depends on order files in a directory are read to pass, should be refactored
Jason Altekruse created DRILL-4492: -- Summary: TestMergeJoinWithSchemaChanges depends on order files in a directory are read to pass, should be refactored Key: DRILL-4492 URL: https://issues.apache.org/jira/browse/DRILL-4492 Project: Apache Drill Issue Type: Bug Reporter: Jason Altekruse Assignee: amit hadke I was running unit tests and saw a failure that seemed unrelated to the changes I was making. The test runs fine in isolation both from IntelliJ and the maven command line (with -Dtest=TestMergeJoinWithSchemaChanges in the java-exec module). Not sure what about the particular test run made it change the order the files were read, but we cannot rely on any particular system to read the files in a given order. The test should be updated to remove this assumption. This is the error I received on one run of the full unit tests: {code} testMissingAndNewColumns(TestMergeJoinWithSchemaChanges.java:265) Caused by: org.apache.drill.common.exceptions.UserRemoteException: UNSUPPORTED_OPERATION ERROR: Sort doesn't currently supportsorts with changing schemas Fragment 0:0 [Error Id: bf84bffb-f643-493b-9ed5-720eb18d55f2 on 10.1.10.225:31010] (org.apache.drill.exec.exception.SchemaChangeException) Sort currently only supports a single schema. org.apache.drill.exec.physical.impl.sort.SortRecordBatchBuilder.build():146 org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext():442 org.apache.drill.exec.record.AbstractRecordBatch.next():162 org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94 org.apache.drill.exec.record.AbstractRecordBatch.next():162 org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.RecordIterator.nextBatch():97 org.apache.drill.exec.record.RecordIterator.next():183 org.apache.drill.exec.record.RecordIterator.prepare():167 org.apache.drill.exec.physical.impl.join.JoinStatus.prepare():87 org.apache.drill.exec.physical.impl.join.MergeJoinBatch.innerNext():162 org.apache.drill.exec.record.AbstractRecordBatch.next():162 org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 org.apache.drill.exec.record.AbstractRecordBatch.next():162 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4332) tests in TestFrameworkTest fail in Java 8
[ https://issues.apache.org/jira/browse/DRILL-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4332. Resolution: Fixed Fix Version/s: (was: Future) 1.6.0 Fixed in 447b093cd2b05bfeae001844a7e3573935e84389 > tests in TestFrameworkTest fail in Java 8 > - > > Key: DRILL-4332 > URL: https://issues.apache.org/jira/browse/DRILL-4332 > Project: Apache Drill > Issue Type: Sub-task > Components: Tools, Build & Test >Affects Versions: 1.5.0 >Reporter: Deneche A. Hakim >Assignee: Laurent Goujon > Fix For: 1.6.0 > > > the following unit tests fail in Java 8: > {noformat} > TestFrameworkTest.testRepeatedColumnMatching > TestFrameworkTest.testCSVVerificationOfOrder_checkFailure > {noformat} > The tests expect the query to fail with a specific error message. The message > generated by DrillTestWrapper.compareMergedVectors assumes a specific order > in a map keySet (which we shouldn't). In Java 8 it seems the order changed > which causes a slightly different error message -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4486) Expression serializer incorrectly serializes escaped characters
[ https://issues.apache.org/jira/browse/DRILL-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4486. Resolution: Fixed Fix Version/s: 1.6.0 Fixed in 80316f3f8bef866720f99e609fe758ec8e0c4612 > Expression serializer incorrectly serializes escaped characters > --- > > Key: DRILL-4486 > URL: https://issues.apache.org/jira/browse/DRILL-4486 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips > Fix For: 1.6.0 > > > the drill expression parser requires backslashes to be escaped. But the > ExpressionStringBuilder is not properly escaping them. This causes problems, > especially in the case of regex expressions run with parallel execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4375) Fix the maven release profile, broken by jdbc jar size enforcer added in DRILL-4291
[ https://issues.apache.org/jira/browse/DRILL-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4375. Resolution: Fixed Fix Version/s: 1.6.0 Fixed in 1f29914fc5c7d1e36651ac28167804c4012501fe > Fix the maven release profile, broken by jdbc jar size enforcer added in > DRILL-4291 > --- > > Key: DRILL-4375 > URL: https://issues.apache.org/jira/browse/DRILL-4375 > Project: Apache Drill > Issue Type: Bug >Reporter: Jason Altekruse >Assignee: Jason Altekruse > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4471) Add unit test for the Drill Web UI
Jason Altekruse created DRILL-4471: -- Summary: Add unit test for the Drill Web UI Key: DRILL-4471 URL: https://issues.apache.org/jira/browse/DRILL-4471 Project: Apache Drill Issue Type: Test Reporter: Jason Altekruse Assignee: Jason Altekruse While the Web UI isn't being very actively developed, a few times changes to the Drill build or internal parts of the server have broken parts of the Web UI. As the web UI is a primary interface for viewing cluster information, cancelling queries, configuring storage and other tasks, we really should add automated tests for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4451) Improve operator unit tests to allow for direct inspection of the sequence of result batches
Jason Altekruse created DRILL-4451: -- Summary: Improve operator unit tests to allow for direct inspection of the sequence of result batches Key: DRILL-4451 URL: https://issues.apache.org/jira/browse/DRILL-4451 Project: Apache Drill Issue Type: Test Components: Tools, Build & Test Reporter: Jason Altekruse Assignee: Jason Altekruse The first version of the operator test framework allows for comparison of the result set with a baseline, but does not give a way to specify the expected batch boundaries. All of the batches are combined together before they are compared (sharing code with the existing test infrastructure for complete SQL queries). The framework should also include a way to directly inspect SV2 and SV4 batches that are produced by operators like filter and sort. These structures are used to store a view into the incoming data (an SV2 is a bitmask for everything that matched the filter and an SV4 is used to represent cross-batch pointers to reflect the sorted order of a series of batches without rewriting them). Currently the test just follows the pointers to iterate over the values as they would appear after a rewrite of the data (by the SelectionVectorRemover operator). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4450) Improve operator unit tests to allow for setting custom options on a test
Jason Altekruse created DRILL-4450: -- Summary: Improve operator unit tests to allow for setting custom options on a test Key: DRILL-4450 URL: https://issues.apache.org/jira/browse/DRILL-4450 Project: Apache Drill Issue Type: Test Reporter: Jason Altekruse Assignee: Jason Altekruse The initial work done on the operator test framework included mocking of the system/session options just complete enough to get the first ~10 operators to execute a single query. These values are currently shared across all tests. To test all code paths we will need a way to set options from individual tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4448) Specification of Ordering (ASC, DESC) on a sort plan node uses Strings for it construction, should also allow for use of the corresponding Calcite Enums
Jason Altekruse created DRILL-4448: -- Summary: Specification of Ordering (ASC, DESC) on a sort plan node uses Strings for it construction, should also allow for use of the corresponding Calcite Enums Key: DRILL-4448 URL: https://issues.apache.org/jira/browse/DRILL-4448 Project: Apache Drill Issue Type: Improvement Reporter: Jason Altekruse -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4439) Improve new unit operator tests to handle operators that expect RawBatchBuffers off of the wire, such as the UnorderedReciever and MergingReciever
Jason Altekruse created DRILL-4439: -- Summary: Improve new unit operator tests to handle operators that expect RawBatchBuffers off of the wire, such as the UnorderedReciever and MergingReciever Key: DRILL-4439 URL: https://issues.apache.org/jira/browse/DRILL-4439 Project: Apache Drill Issue Type: Test Reporter: Jason Altekruse Assignee: Jason Altekruse -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4437) Implement framework for testing operators in isolation
Jason Altekruse created DRILL-4437: -- Summary: Implement framework for testing operators in isolation Key: DRILL-4437 URL: https://issues.apache.org/jira/browse/DRILL-4437 Project: Apache Drill Issue Type: Test Components: Tools, Build & Test Reporter: Jason Altekruse Assignee: Jason Altekruse Fix For: 1.6.0 Most of the tests written for Drill are end-to-end. We spin up a full instance of the server, submit one or more SQL queries and check the results. While integration tests like this are useful for ensuring that all features are guaranteed to not break end-user functionality overuse of this approach has caused a number of pain points. Overall the tests end up running a lot of the exact same code, parsing and planning many similar queries. Creating consistent reproductions of issues, especially edge cases found in clustered environments can be extremely difficult. Even the simpler case of testing cases where operators are able to handle a particular series of incoming batches of records has required hacks like generating large enough files so that the scanners happen to break them up into separate batches. These tests are brittle as they make assumptions about how the scanners will work in the future. An example of when this could break, we might do perf evaluation to find out we should be producing larger batches in some cases. Existing tests that are trying to test multiple batches by producing a few more records than the current threshold for batch size would not be testing the same code paths. We need to make more parts of the system testable without initializing the entire Drill server, as well as making the different internal settings and state of the server configurable for tests. This is a first effort to enable testing the physical operators in Drill by mocking the components of the system necessary to enable operators to initialize and execute. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4438) Fix out of memory failure identified by new operator unit tests
Jason Altekruse created DRILL-4438: -- Summary: Fix out of memory failure identified by new operator unit tests Key: DRILL-4438 URL: https://issues.apache.org/jira/browse/DRILL-4438 Project: Apache Drill Issue Type: Bug Reporter: Jason Altekruse Assignee: Jason Altekruse Priority: Critical -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3930) Remove direct references to TopLevelAllocator from unit tests
[ https://issues.apache.org/jira/browse/DRILL-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-3930. Resolution: Fixed Assignee: (was: Chris Westin) Fix Version/s: 1.3.0 > Remove direct references to TopLevelAllocator from unit tests > - > > Key: DRILL-3930 > URL: https://issues.apache.org/jira/browse/DRILL-3930 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 >Reporter: Chris Westin > Fix For: 1.3.0 > > > The RootAllocatorFactory should be used throughout the code to allow us to > change allocators via configuration or other software choices. Some unit > tests still reference TopLevelAllocator directly. We also need to do a better > job of handling exceptions that can be handled by close()ing an allocator > that isn't in the proper state (remaining open child allocators, outstanding > buffers, etc). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4394) Can’t build the custom functions for Apache Drill 1.5.0
[ https://issues.apache.org/jira/browse/DRILL-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4394. Resolution: Fixed Assignee: Jason Altekruse > Can’t build the custom functions for Apache Drill 1.5.0 > --- > > Key: DRILL-4394 > URL: https://issues.apache.org/jira/browse/DRILL-4394 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Kumiko Yada >Assignee: Jason Altekruse >Priority: Critical > > I tried to build the custom functions for Drill 1.5.0, but I got the below > error: > Failure to find org.apache.drill.exec:drill-java-exec:jar:1.5.0 in > http://repo.maven.apache.org/maven2 was cached in the local repository. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4435) Add YARN jar required for running Drill on cluster with Kerberos
Jason Altekruse created DRILL-4435: -- Summary: Add YARN jar required for running Drill on cluster with Kerberos Key: DRILL-4435 URL: https://issues.apache.org/jira/browse/DRILL-4435 Project: Apache Drill Issue Type: Improvement Reporter: Jason Altekruse Assignee: Jason Altekruse As described here, Drill currently requires adding a YARN jar to the classpath to run on Kerberos. If it doesn't conflict with any jars currently included with Drill we should just include this in the distribution to make this work out of the box. http://www.dremio.com/blog/securing-sql-on-hadoop-part-2-installing-and-configuring-drill/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3229) Create a new EmbeddedVector
[ https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-3229. Resolution: Fixed Fix Version/s: (was: Future) 1.4.0 > Create a new EmbeddedVector > --- > > Key: DRILL-3229 > URL: https://issues.apache.org/jira/browse/DRILL-3229 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill >Reporter: Jacques Nadeau >Assignee: Steven Phillips > Fix For: 1.4.0 > > > Embedded Vector will leverage a binary encoding for holding information about > type for each individual field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-284) Publish artifacts to maven for Drill
[ https://issues.apache.org/jira/browse/DRILL-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-284. --- Resolution: Fixed Fix Version/s: (was: Future) 1.1.0 > Publish artifacts to maven for Drill > > > Key: DRILL-284 > URL: https://issues.apache.org/jira/browse/DRILL-284 > Project: Apache Drill > Issue Type: Task >Reporter: Timothy Chen > Fix For: 1.1.0 > > > We need to publish our artifacts and version to maven so other dependencies > (Whirr, or other ones that wants maven include) can use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4426) Review storage and format plugins like parquet, JSON, Avro, Hive, etc. to ensure they fail with useful error messages including filename, column, etc.
Jason Altekruse created DRILL-4426: -- Summary: Review storage and format plugins like parquet, JSON, Avro, Hive, etc. to ensure they fail with useful error messages including filename, column, etc. Key: DRILL-4426 URL: https://issues.apache.org/jira/browse/DRILL-4426 Project: Apache Drill Issue Type: Improvement Reporter: Jason Altekruse Assignee: Jason Altekruse A number of these issues have been fixed in the past in individual instances. but we should review any remaining cases where a failure does not produce an error message with as much useful context information as possible. Filename should always be possible, column or record/line number where possible would be good. One such case with a low level parquet failure was reported here. http://search-hadoop.com/m/qRVAX48ao4xTDne/drill+Query+Return+Error+because+of+a+single+file=Query+Return+Error+because+of+a+single+file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4383) Allow passign custom configuration options to a file system through the storage plugin config
Jason Altekruse created DRILL-4383: -- Summary: Allow passign custom configuration options to a file system through the storage plugin config Key: DRILL-4383 URL: https://issues.apache.org/jira/browse/DRILL-4383 Project: Apache Drill Issue Type: Improvement Components: Storage - Other Reporter: Jason Altekruse Assignee: Jason Altekruse Fix For: 1.6.0 A similar feature already exists in the Hive and Hbase plugins, it simply provides a key/value map for passing custom configuration options to the underlying storage system. This would be useful for the filesystem plugin to configure S3 without needing to create a core-site.xml file or restart Drill. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4230) NullReferenceException when SELECTing from empty mongo collection
[ https://issues.apache.org/jira/browse/DRILL-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4230. Resolution: Fixed Fix Version/s: 1.5.0 Fixed in ed2f1ca8ed3c0ebac7e33494db6749851fc2c970 This was applied separately to the 1.5 release branch, so the commit there has identical content and the same commit message, but will have a different hash. > NullReferenceException when SELECTing from empty mongo collection > - > > Key: DRILL-4230 > URL: https://issues.apache.org/jira/browse/DRILL-4230 > Project: Apache Drill > Issue Type: Bug > Components: Storage - MongoDB >Affects Versions: 1.3.0 >Reporter: Brick Shitting Bird Jr. >Assignee: Jason Altekruse > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4375) Fix the maven release profile, broken by jdbc jar size enforcer added in DRILL-4291
Jason Altekruse created DRILL-4375: -- Summary: Fix the maven release profile, broken by jdbc jar size enforcer added in DRILL-4291 Key: DRILL-4375 URL: https://issues.apache.org/jira/browse/DRILL-4375 Project: Apache Drill Issue Type: Bug Reporter: Jason Altekruse Assignee: Jason Altekruse -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4295) Obsolete protobuf generated files under protocol/
[ https://issues.apache.org/jira/browse/DRILL-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4295. Resolution: Fixed Fixed in fbb0165def5e23b6b2f6a690d47dc5fbeb2bdbcb > Obsolete protobuf generated files under protocol/ > - > > Key: DRILL-4295 > URL: https://issues.apache.org/jira/browse/DRILL-4295 > Project: Apache Drill > Issue Type: Task > Components: Tools, Build & Test >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Trivial > Fix For: 1.6.0 > > > The following two files don't have a protobuf definition anymore, and are not > generated when running {{mvn process-sources -P proto-compile}} under > {{protocol/}}: > {noformat} > src/main/java/org/apache/drill/exec/proto/beans/RpcFailure.java > src/main/java/org/apache/drill/exec/proto/beans/ViewPointer.java > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4331) TestFlattenPlanning.testFlattenPlanningAvoidUnnecessaryProject fail in Java 8
[ https://issues.apache.org/jira/browse/DRILL-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4331. Resolution: Fixed Fixed in 32da4675e8bf1358b863532daadd2769f380600f > TestFlattenPlanning.testFlattenPlanningAvoidUnnecessaryProject fail in Java 8 > - > > Key: DRILL-4331 > URL: https://issues.apache.org/jira/browse/DRILL-4331 > Project: Apache Drill > Issue Type: Sub-task > Components: Tools, Build & Test >Affects Versions: 1.5.0 >Reporter: Deneche A. Hakim > Fix For: 1.6.0 > > > This test expects the following Project in the query plan: > {noformat} > Project(EXPR$0=[$1], rownum=[$0]) > {noformat} > In Java 8, for some reason the scan operator exposes the columns in reverse > order which causes the project to be different than the one expected: > {noformat} > Project(EXPR$0=[$0], rownum=[$1]) > {noformat} > The plan is still correct, so the test must be fixed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4359) EndpointAffinity missing equals method
[ https://issues.apache.org/jira/browse/DRILL-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4359. Resolution: Fixed Fix Version/s: 1.6.0 Fixed in 6b1b4d257b89e5579140e75388cd37db5563a6a8 > EndpointAffinity missing equals method > -- > > Key: DRILL-4359 > URL: https://issues.apache.org/jira/browse/DRILL-4359 > Project: Apache Drill > Issue Type: Improvement >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Trivial > Fix For: 1.6.0 > > > EndpointAffinity is a placeholder class, but has no equals method to allow > comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4353) Expired sessions in web server are not cleaning up resources, leading to resource leak
[ https://issues.apache.org/jira/browse/DRILL-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4353. Resolution: Fixed Fixed in 282dfd762f1bd6628b293c68b20cdff321bd70a3 This was also merged into the 1.5 release branch, that commit has a different hash, but there were other changes that had already been merged into master that we didn't want to include in the release. > Expired sessions in web server are not cleaning up resources, leading to > resource leak > -- > > Key: DRILL-4353 > URL: https://issues.apache.org/jira/browse/DRILL-4353 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP, Web Server >Affects Versions: 1.5.0 >Reporter: Venki Korukanti >Assignee: Venki Korukanti >Priority: Blocker > Fix For: 1.5.0 > > > Currently we store the session resources (including DrillClient) in attribute > {{SessionAuthentication}} object which implements > {{HttpSessionBindingListener}}. Whenever a session is invalidated, all > attributes are removed and if an attribute class implements > {{HttpSessionBindingListener}}, listener is informed. > {{SessionAuthentication}} implementation of {{HttpSessionBindingListener}} > logs out the user which includes cleaning up the resources as well, but > {{SessionAuthentication}} relies on ServletContext stored in thread local > variable (see > [here|https://github.com/eclipse/jetty.project/blob/jetty-9.1.5.v20140505/jetty-security/src/main/java/org/eclipse/jetty/security/authentication/SessionAuthentication.java#L88]). > In case of thread that cleans up the expired sessions there is no > {{ServletContext}} in thread local variable, leading to not logging out the > user properly and resource leak. > Fix: Add {{HttpSessionEventListener}} to cleanup the > {{SessionAuthentication}} and resources every time a HttpSession is expired > or invalidated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4361) Allow for FileSystemPlugin subclasses to override FormatCreator
[ https://issues.apache.org/jira/browse/DRILL-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4361. Resolution: Fixed Fix Version/s: 1.6.0 Fixed in 5e57b0e3b44f46aa93bf82f366eb3a3f61990da3 > Allow for FileSystemPlugin subclasses to override FormatCreator > --- > > Key: DRILL-4361 > URL: https://issues.apache.org/jira/browse/DRILL-4361 > Project: Apache Drill > Issue Type: Improvement >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Minor > Fix For: 1.6.0 > > > FileSystemPlugin subclasses are not able to customize plugins, as > FormatCreator in created in FileSystemPlugin constructor and immediately used > to create SchemaFactory instance. > FormatCreator instantiation should be moved to a protected method so that > subclass can choose to implement it differently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4225) TestDateFunctions#testToChar fails when the locale is non-English
[ https://issues.apache.org/jira/browse/DRILL-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4225. Resolution: Fixed Fix Version/s: 1.6.0 Fixed in 4e9b82562cf0fc46e759b89857ffb85e129a178b > TestDateFunctions#testToChar fails when the locale is non-English > - > > Key: DRILL-4225 > URL: https://issues.apache.org/jira/browse/DRILL-4225 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.4.0 > Environment: Mac OS X 10.10.5 >Reporter: Akihiko Kusanagi > Fix For: 1.6.0 > > > Set the locale to ja_JP on Mac OS X: > {noformat} > $ defaults read -g AppleLocale > ja_JP > {noformat} > TestDateFunctions#testToChar fails with the following output: > {noformat} > Running org.apache.drill.exec.fn.impl.TestDateFunctions#testToChar > 2008-2-23 > 12 20 30 > 2008 2 23 12:00:00 > ... > Tests run: 6, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 14.333 sec > <<< FAILURE! - in org.apache.drill.exec.fn.impl.TestDateFunctions > testToChar(org.apache.drill.exec.fn.impl.TestDateFunctions) Time elapsed: > 2.793 sec <<< FAILURE! > org.junit.ComparisonFailure: expected:<2008-[Feb]-23> but was:<2008-[2]-23> > at > org.apache.drill.exec.fn.impl.TestDateFunctions.testCommon(TestDateFunctions.java:66) > at > org.apache.drill.exec.fn.impl.TestDateFunctions.testToChar(TestDateFunctions.java:139) > ... > Failed tests: > TestDateFunctions.testToChar:139->testCommon:66 expected:<2008-[Feb]-23> > but was:<2008-[2]-23> > {noformat} > Test queries are like this: > {noformat} > to_char((cast('2008-2-23' as date)), '-MMM-dd') > to_char(cast('12:20:30' as time), 'HH mm ss') > to_char(cast('2008-2-23 12:00:00' as timestamp), ' MMM dd HH:mm:ss') > {noformat} > This failure occurs because org.joda.time.format.DateTimeFormat interprets > the pattern 'MMM' differently depending on the locale. This will probably > occur in other OS platforms. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4032) Drill unable to parse json files with schema changes
[ https://issues.apache.org/jira/browse/DRILL-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4032. Resolution: Fixed Fix Version/s: 1.4.0 > Drill unable to parse json files with schema changes > > > Key: DRILL-4032 > URL: https://issues.apache.org/jira/browse/DRILL-4032 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types, Storage - JSON >Affects Versions: 1.3.0 >Reporter: Rahul Challapalli >Assignee: Steven Phillips >Priority: Blocker > Fix For: 1.4.0 > > > git.commit.id.abbrev=bb69f22 > {code} > select d.col2.col3 from reg1 d; > Error: DATA_READ ERROR: Error parsing JSON - index: 0, length: 4 (expected: > range(0, 0)) > File /drill/testdata/reg1/a.json > Record 2 > Fragment 0:0 > {code} > The folder reg1 contains 2 files > File 1 : a.json > {code} > {"col1": "val1","col2": null} > {"col1": "val1","col2": {"col3":"abc", "col4":"xyz"}} > {code} > File 2 : b.json > {code} > {"col1": "val1","col2": null} > {"col1": "val1","col2": null} > {code} > Exception from the log file : > {code} > [Error Id: a7e3c716-838d-4f8f-9361-3727b98f04cd ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) > ~[drill-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > org.apache.drill.exec.store.easy.json.JSONRecordReader.handleAndRaise(JSONRecordReader.java:165) > [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > org.apache.drill.exec.store.easy.json.JSONRecordReader.next(JSONRecordReader.java:205) > [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:183) > [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:119) > [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:113) > [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:103) > [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:130) > [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:156) > [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:119) > [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) > [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80) > [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) > [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256) > [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250) > [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at java.security.AccessController.doPrivileged(Native Method) > [na:1.7.0_71] > at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_71] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) > [hadoop-common-2.7.0-mapr-1506.jar:na] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250) > [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_71] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_71] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] > Caused by: java.lang.IndexOutOfBoundsException: index: 0, length: 4 > (expected: range(0, 0)) > at
[jira] [Resolved] (DRILL-4048) Parquet reader corrupts dictionary encoded binary columns
[ https://issues.apache.org/jira/browse/DRILL-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4048. Resolution: Fixed Fix Version/s: 1.4.0 > Parquet reader corrupts dictionary encoded binary columns > - > > Key: DRILL-4048 > URL: https://issues.apache.org/jira/browse/DRILL-4048 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.3.0 >Reporter: Rahul Challapalli >Assignee: Jason Altekruse >Priority: Blocker > Fix For: 1.4.0 > > Attachments: lineitem_dic_enc.parquet > > > git.commit.id.abbrev=04c01bd > The below query returns corrupted data (not even showing up here) for binary > columns > {code} > select * from `lineitem_dic_enc.parquet` limit 1; > +-+++---+-+--+-++---+---+-+---+++-+--+ > | l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | > l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | > l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct | > l_shipmode |l_comment | > +-+++---+-+--+-++---+---+-+---+++-+--+ > | 1 | 1552 | 93 | 1 | 17.0| > 24710.35 | 0.04| 0.02 | | | > 1996-03-13 | 1996-02-12| 1996-03-22 | DELIVER IN PE | T | > egular courts above the | > +-+++---+-+--+-++---+---+-+---+++-+--+ > {code} > The same query from an older build (git.commit.id.abbrev=839f8da) > {code} > select * from `lineitem_dic_enc.parquet` limit 1; > +-+++---+-+--+-++---+---+-+---+++-+--+ > | l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | > l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | > l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct | > l_shipmode |l_comment | > +-+++---+-+--+-++---+---+-+---+++-+--+ > | 1 | 1552 | 93 | 1 | 17.0| > 24710.35 | 0.04| 0.02 | N | O | > 1996-03-13 | 1996-02-12| 1996-03-22 | DELIVER IN PERSON | TRUCK > | egular courts above the | > +-+++---+-+--+-++---+---+-+---+++-+--+ > {code} > Below is the output of the parquet-meta command for this dataset > {code} > creator: parquet-mr > file schema: root > --- > l_orderkey: REQUIRED INT32 R:0 D:0 > l_partkey: REQUIRED INT32 R:0 D:0 > l_suppkey: REQUIRED INT32 R:0 D:0 > l_linenumber:REQUIRED INT32 R:0 D:0 > l_quantity: REQUIRED DOUBLE R:0 D:0 > l_extendedprice: REQUIRED DOUBLE R:0 D:0 > l_discount: REQUIRED DOUBLE R:0 D:0 > l_tax: REQUIRED DOUBLE R:0 D:0 > l_returnflag:REQUIRED BINARY O:UTF8 R:0 D:0 > l_linestatus:REQUIRED BINARY O:UTF8 R:0 D:0 > l_shipdate: REQUIRED INT32 O:DATE R:0 D:0 > l_commitdate:REQUIRED INT32 O:DATE R:0 D:0 > l_receiptdate: REQUIRED INT32 O:DATE R:0 D:0 > l_shipinstruct: REQUIRED BINARY O:UTF8 R:0 D:0 > l_shipmode: REQUIRED BINARY O:UTF8 R:0 D:0 > l_comment: REQUIRED BINARY O:UTF8 R:0 D:0 > row group 1: RC:60175 TS:3049610 >
[jira] [Resolved] (DRILL-4243) CTAS with partition by, results in Out Of Memory
[ https://issues.apache.org/jira/browse/DRILL-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4243. Resolution: Fixed Fix Version/s: 1.5.0 > CTAS with partition by, results in Out Of Memory > > > Key: DRILL-4243 > URL: https://issues.apache.org/jira/browse/DRILL-4243 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.5.0 > Environment: 4 node cluster >Reporter: Khurram Faraaz > Fix For: 1.5.0 > > > CTAS with partition by, results in Out Of Memory. It seems to be coming from > ExternalSortBatch > Details of Drill are > {noformat} > version commit_id commit_message commit_time build_email > build_time > 1.5.0-SNAPSHOTe4372f224a4b474494388356355a53808092a67a > DRILL-4242: Updates to storage-mongo03.01.2016 @ 15:31:13 PST > Unknown 04.01.2016 @ 01:02:29 PST > create table `tpch_single_partition/lineitem` partition by (l_moddate) as > select l.*, l_shipdate - extract(day from l_shipdate) + 1 l_moddate from > cp.`tpch/lineitem.parquet` l; > [1;31mError: RESOURCE ERROR: One or more nodes ran out of memory while > executing the query. > Fragment 0:0 > [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010] > (state=,code=0)[m > java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory > while executing the query. > Fragment 0:0 > [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010] > at > org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247) > at > org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1923) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:73) > at > net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404) > at > net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351) > at > net.hydromatic.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:338) > at > net.hydromatic.avatica.AvaticaStatement.execute(AvaticaStatement.java:69) > at > org.apache.drill.jdbc.impl.DrillStatementImpl.execute(DrillStatementImpl.java:101) > at sqlline.Commands.execute(Commands.java:841) > at sqlline.Commands.sql(Commands.java:751) > at sqlline.SqlLine.dispatch(SqlLine.java:746) > at sqlline.SqlLine.runCommands(SqlLine.java:1651) > at sqlline.Commands.run(Commands.java:1304) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36) > at sqlline.SqlLine.dispatch(SqlLine.java:742) > at sqlline.SqlLine.initArgs(SqlLine.java:553) > at sqlline.SqlLine.begin(SqlLine.java:596) > at sqlline.SqlLine.start(SqlLine.java:375) > at sqlline.SqlLine.main(SqlLine.java:268) > Caused by: org.apache.drill.common.exceptions.UserRemoteException: RESOURCE > ERROR: One or more nodes ran out of memory while executing the query. > Fragment 0:0 > [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010] > at > org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119) > at > org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113) > at > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46) > at > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31) > at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:69) > at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:400) > at > org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105) > at > org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:264) > at > org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:142) > at > org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:298) > at > org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:269) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) > at >
[jira] [Resolved] (DRILL-4163) Support schema changes for MergeJoin operator.
[ https://issues.apache.org/jira/browse/DRILL-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4163. Resolution: Fixed Fix Version/s: 1.5.0 Fixed in cc9175c13270660ffd9ec2ddcbc70780dd72dada > Support schema changes for MergeJoin operator. > -- > > Key: DRILL-4163 > URL: https://issues.apache.org/jira/browse/DRILL-4163 > Project: Apache Drill > Issue Type: Improvement >Reporter: amit hadke >Assignee: Jason Altekruse > Fix For: 1.5.0 > > > Since external sort operator supports schema changes, allow use of union > types in merge join to support for schema changes. > For now, we assume that merge join always works on record batches from sort > operator. Thus merging schemas and promoting to union vectors is already > taken care by sort operator. > Test Cases: > 1) Only one side changes schema (join on union type and primitive type) > 2) Both sids change schema on all columns. > 3) Join between numeric types and string types. > 4) Missing columns - each batch has different columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4205) Simple query hit IndexOutOfBoundException
[ https://issues.apache.org/jira/browse/DRILL-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4205. Resolution: Fixed Fix Version/s: 1.5.0 > Simple query hit IndexOutOfBoundException > -- > > Key: DRILL-4205 > URL: https://issues.apache.org/jira/browse/DRILL-4205 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.4.0 >Reporter: Dechang Gu >Assignee: Dechang Gu > Fix For: 1.5.0 > > > The following query failed due to IOB: > 0: jdbc:drill:schema=wf_pigprq100> select * from > `store_sales/part-m-00073.parquet`; > Error: SYSTEM ERROR: IndexOutOfBoundsException: srcIndex: 1048587 > Fragment 0:0 > [Error Id: ad8d2bc0-259f-483c-9024-93865963541e on ucs-node4.perf.lab:31010] > (org.apache.drill.common.exceptions.DrillRuntimeException) Error in parquet > record reader. > Message: > Hadoop path: /tpcdsPigParq/SF100/store_sales/part-m-00073.parquet > Total records read: 135280 > Mock records read: 0 > Records to read: 1424 > Row group index: 0 > Records in row group: 3775712 > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message pig_schema { > optional int64 ss_sold_date_sk; > optional int64 ss_sold_time_sk; > optional int64 ss_item_sk; > optional int64 ss_customer_sk; > optional int64 ss_cdemo_sk; > optional int64 ss_hdemo_sk; > optional int64 ss_addr_sk; > optional int64 ss_store_sk; > optional int64 ss_promo_sk; > optional int64 ss_ticket_number; > optional int64 ss_quantity; > optional double ss_wholesale_cost; > optional double ss_list_price; > optional double ss_sales_price; > optional double ss_ext_discount_amt; > optional double ss_ext_sales_price; > optional double ss_ext_wholesale_cost; > optional double ss_ext_list_price; > optional double ss_ext_tax; > optional double ss_coupon_amt; > optional double ss_net_paid; > optional double ss_net_paid_inc_tax; > optional double ss_net_profit; > } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4192) Dir0 and Dir1 from drill-1.4 are messed up
[ https://issues.apache.org/jira/browse/DRILL-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4192. Resolution: Fixed Fix Version/s: 1.5.0 > Dir0 and Dir1 from drill-1.4 are messed up > -- > > Key: DRILL-4192 > URL: https://issues.apache.org/jira/browse/DRILL-4192 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.4.0 >Reporter: Krystal >Assignee: Aman Sinha >Priority: Blocker > Fix For: 1.5.0 > > > I have the following directories: > /drill/testdata/temp1/abc/dt=2014-12-30/lineitem.parquet > /drill/testdata/temp1/abc/dt=2014-12-31/lineitem.parquet > The following queries returned incorrect data. > select dir0,dir1 from dfs.`/drill/testdata/temp1` limit 2; > ++---+ > | dir0 | dir1 | > ++---+ > | dt=2014-12-30 | null | > | dt=2014-12-30 | null | > ++---+ > select dir0 from dfs.`/drill/testdata/temp1` limit 2; > ++ > | dir0 | > ++ > | dt=2014-12-31 | > | dt=2014-12-31 | > ++ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4336) Fix weird interaction between maven-release, maven-enforcer and RAT plugins
Jason Altekruse created DRILL-4336: -- Summary: Fix weird interaction between maven-release, maven-enforcer and RAT plugins Key: DRILL-4336 URL: https://issues.apache.org/jira/browse/DRILL-4336 Project: Apache Drill Issue Type: Bug Reporter: Jason Altekruse While trying to make the 1.5.0 release I ran into a bizarre failure from RAT complaining about a file it should have been ignoring according to the plugin configuration. Disabling the newly added maven-enforcer plugin "fixed" the issue, but we need to keep this in the build to make sure new dependencies don't creep into the JDBC driver that is supposed to be as small as possible. For the sake of the release the jdbc-all jar's size was checked manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4322) User exception created upon failed DROP TABLE eats the underlying exception
Jason Altekruse created DRILL-4322: -- Summary: User exception created upon failed DROP TABLE eats the underlying exception Key: DRILL-4322 URL: https://issues.apache.org/jira/browse/DRILL-4322 Project: Apache Drill Issue Type: Bug Reporter: Jason Altekruse Assignee: Jason Altekruse Reported in this thread on the list: http://mail-archives.apache.org/mod_mbox/drill-user/201601.mbox/%3CCAMpYv7Cd%2BRuj5L5RAOOe4CoVNxjU6HOSuH2m0XTcyzjzuKiadw%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4322) User exception created upon failed DROP TABLE eats the underlying exception
[ https://issues.apache.org/jira/browse/DRILL-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4322. Resolution: Fixed Fix Version/s: 1.5.0 Fixed in 1b51850f31c02f0a7fa77f0258a83081a9d9e826 > User exception created upon failed DROP TABLE eats the underlying exception > --- > > Key: DRILL-4322 > URL: https://issues.apache.org/jira/browse/DRILL-4322 > Project: Apache Drill > Issue Type: Bug >Reporter: Jason Altekruse >Assignee: Jason Altekruse > Fix For: 1.5.0 > > > Reported in this thread on the list: > http://mail-archives.apache.org/mod_mbox/drill-user/201601.mbox/%3CCAMpYv7Cd%2BRuj5L5RAOOe4CoVNxjU6HOSuH2m0XTcyzjzuKiadw%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4259) Add new functional tests to ensure that failures can be detected independent of the testing environment
Jason Altekruse created DRILL-4259: -- Summary: Add new functional tests to ensure that failures can be detected independent of the testing environment Key: DRILL-4259 URL: https://issues.apache.org/jira/browse/DRILL-4259 Project: Apache Drill Issue Type: Test Reporter: Jason Altekruse In DRILL-4243 an out of memory issue was fixed after a change to the memory allocator made memory limits more strict. While the regression tests had been run by the team at Dremio prior to merging the patch, running the tests on a cluster with more cores changed the memory limits on the queries and caused several tests to fail. While changes of this magnitude are not going to be common, we should have a test suite that reliably fails independent of the environment it is run (assuming that there are sufficient resources for the tests to run). It would be good to at least try to reproduce this failure on a few different setups (cores, nodes in cluster) by adjusting available configuration options and adding tests with those different configurations so that the tests will fail in different environments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4206) Move all_text_mode and read_numbers_as_double options to the JSON format plugin and out of system/session
Jason Altekruse created DRILL-4206: -- Summary: Move all_text_mode and read_numbers_as_double options to the JSON format plugin and out of system/session Key: DRILL-4206 URL: https://issues.apache.org/jira/browse/DRILL-4206 Project: Apache Drill Issue Type: Improvement Reporter: Jason Altekruse Assignee: Jason Altekruse Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4179) Update UDF documentation now that classpath scanning is more strict
Jason Altekruse created DRILL-4179: -- Summary: Update UDF documentation now that classpath scanning is more strict Key: DRILL-4179 URL: https://issues.apache.org/jira/browse/DRILL-4179 Project: Apache Drill Issue Type: Improvement Components: Documentation Reporter: Jason Altekruse A few issues have come up with users that have UDFs that could be found with 1.0-1.2, but fail to be loaded with 1.3. There were changes in 1.3 to speed up finding all UDFs on the classpath made the setup a little more strict. Some discussions on the topic: DRILL-4178 http://search-hadoop.com/m/qRVAXvthcn1xIHUm/+add+your+package+to+drill.classpath.scanning=Re+UDFs+and+1+3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4110) Avro tests are not verifying their results
Jason Altekruse created DRILL-4110: -- Summary: Avro tests are not verifying their results Key: DRILL-4110 URL: https://issues.apache.org/jira/browse/DRILL-4110 Project: Apache Drill Issue Type: Bug Affects Versions: 1.3.0 Reporter: Jason Altekruse Priority: Critical A lot of tests have been written for the Avro format plugin that generate a variety of different files with various schema properties. These tests are currently just verifying that the files can be read without throwing any exceptions, but the results coming out of drill are not being verified. Some of these tests were fixed as a part of DRILL-4056, the rest still need to be refactored to add baseline verification checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4028) Merge Drill parquet modifications back into the mainline project
Jason Altekruse created DRILL-4028: -- Summary: Merge Drill parquet modifications back into the mainline project Key: DRILL-4028 URL: https://issues.apache.org/jira/browse/DRILL-4028 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Reporter: Jason Altekruse Assignee: Jason Altekruse Fix For: 1.3.0 Drill has been maintaining a fork of Parquet for over a year. The changes need to make it back into the main repository so we don't have to bother merging in all of the new changes from the master repository into the fork. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3943) CannotPlanException caused by ExpressionReductionRule
[ https://issues.apache.org/jira/browse/DRILL-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-3943. Resolution: Fixed Fixed in 826144d89391dbadfc7fec84e633359c602bcd5a > CannotPlanException caused by ExpressionReductionRule > - > > Key: DRILL-3943 > URL: https://issues.apache.org/jira/browse/DRILL-3943 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Sean Hsuan-Yi Chu >Assignee: Jason Altekruse > Fix For: 1.3.0 > > > For a query such as: > {code} > SELECT count(DISTINCT employee_id) as col1, > count((to_number(date_diff(now(), cast(birth_date AS date)),''))) as col2 > FROM cp.`employee.json` > {code} > cannot plan exception will be thrown because ExpressionReductionRule does not > properly simply the call "now()". > This issue is similar to DRILL-2808, but that one focuses on error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3939) Drill fails to parse valid JSON object
[ https://issues.apache.org/jira/browse/DRILL-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-3939. Resolution: Duplicate > Drill fails to parse valid JSON object > -- > > Key: DRILL-3939 > URL: https://issues.apache.org/jira/browse/DRILL-3939 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.0.0, 1.1.0, 1.2.0 > Environment: Redhat Linux 6.7 Java 1.7 , 1.8 >Reporter: Jaroslaw Sosnicki > > The following valid JSON object queried from DRILL using various clients: > --- t.json start--- > { > "l1": { > "f1": "text1", > "f2": { > "command": "list", > "StorageArray": [ > { > "array1": "Array1", > "Pool": { > "myPool": "PoolName" > } > }, > { > "array2": "Arrays2", > "Pool": [ > { > "myPool": "PoolName1" > }, > { > "myPool": "PoolName2" > } > ] > } > ] > } > } > } > --- t.json end --- > Generates the following error: > > ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: SELECT > * FROM `dfs`.`hdvm`.`./t.json` LIMIT 100 > [30027]Query execution error. Details:[ > DATA_READ ERROR: You tried to write a Map type when you are using a > ValueWriter of type SingleMapWriter. > File /mapr/demo.mapr.com/data/hcs/hdvm/t.json > Record 1 > Line 16 > Column 39 > Field Pool > Line 16 > Column 39 > Field Pool > Fragment 0:0 > [Error Id: 13e7a786-1135-410f-a4f0-877eab9222d6 on maprdemo:31010] > ] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3899) SplitUpComplexExpressions rule should be enhanced to avoid planning unnecessary copies of data
Jason Altekruse created DRILL-3899: -- Summary: SplitUpComplexExpressions rule should be enhanced to avoid planning unnecessary copies of data Key: DRILL-3899 URL: https://issues.apache.org/jira/browse/DRILL-3899 Project: Apache Drill Issue Type: Bug Reporter: Jason Altekruse A small enhancement was made as part of DRILL-3876 to remove an unnecessary copy in a simple flatten case. This was easy to implement, but did not cover all of the possible cases where the rewrite rule is currently planning inefficient operations. This issue is tracking the more complete fix to handle all of the more complex cases optimally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3783) Incorrect results : COUNT() over results returned by UNION ALL
[ https://issues.apache.org/jira/browse/DRILL-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-3783. Resolution: Not A Problem > Incorrect results : COUNT() over results returned by UNION ALL > > > Key: DRILL-3783 > URL: https://issues.apache.org/jira/browse/DRILL-3783 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.2.0 > Environment: 4 node cluster on CentOS >Reporter: Khurram Faraaz >Assignee: Sean Hsuan-Yi Chu >Priority: Critical > Fix For: 1.2.0 > > > Count over results returned union all query, returns incorrect results. The > below query returned an Exception (please se DRILL-2637) that JIRA was marked > as fixed, however the query returns incorrect results. > {code} > 0: jdbc:drill:schema=dfs.tmp> select count(c1) from (select cast(columns[0] > as int) c1 from `testWindow.csv`) union all (select cast(columns[0] as int) > c2 from `testWindow.csv`); > +-+ > | EXPR$0 | > +-+ > | 11 | > | 100 | > | 10 | > | 2 | > | 50 | > | 55 | > | 67 | > | 113 | > | 119 | > | 89 | > | 57 | > | 61 | > +-+ > 12 rows selected (0.753 seconds) > {code} > Results returned by the query on LHS and RHS of Union all operator are > {code} > 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from > `testWindow.csv`; > +--+ > | c1 | > +--+ > | 100 | > | 10 | > | 2| > | 50 | > | 55 | > | 67 | > | 113 | > | 119 | > | 89 | > | 57 | > | 61 | > +--+ > 11 rows selected (0.197 seconds) > 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c2 from > `testWindow.csv`; > +--+ > | c2 | > +--+ > | 100 | > | 10 | > | 2| > | 50 | > | 55 | > | 67 | > | 113 | > | 119 | > | 89 | > | 57 | > | 61 | > +--+ > 11 rows selected (0.173 seconds) > {code} > Note that enclosing the queries within correct parentheses returns correct > results. We do not want to return incorrect results to user when the > parentheses are missing. > {code} > 0: jdbc:drill:schema=dfs.tmp> select count(c1) from ((select cast(columns[0] > as int) c1 from `testWindow.csv`) union all (select cast(columns[0] as int) > c2 from `testWindow.csv`)); > +-+ > | EXPR$0 | > +-+ > | 22 | > +-+ > 1 row selected (0.234 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3669) fix missing direct dependency
[ https://issues.apache.org/jira/browse/DRILL-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-3669. Resolution: Fixed Fixed in 4b8e85ad6fb40554e6752144f09bdfb474d62d9b > fix missing direct dependency > - > > Key: DRILL-3669 > URL: https://issues.apache.org/jira/browse/DRILL-3669 > Project: Apache Drill > Issue Type: Bug >Reporter: Julien Le Dem >Assignee: Jason Altekruse > Attachments: DRILL-3669.1.patch.txt, DRILL-3669.2.patch.txt > > > This prevents generating a compiling project with mvn eclipse:eclipse > pull request here: > https://github.com/apache/drill/pull/121/files -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3122) Changing a session option to default value results in status as changed
[ https://issues.apache.org/jira/browse/DRILL-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-3122. Resolution: Fixed Assignee: Sudheesh Katkam (was: Jason Altekruse) Changing a session option to default value results in status as changed --- Key: DRILL-3122 URL: https://issues.apache.org/jira/browse/DRILL-3122 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.0.0 Reporter: Ramana Inukonda Nagaraj Assignee: Sudheesh Katkam Fix For: 1.2.0 Attachments: DRILL-3122.1.patch.txt Alter session option hash join to true(which is the default) and the following query shows that the option has changed which could be misleading to users relying on the status field to see if an option has changed or not. Especially in the case of a boolean value. {code} 0: jdbc:drill:zk=10.10.100.171:5181 select * from sys.options where name like '%hash%'; ++--+-+--+-+-+---++ |name| kind | type | status | num_val | string_val | bool_val | float_val | ++--+-+--+-+-+---++ | exec.max_hash_table_size | LONG | SYSTEM | DEFAULT | 1073741824 | null| null | null | | exec.min_hash_table_size | LONG | SYSTEM | DEFAULT | 65536 | null| null | null | | planner.enable_hash_single_key | BOOLEAN | SYSTEM | DEFAULT | null| null| true | null | | planner.enable_hashagg | BOOLEAN | SYSTEM | DEFAULT | null| null| true | null | | planner.enable_hashjoin| BOOLEAN | SYSTEM | DEFAULT | null| null| true | null | | planner.enable_hashjoin_swap | BOOLEAN | SYSTEM | DEFAULT | null| null| true | null | | planner.join.hash_join_swap_margin_factor | DOUBLE | SYSTEM | DEFAULT | null| null| null | 10.0 | | planner.memory.hash_agg_table_factor | DOUBLE | SYSTEM | DEFAULT | null| null| null | 1.1| | planner.memory.hash_join_table_factor | DOUBLE | SYSTEM | DEFAULT | null| null| null | 1.1| ++--+-+--+-+-+---++ 9 rows selected (0.191 seconds) 0: jdbc:drill:zk=10.10.100.171:5181 alter session set `planner.enable_hashjoin`=true; +---+---+ | ok | summary | +---+---+ | true | planner.enable_hashjoin updated. | +---+---+ 1 row selected (0.083 seconds) 0: jdbc:drill:zk=10.10.100.171:5181 select * from sys.options where name like '%hash%'; ++--+--+--+-+-+---++ |name| kind | type | status | num_val | string_val | bool_val | float_val | ++--+--+--+-+-+---++ | exec.max_hash_table_size | LONG | SYSTEM | DEFAULT | 1073741824 | null| null | null | | exec.min_hash_table_size | LONG | SYSTEM | DEFAULT | 65536 | null| null | null | | planner.enable_hash_single_key | BOOLEAN | SYSTEM | DEFAULT | null| null| true | null | | planner.enable_hashagg | BOOLEAN | SYSTEM | DEFAULT | null| null| true | null | | planner.enable_hashjoin| BOOLEAN | SYSTEM | DEFAULT | null| null| true | null | | planner.enable_hashjoin| BOOLEAN | SESSION | CHANGED | null| null| true | null | | planner.enable_hashjoin_swap | BOOLEAN | SYSTEM | DEFAULT | null| null| true | null | | planner.join.hash_join_swap_margin_factor | DOUBLE | SYSTEM | DEFAULT | null| null| null | 10.0 | | planner.memory.hash_agg_table_factor | DOUBLE | SYSTEM |
[jira] [Resolved] (DRILL-3341) Move OperatorWrapper list and FragmentWrapper list creation to ProfileWrapper ctor
[ https://issues.apache.org/jira/browse/DRILL-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-3341. Resolution: Fixed Assignee: Sudheesh Katkam (was: Jason Altekruse) Move OperatorWrapper list and FragmentWrapper list creation to ProfileWrapper ctor -- Key: DRILL-3341 URL: https://issues.apache.org/jira/browse/DRILL-3341 Project: Apache Drill Issue Type: Improvement Reporter: Sudheesh Katkam Assignee: Sudheesh Katkam Priority: Minor Fix For: 1.2.0 Attachments: DRILL-3341.1.patch.txt, DRILL-3341.2.patch.txt + avoid re-computation in some cases + consistent comparator names -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3521) [umbrella] Review switch statements throughout codebase to add default cases where there are none
Jason Altekruse created DRILL-3521: -- Summary: [umbrella] Review switch statements throughout codebase to add default cases where there are none Key: DRILL-3521 URL: https://issues.apache.org/jira/browse/DRILL-3521 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Reporter: Jason Altekruse Assignee: Jason Altekruse There are a number of places in the code that are missing default branches on case statements. One particular instance I noticed is in OptionValue.createOption, which returns null if passed an unexpected type. This and a few other places in the code could be made a little nicer to work with if we just provided the standard behavior of throwing an exception. One additional note, in a number of places where we do have defaults, but the exception thrown is an UnsupportedOperationException with no message. DRILL-2680 discusses this problem, so this might be handled over there, but as we fill in the switch defaults we should try to avoid introducing the new problem of exceptions lacking descriptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3483) Clarify CommonConstants' constants.
[ https://issues.apache.org/jira/browse/DRILL-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-3483. Resolution: Fixed Fixed in 9b351c945b5f10d27cf07b9b5c1a435a029614b7 Clarify CommonConstants' constants. --- Key: DRILL-3483 URL: https://issues.apache.org/jira/browse/DRILL-3483 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Assignee: Jason Altekruse Fix For: 1.2.0 Document, rename, and otherwise clean up CommonConstants' constants. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3484) Error using functions with no parameters when `drill.exec.functions.cast_empty_string_to_null` is set to true
Jason Altekruse created DRILL-3484: -- Summary: Error using functions with no parameters when `drill.exec.functions.cast_empty_string_to_null` is set to true Key: DRILL-3484 URL: https://issues.apache.org/jira/browse/DRILL-3484 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.1.0, 1.0.0 Reporter: Jason Altekruse Assignee: Jason Altekruse Fix For: 1.2.0 There is an issue with function materialization when the function has no parameters and this option is set. This cases a low level IOOB exception to be thrown. 0: jdbc:drill:zk=local select *, random() from sys.drillbits; Error: SYSTEM ERROR: IndexOutOfBoundsException: index (0) must be less than size (0) Fragment 0:0 [Error Id: 5853c1da-ea8d-41c3-812c-2fdde799803b on 10.250.50.33:31010] (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-1094) Using events to parse JSON
[ https://issues.apache.org/jira/browse/DRILL-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-1094. Resolution: Fixed Fix Version/s: (was: Future) 1.0.0 The issues described here have been fixed a few release ago, closing this to get it out of the list of future bugs. Using events to parse JSON -- Key: DRILL-1094 URL: https://issues.apache.org/jira/browse/DRILL-1094 Project: Apache Drill Issue Type: Improvement Components: Storage - JSON Affects Versions: Future Reporter: Sudheesh Katkam Assignee: Neeraja Fix For: 1.0.0 + Define events to parse JSON. + Add project pushdown and flatten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2745) Query returns IOB Exception when JSON data with empty arrays is input to flatten function
[ https://issues.apache.org/jira/browse/DRILL-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2745. Resolution: Not A Problem Query returns IOB Exception when JSON data with empty arrays is input to flatten function - Key: DRILL-2745 URL: https://issues.apache.org/jira/browse/DRILL-2745 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.9.0 Environment: | 9d92b8e319f2d46e8659d903d355450e15946533 | DRILL-2580: Exit early from HashJoinBatch if build side is empty | 26.03.2015 @ 16:13:53 EDT Reporter: Khurram Faraaz Assignee: Khurram Faraaz Fix For: 1.2.0 IOB Exception is returned when JSON file that has many empty arrays and arrays with different types of data is passed to flatten function. Tested on 4 node cluster on CentOS {code} 0: jdbc:drill: select flatten(outkey) from `nestedJArry.json` ; Query failed: RemoteRpcException: Failure while running fragment., index: 176, length: 4 (expected: range(0, 176)) [ 2627cf84-9dfb-4077-8531-9955ecdbdec7 on centos-02.qa.lab:31010 ] [ 2627cf84-9dfb-4077-8531-9955ecdbdec7 on centos-02.qa.lab:31010 ] Error: exception while executing query: Failure while executing query. (state=,code=0) 0: jdbc:drill: select outkey from `nestedJArry.json`; ++ | outkey | ++ | [[100,1000,200,99,1,0,-1,10],[a,b,c,d,e,p,o,f,m,q,d,s,v],[2012-04-01,1998-02-20,2011-08-05,1992-01-01],[10:30:29.123,12:29:21.999],[sdfklgjsdlkjfghlsidhfgopiuesrtoipuertoiurtyoiurotuiydkfjlbn,bfn;waokefpqowertoipuwergklnjdfbpdsiofgoigiuewqrqiugkjehgjksdhbvkjshdfkjsdfbnlkfbkljrghljrelkhbdlkfjbgkdfjbgkndfbnkldfgklbhjdflkghjlnkoiurty984756897345609782-3458745uiyoheirluht7895e6y],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[null],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[test string,hello world!,just do it!,houston we have a problem],[1,2,3,4,5,6,7,8,9,0]] | ++ 1 row selected (0.088 seconds) Stack trace from drillbit.log 2015-04-09 23:54:41,965 [2ad8eebd-adb6-6f7e-469e-4bb8ca276984:frag:0:0] WARN o.a.d.e.w.fragment.FragmentExecutor - Error while initializing or executing fragment java.lang.IndexOutOfBoundsException: index: 176, length: 4 (expected: range(0, 176)) at io.netty.buffer.DrillBuf.checkIndexD(DrillBuf.java:187) ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:4.0.24.Final] at io.netty.buffer.DrillBuf.chk(DrillBuf.java:209) ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:4.0.24.Final] at io.netty.buffer.DrillBuf.setInt(DrillBuf.java:513) ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:4.0.24.Final] at org.apache.drill.exec.vector.UInt4Vector$Mutator.set(UInt4Vector.java:363) ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at org.apache.drill.exec.vector.RepeatedVarCharVector.splitAndTransferTo(RepeatedVarCharVector.java:173) ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at org.apache.drill.exec.vector.RepeatedVarCharVector$TransferImpl.splitAndTransfer(RepeatedVarCharVector.java:200) ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at org.apache.drill.exec.test.generated.FlattenerGen1107.flattenRecords(FlattenTemplate.java:106) ~[na:na] at org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.doWork(FlattenRecordBatch.java:156) ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93) ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122) ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
[jira] [Resolved] (DRILL-1754) Flatten nested within another flatten fails
[ https://issues.apache.org/jira/browse/DRILL-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-1754. Resolution: Duplicate Fix Version/s: (was: 1.2.0) Flatten nested within another flatten fails --- Key: DRILL-1754 URL: https://issues.apache.org/jira/browse/DRILL-1754 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Reporter: Jason Altekruse Assignee: Jason Altekruse A query that tried to flatten a repeated list, and then flatten the resulting simple lists fails in execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-1770) Flatten on top a subquery which applies flatten over kvgen results in a ClassCastException
[ https://issues.apache.org/jira/browse/DRILL-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-1770. Resolution: Fixed Fix Version/s: (was: 1.2.0) 0.8.0 This was actually merged a while ago, it just wasn't updated here. Fixed in 09aa34b68c97a20412e9917d2ab6bf182477beb4 Flatten on top a subquery which applies flatten over kvgen results in a ClassCastException -- Key: DRILL-1770 URL: https://issues.apache.org/jira/browse/DRILL-1770 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Reporter: Rahul Challapalli Assignee: Jason Altekruse Priority: Minor Fix For: 0.8.0 Attachments: DRILL_1770.patch git.commit.id.abbrev=108d29f Dataset : {code} {map:{rm: [ {rptd: [{ a: foo}]}]}} {code} Query : {code} select flatten(sub.fk.`value`) from (select flatten(kvgen(map)) fk from `nested.json`) sub; Query failed: Failure while running fragment., org.apache.drill.exec.vector.NullableIntVector cannot be cast to org.apache.drill.exec.vector.RepeatedVector {code} Let me know if you need more information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-1753) Flatten fails on a repeated map, where the maps being flattened contain repeated lists
[ https://issues.apache.org/jira/browse/DRILL-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-1753. Resolution: Fixed Fix Version/s: (was: 1.2.0) 1.1.0 Flatten fails on a repeated map, where the maps being flattened contain repeated lists -- Key: DRILL-1753 URL: https://issues.apache.org/jira/browse/DRILL-1753 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators, Functions - Drill Reporter: Jason Altekruse Assignee: Jason Altekruse Fix For: 1.1.0 Attachments: error.log We currently fail to flatten the following data, the issue is the repeated list nested inside of the map, this is not being copied properly during the flatten operation. { r_map_3 : [ { d : [ [1021, 1022], [1]] }, { d : [ [1010] ] } ] } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2105) Query fails when using flatten on JSON data where some documents have an empty array
[ https://issues.apache.org/jira/browse/DRILL-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2105. Resolution: Fixed Fix Version/s: (was: 1.2.0) 0.8.0 It looks like this might have been reported with the wrong stacktrace, but Andries said he hasn't seen this issue so I'm closing it. Query fails when using flatten on JSON data where some documents have an empty array Key: DRILL-2105 URL: https://issues.apache.org/jira/browse/DRILL-2105 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 0.7.0 Environment: MFS with JSON Reporter: Andries Engelbrecht Assignee: Jason Altekruse Fix For: 0.8.0 Drill query fails when using flatten on an array, where some records contain an empty array. Especially with larger data sets where the number of JSON documents are greater than 100k. Using twitter data as sample. select flatten (entities.hashtags) from dfs.foo.`file.json`; Empty array entities: { trends: [], symbols: [], urls: [ { expanded_url: http://on.nfl.com/1BkThQF;, indices: [ 118, 140 ], display_url: on.nfl.com/1BkThQF, url: http://t.co/Unr5KFy6hG; } ], hashtags: [], user_mentions: [ { id: 19362299, name: NFL Network, indices: [ 3, 14 ], screen_name: nflnetwork, id_str: 19362299 } ] }, Array with content entities: { trends: [], symbols: [], urls: [], hashtags: [ { text: djpreps, indices: [ 47, 55 ] }, { text: MSPreps, indices: [ 56, 64 ] } ], user_mentions: [] }, Log output 2015-01-27 02:26:13,478 [2b3908b9-cf08-3fd5-3bd8-ebb6bb5b70f1:foreman] INFO o.a.d.e.store.mock.MockStorageEngine - Failure while attempting to check for Parquet metadata file. java.io.IOException: Open failed for file: /data/twitter/nfl/2015, error: Invalid argument (22) at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:191) ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr] at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:776) ~[maprfs-4.0.1.28318-mapr.jar:4.0.1.28318-mapr] at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:800) ~[hadoop-common-2.4.1-mapr-1408.jar:na] at org.apache.drill.exec.store.dfs.shim.fallback.FallbackFileSystem.open(FallbackFileSystem.java:94) ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.store.dfs.BasicFormatMatcher$MagicStringMatcher.matches(BasicFormatMatcher.java:138) ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.store.dfs.BasicFormatMatcher.isReadable(BasicFormatMatcher.java:107) ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isDirReadable(ParquetFormatPlugin.java:232) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isReadable(ParquetFormatPlugin.java:212) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory.create(WorkspaceSchemaFactory.java:141) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory.create(WorkspaceSchemaFactory.java:58) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry(ExpandingConcurrentMap.java:96) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get(ExpandingConcurrentMap.java:90) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable(WorkspaceSchemaFactory.java:273) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT] at net.hydromatic.optiq.jdbc.SimpleOptiqSchema.getTable(SimpleOptiqSchema.java:75) [optiq-core-0.9-drill-r12.jar:na] at net.hydromatic.optiq.prepare.OptiqCatalogReader.getTableFrom(OptiqCatalogReader.java:87) [optiq-core-0.9-drill-r12.jar:na] at net.hydromatic.optiq.prepare.OptiqCatalogReader.getTable(OptiqCatalogReader.java:70) [optiq-core-0.9-drill-r12.jar:na]
[jira] [Resolved] (DRILL-3370) FLATTEN error with a where clause
[ https://issues.apache.org/jira/browse/DRILL-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-3370. Resolution: Fixed Fixed in a915085e8a8b4255ff659086d047cc5dd874a5bf FLATTEN error with a where clause - Key: DRILL-3370 URL: https://issues.apache.org/jira/browse/DRILL-3370 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.0.0 Reporter: Joanlynn LIN Assignee: Jason Altekruse Fix For: 1.1.0 Attachments: DRILL-3370.patch, jsonarray.150.json I've got a JSON file which contains 150 JSON strings all like this: {arr: [94]} {arr: [39]} {arr: [180]} I was trying to Flatten() the arrays and filter the values using such an SQL query: select flatten(arr) as a from dfs.`/data/test/jsonarray.150.json` where a 100; However, it returned no result. Then I modified my expression like this: select a from (select flatten(arr) as a from dfs.`/data/test/jsonarray.150.json`) where a 100; It then failed: Error: SYSTEM ERROR: org.apache.drill.exec.exception.SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [flatten(BIGINT-REPEATED)]. Full expression: --UNKNOWN EXPRESSION--.. Fragment 0:0 [Error Id: 1d71bf0e-48da-43f8-8b36-6a513120d7e0 on slave2:31010] (state=,code=0) After a lot of attempts, I finally got it work: select a from (select flatten(arr) as a from dfs.`/data/test/jsonarray.150.json` limit 1000) where a 100; See, I just added a limit 1000 in this query and I am wondering if this is a bug or what in Drill? Looking forward to your attention and help. Many thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-1616) Add support for count() on maps and arrays
[ https://issues.apache.org/jira/browse/DRILL-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-1616. Resolution: Duplicate Add support for count() on maps and arrays -- Key: DRILL-1616 URL: https://issues.apache.org/jira/browse/DRILL-1616 Project: Apache Drill Issue Type: Improvement Components: Storage - JSON Reporter: Abhishek Girish Assignee: Jason Altekruse Priority: Minor Count(field) throws error on fields which are objects or arrays and these are not clean. They do not indicate an error in usage. Also, count on objects/arrays should be supported. select * from `abc.json`; ++++++ | field_1 | field_2 | field_3 | field_4 | field_5 | ++++++ | [1] | null | {inner_3:[]} | {inner_1:[],inner_3:{}} | [] | | [5] | 2 | {inner_1:2,inner_3:[]} | {inner_1:[1,2,3],inner_2:3,inner_3:{inner_object_field_1:2}} | [{inner_list:[1,null,6],inner_ | | [5,10,15] | A wild string appears! | {inner_1:5,inner_2:3,inner_3:[{},{inner_object_field_1:10}]} | {inner_1:[4,5,6],inner_2:3,inner_3:{}} | [{ | ++++++ 3 rows selected (0.081 seconds) select count(field_1) from `abc.json`; Query failed: Failure while running fragment., Schema is currently null. You must call buildSchema(SelectionVectorMode) before this container can return a schema. [ b6f021f9-213e-475e-83f4-a6facf6fd76d on abhi7.qa.lab:31010 ] Error: exception while executing query: Failure while executing query. (state=,code=0) Error is seen on fields 1,3,4,5. The issue is not seen when array index is specified. select count(field_1[0]) from `abc.json`; ++ | EXPR$0 | ++ | 3 | ++ 1 row selected (0.152 seconds) Or when the element in the object is specified: select count(t.field_3.inner_3) from `textmode.json` as t; ++ | EXPR$0 | ++ | 3 | ++ 1 row selected (0.155 seconds) LOG: 2014-10-30 13:28:20,286 [a90cc246-e60b-452b-ba96-7f79709f5ffa:frag:0:0] ERROR o.a.d.e.w.f.AbstractStatusReporter - Error bc438332-0828-4a86-8063-9dc8c5a703d9: Failure while running fragment. java.lang.NullPointerException: Schema is currently null. You must call buildSchema(SelectionVectorMode) before this container can return a schema. at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208) ~[guava-14.0.1.jar:na] at org.apache.drill.exec.record.VectorContainer.getSchema(VectorContainer.java:273) ~[drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.getSchema(AbstractRecordBatch.java:116) ~[drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.getSchema(IteratorValidatorBatchIterator.java:75) ~[drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.buildSchema(ScreenCreator.java:100) ~[drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:103) ~[drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT] at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:249) [drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3323) Flatten planning rule creates unneeded copy of the list being flattened, causes executuion/allocation issues with large lists
Jason Altekruse created DRILL-3323: -- Summary: Flatten planning rule creates unneeded copy of the list being flattened, causes executuion/allocation issues with large lists Key: DRILL-3323 URL: https://issues.apache.org/jira/browse/DRILL-3323 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Reporter: Jason Altekruse Assignee: Jason Altekruse Fix For: 1.1.0 The planning rule for flatten was written to not only handle the flatten operator, but it also was designed to address some shortcomings in expression evaluation involving complex types. The rule currently plans inefficiently to try to cover some of these more advanced cases, but there is not thorough test coverage to even demonstrate the benefits of it. We should disable a particular behavior of copying complex data and extra time when it is not needed, because it is causing flatten queries to fail with allocation issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3263) Read smallint and tinyint data from hive as integer until these types are well supported throughout Drill
[ https://issues.apache.org/jira/browse/DRILL-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-3263. Resolution: Fixed Fixed in 437706f750b0ec50b60582ea2c47e7017e2718e3 Read smallint and tinyint data from hive as integer until these types are well supported throughout Drill - Key: DRILL-3263 URL: https://issues.apache.org/jira/browse/DRILL-3263 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types, Storage - Hive Reporter: Jason Altekruse Assignee: Jason Altekruse Fix For: 1.1.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3209) [Umbrella] Plan reads of Hive tables as native Drill reads when a native reader for the underlying table format exists
Jason Altekruse created DRILL-3209: -- Summary: [Umbrella] Plan reads of Hive tables as native Drill reads when a native reader for the underlying table format exists Key: DRILL-3209 URL: https://issues.apache.org/jira/browse/DRILL-3209 Project: Apache Drill Issue Type: Improvement Components: Query Planning Optimization, Storage - Hive Reporter: Jason Altekruse Assignee: Jason Altekruse -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3116) Headers do not resize in enhanced sqlline that correctly resizes columns to nicely format data
Jason Altekruse created DRILL-3116: -- Summary: Headers do not resize in enhanced sqlline that correctly resizes columns to nicely format data Key: DRILL-3116 URL: https://issues.apache.org/jira/browse/DRILL-3116 Project: Apache Drill Issue Type: Bug Components: Client - CLI Reporter: Jason Altekruse Assignee: Jason Altekruse -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3092) Memory leak when an allocation fails near the creation of a RecordBatchData object
Jason Altekruse created DRILL-3092: -- Summary: Memory leak when an allocation fails near the creation of a RecordBatchData object Key: DRILL-3092 URL: https://issues.apache.org/jira/browse/DRILL-3092 Project: Apache Drill Issue Type: Bug Reporter: Jason Altekruse Assignee: Jason Altekruse A number of locations in the code need try/finally blocks around the code that interacts with the buffers stored in a RecordBatchData object. Runtime exceptions (like running out of memory) in these code blocks can cause buffers to leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2960) Default hive storage plugin missing from fresh drill install
[ https://issues.apache.org/jira/browse/DRILL-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2960. Resolution: Fixed Fixed in d4f9bf2e994969c863b2b90b58e90139d242b106 Default hive storage plugin missing from fresh drill install Key: DRILL-2960 URL: https://issues.apache.org/jira/browse/DRILL-2960 Project: Apache Drill Issue Type: Bug Components: Storage - Hive Affects Versions: 0.9.0 Reporter: Krystal Assignee: Jason Altekruse Fix For: 1.0.0 Attachments: 2960.patch Installed drill on a fresh node. The default storage plugin for hive is missing from the webUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-1460) JsonReader fails reading files with decimal numbers and integers in the same field
[ https://issues.apache.org/jira/browse/DRILL-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-1460. Resolution: Fixed JsonReader fails reading files with decimal numbers and integers in the same field -- Key: DRILL-1460 URL: https://issues.apache.org/jira/browse/DRILL-1460 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 0.6.0, 0.7.0 Reporter: Bhallamudi Venkata Siva Kamesh Assignee: Jason Altekruse Priority: Critical Fix For: 1.0.0 Attachments: DRILL-1460.1.patch.txt, DRILL-1460.2.patch.txt Used the following dataset : http://thecodebarbarian.wordpress.com/2014/03/28/plugging-usda-nutrition-data-into-mongodb Executed the following query {noformat}select t.nutrients from dfs.usda.`usda.json` t limit 1;{noformat} and it failed with following exception {noformat} 2014-09-27 17:48:39,421 [b9dfbb9b-29a9-425d-801c-2e418533525f:frag:0:0] ERROR o.a.d.e.p.i.ScreenCreator$ScreenRoot - Error 0568d90a-d7df-4a5d-87e9-8b9f718dffa4: Screen received stop request sent. java.lang.IllegalArgumentException: You tried to write a BigInt type when you are using a ValueWriter of type NullableFloat8WriterImpl. at org.apache.drill.exec.vector.complex.impl.AbstractFieldWriter.fail(AbstractFieldWriter.java:513) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT] at org.apache.drill.exec.vector.complex.impl.AbstractFieldWriter.write(AbstractFieldWriter.java:145) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT] at org.apache.drill.exec.vector.complex.impl.NullableFloat8WriterImpl.write(NullableFloat8WriterImpl.java:88) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT] at org.apache.drill.exec.vector.complex.fn.JsonReader.writeData(JsonReader.java:257) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT] at org.apache.drill.exec.vector.complex.fn.JsonReader.writeData(JsonReader.java:310) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT] at org.apache.drill.exec.vector.complex.fn.JsonReader.writeData(JsonReader.java:204) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT] at org.apache.drill.exec.vector.complex.fn.JsonReader.write(JsonReader.java:134) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT] at org.apache.drill.exec.vector.complex.fn.JsonReaderWithState.write(JsonReaderWithState.java:65) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT] at org.apache.drill.exec.store.easy.json.JSONRecordReader2.next(JSONRecordReader2.java:111) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT] {noformat} {noformat}select t.nutrients[0].units from dfs.usda.`usda.json` t limit 1;{noformat} and it failed with following exception {noformat} 2014-09-27 17:50:04,394 [9ee8a529-17fd-492f-9cba-2d1f5842eae1:frag:0:0] ERROR o.a.d.e.p.i.ScreenCreator$ScreenRoot - Error c4c6bffd-b62b-4878-af1e-58db64453307: Screen received stop request sent. java.lang.IllegalArgumentException: You tried to write a BigInt type when you are using a ValueWriter of type NullableFloat8WriterImpl. at org.apache.drill.exec.vector.complex.impl.AbstractFieldWriter.fail(AbstractFieldWriter.java:513) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT] at org.apache.drill.exec.vector.complex.impl.AbstractFieldWriter.write(AbstractFieldWriter.java:145) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT] at org.apache.drill.exec.vector.complex.impl.NullableFloat8WriterImpl.write(NullableFloat8WriterImpl.java:88) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT] at org.apache.drill.exec.vector.complex.fn.JsonReader.writeData(JsonReader.java:257) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT] at org.apache.drill.exec.vector.complex.fn.JsonReader.writeData(JsonReader.java:310) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT] at org.apache.drill.exec.vector.complex.fn.JsonReader.writeData(JsonReader.java:204) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT] at org.apache.drill.exec.vector.complex.fn.JsonReader.write(JsonReader.java:134)
[jira] [Resolved] (DRILL-2772) Display status of query when viewing the query's profile page
[ https://issues.apache.org/jira/browse/DRILL-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2772. Resolution: Fixed Fixed in fd337efbcaabb15ec0c5f3336848f4347e42cf27 Display status of query when viewing the query's profile page - Key: DRILL-2772 URL: https://issues.apache.org/jira/browse/DRILL-2772 Project: Apache Drill Issue Type: Improvement Components: Client - HTTP Affects Versions: 0.8.0 Environment: RHEL 6.4 Reporter: Kunal Khatua Assignee: Jason Altekruse Fix For: 1.2.0 Attachments: DRILL-2772.1.patch.txt When viewing the profile of a query that has run/executed a while ago, it would be helpful to see whether the query was marked as completed, cancelled or failed. The summary on the http://hostname:8047/profiles page shows the status but none of the profile pages show this information. Since the summary is limited to the last 100 queries, having the status would help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2569) Minor fragmentId in Profile UI gets truncated to the last 2 digits
[ https://issues.apache.org/jira/browse/DRILL-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2569. Resolution: Fixed Fix Version/s: (was: 1.2.0) 1.0.0 Minor fragmentId in Profile UI gets truncated to the last 2 digits -- Key: DRILL-2569 URL: https://issues.apache.org/jira/browse/DRILL-2569 Project: Apache Drill Issue Type: Bug Components: Client - HTTP Affects Versions: 0.9.0 Reporter: Krystal Assignee: Jason Altekruse Fix For: 1.0.0 Attachments: DRILL-2569.1.patch.txt git.commit.id.abbrev=8493713 When the profile json contains minorFragmentId 99, the UI only display the last 2 digits. For example if minorFragmentId=100, it is being displayed as 00. Here is a snippet of such data from the profile UI: 04-xx-03 - PARQUET_ROW_GROUP_SCAN Minor FragmentSetup Process WaitMax Batches Max Records Peak Mem 04-98-03 0.000 3.807 1.795 0 0 15MB 04-99-03 0.000 3.247 3.111 0 0 24MB 04-00-03 0.000 3.163 2.545 0 0 20MB 04-01-03 0.000 3.272 2.278 0 0 15MB 04-02-03 0.000 3.496 2.004 0 0 15MB -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2508) Add new column to sys.options table that exposes whether or not the current system value is the default
[ https://issues.apache.org/jira/browse/DRILL-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2508. Resolution: Fixed Fixed in d12bee05a8f6e974c70d5d2a94176b176d7dba5b Add new column to sys.options table that exposes whether or not the current system value is the default --- Key: DRILL-2508 URL: https://issues.apache.org/jira/browse/DRILL-2508 Project: Apache Drill Issue Type: Improvement Components: Storage - Other Reporter: Victoria Markman Assignee: Jason Altekruse Fix For: 1.0.0 Attachments: DRILL-2508.1.patch.txt, DRILL-2508.2.patch.txt Need to be able to see system parameters that I changed. There is an enhancement already to reset them to default values: drill-1065 I don't necessarily want to do that, I just want to see only things that I changed : default value vs. my change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-1545) Json files can only be read when they have a .json extension
[ https://issues.apache.org/jira/browse/DRILL-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-1545. Resolution: Fixed Json files can only be read when they have a .json extension Key: DRILL-1545 URL: https://issues.apache.org/jira/browse/DRILL-1545 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Reporter: Jason Altekruse Assignee: Jason Altekruse Priority: Critical Fix For: 1.0.0 Attachments: DRILL-1545.2.patch.txt, DRILL-1545.3.patch.txt It seems that Drill can only discover json data if the file extension is .json. We have tried to add the file extension.log as type json in the Storage Plugin (and validated the json) , but without success. Would be great if somebody can share a example config or has an idea. Storage Plugin Configuration. { type: file, enabled: true, connection: maprfs:///, workspaces: { root: { location: /, writable: false, storageformat: null }, tmp: { location: /tmp, writable: true, storageformat: csv } }, formats: { log: { type: json, extensions: [ log ] }, psv: { type: text, extensions: [ tbl ], delimiter: | }, csv: { type: text, extensions: [ csv ], delimiter: , }, tsv: { type: text, extensions: [ tsv ], delimiter: \t }, parquet: { type: parquet }, json: { type: json } } } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2228) Projecting '*' returns all nulls when we have flatten in a filter and order by
[ https://issues.apache.org/jira/browse/DRILL-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2228. Resolution: Later Flatten has been disabled in the order by clause by DRILL-2181. We will look at how best to re-enable this past 1.0 Projecting '*' returns all nulls when we have flatten in a filter and order by -- Key: DRILL-2228 URL: https://issues.apache.org/jira/browse/DRILL-2228 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Reporter: Rahul Challapalli Assignee: Jason Altekruse Priority: Critical Fix For: 1.0.0 git.commit.id.abbrev=3d863b5 The below query returns all nulls currently : {code} 0: jdbc:drill:schema=dfs_eea select * from `data.json` where 2 in (select flatten(lst_lst[1]) from `data.json`) order by flatten(lst_lst[1]); ++ | * | ++ | null | | null | | null | | null | | null | | null | | null | | null | | null | | null | ++ {code} There seems to be another issue here since the no of records returned also does not look right. I will raise a separate JIRA for that. The issue goes away, if we do an order by without the flatten. Below query works {code} select * from `data.json` where 2 in (select flatten(lst_lst[1]) from `data.json`) order by uid; {code} Attached the data files -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2232) Flatten functionality not well defined when we use flatten in an order by without projecting it
[ https://issues.apache.org/jira/browse/DRILL-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2232. Resolution: Later Flatten has been disabled in the order by clause by 2181. We will look at how best to re-enable this past 1.0 Flatten functionality not well defined when we use flatten in an order by without projecting it --- Key: DRILL-2232 URL: https://issues.apache.org/jira/browse/DRILL-2232 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Reporter: Rahul Challapalli Assignee: Jason Altekruse Priority: Critical Fix For: 1.0.0 git.commit.id.abbrev=3d863b5 Data Set : {code} { id : 1, lst : [1,2,3,4] } {code} The below query returns 4 rows instead of 1. The expected behavior in this case is not documented properly {code} select id from `data.json` where 2 in (select flatten(lst) from `data.json`) order by flatten(lst); ++ | id | ++ | 1 | | 1 | | 1 | | 1 | ++ {code} The below projects a flatten. {code} 0: jdbc:drill:schema=dfs_eea select id, flatten(lst) from `temp.json` where 2 in (select flatten(lst) from `temp.json`) order by flatten(lst); +++ | id | EXPR$1 | +++ | 1 | 1 | | 1 | 2 | | 1 | 3 | | 1 | 4 | +++ {code} We can agree on one of the 3 possibilites when flatten is not projected: 1. Irrespective of whether flatten is in the select list or not, we would still return more records based on flatten in the order by 2. Flatten in the order by clause does not change the no of records we return 3. Using flatten in an order by (or probably group by) is not supported Whatever we agree on, we should document it more clearly. Let me know your thoughts -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2208) Error message must be updated when query contains operations on a flattened column
[ https://issues.apache.org/jira/browse/DRILL-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2208. Resolution: Duplicate Fix Version/s: (was: 1.0.0) Error message must be updated when query contains operations on a flattened column -- Key: DRILL-2208 URL: https://issues.apache.org/jira/browse/DRILL-2208 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Jason Altekruse Priority: Minor Attachments: drillbit_flatten.log Currently i observe that if there is a flatten/kvgen operation applied on a column, no further operations can be performed on the said column unless it is wrapped inside a nested query. Consider a simple flatten/kvgen operation on a complex JSON file : select flatten(kvgen(f.`people`)) as p from `factbook/world.json` f limit 1; ++ | p | ++ | {key:languages,value:{text:Mandarin Chinese 12.44%, Spanish 4.85%, English 4.83%, Arabic 3.25%, Hindi 2.68%, Bengali 2.66%, Portuguese 2.62%, Russian 2.12%, Japanese 1.8%, Standard German 1.33%, Javanese 1.25% (2009 est.),note_1:percents are for \first language\ speakers only; the six UN languages - Arabic, Chinese (Mandarin), English, French, Russian, and Spanish (Castilian) - are the mother tongue or second language of about half of the world's population, and are the official languages in more than half the states in the world; some 150 to 200 languages have more than a million speakers,note_2:all told, there are an estimated 7,100 languages spoken in the world; aproximately 80% of these languages are spoken by less than 100,000 people; about 50 languages are spoken by only 1 person; communities that are isolated from each other in mountainous regions often develop multiple languages; Papua New Guinea, for example, boasts about 836 separate languages,note_3:approximately 2,300 languages are spoken in Asia, 2,150, in Africa, 1,311 in the Pacific, 1,060 in the Americas, and 280 in Europe}} | | {key:religions,value:{text:Christian 33.39% (of which Roman Catholic 16.85%, Protestant 6.15%, Orthodox 3.96%, Anglican 1.26%), Muslim 22.74%, Hindu 13.8%, Buddhist 6.77%, Sikh 0.35%, Jewish 0.22%, Baha'i 0.11%, other religions 10.95%, non-religious 9.66%, atheists 2.01% (2010 est.)}} | | {key:population,value:{text:7,095,217,980 (July 2013 est.),top_ten_most_populous_countries_in_millions:China 1,349.59; India 1,220.80; United States 316.67; Indonesia 251.16; Brazil 201.01; Pakistan 193.24; Nigeria 174.51; Bangladesh 163.65; Russia 142.50; Japan 127.25}} | | {key:age_structure,value:{0_14_years:26% (male 953,496,513/female 890,372,474),15_24_years:16.8% (male 614,574,389/female 579,810,490),25_54_years:40.6% (male 1,454,831,900/female 1,426,721,773),55_64_years:8.4% (male 291,435,881/female 305,185,398),65_years_and_over:8.2% (male 257,035,416/female 321,753,746) (2013 est.)}} | | {key:dependency_ratios,value:{total_dependency_ratio:52 %,youth_dependency_ratio:39.9 %,elderly_dependency_ratio:12.1 %,potential_support_ratio:8.3 (2013)}} | ++ *Adding a WHERE clause with conditions on this column fails:* select flatten(kvgen(f.`people`)) as p from `factbook/world.json` f where f.p.`key` = 'languages'; Query failed: RemoteRpcException: Failure while running fragment., languages [ 686bcd40-c23b-448c-93d8-b98a3b092657 on abhi5.qa.lab:31010 ] [ 686bcd40-c23b-448c-93d8-b98a3b092657 on abhi5.qa.lab:31010 ] Error: exception while executing query: Failure while executing query. (state=,code=0) Logs indicate a NumberFormat Exception in the above case. *And query fails to parse in the below case* select flatten(kvgen(f.`people`)).`value` as p from `factbook/world.json` f limit 5; Query failed: ParseException: Encountered . at line 1, column 34. Was expecting one of: FROM ... , ... AS ... OVER ... Error: exception while executing query: Failure while executing query. (state=,code=0) Rewriting using an inner query succeeds: select g.p.`value`.`note_3` from (select flatten(kvgen(f.`people`)) as p from `factbook/world.json` f) g where g.p.`key`='languages'; ++ | EXPR$0 | ++ | approximately 2,300 languages are spoken in Asia, 2,150, in Africa, 1,311 in the Pacific, 1,060 in the Americas, and 280 in Europe | ++ *In both the failure cases the error message needs to be updated to indicate that the operation is not supported. The current error message and logs are not clear for an end user. * -- This message was sent by Atlassian JIRA
[jira] [Resolved] (DRILL-2264) Incorrect data when we use aggregate functions with flatten
[ https://issues.apache.org/jira/browse/DRILL-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2264. Resolution: Later Flatten has been disabled in the group by clause by DRILL-2181. We will look at how best to re-enable this past 1.0 Incorrect data when we use aggregate functions with flatten --- Key: DRILL-2264 URL: https://issues.apache.org/jira/browse/DRILL-2264 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Reporter: Rahul Challapalli Assignee: Jason Altekruse Priority: Critical Fix For: 1.0.0 git.commit.id.abbrev=6676f2d Data Set : {code} { uid:1, lst_lst : [[1,2],[3,4]] } { uid:2, lst_lst : [[1,2],[3,4]] } {code} The below query returns incorrect results : {code} select uid,MAX( flatten(lst_lst[1]) + flatten(lst_lst[0])) from `temp.json` group by uid, flatten(lst_lst[1]), flatten(lst_lst[0]); +++ |uid | EXPR$1 | +++ | 1 | 6 | | 1 | 6 | | 1 | 6 | | 1 | 6 | | 2 | 6 | | 2 | 6 | | 2 | 6 | | 2 | 6 | +++ {code} However if we use a sub query, drill returns the right data {code} select uid, MAX(l1+l2) from (select uid,flatten(lst_lst[1]) l1, flatten(lst_lst[0]) l2 from `temp.json`) sub group by uid, l1, l2; +++ |uid | EXPR$1 | +++ | 1 | 4 | | 1 | 5 | | 1 | 5 | | 1 | 6 | | 2 | 4 | | 2 | 5 | | 2 | 5 | | 2 | 6 | +++ {code} Also using a single flatten yields proper results {code} select uid,MAX(flatten(lst_lst[0])) from `temp.json` group by uid, flatten(lst_lst[0]); +++ |uid | EXPR$1 | +++ | 1 | 1 | | 1 | 2 | | 2 | 1 | | 2 | 2 | +++ {code} Marked it as critical since we return in-correct data. Let me know if you have any other questions -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2938) Refactor
Jason Altekruse created DRILL-2938: -- Summary: Refactor Key: DRILL-2938 URL: https://issues.apache.org/jira/browse/DRILL-2938 Project: Apache Drill Issue Type: Improvement Reporter: Jason Altekruse -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-1502) Can't connect to mongo when requiring auth
[ https://issues.apache.org/jira/browse/DRILL-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-1502. Resolution: Fixed Fixed in f5b0f4928d9c8c47c145a179c52ba3933d85c0b4 Can't connect to mongo when requiring auth -- Key: DRILL-1502 URL: https://issues.apache.org/jira/browse/DRILL-1502 Project: Apache Drill Issue Type: Bug Components: Storage - MongoDB Affects Versions: 0.6.0 Reporter: Robert Malko Assignee: Jason Altekruse Priority: Minor Fix For: 1.0.0 It doesn't appear that the latest 0.6.0 version allows you to connect to a mongo database requiring auth. The usual mongo db connection string of mongodb://user:pass@host:port/ is not honored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2908) Support reading the Parquet int 96 type
Jason Altekruse created DRILL-2908: -- Summary: Support reading the Parquet int 96 type Key: DRILL-2908 URL: https://issues.apache.org/jira/browse/DRILL-2908 Project: Apache Drill Issue Type: Improvement Components: Storage - Parquet Reporter: Jason Altekruse Assignee: Jason Altekruse Fix For: 1.0.0 While Drill does not currently have an int96 type, it is supported by the parquet format and we should be able to read files that contain columns of this type. For now we will read the data into a varbinary and users will have to use existing convert_from functions or write their own to interpret the type of data actually stored. One example is the Impala timestamp format which is encoded in an int96 column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2913) Directory explorer UDFs causing warnings from failed janino compilation
Jason Altekruse created DRILL-2913: -- Summary: Directory explorer UDFs causing warnings from failed janino compilation Key: DRILL-2913 URL: https://issues.apache.org/jira/browse/DRILL-2913 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Reporter: Jason Altekruse Assignee: Jason Altekruse The functions added in DRILL-2173 never need to be compiled using janino because they are never used during the regular java code generation based evaluation, they are only useful if they can be folded at planning time to allow pruning partitions dynamically. As they are being registered in Drill they are currently causing warnings as they are using generics which jainino does not support. They need to be modified to remove the generics or forced to use the JDK compiler upon registration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2754) Allocation bug in splitAndTransfer method causing some flatten queries to fail
[ https://issues.apache.org/jira/browse/DRILL-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2754. Resolution: Fixed Fix Version/s: (was: 0.9.0) 1.0.0 Allocation bug in splitAndTransfer method causing some flatten queries to fail -- Key: DRILL-2754 URL: https://issues.apache.org/jira/browse/DRILL-2754 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types, Execution - Relational Operators Reporter: Jason Altekruse Assignee: Jason Altekruse Priority: Critical Fix For: 1.0.0 Attachments: DRILL-2754.patch Data for reproduction: {code} { config: [ [], [ a string ] ] } {code} {code} select flatten(config) as flat from cp.`/store/json/null_list.json` {code} This was carved out of a larger use case that was failing, so in the course of coming up with a minimal reproduction I fixed the bug. I will be posting a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2842) Parquet files with large file metadata sometimes fail to read in the FooterGather
Jason Altekruse created DRILL-2842: -- Summary: Parquet files with large file metadata sometimes fail to read in the FooterGather Key: DRILL-2842 URL: https://issues.apache.org/jira/browse/DRILL-2842 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Reporter: Jason Altekruse Assignee: Jason Altekruse Priority: Critical -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2616) strings loaded incorrectly from parquet files
[ https://issues.apache.org/jira/browse/DRILL-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2616. Resolution: Duplicate strings loaded incorrectly from parquet files - Key: DRILL-2616 URL: https://issues.apache.org/jira/browse/DRILL-2616 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 0.7.0 Reporter: Jack Crawford Assignee: Jason Altekruse Priority: Critical Labels: parquet When loading string columns from parquet data sources, some rows have their string values replaced with the value from other rows. Example parquet for which the problem occurs: https://drive.google.com/file/d/0B2JGBdceNMxdeFlJcW1FUElOdXc/view?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2162) Multiple flattens on a list within a list results in violating the incoming batch size limit
[ https://issues.apache.org/jira/browse/DRILL-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2162. Resolution: Fixed Multiple flattens on a list within a list results in violating the incoming batch size limit Key: DRILL-2162 URL: https://issues.apache.org/jira/browse/DRILL-2162 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Reporter: Rahul Challapalli Assignee: Jason Altekruse Fix For: 0.9.0 Attachments: data.json, drill-2162.patch git.commit.id.abbrev=3e33880 I attached the data set with 2 records. The below query succeeds on top of the attached data set. However when I copied over the same data set 5 times, the same query failed {code} select uid, flatten(d.lst_lst[1]) lst1, flatten(d.lst_lst[0]) lst0, flatten(d.lst_lst) lst from `data.json` d; Query failed: RemoteRpcException: Failure while running fragment., Incoming batch of org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch has size 102375, which is beyond the limit of 65536 [ ef16dd95-40e2-4b66-ba30-8650ddb99812 on qa-node190.qa.lab:31010 ] [ ef16dd95-40e2-4b66-ba30-8650ddb99812 on qa-node190.qa.lab:31010 ] {code} Error from the logs : {code} java.lang.IllegalStateException: Incoming batch of org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch has size 102375, which is beyond the limit of 65536 at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:129) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
[jira] [Created] (DRILL-2712) Issue with PruneScanRule and filter expressions that evaluate to false
Jason Altekruse created DRILL-2712: -- Summary: Issue with PruneScanRule and filter expressions that evaluate to false Key: DRILL-2712 URL: https://issues.apache.org/jira/browse/DRILL-2712 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Jason Altekruse Assignee: Aman Sinha Priority: Critical Testing out the recently committed changes to allow for partition querying in UDFs (DRILL-2173) I ran into a query that was not able to plan. Oddly it was not throwing a typical error we would see out of calcite, it seemed to be just spinning indefinitely. I was able to create a simple reproduction that removed the new UDF use, it seems to be related to using a false filter along with a directory filter. I fixed one issue while I was creating the repro (I found the error message in the logs, with the patch attached here that issue goes away but I see a different exception after letting it run for a long time) These issues might be completely unrelated, I just do not currently have a separate reproduction for the issue I fixed that can actually complete successfully for the sake of writing a test for it. Disabling constant folding is not required, but it happened both with it and without it, so it simplified debugging for the time being. The fix for this issue should probably be tested with constant folding turned off and with the default behavior of it turned on. The test is: {code} @Test public void testFailingPrune() throws Exception { test(alter session set `planner.enable_constant_folding` = false); test(explain plan for select * from ( + String.format(select dir0, dir1, o_custkey, o_orderdate from dfs_test.`%s/multilevel/parquet` where dir0=1994 and dir1='Q1', TEST_RES_PATH) + ) t where 1=0;); } {code} The issue I saw in the logs after it ran for a while: {code} org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: Internal error: Error while applying rule DrillMergeFilterRule, args [rel#1669:FilterRel.NONE.ANY([]).[](child=rel#23:Subset#2.NONE.ANY([]).[],condition=AND(=($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1, 'Q1'), =($0, 1994), =($1,
[jira] [Resolved] (DRILL-2226) Create test utilities for checking plans for patterns
[ https://issues.apache.org/jira/browse/DRILL-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2226. Resolution: Fixed Fix Version/s: (was: 1.0.0) 0.8.0 Fixed in ed397862eb9584572aa0fcb684dfc9554b00cf60 Create test utilities for checking plans for patterns - Key: DRILL-2226 URL: https://issues.apache.org/jira/browse/DRILL-2226 Project: Apache Drill Issue Type: Improvement Components: Tools, Build Test Reporter: Jason Altekruse Assignee: Jason Altekruse Fix For: 0.8.0 Attachments: DRILL-2226.patch Regex matching for calcite text format plans, includes expected and excluded pattern matching. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2143) Remove RecordBatch from setup method of DrillFunc interface
[ https://issues.apache.org/jira/browse/DRILL-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2143. Resolution: Fixed Resolved in bff7b9ef5a9f345908aca160a97b98f6ab187708 and 1c5decc17cf38cbf4a4119d7ca19653cb19e1b53 Remove RecordBatch from setup method of DrillFunc interface --- Key: DRILL-2143 URL: https://issues.apache.org/jira/browse/DRILL-2143 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Reporter: Jason Altekruse Assignee: Jason Altekruse Fix For: 0.8.0 Attachments: DRILL-2143-part1-feb-27.patch, DRILL-2143-part1-feb-6.patch, DRILL-2143-part1-mar-3.patch, DRILL-2143-part2-15-mar-15.patch, DRILL-2143-part2-feb-27.patch, DRILL-2143-part2-feb-6.patch, DRILL-2143-part2-mar-3.patch, DRILL-2143-remove-record-batch-from-udfs.patch Drill UDFs currently are exposed to too much system state by receiving a reference to a RecordBatch in their setup method. This is not necessary as all of the schema change triggered operator functionality is handled outside of UDFs (the UDFS themselves are actually required to define a specific type they take as input, except in the case of complex types (maps and lists)). The only remaining artifact left from this interface is the date/time functions that ask for the query start time or current timezone. This can be provided to functions using a new injectable type, as DrillBufs are provided to functions currently. For more info read here: http://mail-archives.apache.org/mod_mbox/drill-dev/201501.mbox/%3ccampyv7ac_-9u4irz+5fxoenzbojctovjronn0qri4bqzf53...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2406) Fix expression interpreter to allow executing expressions at planning time
[ https://issues.apache.org/jira/browse/DRILL-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2406. Resolution: Fixed Resolve din 0aa8b19d624d173da51de36aa164f3435d3366a4 and 3f93454f014196a4da198ce012b605b70081fde0 Fix expression interpreter to allow executing expressions at planning time -- Key: DRILL-2406 URL: https://issues.apache.org/jira/browse/DRILL-2406 Project: Apache Drill Issue Type: Improvement Reporter: Jason Altekruse Assignee: Jason Altekruse Priority: Critical Fix For: 0.8.0 Attachments: DRILL-2406-part1-15-mar-15.patch, DRILL-2406-part1-planning-time-expression-evaulutation.patch, DRILL-2406-part2-15-mar-15.patch, DRILL-2406-part2-planning-time-expression-evaulutation.diff, DRILL-2406-part2-v2-planning-time-expression-evaulutation.patch, DRILL-2406-part2-v3-planning-time-expression-evaulutation.diff The expression interpreter currently available in Drill cannot be used at planning time, as it does not have a means to connect to the direct memory allocator stored at the DrillbitContext level. To implement new rules based on evaluating expressions on constants, or small datasets, such as partition information this limitation must be addressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2448) Remove outdated code to ignore type resolution with varchar vs varbinary now that implicit casting subsumes it
Jason Altekruse created DRILL-2448: -- Summary: Remove outdated code to ignore type resolution with varchar vs varbinary now that implicit casting subsumes it Key: DRILL-2448 URL: https://issues.apache.org/jira/browse/DRILL-2448 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 0.7.0 Reporter: Jason Altekruse Assignee: Jason Altekruse Priority: Critical Fix For: 0.8.0 Function resolution included a small condition to allow varchar and varbinary functions to be resolved for either incoming type. While it is valid to implicitly cast between these two, this early workaround creates a technically invalid expression tree that happens to work with the current code generation system. This however does create an issue for the interpreted expression evaluator. Removing the code simply causes an implicit cast to be added during materialization, this works for both generated code expression evaluation as well as the interpreter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2406) Fix expression interpreter to allow executing expressions at planning time
Jason Altekruse created DRILL-2406: -- Summary: Fix expression interpreter to allow executing expressions at planning time Key: DRILL-2406 URL: https://issues.apache.org/jira/browse/DRILL-2406 Project: Apache Drill Issue Type: Improvement Reporter: Jason Altekruse Assignee: Jason Altekruse Priority: Critical Fix For: 0.8.0 The expression interpreter currently available in Drill cannot be used at planning time, as it does not have a means to connect to the direct memory allocator stored at the DrillbitContext level. To implement new rules based on evaluating expressions on constants, or small datasets, such as partition information this limitation must be addressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2391) NPE during cleanup in parquet record writer when query fails during execution on CTAS
[ https://issues.apache.org/jira/browse/DRILL-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2391. Resolution: Duplicate NPE during cleanup in parquet record writer when query fails during execution on CTAS - Key: DRILL-2391 URL: https://issues.apache.org/jira/browse/DRILL-2391 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Steven Phillips Attachments: query.sql, t5.csv Query below fails during execution due to the user error: {code} 0: jdbc:drill:schema=dfs select . . . . . . . . . . . . case when columns[0] = '' then cast(null as varchar(255)) else cast(columns[0] as varchar(255)) end, . . . . . . . . . . . . case when columns[1] = '' then cast(null as integer) else cast(columns[1] as integer) end, . . . . . . . . . . . . case when columns[2] = '' then cast(null as bigint) else cast(columns[2] as bigint) end, . . . . . . . . . . . . case when columns[3] = '' then cast(null as float) else cast(columns[3] as float) end, . . . . . . . . . . . . case when columns[4] = '' then cast(null as double) else cast(columns[4] as double) end, . . . . . . . . . . . . case when columns[5] = '' then cast(null as date) else cast(columns[6] as date) end, . . . . . . . . . . . . case when columns[6] = '' then cast(null as time) else cast(columns[7] as time) end, . . . . . . . . . . . . case when columns[7] = '' then cast(null as timestamp) else cast(columns[8] as timestamp) end, . . . . . . . . . . . . case when columns[8] = '' then cast(null as boolean) else cast(columns[9] as boolean) end, . . . . . . . . . . . . case when columns[9] = '' then cast(null as decimal(8,2)) else cast(columns[9] as decimal(8,2)) end, . . . . . . . . . . . . case when columns[10] = '' then cast(null as decimal(18,4)) else cast(columns[10] as decimal(18,4)) end, . . . . . . . . . . . . case when columns[11] = '' then cast(null as decimal(28,4)) else cast(columns[11] as decimal(28,4)) end, . . . . . . . . . . . . case when columns[12] = '' then cast(null as decimal(38,6)) else cast(columns[12] as decimal(38,6)) end . . . . . . . . . . . . from `t5.csv`; Query failed: RemoteRpcException: Failure while running fragment., Value 0 for monthOfYear must be in the range [1,12] [ 5a56453c-304d-430a-b4b2-fbc48c9c2766 on atsqa4-133.qa.lab:31010 ] [ 5a56453c-304d-430a-b4b2-fbc48c9c2766 on atsqa4-133.qa.lab:31010 ] Error: exception while executing query: Failure while executing query. (state=,code=0) {code} If I run the same query in CTAS, I get NPE during cleanup in parquet writer. {code} 2015-03-05 22:31:05,212 [2b0726d7-4127-2a83-8c83-2376b767d800:frag:0:0] ERROR o.a.d.e.w.f.AbstractStatusReporter - Error 50633e23-7e6f-48b8-82ec-a395c5c596e4: Failure while running fragment. java.lang.NullPointerException: null at org.apache.drill.exec.store.parquet.ParquetRecordWriter.cleanup(ParquetRecordWriter.java:318) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.WriterRecordBatch.cleanup(WriterRecordBatch.java:187) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.cleanup(IteratorValidatorBatchIterator.java:148) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.cleanup(AbstractSingleRecordBatch.java:121) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.cleanup(IteratorValidatorBatchIterator.java:148) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.internalStop(ScreenCreator.java:178) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:101) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:121) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303)
[jira] [Created] (DRILL-2395) Improve interpreted expression evaluation to only call the setup method once per batch
Jason Altekruse created DRILL-2395: -- Summary: Improve interpreted expression evaluation to only call the setup method once per batch Key: DRILL-2395 URL: https://issues.apache.org/jira/browse/DRILL-2395 Project: Apache Drill Issue Type: Improvement Components: Functions - Drill Reporter: Jason Altekruse Assignee: Daniel Barclay (Drill) Priority: Minor This enhancement request came out of the review for a patch for DRILL-2060. Copied below is the discussion from there: Jinfeng: Do you have a plan to move setup() call into places such that setup() will be called once for each VectorAccessible input? In the code compile + evaluation model, doSetup() will be called per batch, in stead of per row. Jason: I have started working on a fix for this. Its a little complicated with setting constant inputs before the setup method is called. I'm trying to figure out the best way to share code with the rest of the input passing used in the EvalVisitor. Would you be okay with this being opened as an enhancement request to be merged a little later? Considering the current use of the interpreter this won't have an impact on any actual queries today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2360) Add appropriate constant flag on UDF inputs that are used in the setup method
Jason Altekruse created DRILL-2360: -- Summary: Add appropriate constant flag on UDF inputs that are used in the setup method Key: DRILL-2360 URL: https://issues.apache.org/jira/browse/DRILL-2360 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Reporter: Jason Altekruse Assignee: Daniel Barclay (Drill) A number of UDFs do not include the appropriate flag to declare them as constants when the should be. Any input used in the setup method should include a constant flag in the @Param annotation for it (this allows us to detect and throw a helpful error message if they are used incorrectly). A correct example can be found in StringFunctions.RegexpReplace, an incorrect example can be found in GFloat8ToChar (available after running the freemarker generation that is a part of the default mvn install build). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2031) IndexOutOfBoundException when reading a wide parquet table with boolean columns
[ https://issues.apache.org/jira/browse/DRILL-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2031. Resolution: Fixed Fixed in 1d2ed349699a326165c721257937905e3043418c IndexOutOfBoundException when reading a wide parquet table with boolean columns --- Key: DRILL-2031 URL: https://issues.apache.org/jira/browse/DRILL-2031 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 0.7.0 Reporter: Aman Sinha Assignee: Jason Altekruse Fix For: 0.8.0 Attachments: DRILL-2031-Parquet-bit-reader-fix-v2.patch, DRILL-2031-Parquet-bit-reader-fix.patch, wide1.sql I created a wide table with 128 Lineitem columns plus 6 additional boolean columns for a total of 134 columns via a CTAS script (see attached SQL). The source data is from TPCH scale factor 1 (smaller scale factor may not reproduce the problem). The creation of the table was Ok. Reading from the table gives an IOBE. See stack below. It seems to occur for the boolean columns. {code} 0: jdbc:drill:zk=local select * from wide1 where 1=0; java.lang.IndexOutOfBoundsException: srcIndex: 97792 io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:255) ~[netty-buffer-4.0.24.Final.jar:4.0.24.Final] io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:378) ~[netty-buffer-4.0.24.Final.jar:4.0.24.Final] io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:25) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:4.0.24.Final] io.netty.buffer.DrillBuf.setBytes(DrillBuf.java:645) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:4.0.24.Final] io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:850) ~[netty-buffer-4.0.24.Final.jar:4.0.24.Final] org.apache.drill.exec.store.parquet.columnreaders.BitReader.readField(BitReader.java:54) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.readValues(ColumnReader.java:120) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPageData(ColumnReader.java:169) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.determineSize(ColumnReader.java:146) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPages(ColumnReader.java:107) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields(ParquetRecordReader.java:367) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:413) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:158) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2226) Create test utilities for checking plans for patterns
Jason Altekruse created DRILL-2226: -- Summary: Create test utilities for checking plans for patterns Key: DRILL-2226 URL: https://issues.apache.org/jira/browse/DRILL-2226 Project: Apache Drill Issue Type: Improvement Reporter: Jason Altekruse -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2218) Constant folding rule not being used in plan where the constant expression is in the select list
Jason Altekruse created DRILL-2218: -- Summary: Constant folding rule not being used in plan where the constant expression is in the select list Key: DRILL-2218 URL: https://issues.apache.org/jira/browse/DRILL-2218 Project: Apache Drill Issue Type: Improvement Reporter: Jason Altekruse Priority: Minor This test method and rule is not currently in the master branch, but it does appear in the patch posted for constant expression folding during planning, DRILL-2060. Once it is merged, the test TestConstantFolding.testConstExprFolding_InSelect() which is currently ignored, will be failing. The issue is that even though the constant folding rule for project is firing, and I have traced it to see that a replacement project with a literal is created, it is not being selected in the final plan. This seems rather odd, as there is a comment in the last line of the onMatch() method of the rule that says the following. This does not appear to be having the desired effect, may need to file a bug in calcite. {code} // New plan is absolutely better than old plan. call.getPlanner().setImportance(project, 0.0); {code} Here is the query from the test, I expect the sum to be folded in planning with the newly enabled project constant folding rule. {code} select columns[0], 3+5 from cp.`test_input.csv` {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2219) Concurrent modification exception in TopLevelAllocator if a child allocator is added during loop in close()
Jason Altekruse created DRILL-2219: -- Summary: Concurrent modification exception in TopLevelAllocator if a child allocator is added during loop in close() Key: DRILL-2219 URL: https://issues.apache.org/jira/browse/DRILL-2219 Project: Apache Drill Issue Type: Bug Reporter: Jason Altekruse Assignee: Jason Altekruse Priority: Critical -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2173) Enable querying partition information without reading all data
Jason Altekruse created DRILL-2173: -- Summary: Enable querying partition information without reading all data Key: DRILL-2173 URL: https://issues.apache.org/jira/browse/DRILL-2173 Project: Apache Drill Issue Type: New Feature Components: Query Planning Optimization Affects Versions: 0.7.0 Reporter: Jason Altekruse Assignee: Jason Altekruse When reading a series of files in nested directories, Drill currently adds columns representing the directory structure that was traversed to reach the file currently being read. These columns are stored as varchar under tha names dir0, dir1, ... As these are just regular columns, Drill allows arbitrary queries against this data, in terms of aggregates, filter, sort, etc. To allow optimizing reads, basic partition pruning has already been added to prune in the case of an expression like dir0 = 2015 or a simple in list, which is converted during planning to a series of ORs of equals expressions. If users want to query the directory information dynamically, and not include specific directory names in the query, this will prompt a full table scan and filter operation on the dir columns. This enhancement is to allow more complex queries to be run against directory metadata, and only scanning the matching directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)