[jira] [Resolved] (DRILL-4010) In HBase reader, create child vectors for referenced HBase columns to avoid spurious schema changes
[ https://issues.apache.org/jira/browse/DRILL-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) resolved DRILL-4010. --- Resolution: Fixed Resolved as part of DRILL-2288 patch. > In HBase reader, create child vectors for referenced HBase columns to avoid > spurious schema changes > --- > > Key: DRILL-4010 > URL: https://issues.apache.org/jira/browse/DRILL-4010 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types, Storage - HBase > Reporter: Daniel Barclay (Drill) > Assignee: Daniel Barclay (Drill) > > {{HBaseRecordReader}} needs to create child vectors for all > referenced/requested columns. > Currently, if a fragment reads only HBase rows that don't have a particular > referenced column (within a given column family), downstream code adds a > dummy column of type {{NullableIntVector}} (as a child in the {{MapVector}} > for the containing HBase column family). > If any other fragment reads an HBase row that _does_ contain the referenced > column, that fragment's reader will create a child > {{NullableVarBinaryVector}} for the referenced column. > When the data from those two fragments comes together, Drill detects a schema > change, even though logically there isn't really any schema change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3659) UnionAllRecordBatch infers wrongly from next() IterOutcome values
[ https://issues.apache.org/jira/browse/DRILL-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) resolved DRILL-3659. --- Resolution: Fixed Fix Version/s: (was: 1.5.0) Resolved as part of DRILL-2288 patch. > UnionAllRecordBatch infers wrongly from next() IterOutcome values > - > > Key: DRILL-3659 > URL: https://issues.apache.org/jira/browse/DRILL-3659 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators > Reporter: Daniel Barclay (Drill) > Assignee: Daniel Barclay (Drill) > > When UnionAllRecordBatch uses IterOutcome values returned from the next() > method of upstream batches, it seems to be using those values wrongly (making > incorrect inferences about what they mean). > In particular, some switch statements seem to check for NONE vs. > OK_NEW_SCHEMA in order to determine whether there are any rows (instead of > explicitly checking the number of rows). However, OK_NEW_SCHEMA can be > returned even when there are zero rows. > The apparent latent bug in the union code blocks the fix for DRILL-2288 > (having ScanBatch return OK_NEW_SCHEMA for a zero-rows case in which is was > wrongly (per the IterOutcome protocol) returning NONE without first returning > OK_NEW_SCHEMA). > > For details of IterOutcome values, see the Javadoc documentation of > RecordBatch.IterOutcome (after DRILL-3641 is merged; until then, see > https://github.com/apache/drill/pull/113). > For an environment/code state that exposes the UnionAllRecordBatch problems, > see https://github.com/dsbos/incubator-drill/tree/bugs/WORK_2288_etc, which > includes: > - a test that exposes the DRILL-2288 problem; > - an enhanced IteratorValidatorBatchIterator, which now detects IterOutcome > value sequence violations; and > - a fixed (though not-yet-cleaned) version of ScanBatch that fixes the > DRILL-2288 problem and thereby exposes the UnionAllRecordBatch problem > (several test methods in each of TestUnionAll and TestUnionDistinct fail). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3641) Document RecordBatch.IterOutcome (enumerators and possible sequences)
[ https://issues.apache.org/jira/browse/DRILL-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) resolved DRILL-3641. --- Resolution: Fixed Resolved as part of DRILL-2288 patch. > Document RecordBatch.IterOutcome (enumerators and possible sequences) > - > > Key: DRILL-3641 > URL: https://issues.apache.org/jira/browse/DRILL-3641 > Project: Apache Drill > Issue Type: Bug > Reporter: Daniel Barclay (Drill) > Assignee: Daniel Barclay (Drill) > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3955) Possible bug in creation of Drill columns for HBase column families
[ https://issues.apache.org/jira/browse/DRILL-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) resolved DRILL-3955. --- Resolution: Fixed Resolved as part of DRILL-2288 patch. > Possible bug in creation of Drill columns for HBase column families > --- > > Key: DRILL-3955 > URL: https://issues.apache.org/jira/browse/DRILL-3955 > Project: Apache Drill > Issue Type: Bug > Reporter: Daniel Barclay (Drill) > Assignee: Daniel Barclay (Drill) > > If all of the rows read by a given {{HBaseRecordReader}} have no HBase > columns in a given HBase column family, {{HBaseRecordReader}} doesn't create > a Drill column for that HBase column family. > Later, in a {{ProjectRecordBatch}}'s {{setupNewSchema}}, because no Drill > column exists for that HBase column family, that {{setupNewSchema}} creates a > dummy Drill column using the usual {{NullableIntVector}} type. In > particular, it is not a map vector as {{HBaseRecordReader}} creates when it > sees an HBase column family. > Should {{HBaseRecordReader}} and/or something around setting up for reading > HBase (including setting up that {{ProjectRecordBatch}}) make sure that all > HBase column families are represented with map vectors so that > {{setupNewSchema}} doesn't create a dummy field of type {{NullableIntVector}}? > The problem is that, currently, when an HBase table is read in two separate > fragments, one fragment (seeing rows with columns in the column family) can > get a map vector for the column family while the other (seeing only rows with > no columns in the column familar) can get the {{NullableIntVector}}. > Downstream code that receives the two batches ends up with an unresolved > conflict, yielding IndexOutOfBoundsExceptions as in DRILL-3954. > It's not clear whether there is only one bug\--that downstream code doesn't > resolve {{NullableIntValue}} dummy fields right (DRILL-TBD)\--or two\--that > the HBase reading code should set up a Drill column for every HBase column > family (regardless of whether it has any columns in the rows that were read) > and that downstream code doesn't resolve {{NullableIntValue}} dummy fields > (resolution is applicable to sources other than just HBase). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4045) FLATTEN case in testNestedFlatten yields no rows (test didn't detect)
Daniel Barclay (Drill) created DRILL-4045: - Summary: FLATTEN case in testNestedFlatten yields no rows (test didn't detect) Key: DRILL-4045 URL: https://issues.apache.org/jira/browse/DRILL-4045 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Reporter: Daniel Barclay (Drill) The case of using {{FLATTEN}} on nested lists appearing in {{TestComplexTypeReader.testNestedFlatten()}} yields no rows. Part of the problem is that in the code generated by {{FlattenRecordBatch}}, the methods are empty. (That test method doesn't check the results, so, previous to DRILL-2288 work, the problem was not detected. However, with DRILL-2288 fixes, the flatten problem causes an {{IllegalArgumentException}} (logically, an assertion exception) in RecordBatchLoader, so the test is being disabled (with @Ignore) as part of DRILL-2288.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4010) In HBase reader, create child vectors for referenced HBase columns to avoid spurious schema changes
Daniel Barclay (Drill) created DRILL-4010: - Summary: In HBase reader, create child vectors for referenced HBase columns to avoid spurious schema changes Key: DRILL-4010 URL: https://issues.apache.org/jira/browse/DRILL-4010 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types, Storage - HBase Reporter: Daniel Barclay (Drill) {{HBaseRecordReader}} needs to create child vectors for all referenced/requested columns. Currently, if a fragment reads only HBase rows that don't have a particular referenced column (within a given column family), downstream code adds a dummy column of type {{NullableIntVector}} (as a child in the {{MapVector}} for the containing HBase column family). If any other fragment reads an HBase row that _does_ contain the referenced column, that fragment's reader will create a child {{NullableVarBinaryVector}} for the referenced column. When the data from those two fragments comes together, Drill detects a schema change, even though logically there isn't really any schema change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4001) Empty vectors from previous batch left by MapVector.load(...)/RecordBatchLoader.load(...)
Daniel Barclay (Drill) created DRILL-4001: - Summary: Empty vectors from previous batch left by MapVector.load(...)/RecordBatchLoader.load(...) Key: DRILL-4001 URL: https://issues.apache.org/jira/browse/DRILL-4001 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) In certain cases, {{MapVector.load(...)}} (called by {{RecordBatchLoader.load(...)}}) returns with some map child vectors having a length of zero instead of having a length matching the length of sibling vectors and the number of records in the batch. (This caused some of the {{IndexOutOfBoundException}} errors seen in fixing DRILL-2288.) The condition seems to be that a child field (e.g., an HBase column in a HBase column family) appears in an earlier batch and does not appear in a later batch. (The HBase column's child vector gets created (in the MapVector for the HBase column family) during loading of the earlier batch. During loading of the later batch, all vectors get reset to zero length, and then only vectors for fields _appearing in the batch message being loaded_ get loaded and set to the length of the batch-\-other vectors created from earlier messages/{{load}} calls are left with a length of zero (instead of, say, being filled with nulls to the length of their siblings and the current record batch).) See the TODO(DRILL-) mark and workaround in {{MapVector.getObject(int)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4002) Result check doesn't execute in TestNewMathFunctions.runTest(...)
Daniel Barclay (Drill) created DRILL-4002: - Summary: Result check doesn't execute in TestNewMathFunctions.runTest(...) Key: DRILL-4002 URL: https://issues.apache.org/jira/browse/DRILL-4002 Project: Apache Drill Issue Type: Bug Components: Tools, Build & Test Reporter: Daniel Barclay (Drill) In {{TestNewMathFunctions}}, method {{runTest}}'s check of the result does not execute. Method {{runTest(...)}} skips the first record batch--which currently contains the results to be checked. The loop that is right after that and that checks any subsequent batches never executes. Additionally, the test has no self-check assertions (e.g., that a second batch existed) to detect that its assumptions are not longer valid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3998) Check skipping of .clear and .release in BaseTestQuery.printResult(...) (bug?)
Daniel Barclay (Drill) created DRILL-3998: - Summary: Check skipping of .clear and .release in BaseTestQuery.printResult(...) (bug?) Key: DRILL-3998 URL: https://issues.apache.org/jira/browse/DRILL-3998 Project: Apache Drill Issue Type: Bug Components: Tools, Build & Test Reporter: Daniel Barclay (Drill) In {{BaseTestQuery.printResult(...)}}, if a loaded record batch has no records, the code skips calling not only the printout method but also {{RecordBatchLoader.clear()}} and {{QueryDataBatch.release()}} methods. Is that correct? (At some point in debugging DRILL-2288, that skipping of {{clear}} and {{release}} seemed to cause reporting of a memory leak.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Maven/checkstyle changes
Hey, what changed in the Maven setup regarding checkstyle? I used to be able to run "mvn validate" to run checkstyle to find style violations up front (e.g., before starting a long test run). However, that doesn't seem to work any more. It looks like checkstyle is run later, but it's not clear in what Maven phase or step. How do we run checkstyle (without building everything) now? Thanks, Daniel -- Daniel Barclay MapR Technologies
Re: [DISCUSS] Processing non-printable characters in Drill
Khurram Faraaz wrote: ... It looks like Drill processes non-printable characters in both cases, with and without the new text reader (exec.storage.enable_new_text_reader) Should we throw an error since these are non-printable characters ? No, I don't think so. Does there seem to be any need to reject non-printable characters? ... Content from the csv file used in test 1,^A 2,^B 3,^C 4,^D 5,^E 6,^F 0: jdbc:drill:schema=dfs.tmp> select * from `nonPrintables.csv`; +-+ | columns | +-+ | ["1","\u0001"] | | ["2","\u0002"] | | ["3","\u0003"] | | ["4","\u0004"] | | ["5","\u0005"] | | ["6","\u0006"] | +-+ 6 rows selected (0.521 seconds) 0: jdbc:drill:schema=dfs.tmp> select columns[1] from `nonPrintables.csv`; +-+ | EXPR$0 | +-+ || || || || || || +-+ 6 rows selected (0.382 seconds) Note what's going on there (re the difference between those two outputs): In the first case, the strings with unprintable characters go through Drill's conversion of a value of a complex type (e.g., VARCHAR ARRAY) to a JSON string (in order to have a string to return through the JDBC API). That conversion encodes string (VARCHAR) values as JSON string tokens, using JSON's escape sequences for the unprintable characters. Finally, the resultant JSON string (the whole string of JSON, not the JSON string token) is displayed by SQLLine or the web UI or whatever. (And don't forget the step of your copying and pasting into your message.) In the second case, the core part of Drill is directly returning the characters strings from the data through the JDBC API. Then, SQLLine or the web UI or whatever is deciding how to display those strings--including how handle any special, e.g., unprintable, characters. Evidently, SQLLine doesn't render unprintable characters into some visible form. It probably just writes them to your terminal's output stream. Since your terminal doesn't render them especially either, the characters still aren't visible, and when you copied to paste to compose your e-mail message, there was nothing from those special characters to copy. (Actually, the non-printable characters are slightly visible--note how the six lines with visually blank values have terminating vertical-bar characters that don't line up with the other terminating "+" or "|" characters.) From the point of view of the core part of Drill, it's up to the client of the JDBC API to decide how to display values, including character string with unprintable characters. (The JDBC API returns the Java representations (String objects) of the VARCHAR values.) However, from the point of view of users, SQLLine (and Drill's web UI too) should render all values visibly, including character strings with unprintable characters. (They should also render byte strings competently, e.g., rendering in hex the bytes themselves rather than displaying in hex the hash code of the Java byte array object that contains (a specific copy of) the bytes of the byte string(!).) Daniel -- Daniel Barclay MapR Technologies
[jira] [Created] (DRILL-3954) HBase tests use only 1 region, don't detect bug(s) in dummy-column NullableIntVector creation/resolution
Daniel Barclay (Drill) created DRILL-3954: - Summary: HBase tests use only 1 region, don't detect bug(s) in dummy-column NullableIntVector creation/resolution Key: DRILL-3954 URL: https://issues.apache.org/jira/browse/DRILL-3954 Project: Apache Drill Issue Type: Bug Components: Storage - HBase Reporter: Daniel Barclay (Drill) Currently, the HBase tests (e.g., {{TestHBaseFilterPushDown}}) use only one region. That causes them to miss detecting a bug in creating and/or resolving dummy fields ({{NullableIntVectors}} for referenced but non-existent fields) somewhere between reading from HBase and {{ProjectRecordBatch.setupNewSchema}} (or maybe two separate bugs). Reproduction: In HBaseTestsSuite, change the line: {noformat} UTIL.startMiniHBaseCluster(1, 1); {noformat} to: {noformat} UTIL.startMiniHBaseCluster(1, 3); {noformat} and change the line: {noformat} TestTableGenerator.generateHBaseDataset1(admin, TEST_TABLE_1, 1); {noformat} to: {noformat} TestTableGenerator.generateHBaseDataset1(admin, TEST_TABLE_1, 3); {noformat} . Run unit test class {{TestHBaseFilterPushDown}}. Depending on which region gets processed first (it's non-deteministic), test methods {{testFilterPushDownOrRowKeyEqualRangePred}} and {{testFilterPushDownMultiColumns}} get exceptions like this: {noformat} java.lang.IndexOutOfBoundsException: index: 0, length: 1 (expected: range(0, 0)) at io.netty.buffer.DrillBuf.checkIndexD(DrillBuf.java:189) at io.netty.buffer.DrillBuf.chk(DrillBuf.java:211) at io.netty.buffer.DrillBuf.getByte(DrillBuf.java:746) at org.apache.drill.exec.vector.UInt1Vector$Accessor.get(UInt1Vector.java:364) at org.apache.drill.exec.vector.NullableVarBinaryVector$Accessor.isSet(NullableVarBinaryVector.java:391) at org.apache.drill.exec.vector.NullableVarBinaryVector$Accessor.isNull(NullableVarBinaryVector.java:387) at org.apache.drill.exec.vector.NullableVarBinaryVector$Accessor.getObject(NullableVarBinaryVector.java:411) at org.apache.drill.exec.vector.NullableVarBinaryVector$Accessor.getObject(NullableVarBinaryVector.java:1) at org.apache.drill.exec.vector.complex.MapVector$Accessor.getObject(MapVector.java:313) at org.apache.drill.exec.util.VectorUtil.showVectorAccessibleContent(VectorUtil.java:166) at org.apache.drill.BaseTestQuery.printResult(BaseTestQuery.java:487) at org.apache.drill.hbase.BaseHBaseTest.printResultAndVerifyRowCount(BaseHBaseTest.java:95) at org.apache.drill.hbase.BaseHBaseTest.runHBaseSQLVerifyCount(BaseHBaseTest.java:91) at org.apache.drill.hbase.TestHBaseFilterPushDown.testFilterPushDownMultiColumns(TestHBaseFilterPushDown.java:592) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.lang.reflect.Method.invoke(Method.java:606) at java.lang.reflect.Method.invoke(Method.java:606) {noformat} See DRILL-TBD. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3955) Possible bug in creation of Drill columns for HBase column families
Daniel Barclay (Drill) created DRILL-3955: - Summary: Possible bug in creation of Drill columns for HBase column families Key: DRILL-3955 URL: https://issues.apache.org/jira/browse/DRILL-3955 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) If all of the rows read by a given {{HBaseRecordReader}} have no HBase columns in a given HBase column family, {{HBaseRecordReader}} doesn't create a Drill column for that HBase column family. Later, in a {{ProjectRecordBatch}}'s {{setupNewSchema}}, because no Drill column exists for that HBase column family, that {{setupNewSchema}} creates a dummy Drill column using the usual {{NullableIntVector}} type. In particular, it is not a map vector as {{HBaseRecordReader}} creates when it sees an HBase column family. Should {{HBaseRecordReader}} and/or something around setting up for reading HBase (including setting up that {{ProjectRecordBatch}}) make sure that all HBase column families are represented with map vectors so that {{setupNewSchema}} doesn't create a dummy field of type {{NullableIntVector}}? The problem is that, currently, when an HBase table is read in two separate fragments, one fragment (seeing rows with columns in the column family) can get a map vector for the column family while the other (seeing only rows with no columns in the column familar) can get the {{NullableIntVector}}. Downstream code that receives the two batches ends up with an unresolved conflict, yielding IndexOutOfBoundsExceptions as in DRILL-3954. It's not clear whether there is only one bug--that downstream code doesn't resolve {{NullableIntValue}} dummy fields right (DRILL-TBD)--or two--that the HBase reading code should set up a Drill column for every HBase column family (regardless of whether it has any columns in the rows that were read) and that downstream code doesn't resolve {{NullableIntValue}} dummy fields (resolution is applicable to sources other than just HBase). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: intermittent IndexOutOfBoundsException from uninitialized null-bits vector?
I wrote: *> Is f2's type of **INT:OPTIONAL**correct?* > > Why wouldn't f2's type be MAP:REQUIRED? Even if the HBase reader didn't see any HBase columns in HBase column family f2, doesn't it still know that f2 is a column family and shouldn't it still set Drill's f2 column to be a map and not of type INT:OPTIONAL? Okay, I see now where that part (f2's being INT:OPTIONAL) is coming from: One of the HBase scans doesn't read any rows with HBase columns in column family f2, so HBaseRecordReader doesn't know about column family f2 and doesn't create any Drill column f2. Drill column f2 is referred to in the query but doesn't exist the above scan's fragment, so some code in ProjectRecordBatch.setupNewSchema() creates it with type INT:OPTIONAL. I can't tell whether: * The HBase reader should be getting /all /column families. o (Probably not--doing so could wastefully get unneeded data.) * ProjectRecordBatch.setupNewSchema() is wrongly creating f2 with the current assumed type (INT:OPTIONAL) for referenced-but-absent fields. o (Probably not--it doesn't know about HBase or that f2 is an HBase column family to know to make the Drill f2 column a map.) * Something downstream in the fragment that receives data from the two HBase scan fragments isn't properly resolving the INT:OPTIONAL version of f2 (from the unknown-in-that fragment reference to f2) with the MAP:REQUIRED version of f2 (from the fragment that did see f2), e.g., resolving/converting f2 in the batch from the no-f2 scan to MAP:REQUIRED. o (Unclear how likely--it's not clear how downstream code could know when to take INT:OPTIONAL as a dummy type and ignore it and when to take it as the real type and report a schema conflict.) * The real problem is simply that ambiguously reusing INT:OPTIONAL for unknown columns, instead of using something separate and dedicated, is fundamentally flawed, and that core problem should be fixed. o (Seems likely.) o Can we just create some sibling of our enumerator INT and use that for referenced-but-otherwise-unknown columns? (Does any need to also correspond to a Calcite enumerator prevent that approach (without a Calcite change)?) o If that doesn't work, would creating a new sibling major type work? o With a dedicated indication of the referenced-but-of-unknown-type case, code could easily tell which seemingly schema differences were actually simpley resolvable. Daniel Daniel Barclay wrote: [With corrected track trace.] Chris, Actually, now I am seeing a non-deterministic IOOBE like yours (with length = 1). Note in the call stack way below that it's coming from an isNull() method. The isNull() method was called with an index of 0 when the top-level vector container or whatever had one row. *It looks like the subvector used to track null values didn't get filled in right. *(I can't tell yet if it also means that the value-printing code in the HBase test is missing something about a schema change.) This is from an HBase query with "WHERE row_key = 'a2' or row_key between 'b5' and 'b6'". A batch of 2 rows resulting from the "between ..." part comes first--*non-deterministically* (it was consistent for many runs, but it switched just now in a run for adding the schemas to this message)--and then the batch of one row for the "= 'a2'" part seems messed up in the second column family (f2): The first batch's schema is (note *f2*'s type): BatchSchema [fields=[`row_key`(VARBINARY:REQUIRED)[`$offsets$`(UINT4:REQUIRED)], `f`(MAP:REQUIRED)[`f`.`c1`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f`.`c1`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f`.`c2`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f`.`c2`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f`.`c3`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f`.`c3`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f`.`c4`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f`.`c4`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f`.`c5`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f`.`c5`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f`.`c6`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f`.`c6`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f`.`c8`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f`.`c8`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]]], `f2`(MAP:REQUIRED)[`f2`.`c1`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f2`.`c1`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f2`.`c3`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f2`.`c3`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f2`.`c5`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f2`.`c5`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f2`.`c7`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f2`.`c7`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f2`.`c9`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f2`.`c9`(VARBINARY:OPTIONAL)[`$offsets$`(UI
intermittent IndexOutOfBoundsException from uninitialized null-bits vector?
rameworkMethod$1(ReflectiveCallable).run() line: 12 FrameworkMethod.invokeExplosively(Object, Object...) line: 44 JUnit4TestRunnerDecorator.executeTestMethod(FrameworkMethod, Object, Object...) line: 120 JUnit4TestRunnerDecorator.invokeExplosively(FrameworkMethod, Object, Object...) line: 65 MockFrameworkMethod.invokeExplosively(Invocation, Object, Object...) line: 29 GeneratedMethodAccessor133.invoke(Object, Object[]) line: not available DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 MethodReflection.invokeWithCheckedThrows(Object, Method, Object...) line: 95 MockMethodBridge.callMock(Object, boolean, String, String, String, int, int, boolean, Object[]) line: 76 MockMethodBridge.invoke(Object, Method, Object[]) line: 41 (FrameworkMethod).invokeExplosively(Object, Object...) line: 44 InvokeMethod.evaluate() line: 17 RunBefores.evaluate() line: 26 RunAfters.evaluate() line: 27 TestWatcher$1.evaluate() line: 55 TestWatcher$1.evaluate() line: 55 TestWatcher$1.evaluate() line: 55 ExpectedException$ExpectedExceptionStatement.evaluate() line: 168 TestWatcher$1.evaluate() line: 55 RunRules.evaluate() line: 20 BlockJUnit4ClassRunner(ParentRunner).runLeaf(Statement, Description, RunNotifier) line: 271 BlockJUnit4ClassRunner.runChild(FrameworkMethod, RunNotifier) line: 70 BlockJUnit4ClassRunner.runChild(Object, RunNotifier) line: 50 ParentRunner$3.run() line: 238 ParentRunner$1.schedule(Runnable) line: 63 BlockJUnit4ClassRunner(ParentRunner).runChildren(RunNotifier) line: 236 ParentRunner.access$000(ParentRunner, RunNotifier) line: 53 ParentRunner$2.evaluate() line: 229 RunBefores.evaluate() line: 26 RunAfters.evaluate() line: 27 BlockJUnit4ClassRunner(ParentRunner).run(RunNotifier) line: 309 JUnit4TestClassReference(JUnit4TestReference).run(TestExecution) line: 50 TestExecution.run(ITestReference[]) line: 38 RemoteTestRunner.runTests(String[], String, TestExecution) line: 459 RemoteTestRunner.runTests(TestExecution) line: 675 RemoteTestRunner.run() line: 382 RemoteTestRunner.main(String[]) line: 192 Daniel Chris Westin wrote: I seem to recall you were telling me about a new IOOB that you're seeing, was that you? Is is this? Execution Failures: /root/drillAutomation/framework-master/framework/resources/Functional/aggregates/tpcds_variants/csv/aggregate26.q Query: select cast(case columns[0] when '' then 0 else columns[0] end as int) as soldd, cast(case columns[1] when '' then 0 else columns[1] end as bigint) as soldt, cast(case columns[2] when '' then 0 else columns[2] end as float) as itemsk, cast(case columns[3] when '' then 0 else columns[3] end as decimal(18,9)) as custsk, cast(case columns[4] when '' then 0 else columns[4] end as varchar(20)) as cdemo, columns[5] as hdemo, columns[6] as addrsk, columns[7] as storesk, columns[8] as promo, columns[9] as tickn, sum(case columns[10] when '' then 0 else cast(columns[10] as int) end) as quantities from `store_sales.dat` group by cast(case columns[0] when '' then 0 else columns[0] end as int), cast(case columns[1] when '' then 0 else columns[1] end as bigint), cast(case columns[2] when '' then 0 else columns[2] end as float), cast(case columns[3] when '' then 0 else columns[3] end as decimal(18,9)), cast(case columns[4] when '' then 0 else columns[4] end as varchar(20)), columns[5], columns[6], columns[7], columns[8], columns[9] order by soldd desc, soldt desc, itemsk desc limit 20 Failed with exception java.sql.SQLException: SYSTEM ERROR: IndexOutOfBoundsException: index: 0, length: 1 (expected: range(0, 0)) Fragment 0:0 [Error Id: 2322e296-4fff-4770-a778-c20ea4d7 on atsqa6c61.qa.lab:31010] at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247) at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:320) at oadd.net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187) at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:160) at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:203) at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:89) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IndexOutOfBoundsException: index: 0, length: 1 (expected: range(0, 0)) Fragment 0:0 -- Daniel Barclay MapR Technologies
intermittent IndexOutOfBoundsException from uninitialized null-bits vector?
, Object...) line: 606 FrameworkMethod$1.runReflectiveCall() line: 47 FrameworkMethod$1(ReflectiveCallable).run() line: 12 FrameworkMethod.invokeExplosively(Object, Object...) line: 44 JUnit4TestRunnerDecorator.executeTestMethod(FrameworkMethod, Object, Object...) line: 120 JUnit4TestRunnerDecorator.invokeExplosively(FrameworkMethod, Object, Object...) line: 65 MockFrameworkMethod.invokeExplosively(Invocation, Object, Object...) line: 29 GeneratedMethodAccessor133.invoke(Object, Object[]) line: not available DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 MethodReflection.invokeWithCheckedThrows(Object, Method, Object...) line: 95 MockMethodBridge.callMock(Object, boolean, String, String, String, int, int, boolean, Object[]) line: 76 MockMethodBridge.invoke(Object, Method, Object[]) line: 41 (FrameworkMethod).invokeExplosively(Object, Object...) line: 44 InvokeMethod.evaluate() line: 17 RunBefores.evaluate() line: 26 RunAfters.evaluate() line: 27 TestWatcher$1.evaluate() line: 55 TestWatcher$1.evaluate() line: 55 TestWatcher$1.evaluate() line: 55 ExpectedException$ExpectedExceptionStatement.evaluate() line: 168 TestWatcher$1.evaluate() line: 55 RunRules.evaluate() line: 20 BlockJUnit4ClassRunner(ParentRunner).runLeaf(Statement, Description, RunNotifier) line: 271 BlockJUnit4ClassRunner.runChild(FrameworkMethod, RunNotifier) line: 70 BlockJUnit4ClassRunner.runChild(Object, RunNotifier) line: 50 ParentRunner$3.run() line: 238 ParentRunner$1.schedule(Runnable) line: 63 BlockJUnit4ClassRunner(ParentRunner).runChildren(RunNotifier) line: 236 ParentRunner.access$000(ParentRunner, RunNotifier) line: 53 ParentRunner$2.evaluate() line: 229 RunBefores.evaluate() line: 26 RunAfters.evaluate() line: 27 BlockJUnit4ClassRunner(ParentRunner).run(RunNotifier) line: 309 JUnit4TestClassReference(JUnit4TestReference).run(TestExecution) line: 50 TestExecution.run(ITestReference[]) line: 38 RemoteTestRunner.runTests(String[], String, TestExecution) line: 459 RemoteTestRunner.runTests(TestExecution) line: 675 RemoteTestRunner.run() line: 382 RemoteTestRunner.main(String[]) line: 192 Daniel Chris Westin wrote: I seem to recall you were telling me about a new IOOB that you're seeing, was that you? Is is this? Execution Failures: /root/drillAutomation/framework-master/framework/resources/Functional/aggregates/tpcds_variants/csv/aggregate26.q Query: select cast(case columns[0] when '' then 0 else columns[0] end as int) as soldd, cast(case columns[1] when '' then 0 else columns[1] end as bigint) as soldt, cast(case columns[2] when '' then 0 else columns[2] end as float) as itemsk, cast(case columns[3] when '' then 0 else columns[3] end as decimal(18,9)) as custsk, cast(case columns[4] when '' then 0 else columns[4] end as varchar(20)) as cdemo, columns[5] as hdemo, columns[6] as addrsk, columns[7] as storesk, columns[8] as promo, columns[9] as tickn, sum(case columns[10] when '' then 0 else cast(columns[10] as int) end) as quantities from `store_sales.dat` group by cast(case columns[0] when '' then 0 else columns[0] end as int), cast(case columns[1] when '' then 0 else columns[1] end as bigint), cast(case columns[2] when '' then 0 else columns[2] end as float), cast(case columns[3] when '' then 0 else columns[3] end as decimal(18,9)), cast(case columns[4] when '' then 0 else columns[4] end as varchar(20)), columns[5], columns[6] , columns[7], columns[8], columns[9] order by soldd desc, soldt desc, itemsk desc limit 20 Failed with exception java.sql.SQLException: SYSTEM ERROR: IndexOutOfBoundsException: index: 0, length: 1 (expected: range(0, 0)) Fragment 0:0 [Error Id: 2322e296-4fff-4770-a778-c20ea4d7 on atsqa6c61.qa.lab:31010] at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247) at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:320) at oadd.net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187) at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:160) at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:203) at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:89) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IndexOutOfBoundsException: index: 0, length: 1 (expected: range(0, 0)) Fragment 0:0 -- Daniel Barclay MapR Technologies
current support/non-support for empty JSON files mixed in with regular JSON files
Do we intend/expect Drill to _currently_ handle having empty (zero-byte) JSON files mixed in with regular JSON files? The in-progress fix for DRILL-2288 hits problems with some cases of having empty JSON files mixed in with some non-empty JSON files. However, it's not clear whether those cases are currently supported (and DRILL-2288 changes need to not break that support) or those cases are not supported yet anyway, so changes needed to fix DRILL-2288 don't need to leave Drill in the state of handling such empty JSON files. Thanks, Daniel -- Daniel Barclay MapR Technologies
[jira] [Created] (DRILL-3902) Bad error message: core cause not included in text; maybe wrong kind
Daniel Barclay (Drill) created DRILL-3902: - Summary: Bad error message: core cause not included in text; maybe wrong kind Key: DRILL-3902 URL: https://issues.apache.org/jira/browse/DRILL-3902 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) When trying to use an empty directory as a table causes Drill to fail by hitting an IndexOutOfBoundsException, the final error message includes the text from the IndexOutOfBoundsException's getMessage()--but fails to mention IndexOutOfBoundsException itself (or equivalent information): {noformat} 0: jdbc:drill:zk=localhost:2181> SELECT * FROM `dfs`.`root`.`/tmp/empty_directory`; Error: VALIDATION ERROR: Index: 0, Size: 0 [Error Id: 66ff61ed-ea41-4af9-87c5-f91480ef1b21 on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:zk=localhost:2181> {noformat} Also, since this isn't a coherent/intentional validation error but an internal error, shouldn't this be a SYSTEM ERROR message? (Does the SYSTEM ERROR case including the exception class name in the message?) Daniel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: ensureAtLeastOneField: obsolete? needed?
Jacques, > ... it ... should probably check if a column is already setup in the output mutation and, if so, don't re-add the ensure at least one column. Thus probably requires a new method in output mutator (isEmpty or size) Some questions on the various moving pieces: Does ensureAtLeastOneField() need to check only for the fieldWriter.integer(...) call, or does it also need to check for the earlier fieldWriter.map(...) call? (I'm not clear on what cases that loop can encounter.) Would the method go (be declared) on MapWriter (re ensureAtLeastOneField()'s fieldWriter variable) or on ComplexWriter (re its writer parameter) (or something else)? Which data should the method implementation be consulting? For example, I see that SingleMapWriter has both container and fields members, and traversing down container (at least in the current case I'm looking at) leads to fields primary, secondary, and delegate (in MapWithOrdinal). Which is the primary/best source for whether there's any column (or subcolumn?) yet?) Thanks, Daniel Jacques Nadeau wrote: It's purpose is to make sure that a projection that finds no columns still produces output. Think record {a:4,b:6} If I say 'select c from t' , the reader (without this functionality) return zero columns (not supported). As such, ensure at least one adds a column. I believe it is still needed but should probably check if a column is already setup in the output mutation and, if so, don't re-add the ensure at least one column. Thus probably requires a new method in output mutator (isEmpty or size) On Oct 3, 2015 7:49 PM, "Daniel Barclay" <dbarc...@maprtech.com <mailto:dbarc...@maprtech.com>> wrote: What exactly is the purpose of ensureAtLeastOneField() org.apache.drill.exec.vector.complex.fn.JsonReader (drill/JsonReader.java at fdb6b4fecee30282d8f490e78b7f2dc3a2e27347 · apache/drill <https://github.com/apache/drill/blob/fdb6b4fecee30282d8f490e78b7f2dc3a2e27347/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L92>)? Is it still needed? Currently, it makes the first field of a map be a nullable-integer field if the JSON reader reads no nows. However, it does that /regardless/ of whether the first field already exists from earlier readers, causing a later reader and ScanBatch to signal a schema change when there wasn't really a schema change. This is currently causing breaking in the attempt to fix the ScanBatch/IterOutcome problems underlying DRILL-2288. Example case: - There are three JSON files. The first and last to be read have the same schema. The middle file is empty. They are all read by the same ScanBatch. - The first JSON reader sets map fields. - The second JSON reader sees no rows, so its atLeastOneWrite flag isn't set, so its ensureAtLeastOneField() thinks it needs to add a field, but forcibly sets the first field to be a nullable-int field--/regardless /of whether a first field exists, so it changes what the first reader set it to. - Then, somewhere in and/or downstream of the third reader (with in-progress ScanBatch fixes in place), Drill gets incompatible-vector errors (mentioning the second reader's NullableIntVector vs. the original reader's type for the first field) and/or schema-change-not-supported errors because ScanBatch reported OK_NEW_SCHEMA (instead of OK) when the schema didn't really change (between the first and third JSON files). Disabling ensureAtLeastOneField() eliminated the wrong-vector-type or unsupported-schema-change errors, and did not cause any new errors in the java-exec unit tests. (I haven't checked other tests yet.) Also, ensureAtLeastOneField() (or next()) has a comment about making sure there's a field in order to return a record count, but next() returns the record count. Those two things make me wonder if ensureAtLeastOneField()is now obsolete. Can it be deleted now? Or is it the case that it is still needed, but it needs to check whether there are already any fields before (currently blindly) creating one? Thanks, Daniel -- Daniel Barclay MapR Technologies -- Daniel Barclay MapR Technologies
Re: New Slack setup for Devs and Users
Is Slack threaded as e-mail is? (If not, and you can't see messages grouped by subject (that is, if it's the case that you can't see the tree of messages for a subject without wading through irrelevant messages that occurred at intervening times), how is Slack better than using the existing e-mail lists?) Also, should we really be siphoning discussion away from the public Apache Drill mailing lists and into the private Slack enviroment? (Or did I miss some public way to see what's in Slack?) Daniel Jacques Nadeau wrote: Hey Guys, We've been using Slack a lot internally and have found it very useful. I setup a new slack for Drill developers and users. I've set it to automatically accept new users from @apache.org as well as @mapr.com, @ maprtech.com and @dremio.com. I'm more than happy to whitelist other domains (but slack won't let me enable general domains such as gmail and yahoo). If you aren't on one the whitelisted domains, just send me an email and I'll invite you (or add your domain as appropriate). Remember, these channels should be used for help as opposed to making design decisions. My goal is also to post a digest once a week from the channels back on to the list so that the information is publicly available. I set up two initial channels: #user and #dev. Let's see if this makes things easier for everybody. To Join, go to: https://drillers.slack.com/signup -- Jacques Nadeau CTO and Co-Founder, Dremio -- Daniel Barclay MapR Technologies
Re: ensureAtLeastOneField: obsolete? needed?
[Also adding forgotten cc to dev. list] Jacques, Jacques Nadeau wrote: It's purpose is to make sure that a projection that finds no columns still produces output. Think record {a:4,b:6} If I say 'select c from t' , the reader (without this functionality) return zero columns (not supported). As such, ensure at least one adds a column. I believe it is still needed In what sense is that not supported? For example, where would returning zero columns be expected to break? When I tried disabling ensureAtLeastOneField(), the only thing I've seen break so far is the empty-schema check in IteratorValidatorBatchIterator. What that validator check is also disabled, all drill-java-exec unit tests run. (I'm currently checking later-run tests (drill-jdbc, etc.) and have not yet tried the regression tests.) but should probably check if a column is already setup in the output mutation and, if so, don't re-add the ensure at least one column. Thus probably requires a new method in output mutator (isEmpty or size) Yeah; I was wondering how to get that information given that only writing methods seemed to be available on the interface types used in ensureAtLeastOneField(). On Oct 3, 2015 7:49 PM, "Daniel Barclay" <dbarc...@maprtech.com <mailto:dbarc...@maprtech.com>> wrote: What exactly is the purpose of ensureAtLeastOneField() org.apache.drill.exec.vector.complex.fn.JsonReader (drill/JsonReader.java at fdb6b4fecee30282d8f490e78b7f2dc3a2e27347 · apache/drill <https://github.com/apache/drill/blob/fdb6b4fecee30282d8f490e78b7f2dc3a2e27347/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L92>)? Is it still needed? Currently, it makes the first field of a map be a nullable-integer field if the JSON reader reads no nows. However, it does that /regardless/ of whether the first field already exists from earlier readers, causing a later reader and ScanBatch to signal a schema change when there wasn't really a schema change. This is currently causing breaking in the attempt to fix the ScanBatch/IterOutcome problems underlying DRILL-2288. Example case: - There are three JSON files. The first and last to be read have the same schema. The middle file is empty. They are all read by the same ScanBatch. - The first JSON reader sets map fields. - The second JSON reader sees no rows, so its atLeastOneWrite flag isn't set, so its ensureAtLeastOneField() thinks it needs to add a field, but forcibly sets the first field to be a nullable-int field--/regardless /of whether a first field exists, so it changes what the first reader set it to. - Then, somewhere in and/or downstream of the third reader (with in-progress ScanBatch fixes in place), Drill gets incompatible-vector errors (mentioning the second reader's NullableIntVector vs. the original reader's type for the first field) and/or schema-change-not-supported errors because ScanBatch reported OK_NEW_SCHEMA (instead of OK) when the schema didn't really change (between the first and third JSON files). Disabling ensureAtLeastOneField() eliminated the wrong-vector-type or unsupported-schema-change errors, and did not cause any new errors in the java-exec unit tests. (I haven't checked other tests yet.) Also, ensureAtLeastOneField() (or next()) has a comment about making sure there's a field in order to return a record count, but next() returns the record count. Those two things make me wonder if ensureAtLeastOneField()is now obsolete. Can it be deleted now? Or is it the case that it is still needed, but it needs to check whether there are already any fields before (currently blindly) creating one? Thanks, Daniel -- Daniel Barclay MapR Technologies -- Daniel Barclay MapR Technologies
[jira] [Created] (DRILL-3885) Column alias "`f.c`" rejected if number of regions is > 1 in HBase unit tests
Daniel Barclay (Drill) created DRILL-3885: - Summary: Column alias "`f.c`" rejected if number of regions is > 1 in HBase unit tests Key: DRILL-3885 URL: https://issues.apache.org/jira/browse/DRILL-3885 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Drill rejects the column alias {{`f.c`}}, because of its period character, in this query: {noformat} SELECT row_key, convert_from(tableName.f.c, 'UTF8') `f.c` FROM hbase.`TestTable3` tableName WHERE row_key LIKE '08%0' OR row_key LIKE '%70' {noformat} in unit test {{TestHBaseFilterPushDown.testFilterPushDownRowKeyLike}} if the number of regions used in {{HBaseTestsSuite}} is set to something greater than one. One problem seems to be that the validation check is inconsistent, happening only if the data structure containing that alias get serialized and deserialized. The rejection of that alias seems like a problem (at least from the SQL level), although it seems that it might be reasonable given some nearby code, suggesting that maybe names/expressions/something aren't encoded enough to handle name segments with periods. The exception stack trace is: {noformat} org.apache.drill.exec.rpc.RpcException: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: UnsupportedOperationException: Field references must be singular names. Fragment 1:1 [Error Id: 34475f52-6f22-43be-9011-c31a84469781 on dev-linux2:31010] at org.apache.drill.exec.rpc.RpcException.mapException(RpcException.java:60) at org.apache.drill.exec.client.DrillClient$ListHoldingResultsListener.getResults(DrillClient.java:386) at org.apache.drill.exec.client.DrillClient.runQuery(DrillClient.java:291) at org.apache.drill.BaseTestQuery.testRunAndReturn(BaseTestQuery.java:292) at org.apache.drill.BaseTestQuery.testSqlWithResults(BaseTestQuery.java:279) at org.apache.drill.hbase.BaseHBaseTest.runHBaseSQLlWithResults(BaseHBaseTest.java:86) at org.apache.drill.hbase.BaseHBaseTest.runHBaseSQLVerifyCount(BaseHBaseTest.java:90) at org.apache.drill.hbase.TestHBaseFilterPushDown.testFilterPushDownRowKeyLike(TestHBaseFilterPushDown.java:466) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.lang.reflect.Method.invoke(Method.java:606) at java.lang.reflect.Method.invoke(Method.java:606) Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: UnsupportedOperationException: Field references must be singular names. Fragment 1:1 [Error Id: 34475f52-6f22-43be-9011-c31a84469781 on dev-linux2:31010] at org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118) at org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:110) at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47) at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:1) at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61) at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233) at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:1) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteCha
[jira] [Created] (DRILL-3863) TestBuilder.baseLineColumns(...) doesn't take net strings; parses somehow--can't test some names
Daniel Barclay (Drill) created DRILL-3863: - Summary: TestBuilder.baseLineColumns(...) doesn't take net strings; parses somehow--can't test some names Key: DRILL-3863 URL: https://issues.apache.org/jira/browse/DRILL-3863 Project: Apache Drill Issue Type: Bug Components: Tools, Build & Test Reporter: Daniel Barclay (Drill) Assignee: Jason Altekruse {{TestBuilder}}'s {{baseLineColumns(String...)}} method doesn't take the given strings as net column names, and instead tries to parse them somehow, but doesn't parse them as the SQL parser would (and that method's Javadoc documentation doesn't seem to say how the strings are parsed/interpreted or indicate any third way of specifying arbitrary net column names). That means that certain column names _cannot be checked_ for (cannot be used in the result set being checked). For example, in Drill, the SQL delimited identifier "{{`Column B`}}" specifies a net column name of "{{Column B}}". However, passing that net column name (that is, a {{String}} representing that net column name) to {{baseLineColumns}} results in a strange parsing error. (See Test Class 1 and the error in Failure Trace 1.) Checking whether {{baseLineColumns}} takes SQL-level syntax for column names rather than net column names (by passing a string including the back-quote characters of the delimited identifier) seems to indicate that {{baseLineColumns}} doesn't take that syntax that either. (See Test Class 2 and the three expected/returned records in Failure Trace 2.) That seems to mean that it's impossible to use {{baseLineColumns}} to validate certain column names (including the fairly simple/common case of alias names containing spaces for output formatting purposes). Test Class 1: {noformat} import org.junit.Test; public class TestTEMPFileNameBugs extends BaseTestQuery { @Test public void test1() throws Exception { testBuilder() .sqlQuery( "SELECT * FROM ( VALUES (1, 2) ) AS T(column_a, `Column B`)" ) .unOrdered() .baselineColumns("column_a", "Column B") .baselineValues(1, 2) .go(); } } {noformat} Failure Trace 1: {noformat} org.apache.drill.common.exceptions.ExpressionParsingException: Expression has syntax error! line 1:0:no viable alternative at input 'Column' at org.apache.drill.common.expression.parser.ExprParser.displayRecognitionError(ExprParser.java:169) at org.antlr.runtime.BaseRecognizer.reportError(BaseRecognizer.java:186) at org.apache.drill.common.expression.parser.ExprParser.lookup(ExprParser.java:5163) at org.apache.drill.common.expression.parser.ExprParser.atom(ExprParser.java:4370) at org.apache.drill.common.expression.parser.ExprParser.unaryExpr(ExprParser.java:4252) at org.apache.drill.common.expression.parser.ExprParser.xorExpr(ExprParser.java:3954) at org.apache.drill.common.expression.parser.ExprParser.mulExpr(ExprParser.java:3821) at org.apache.drill.common.expression.parser.ExprParser.addExpr(ExprParser.java:3689) at org.apache.drill.common.expression.parser.ExprParser.relExpr(ExprParser.java:3564) at org.apache.drill.common.expression.parser.ExprParser.equExpr(ExprParser.java:3436) at org.apache.drill.common.expression.parser.ExprParser.andExpr(ExprParser.java:3310) at org.apache.drill.common.expression.parser.ExprParser.orExpr(ExprParser.java:3185) at org.apache.drill.common.expression.parser.ExprParser.condExpr(ExprParser.java:3110) at org.apache.drill.common.expression.parser.ExprParser.expression(ExprParser.java:3041) at org.apache.drill.common.expression.parser.ExprParser.parse(ExprParser.java:206) at org.apache.drill.TestBuilder.parsePath(TestBuilder.java:202) at org.apache.drill.TestBuilder.baselineColumns(TestBuilder.java:333) at org.apache.drill.TestTEMPFileNameBugs.test1(TestTEMPFileNameBugs.java:30) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.lang.reflect.Method.invoke(Method.java:606) {noformat} Test Class 2: {noformat} import org.junit.Test; public class TestTEMPFileNameBugs extends BaseTestQuery { @Test public void test1() throws Exception { testBuilder() .sqlQuery( "SELECT * FROM ( VALUES (1, 2) ) AS T(column_a, `Column B`)" ) .unOrdered() .baselineColumns("column_a", "`Column B`") .baselineValues(1, 2) .go(); } } {noformat} Failure Trace 2: {noformat} java.lang.Exception: After matching 0 records, did not find expected record in result set: `Column B` : 2, `column_a` : 1, Some examples of ex
[jira] [Created] (DRILL-3859) Delimited identifier `*` breaks in aliases list--causes AssertionError saying "INTEGER"
Daniel Barclay (Drill) created DRILL-3859: - Summary: Delimited identifier `*` breaks in aliases list--causes AssertionError saying "INTEGER" Key: DRILL-3859 URL: https://issues.apache.org/jira/browse/DRILL-3859 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) When a delimited identifier whose body consists of a single asterisk ("{{`*`}}") is used in a subquery aliases list and the containing query's select list refers to a non-existent column, Drill throws an assertion error (and its message says only "INTEGER"). For example, see the third query and its error message in the following: {noformat} 0: jdbc:drill:zk=local> SELECT * FROM (VALUES (0, 0)) AS T(A, `*`); +++ | A | * | +++ | 0 | 0 | +++ 1 row selected (0.143 seconds) 0: jdbc:drill:zk=local> SELECT a FROM (VALUES (0, 0)) AS T(A, `*`); ++ | a | ++ | 0 | ++ 1 row selected (0.127 seconds) 0: jdbc:drill:zk=local> SELECT b FROM (VALUES (0, 0)) AS T(A, `*`); Error: SYSTEM ERROR: AssertionError: INTEGER [Error Id: 859d3ef9-b1e7-497b-b366-b64b2b592b69 on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:zk=local> {noformat} It's not clear that the problem is in the SQL parser area (because another bug with {{`*`}} that _acts_ the same as a hypothetical parser problem strongly seems to be downstream of the parser). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3861) Apparent uncontrolled format string error in table name error reporting
Daniel Barclay (Drill) created DRILL-3861: - Summary: Apparent uncontrolled format string error in table name error reporting Key: DRILL-3861 URL: https://issues.apache.org/jira/browse/DRILL-3861 Project: Apache Drill Issue Type: Bug Components: SQL Parser Reporter: Daniel Barclay (Drill) It seems that a data string is being used as a printf format string. In the following, note the percent character in name of the table file (which does not exist, apparently trying to cause an expected no-such-table error) and that the actual error mentions format conversion characters: {noformat} 0: jdbc:drill:zk=local> select * from `test%percent.json`; Sep 29, 2015 2:59:37 PM org.apache.calcite.sql.validate.SqlValidatorException SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 'test%percent.json' not found Sep 29, 2015 2:59:37 PM org.apache.calcite.runtime.CalciteException SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 15 to line 1, column 33: Table 'test%percent.json' not found Error: SYSTEM ERROR: UnknownFormatConversionException: Conversion = 'p' [Error Id: 8025e561-6ba1-4045-bbaa-a96cafc7f719 on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:zk=local> {noformat} (Selecting SQL Parser component because I _think_ table/file existing is checked in validation called in or near the parsing step.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
"../" in file pathnames - intend to block or not?
In file/directory pathnames for tables, does Drill intend to block use of "../" that traverses up beyond the root of the workspace (i.e., above /tmp for (default) dfs.tmp)? Daniel -- Daniel Barclay MapR Technologies
[jira] [Created] (DRILL-3864) TestBuilder "Unexpected column" message doesn't show records
Daniel Barclay (Drill) created DRILL-3864: - Summary: TestBuilder "Unexpected column" message doesn't show records Key: DRILL-3864 URL: https://issues.apache.org/jira/browse/DRILL-3864 Project: Apache Drill Issue Type: Bug Components: Tools, Build & Test Reporter: Daniel Barclay (Drill) Assignee: Jason Altekruse When {{TestBuilder}} reports that the actual result set contains an unexpected column, it doesn't show any whole expected record (as it shows some expected record and some actual records for the "did not find expected record in result set" case). Showing a couple of whole expected records, rather than just reporting the unexpected column name(s), would speed up diagnosis of test failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3860) Delimited identifier `*` breaks in select list--acts like plain asterisk token
Daniel Barclay (Drill) created DRILL-3860: - Summary: Delimited identifier `*` breaks in select list--acts like plain asterisk token Key: DRILL-3860 URL: https://issues.apache.org/jira/browse/DRILL-3860 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) At least when it appears in a SELECT list, a delimited identifier whose body consists of a single asterisk ("{{`*`}}") is not treated consistently with other delimited identifiers (that is, specifying a column whose name matches the body ("{{*}}").) For example, in the following, notice how in the first two queries, each select list delimited identifier selects the one expected column, but in the third query, instead of selecting the one expected column, it selected all columns (list the regular "{{*}}" in the fourth query): {noformat} 0: jdbc:drill:zk=local> SELECT `a` FROM (VALUES (1, 2, 3)) AS T(a, `.`, `*`); ++ | a | ++ | 1 | ++ 1 row selected (0.132 seconds) 0: jdbc:drill:zk=local> SELECT `.` FROM (VALUES (1, 2, 3)) AS T(a, `.`, `*`); ++ | . | ++ | 2 | ++ 1 row selected (0.152 seconds) 0: jdbc:drill:zk=local> SELECT `*` FROM (VALUES (1, 2, 3)) AS T(a, `.`, `*`); ++++ | a | . | * | ++++ | 1 | 2 | 3 | ++++ 1 row selected (0.136 seconds) 0: jdbc:drill:zk=local> SELECT * FROM (VALUES (1, 2, 3)) AS T(a, `.`, `*`); ++++ | a | . | * | ++++ | 1 | 2 | 3 | ++++ 1 row selected (0.128 seconds) 0: jdbc:drill:zk=local> {noformat} Although this acts the same as if the SQL parser treated the delimited identifier {{`*`}} as a plain asterisk token, that does not seem to be the actual mechanism for this behavior. (The problem seems to be further downstream.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: "../" in file pathnames - intend to block or not?
Jason Altekruse wrote: Yes, we want workspaces to be able to used in conjunction with authentication to provide limited views of data to some users. Is this currently not being enforced? I'm not sure what would be enforced with authentication/impersonation turned on (especially, whether access is checked after all pathname resolution is done or is checked too early). I was just running in regular (no-impersonation) local mode and noticed that using "../" in a pathname can get to directories outside the workspace's root. Is that behavior expected or is that a bug? (Part of, or another way to ask, my question is whether we: - only intend the workspace to be a like a default working directory (where you usually give downward-only relative names to files in its subtree, but might occasionally reach out of the subtree), or - intend the workspace to be more restricted.) Daniel On Tue, Sep 29, 2015 at 3:49 PM, Daniel Barclay <dbarc...@maprtech.com> wrote: In file/directory pathnames for tables, does Drill intend to block use of "../" that traverses up beyond the root of the workspace (i.e., above /tmp for (default) dfs.tmp)? Daniel -- Daniel Barclay MapR Technologies -- Daniel Barclay MapR Technologies
[jira] [Created] (DRILL-3848) Increase timeout time on several tests that time out frequently.
Daniel Barclay (Drill) created DRILL-3848: - Summary: Increase timeout time on several tests that time out frequently. Key: DRILL-3848 URL: https://issues.apache.org/jira/browse/DRILL-3848 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Assignee: Daniel Barclay (Drill) Increase test timeout time a bit on: - TestTpchDistributedConcurrent - TestExampleQueries - TestFunctionsQuery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
How get pull req. linked (auto-updating) to two JIRA issues?
For dealing with a branch addressing two (interrelated) JIRA issues, have we figured out what to do so that the GitHub pull request is linked to both JIRA issues (i.e., it auto-updates both)? (Some tries at including both DRILL- keys in the PR description failed to link to JIRA reports. Does anyone know exactly where GitHub recognizes JIRA keys?) Thanks, Daniel -- Daniel Barclay MapR Technologies
[jira] [Created] (DRILL-3814) Directory containing only unrecognized files reported as not found vs. taken as empty table
Daniel Barclay (Drill) created DRILL-3814: - Summary: Directory containing only unrecognized files reported as not found vs. taken as empty table Key: DRILL-3814 URL: https://issues.apache.org/jira/browse/DRILL-3814 Project: Apache Drill Issue Type: Bug Components: SQL Parser, Storage - Other Reporter: Daniel Barclay (Drill) Assignee: Aman Sinha A directory subtree all of whose descendent files have unrecognized extensions is reported as non-existent rather treated as a table with zero rows. Is this intended? (The error message is the exact same error message that results if the user gets a directory name wrong and refers to a non-existent directory, making the message really confusing and misleading.) For example, for directory {{/tmp/unrecognized_files_directory}} containing only file {{/tmp/unrecognized_files_directory/junk.junk}}: {noformat} 0: jdbc:drill:zk=local> SELECT * FROM `dfs`.`tmp`.`unrecognized_files_directory`; Sep 20, 2015 11:16:34 PM org.apache.calcite.sql.validate.SqlValidatorException SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 'dfs.tmp.unrecognized_files_directory' not found Sep 20, 2015 11:16:34 PM org.apache.calcite.runtime.CalciteException SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 15 to line 1, column 19: Table 'dfs.tmp.unrecognized_files_directory' not found Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 19: Table 'dfs.tmp.unrecognized_files_directory' not found [Error Id: 0ce9ba05-7f62-4063-a2c0-7d2b4f1f7967 on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:zk=local> {noformat} Notice how that is the same message as for a non-existent directory: {noformat} 0: jdbc:drill:zk=local> SELECT * FROM `dfs`.`tmp`.`no_such_directory`; Sep 20, 2015 11:17:12 PM org.apache.calcite.sql.validate.SqlValidatorException SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 'dfs.tmp.no_such_directory' not found Sep 20, 2015 11:17:12 PM org.apache.calcite.runtime.CalciteException SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 15 to line 1, column 19: Table 'dfs.tmp.no_such_directory' not found Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 19: Table 'dfs.tmp.no_such_directory' not found [Error Id: 49f423f1-5dfe-4435-8b72-78e0b80e on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:zk=local> {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Resolving ScanBatch.next() behavior for 0-row readers; handling of NONE, OK_NEW_SCHEMA
Jacques Nadeau wrote: I think we should start with a much simpler discussion: If I query an empty file, what should we return? One thing to clarify is the difference between empty files (e.g., a zero-byte .json file) and other zero-row files (e.g., a non-empty file that is a Parquet file that represents no rows but still carries a schema). (We might also need to distinguish between types of empty files (e.g., .json files, where without any row objects we know nothing about the schema, vs. .csv files, where we can know that there's one logical column of type VARCHAR ARRAY, even though we don't know the length).) Here's one possible view of the JSON-style case of empty files: An empty file implies no rows of data and implies no columns in the schema. (I don't mean that it implies that there are no columns; I just mean that it does not imply any columns.) It also most probably does not imply the absence of any columns either--so it never conflicts with another schema. That non-implication of columns (i.e., the schema) means that: - taking that file by itself as a table yields no rows and an empty schema, and - taking that files as part of taking a subtree of files as a table means that the empty file never causes a conflict with the schema from other files in that subtree. (The only thing that the empty file would imply would be the type (file name extension) of other files when an ancestor directory is taken as a table. (That's assuming we don't allow mixing, say, JSON and CSV files in the same subtree.)) Daniel -- Jacques Nadeau CTO and Co-Founder, Dremio On Fri, Sep 18, 2015 at 3:26 PM, Daniel Barclay <dbarc...@maprtech.com> wrote: What sequence of RecordBatch.IterOutcome< https://github.com/dsbos/incubator-drill/blob/bugs/drill-3641/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java#L106 < https://github.com/dsbos/incubator-drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java#L41> values should ScanBatch's next() return for a reader (file/etc.) that has zero rows of data, and what does that sequence depend on (e.g., whether there's still a non-empty schema even though there are no rows, whether there other files in the scan)? [See other questions at bottom.] I'm trying to resolve this question to fix DRILL-2288 < https://issues.apache.org/jira/browse/DRILL-2288>. Its initial symptom was that INFORMATION_SCHEMA queries that return zero rows because of pushed-down filtering yielded results that have zero columns instead of the expected columns. An additional symptom was that "SELECT A, B, *" from an empty JSON file yielded zero columns instead of the expected columns A and B (with zero rows). The immediate cause of the problem (the missing schema information) was how ScanBatch.next() handled readers that returned no rows: If a reader has no rows at all, then the first call to its next() method (from ScanBatch.next()) returns zero (indicating that there are no more rows, and, in this case, no rows at all), and ScanBatch.next()'s call to the reader's mutator's isNewSchema() returns true, indicating that the reader has a schema that ScanBatch has not yet processed (e.g., notified its caller about). The way ScanBatch.next()'s code checked those conditions, when the last reader had no rows at all, ScanBatch.next() returned IterOutcome.NONE. However, when that /last /reader was the /only /reader, that returning of IterOutcome.NONE for a no-rows reader by ScanBatch.next() meant that next() never returned IterOutcome.OK_NEW_SCHEMA for that ScanBatch. That immediate return of NONE in turn meant that the downstream operator _never received a return value of __OK_NEW_SCHEMA__to trigger its schema processing_. (For example, in the DRILL-2288 JSON case, the project operator never constructed its own schema containing columns A and B plus whatever columns (none) came from the empty JSON file; in DRILL-2288 's other case, the caller never propagated the statically known columns from the INFORMATION_SCHEMA table.) That returning of NONE without ever returning OK_NEW_SCHEMA also violates the (apparent) intended call/return protocol (sequence of IterOutcome values) for RecordBatch.next(). (See the draft Javadoc comments currently at RecordBatch.IterOutcome < https://github.com/dsbos/incubator-drill/blob/bugs/drill-3641/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java#L106 .) Therefore, it seems that ScanBatch.next() _must_ return OK_NEW_SCHEMA before returning NONE, instead of immediately returning NONE, for readers/files with zero rows for at least _some_ cases. (It must both notify the downstream caller that there is a schema /and/ give the caller a chance to read the schema (which is allowed after OK_NEW_SCHEMA is returned but not after NONE).) However, it is not clear exactly what that set of cases is. (It does not seem to be _all_ zero-row cas
[jira] [Created] (DRILL-3816) weird file-extension recognition behavior in directory subtree scanning
Daniel Barclay (Drill) created DRILL-3816: - Summary: weird file-extension recognition behavior in directory subtree scanning Key: DRILL-3816 URL: https://issues.apache.org/jira/browse/DRILL-3816 Project: Apache Drill Issue Type: Bug Components: Storage - Other Reporter: Daniel Barclay (Drill) Assignee: Jacques Nadeau In scanning of directory subtrees for files, recognition of known vs. unknown file extensions seems really screwy (not following any apparent pattern). For example: - a suffix of {{.jsxon_not}}, as expected, is not recognized as a JSON file - a suffix of {{.jsoxn_not}} unexpectedly _is_ taken as JSON - a suffix of .{{jsonx_not}}, as expected, is not recognized as a JSON file (Creating a directory containing only a non-empty JSON file ending with {{.json}} and another non-empty JSON file ending with one of the above suffixes sometimes reads both JSON files and sometimes reports a (presumably) expected error because of the mixed file extensions).) The result sometimes seems to also depend on the rest of the filename, presumably related to the order of listing of files. (It's not clear if it depends only on the order after filename sorting, or also depends on the order file names are listed the by OS.) Here are more data points (using a JSON file named {{voter1.json}}): - with {{voter2.xjson_not}} - read, as JSON - with {{voter2.jxson_not}} - read, as JSON - with {{voter2.jsxon_not}} - causes expected error - with {{voter2.jsoxn_not}} - read, as JSON - with {{voter2.jsonx_not}} - causes expected error - with {{voter2.json_xnot}} - read, as JSON - with {{voter2.json_nxot}} - read, as JSON - with {{voter2.json_noxt}} - read, as JSON - with {{voter2.json_notx}} - read, as JSON - with {{voter2.jsonxnot}} - read, as JSON - with {{voter2.jsonxot}} - read, as JSON - with {{voter2.jsoxot}}- causes expected error - with {{voter2.jxsxoxn}} - read, as JSON - with {{voter2.xjxsxoxn}} - read, as JSON - with {{voter2.xjxsxoxnx}} - causes expected error - with {{voter2.xjxxoxn}} - read, as JSON - with {{voter2.xjxxxn}- read, as JSON - with {{voter2.n} - read, as JSON - with {{voter2.} - read, as JSON - with {{voter2.xxx}} - read, as JSON - with {{voter2.xx}}- read, as JSON - with {{voter2.x}} - read, as JSON - with {{voter2.}} - causes expected error - with {{voter2.x - read, as JSON - with {{voter2.xx- read, as JSON -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3815) unknown suffixes .not_json and .json_not treated differently (multi-file case)
Daniel Barclay (Drill) created DRILL-3815: - Summary: unknown suffixes .not_json and .json_not treated differently (multi-file case) Key: DRILL-3815 URL: https://issues.apache.org/jira/browse/DRILL-3815 Project: Apache Drill Issue Type: Bug Components: Storage - Other Reporter: Daniel Barclay (Drill) Assignee: Jacques Nadeau In scanning a directory subtree used as a table, unknown filename extensions seem to be treated differently depending on whether they're similar to known file extensions. The behavior suggests that Drill checks whether a file name _contains_ an extension's string rather than _ending_ with it. For example, given these subtrees with almost identical leaf file names: {noformat} $ find /tmp/testext_xx_json/ /tmp/testext_xx_json/ /tmp/testext_xx_json/voter2.not_json /tmp/testext_xx_json/voter1.json $ find /tmp/testext_json_xx/ /tmp/testext_json_xx/ /tmp/testext_json_xx/voter1.json /tmp/testext_json_xx/voter2.json_not $ {noformat} the results of trying to use them as tables differs: {noformat} 0: jdbc:drill:zk=local> SELECT * FROM `dfs.tmp`.`testext_xx_json`; Sep 21, 2015 11:41:50 AM org.apache.calcite.sql.validate.SqlValidatorException ... Error: VALIDATION ERROR: From line 1, column 17 to line 1, column 25: Table 'dfs.tmp.testext_xx_json' not found [Error Id: 6fe41deb-0e39-43f6-beca-de27b39d276b on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:zk=local> SELECT * FROM `dfs.tmp`.`testext_json_xx`; +---+ | onecf | +---+ | {"name":"someName1"} | | {"name":"someName2"} | +---+ 2 rows selected (0.149 seconds) {noformat} (Other probing seems to indicate that there is also some sensitivity to whether the extension contains an underscore character.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3812) message for invalid compound name doesn't identify part that's bad
Daniel Barclay (Drill) created DRILL-3812: - Summary: message for invalid compound name doesn't identify part that's bad Key: DRILL-3812 URL: https://issues.apache.org/jira/browse/DRILL-3812 Project: Apache Drill Issue Type: Bug Components: SQL Parser Reporter: Daniel Barclay (Drill) Assignee: Aman Sinha When a compound name (e.g., {{schema.subschema.table}}) is invalid, the error message doesn't where it went bad (e.g., which part referred to something unknown and/or non-existent). For example, see the query and the "VALIDATION ERROR ..." line in the following: {noformat} 0: jdbc:drill:zk=local> SELECT * FROM `dfs.NoSuchSchema`.`empty_directory`; Sep 20, 2015 10:38:24 PM org.apache.calcite.sql.validate.SqlValidatorException SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 'dfs.NoSuchSchema.empty_directory' not found Sep 20, 2015 10:38:24 PM org.apache.calcite.runtime.CalciteException SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 15 to line 1, column 32: Table 'dfs.NoSuchSchema.empty_directory' not found Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 32: Table 'dfs.NoSuchSchema.empty_directory' not found [Error Id: 2a298c8e-2923-4744-8f78-b0cf36c83799 on dev-linux2:31010] (state=,code=0) {noformat} A better error message would say that {{dfs.NoSuchSchema}} was not found (or that no {{NoSuchSchema}} was found in schema {{dfs}}). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Resolving ScanBatch.next() behavior for 0-row readers; handling of NONE, OK_NEW_SCHEMA
ould be checking the number of rows too, or OK_NEW_SCHEMA shouldn't be returned in as many subcases of the no-rows last-reader/file case. So, some open and potential questions seem to be: 1. Is it the case that a) any batch's next() should return OK_NEW_SCHEMA before it returns NONE, and callers/downstream batches should be able to count on getting OK_NEW_SCHEMA (e.g., to trigger setting up their downstream schemas), or that b) empty files can cause next() to return NONE without ever returning OK_NEW_SCHEMA , and therefore all downstream batch classes must handle getting NONE before they have set up their schemas? 2. For a file/source kind that has a schema even when there are no rows, should getting an empty file constitute a schema change? (On one hand there are no actual /rows/ (following the new schema) conflicting with any previous schema (and maybe rows), but on the other hand there is a non-empty /schema /that can conflict when that's enough to matter.) 3. For a file/source kind that implies a schema only when there are rows (e.g., JSON), when should or shouldn't that be considered a schema change? If ScanBatch reads non-empty JSON file A, reads empty JSON file B, and reads non-empty JSON file C implying the same schema as A did, should that be considered to not be schema change or not? (When reading no-/empty-schema B, should ScanBatch the keep the schema from A and check against that when it gets to C, effectively ignoring the existence of B completely?) 4. In ScanBatch.next(), when the last reader had no rows at all, when should next() return OK_NEW_SCHEMA? always? /iff/ the reader has a non-empty schema? just enough to never return NONE before returning OK_NEW_SCHEMA (which means it acts differently for otherwise-identical empty files, depending on what happened with previous readers)? as in that last case except only if the reader has a non-empty schema? Thanks, Daniel -- Daniel Barclay MapR Technologies
Re: Is anyone having issues with the jdbc unit tests (ITTestShadedJar)?
Aman Sinha wrote: Here's a different issue I encountered running unit tests on a clean master branch on my Mac: Running org.apache.drill.exec.store.jdbc.TestJdbcPlugin But that's the JDBC storage plug-in, not the Drill JDBC driver and its tests. Daniel ... ... Tue Sep 15 20:01:36 PDT 2015 : Could not listen on port 2 on host 127.0.0.1: java.net.BindException: Address already in use It has to do with starting up derby: storage-jdbc asinha$ cat derby.log Tue Sep 15 22:13:01 PDT 2015 : Apache Derby Network Server - 10.11.1.1 - (1616546) started and ready to accept connections on port 1527 Tue Sep 15 22:13:01 PDT 2015 : Could not listen on port 2 on host 127.0.0.1: java.net.BindException: Address already in use An exception was thrown during network server startup. DRDA_ListenPort.S:Could not listen on port 2 on host 127.0.0.1: java.net.BindException: Address already in use On Tue, Sep 15, 2015 at 9:37 PM, Chris Westin <chriswesti...@gmail.com> wrote: Variability: for me so far 2 out of 2 times. No stack trace, but as above, when I try to reproduce it in an IDE "This seems to be because the test is getting an ExceptionInInitializer in DrillbitClassLoader because the app.class.path property isn't set (and then the resulting String.length() on its value throws an NPE)." I don't see app.class.path set anywhere in any pom.xml (so it's not getting set when I copy the surefire arguments into the IDE's launch configuration for the test, either). On Tue, Sep 15, 2015 at 9:09 PM, Jacques Nadeau <jacq...@dremio.com> wrote: It was tested on a clean machine a number of times. Any thoughts on the variability? Can you provide stack trace? On Sep 15, 2015 6:28 PM, "Sudheesh Katkam" <skat...@maprtech.com> wrote: Yes, I see this issue too. On Sep 15, 2015, at 5:53 PM, Chris Westin <chriswesti...@gmail.com> wrote: This seems to be because the test is getting an ExceptionInInitializer in DrillbitClassLoader because the app.class.path property isn't set (and then the resulting String.length() on its value throws an NPE). Bueller? On Tue, Sep 15, 2015 at 5:20 PM, Chris Westin < chriswesti...@gmail.com wrote: I just rebased, and twice in a row I've gotten wedged running org.apache.drill.jdbc.ITTestShadedJar -- Daniel Barclay MapR Technologies
Re: Is anyone having issues with the jdbc unit tests (ITTestShadedJar)?
> The level complexity of the build process to get a test to correctly test > the right thing means jumping through a bunch of hoops to clear the > classpath and then use a special classloader. Hey, do we know if using a Maven integration test would make testing the JDBC-all Jar file easier? Presumably, integration testing supports having different dependencies and classpaths for server that's started and stopped once vs. the set of client tests. If that's accurate, then, when we start moving some tests to be Maven integration tests, maybe this JDBC-all Jar test could be simplified a lot. Daniel Jacques Nadeau wrote: Ah, you're focused on testing from within the IDE? The level complexity of the build process to get a test to correctly test the right thing means jumping through a bunch of hoops to clear the classpath and then use a special classloader. I can't imagine that you could get it to run correctly in an ide. For example, Eclipse is very sloppy about keeping classpaths perfect versus what is declared in the pom file. The parameter you're looking for is generated by the ant plugin simply because that appears the way to get the value into an environment variable so that the inner classloader can load the drillbit for the test. The test: loads a drillbit in one classloader using the alternative classpath provided by the app.class.path variable. This is taken from what would have typically been the jvm level classpath. We then clear the jvm classpath to only include the test class, Junit and hamcrest. After the drillbit is initialized and we've run one query, we then add the jdbc all jar to the system classloader and open a connection to the drillbit and execute a query. The test is designed specifically to confirm that the requisite classes are correctly included in jdbc-all and that it will run correctly. The test can't run without the shaded jar being generated and I can't imagine any of the of the ides have good enough understanding of the various maven plugins and options used that they would correctly work. Even if you found some changes that made the test execute in and ide, I can't imagine that it would correctly manage all the classpath stuff. On Sep 15, 2015 9:37 PM, "Chris Westin" <chriswesti...@gmail.com> wrote: Variability: for me so far 2 out of 2 times. No stack trace, but as above, when I try to reproduce it in an IDE "This seems to be because the test is getting an ExceptionInInitializer in DrillbitClassLoader because the app.class.path property isn't set (and then the resulting String.length() on its value throws an NPE)." I don't see app.class.path set anywhere in any pom.xml (so it's not getting set when I copy the surefire arguments into the IDE's launch configuration for the test, either). On Tue, Sep 15, 2015 at 9:09 PM, Jacques Nadeau <jacq...@dremio.com> wrote: It was tested on a clean machine a number of times. Any thoughts on the variability? Can you provide stack trace? On Sep 15, 2015 6:28 PM, "Sudheesh Katkam" <skat...@maprtech.com> wrote: Yes, I see this issue too. On Sep 15, 2015, at 5:53 PM, Chris Westin <chriswesti...@gmail.com> wrote: This seems to be because the test is getting an ExceptionInInitializer in DrillbitClassLoader because the app.class.path property isn't set (and then the resulting String.length() on its value throws an NPE). Bueller? On Tue, Sep 15, 2015 at 5:20 PM, Chris Westin < chriswesti...@gmail.com wrote: I just rebased, and twice in a row I've gotten wedged running org.apache.drill.jdbc.ITTestShadedJar -- Daniel Barclay MapR Technologies
filesystem pathnames or (file) URI references?
For the file system plug-in, are Drill table name identifiers supposed to be taken as filesystem pathnames or as URI references? (Or is it sometimes one and sometimes the other, and, if so, when one and when the other?) For example, would the delimited identifier `10%20%30` refer to a file with simple name "10%20%30" or file with simple name "10 0"? (Or, the other way around, to refer to a file whose simple name is "23:59:59", can one use simply `23:59:59` or must one use `./23:59:59`?) I ask because I see that a number of tests use the "file:" URI for a file instead of the filesystem pathname for the file. Daniel -- Daniel Barclay MapR Technologies
[jira] [Resolved] (DRILL-3658) Missing org.apache.hadoop in the JDBC jar
[ https://issues.apache.org/jira/browse/DRILL-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) resolved DRILL-3658. --- Resolution: Fixed > Missing org.apache.hadoop in the JDBC jar > - > > Key: DRILL-3658 > URL: https://issues.apache.org/jira/browse/DRILL-3658 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Reporter: Piotr Sokólski > Assignee: Daniel Barclay (Drill) >Priority: Blocker > Fix For: 1.2.0 > > > java.lang.ClassNotFoundException: local.org.apache.hadoop.io.Text is thrown > while trying to access a text field from a result set returned from Drill > while using the drill-jdbc-all.jar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2482) JDBC : calling getObject when the actual column type is 'NVARCHAR' results in NoClassDefFoundError
[ https://issues.apache.org/jira/browse/DRILL-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) resolved DRILL-2482. --- Resolution: Fixed > JDBC : calling getObject when the actual column type is 'NVARCHAR' results in > NoClassDefFoundError > -- > > Key: DRILL-2482 > URL: https://issues.apache.org/jira/browse/DRILL-2482 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Reporter: Rahul Challapalli > Assignee: Daniel Barclay (Drill) >Priority: Blocker > Fix For: 1.2.0 > > > git.commit.id.abbrev=7b4c887 > I tried to call getObject(i) on a column which is of type varchar, drill > failed with the below error : > {code} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/io/Text > at > org.apache.drill.exec.vector.VarCharVector$Accessor.getObject(VarCharVector.java:407) > at > org.apache.drill.exec.vector.NullableVarCharVector$Accessor.getObject(NullableVarCharVector.java:386) > at > org.apache.drill.exec.vector.accessor.NullableVarCharAccessor.getObject(NullableVarCharAccessor.java:98) > at > org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:137) > at > org.apache.drill.jdbc.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:136) > at > net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351) > at Dummy.testComplexQuery(Dummy.java:94) > at Dummy.main(Dummy.java:30) > Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Text > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 8 more > {code} > When the underlying type is a primitive, the getObject call succeeds -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: anyone seen these errors on master ?
im Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training < http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training < http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training < http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training < http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available -- Daniel Barclay MapR Technologies
Drill data types correspondence table
For those who didn't see earlier drafts, here's a spreadsheet showing the correspondence between various kinds/levels of types in Drill (e.g., SQL data type names vs. Calcite Java enumerations vs. RPC-level Protobuf enumerations): Drill Data Types Correspondence Table <https://docs.google.com/spreadsheets/d/1NwfYScdOjKqM4Ow5tVVtFa0nNpd70x1Ew-hzS3bgyJw/edit?pli=1#gid=0> If you notice anything incorrect, or have answers to any of the loose ends (labeled with "TBD:" or "Q:" (for questions)), please comment on the document or e-mail me. Thanks, Daniel -- Daniel Barclay MapR Technologies
[jira] [Resolved] (DRILL-3617) Apply "shading" to JDBC-all Jar file to avoid version conflicts
[ https://issues.apache.org/jira/browse/DRILL-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) resolved DRILL-3617. --- Resolution: Fixed Fixed by fix for DRILL-3589. > Apply "shading" to JDBC-all Jar file to avoid version conflicts > --- > > Key: DRILL-3617 > URL: https://issues.apache.org/jira/browse/DRILL-3617 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Reporter: Daniel Barclay (Drill) >Assignee: Daniel Barclay (Drill) > Fix For: 1.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3502) JDBC driver can cause conflicts
[ https://issues.apache.org/jira/browse/DRILL-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) resolved DRILL-3502. --- Resolution: Fixed Fixed by fix for DRILL-3589. > JDBC driver can cause conflicts > --- > > Key: DRILL-3502 > URL: https://issues.apache.org/jira/browse/DRILL-3502 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.1.0 >Reporter: Stefán Baxter > Assignee: Daniel Barclay (Drill) > Fix For: 1.2.0 > > > Using the JDBC driver in Java projects is problematic as it contains older > versions of some popular libraries and since they are not isolated/shaded > they may conflict with newer versions being used in these projects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Beginners Query
Ajay Shriwastava wrote: ... I was checking org.apache.drill.jdbc.impl.DrillStatementImpl class and see that its still using import net.hydromatic.avatica.AvaticaStatement; and the drill-jdbc pom has dependency net.hydromatic optiq-avatica 0.9-drill-r20 so I searched JIRA and found out that issue to rebase drill on calcite has been resolved. https://issues.apache.org/jira/browse/DRILL-1384 Can you help me understand that if it was rebased to calcite in 0.9 why is it referring to hydromatic in master branch. I am missing something obvious here so please excuse my ignorance. Only Drill's SQL parser and related using Calcite were rebased to use a newer version of Calcite. Drill's JDBC driver was not part of that rebasing. (It doesn't use the newer version of Avatica that's in the version of Calcite on which the parser, etc., where rebased.) Daniel -- Daniel Barclay MapR Technologies
[jira] [Created] (DRILL-3768) HTML- and JavaScript-injection vulnerability (lack of HTML encoding)
Daniel Barclay (Drill) created DRILL-3768: - Summary: HTML- and JavaScript-injection vulnerability (lack of HTML encoding) Key: DRILL-3768 URL: https://issues.apache.org/jira/browse/DRILL-3768 Project: Apache Drill Issue Type: Bug Components: Client - HTTP Reporter: Daniel Barclay (Drill) Assignee: Jason Altekruse Priority: Critical The Web UI does not properly encode query text or error message text into HTML. This makes the Web UI vulnerable to JavaScript-injection attacks. Most importantly, the Web UI doesn't encode characters that are special in HTML, e.g., encoding "<" in that plain text to "lt;" in the HTML text. This means that some queries containing a less-than character ("<") are displayed wrong. For example, submit this query and then look at its profile via the Web UI: {noformat} SELECT 1 alert("Gotcha!") {noformat} Another, though less serious, problem is that line breaks in plain text are not encoded into HTML (e.g., as ""). That means that separate lines of error messages are run together, making them harder or impossible to parse correctly when see in the Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Maven build failing on checkstyle
Ted Dunning wrote: ... I think that we should have a rule that every class should have javadoc on the class. Since it will take a long time to get to that state, we should probably start with whichever are the most important classes to document. That fuzzy set of "most important" classes probably includes: - classes relevant to using Drill's application-programming interfaces (e.g., Drill's driver JDBC now (which already has significant documentation), and, in the future, DrillClient/etc.) - classes relevant to developing plug-ins - most classes at the roots of big and/or significant hierarchies (since they specify methods, contracts, and protocols used by many other things) - other classes that methods, contracts, and protocols used by many other parts of Drill Daniel P.S. Speaking of documentation... Could a committer approve and merge my DRILL-3641 IterOutcome doc. pull request at https://github.com/apache/drill/pull/113? (Only one file to review! only an enum class's documentation!) (It should help avoid future bugs like DRILL-2288 and DRILL-3569.) Also, could anyone take a look at https://github.com/apache/drill/pull/118? It's mostly just Javadoc edits on code somewhat related to storage plug-ins (things I encountered in starting to explore how to write a storage plug-in). Thanks, Daniel -- Daniel Barclay MapR Technologies
[jira] [Created] (DRILL-3760) Casting interval to string and back to interval fails
Daniel Barclay (Drill) created DRILL-3760: - Summary: Casting interval to string and back to interval fails Key: DRILL-3760 URL: https://issues.apache.org/jira/browse/DRILL-3760 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Reporter: Daniel Barclay (Drill) Assignee: Mehant Baid Casting from an interval type to {{VARCHAR(...)}} and then casting back to the same interval type yields data format errors, for example: {noformat} 0: jdbc:drill:drillbit=localhost> VALUES CAST( CAST( INTERVAL '1' MONTH AS VARCHAR(99) ) AS INTERVAL MONTH ); Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: "0 years 1 month " [Error Id: 339d28df-b687-47f0-b6ce-1f7732e41660 on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:drillbit=localhost> {noformat} The problem seems to be in casting from interval types to strings. The SQL standard specifies that the result string has the syntax of a SQL literal, but Drill currently uses some other syntax: {noformat} 0: jdbc:drill:drillbit=localhost> VALUES CAST( INTERVAL '1' YEAR AS VARCHAR(99) ); +---+ | EXPR$0 | +---+ | 1 year 0 months | +---+ 1 row selected (0.27 seconds) 0: jdbc:drill:drillbit=localhost> {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3748) apache-release profile on single module fails
Daniel Barclay (Drill) created DRILL-3748: - Summary: apache-release profile on single module fails Key: DRILL-3748 URL: https://issues.apache.org/jira/browse/DRILL-3748 Project: Apache Drill Issue Type: Bug Components: Tools, Build & Test Reporter: Daniel Barclay (Drill) Assignee: Steven Phillips Running Maven with the {{apache-release}} profile enabled fails when run on just the {{drill-jdbc-all}} module. (Building all of Drill with that profile seems to work, but trying to re-build just the {{drill-jdbc-all}} module}} fails. That means that the edit/regenerate/check loop for JDBC Javadoc documentation (coming with DRILL-3160) takes much longer, since one can't incrementally rebuild just the {{drill-jdbc-all}} module (with the {{apache-release}} profile, which is needed for that Javadoc).) Specifically, although this command works: {noformat} mvn install -DskipTests -Papache-release -Dgpg.skip=true {noformat} executing the following (even right after the above command) fails: {noformat} cd exec/jdbc-all mvn install -DskipTests -Papache-release -Dgpg.skip=true {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3744) Resolve missing expected {@inheritDoc} results in JDBC Javadoc
Daniel Barclay (Drill) created DRILL-3744: - Summary: Resolve missing expected {@inheritDoc} results in JDBC Javadoc Key: DRILL-3744 URL: https://issues.apache.org/jira/browse/DRILL-3744 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Reporter: Daniel Barclay (Drill) Assignee: Daniel Barclay (Drill) Priority: Minor Review the JDBC Javadoc comments that use {{\{@inheritDoc\}}}. Results from running the regular Javadoc command (including via Maven) are different from the results that were expected (from composing/editing documentation using Eclipse's dynamic Javadoc view). (Beware--Eclipse's dynamic Javadoc view does not update correctly. What the view displays for a given state of the documentation comment depends on the order of changes (comment contents and cursor position) that got the view into that state, not just the state itself.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: The meaning of the methods in StoragePlugin and EasyFormatPlugin
I wrote: ... Below are some notes on the detailed requirements I had extracted from the code. ... I found a later copy of my (still rough) notes. See the Google Docs document at [Notes for] Instructions on Creating Storage Plug-ins <https://docs.google.com/document/d/1GqPV1oXMYoVAMihhfhTDObL4utg2NvCq4_obFPlar18/edit>. Daniel -- Daniel Barclay MapR Technologies
[jira] [Resolved] (DRILL-3661) Add/edit various JDBC Javadoc.
[ https://issues.apache.org/jira/browse/DRILL-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) resolved DRILL-3661. --- Resolution: Fixed > Add/edit various JDBC Javadoc. > -- > > Key: DRILL-3661 > URL: https://issues.apache.org/jira/browse/DRILL-3661 > Project: Apache Drill > Issue Type: Bug > Reporter: Daniel Barclay (Drill) > Assignee: Daniel Barclay (Drill) > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [GitHub] drill pull request: DRILL-3589: Update JDBC driver to shade and mi...
I wrote: Github user dsbos commented on the pull request: https://github.com/apache/drill/pull/116#issuecomment-136889943 Something seems to be broken. Never mind that comment. I seemed to have a local version/build mismatch. Daniel After rebasing your branch on my branch with my DRILL-3347 (Hadoop Test) and DRILL-3566 (Prep.Stmt.) fixes, I tried installing the resulting JDBC-all Jar file on Spotfire, but Spotfire's getting IndexOutOfBoundsExceptions somewhere within ResultSet.next(). I'll see if I can identify which shading might be causing that or what's going on in next(). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- -- Daniel Barclay MapR Technologies
[jira] [Created] (DRILL-3730) Change JDBC driver's DrillConnectionConfig to interface
Daniel Barclay (Drill) created DRILL-3730: - Summary: Change JDBC driver's DrillConnectionConfig to interface Key: DRILL-3730 URL: https://issues.apache.org/jira/browse/DRILL-3730 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Reporter: Daniel Barclay (Drill) Assignee: Daniel Barclay (Drill) Change {{org.apache.drill.jdbc.DrillConnectionConfig}} (in Drill's published interface for the JDBC driver) from being a class to being an interface. Move the implementation (including the inheritance from {{net.hydromatic.avatica.ConnectionConfigImpl}}) from published-interface package {{org.apache.drill.jdbc}} to a class in implementation package {{org.apache.drill.jdbc.impl}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
DRILL-3548 - plug-in-exploration--related doc.
Jacques, Since you seem to be addressing plug-in--related documentation now, could you please review my patch for DRILL-3548 <https://issues.apache.org/jira/browse/DRILL-3548> in Pull Request #118 <https://github.com/apache/drill/pull/118>, or at least assimilate the plug-in--related parts of it (e.g., FormatPluginConfig.java, DrillConfig.java (doc. of drill-module.conf, etc), StoragePluginRegistry.java (clearer messages), DrillTable.java, AbstractSchema.java, RecordGenerator.java, etc.)? Thanks, Daniel -- Daniel Barclay MapR Technologies
Re: The meaning of the methods in StoragePlugin and EasyFormatPlugin
Edmon, I see you're working on adding documentation about creating storage plug-ins. I was looking into that myself a little while ago, but wasn't able to continue. Below are some notes on the detailed requirements I had extracted from the code. Hopefully they'll be helpful in filling in your documentation of what's required to create a storage plug-in. Daniel Storage Plug-In Notes Pieces needed for/aspects of a storage plug-in: * Need storage plug-in configuration class: - per StoragePluginConfig (abstract class (currently)) - (org.apache.drill.common.logical.StoragePluginConfig) - public class - public no-argument constructor? (modulo Jackson/@JsonTypeName?) - one per storage plug-in type? - What about Jackson serialization? - TODO: REFINE: plug-in type name (for ?) defaults to simple class name; can be specified by some kind of NAME property (caps? any?) - what are requirements on serializability? * Need storage plug-in class: - per StoragePlugin (interface) - (org.apache.drill.exec.store StoragePlugin) - public class (not clearly needed) - public constructor ...( SomeStoragePluginConfig, DrillContext, String), where: - StoragePluginConfig is _specific_ implementation class of StoragePluginConfig - multiple storage plug-ins can share one storage plug-in class--one constructor per StoragePluginConfig class * Class path scanning requirement: - StoragePluginConfig and StoragePlugin implementation classes are found by classpath scanning - Need drill-module.conf file in root of classpath subtree containing classes to be found. - Normally need to append name of each package (immediately?) containing implementation classes to configuration property drill.exec.storage.packages. * bootstrap-storage-plugins.json - Normally need to have bootstrap-storage-plugins.json file in same classpath root. - Normally have default configuration for plug-in in same classpath root's bootstrap-storage-plugins.json file. - Format seems to Jackson's serialization of some kind of list of StoragePluginConfig: - Jackson seems to follow Java Beans getter/setting mapping rules (verified only for simple values--String, boolean) - (What else?) * Schema, ROUGH: - Calcite's Schema - Drill's AbstractSchema - implementations of interface Calcite's Table must be subclasses of Drill's DrillTable (Document that old code doesn't follow recently clarified terminology in (most of) user documetation: - "storage plug-in" refers to the code itself (what plugs into Drill) - "storage plug-in configuration" refers to the configuration associated with names such as "cp" and "dfs"--different configurations of the file-system plug-in - "storage plug-in configuration name" refers to names such as "cp" and "dfs" - "storage plug-in type name" refers to ... (e.g., "file", "hive") - (old terms in code: "storage engine" (sometimes) means storage plug-in configuration name) ) Pending questions: - Q: What does the @JsonTypeInfo annotation on StoragePluginConfig do? Specifically, how exactly does it relate to "type" in 'type: "file"' and to "NAME" and "name" fields (JavaBeans/Jackson properties?) on plug-in classes? @JsonTypeInfo(use = JsonTypeInfo.Id.NAME, include = JsonTypeInfo.As.PROPERTY, property="type") on StoragePluginConfig specify that JavaBeans/Jackson property named "type" on subclasses - Q: What exactly does SystemTablePluginConfig's _public_ NAME field do? - Q: What exactly does SystemTablePluginConfig's _public_ INSTANCE field do? -- Daniel Barclay MapR Technologies
Re: Hangout happening in 30mins! (at 10:00am Pacific)
This weeks's Drill Hangout will be happening in about 35 minutes. (https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc) Hsuan Yi Chu wrote: Come join the Drill community as we discuss what has been happening lately and what is in the pipeline. All are welcome, if you know about Drill, want to know more or just want to listen in. Link: https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc -- Daniel Barclay MapR Technologies
Review Request 37685: DRILL-2489: Throw exception from remaining methods for closed objects.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37685/ --- Review request for drill, Mehant Baid and Parth Chandra. Bugs: DRILL-2489 https://issues.apache.org/jira/browse/DRILL-2489 Repository: drill-git Description --- (Note: Patch depends on (needs to be applied after) patches for DRILL-3153, -3347, -3566, and -3661.) Refactored unit test to check all methods per interface. (Replaced individual, static test methods with bulk reflection-based checking.) [Drill2489CallsAfterCloseThrowExceptionsTest] Added DrillResultSetMetaDataImpl. Added method overrides to check state for remaining methods from Connection, Statement, PreparedStatement, ResultSet, ResultSetMetaData and DatabaseMetaData. Also: - renamed checkNotClosed to throwIfClosed. Diffs - exec/jdbc/src/main/java/org/apache/drill/jdbc/DrillConnection.java 608bf05 exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillConnectionImpl.java 243e627 exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillDatabaseMetaDataImpl.java 9d0c132 exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillJdbc41Factory.java 11191ae exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillPreparedStatementImpl.java 86683cb exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java 1b37dc1 exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetMetaDataImpl.java PRE-CREATION exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java 6cba58e exec/jdbc/src/test/java/org/apache/drill/jdbc/ConnectionTest.java 8735146 exec/jdbc/src/test/java/org/apache/drill/jdbc/ConnectionTransactionMethodsTest.java 1aff918 exec/jdbc/src/test/java/org/apache/drill/jdbc/StatementTest.java 3e64fcb exec/jdbc/src/test/java/org/apache/drill/jdbc/test/Drill2489CallsAfterCloseThrowExceptionsTest.java 01008b2 Diff: https://reviews.apache.org/r/37685/diff/ Testing --- Ran new tests in this patch. Also manually injected various errors to confirm detection. Ran existing tests; no new errors. Thanks, Daniel Barclay
[jira] [Created] (DRILL-3693) SQLLine/drill-localhost seems to demand 2.9GB just to start
Daniel Barclay (Drill) created DRILL-3693: - Summary: SQLLine/drill-localhost seems to demand 2.9GB just to start Key: DRILL-3693 URL: https://issues.apache.org/jira/browse/DRILL-3693 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Starting SQLLine/drill-localhost when not enough virtual memory is available seems to indicate that Drill's _client-side_ code is trying to allocate almost 2.9 GB: {noformat} $ ./distribution/target/apache-drill-1.2.0-SNAPSHOT/apache-drill-1.2.0-SNAPSHOT/bin/drill-localhost Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x0006fff8, 2863661056, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 2863661056 bytes for committing reserved memory. # An error report file with more information is saved as: # /home/dbarclay/work/git/incubator-drill/hs_err_pid21405.log $ {noformat} Is that intended? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3696) REGEXP_REPLACE doesn't document that replacement is pattern (not plain string)
Daniel Barclay (Drill) created DRILL-3696: - Summary: REGEXP_REPLACE doesn't document that replacement is pattern (not plain string) Key: DRILL-3696 URL: https://issues.apache.org/jira/browse/DRILL-3696 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3695) SYSTEM ERROR for REGEXP_REPLACE replacement pattern format error
Daniel Barclay (Drill) created DRILL-3695: - Summary: SYSTEM ERROR for REGEXP_REPLACE replacement pattern format error Key: DRILL-3695 URL: https://issues.apache.org/jira/browse/DRILL-3695 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Similar to the problem with REGEXP_REPLACE match patterns reported in DRILL-3694, REGEXP_REPLACE reports SYSTEM ERROR errors rather than specific (FUNCTION ERROR) errors for bad replacement pattern strings: 0: jdbc:drill:drillbit=localhost VALUES REGEXP_REPLACE( 'abc', 'b', '\'); Error: SYSTEM ERROR: StringIndexOutOfBoundsException: String index out of range: 1 {noformat} [Error Id: 12f09e63-8dcb-4ab8-bfe6-183d81617c1e on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:drillbit=localhost VALUES REGEXP_REPLACE( 'abc', 'b', '$'); Error: SYSTEM ERROR: StringIndexOutOfBoundsException: String index out of range: 1 [Error Id: 084ce8ce-8c11-4d53-82a4-be19aa9140b2 on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:drillbit=localhost VALUES REGEXP_REPLACE( 'abc', 'b', '$2'); Error: SYSTEM ERROR: IndexOutOfBoundsException: No group 2 [Error Id: 04d5e101-1f94-46df-8590-6f94aac9201c on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:drillbit=localhost {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3694) SYSTEM ERROR for REGEXP_REPLACE regex format error
Daniel Barclay (Drill) created DRILL-3694: - Summary: SYSTEM ERROR for REGEXP_REPLACE regex format error Key: DRILL-3694 URL: https://issues.apache.org/jira/browse/DRILL-3694 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Giving a bad regular expression as the match-pattern argument string to REGEXP_REPLACE yields a SYSTEM ERROR (apparently from a non-specific catch of PatternSyntaxException from the Java implementation) rather than a more specific error (FUNCTION ERROR?) from explicit validation (e.g., via a specific catch of PatternSyntaxException from the Java implementation). For example: {noformat} 0: jdbc:drill:drillbit=localhost VALUES REGEXP_REPLACE( 'abc', '\', 'x'); Error: SYSTEM ERROR: PatternSyntaxException: Unexpected internal error near index 1 \ ^ [Error Id: 6a4dfb45-cd7b-4c24-b720-3813522254a4 on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:drillbit=localhost {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3697) REGEXP_REPLACE doc. says POSIX reg. expr.; which? not Java?
Daniel Barclay (Drill) created DRILL-3697: - Summary: REGEXP_REPLACE doc. says POSIX reg. expr.; which? not Java? Key: DRILL-3697 URL: https://issues.apache.org/jira/browse/DRILL-3697 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) The {{REGEXP_REPLACE}} function documentation currently at [https://drill.apache.org/docs/string-manipulation/#regexp_replace] says that {{REGEXP_REPLACE}} uses POSIX regular expressions. Is that true (that Drill using POSIX regular expressions and not Java regular expressions)? If that's really true, are they BREs or EREs? Assuming it's actually Java regular expressions, the documentation should probably have a link to some appropriate target in the JDK Java doc (maybe http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html and http://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#replaceAll-java.lang.String-. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3686) no-ZooKeeper logging: Curator messages not in main file, no trying ZooKeeper from Drill
Daniel Barclay (Drill) created DRILL-3686: - Summary: no-ZooKeeper logging: Curator messages not in main file, no trying ZooKeeper from Drill Key: DRILL-3686 URL: https://issues.apache.org/jira/browse/DRILL-3686 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) When Drill is started when ZooKeeper is not running, the logging could be clearer. The log messages from Curator (e.g., ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (localhost:2181) and timeout (5000) / elapsed (5568)) don't go to Drill's normal/main log file .../drillbit.log; instead they go to .../Drillbit.out. (They'd be easier to notice and find if they were in the main log file where most of the rest of Drill's logging output is.) Additionally, at least at the default logging level (for drillbit.sh), nothing in the main log says that Drill's about to try to connect to ZooKeeper. (Seeing a connecting to ZooKeeper message without a following connected to ZooKeeper message) in the main log would help point the reader to the secondary log, even if we can't/don't get the Curator log output into the main log file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3687) NullPointerException from query with WITH, VALUES, and USING
Daniel Barclay (Drill) created DRILL-3687: - Summary: NullPointerException from query with WITH, VALUES, and USING Key: DRILL-3687 URL: https://issues.apache.org/jira/browse/DRILL-3687 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) The following query fails: {noformat} WITH q(key) AS (VALUES 1, 1) SELECT * FROM q q1 INNER JOIN q q2 USING (key) {noformat} The failures is a NullPointerException SYSTEM ERROR message: {noformat} Error: SYSTEM ERROR: NullPointerException [Error Id: ba74e744-7c8a-4ec8-b046-ac28ad0a03a4 on dev-linux2:31010] (state=,code=0) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2312) JDBC driver returning incorrect data after extended usage
[ https://issues.apache.org/jira/browse/DRILL-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) resolved DRILL-2312. --- Resolution: Cannot Reproduce Closing because attempts to reproduce this now on Drill pre-1.2 have failed and this bug was reported way back on version 0.7. JDBC driver returning incorrect data after extended usage - Key: DRILL-2312 URL: https://issues.apache.org/jira/browse/DRILL-2312 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Reporter: Norris Lee Assignee: Daniel Barclay (Drill) Fix For: 1.2.0 After executing ~20-30 queries with the JDBC driver, the data returned from a Show files query is incorrect, particularly the isFile and isDirectory columns. The first item in the schema/directory will be correct, but subsequent items will report false for isFile and isDirectory. This was tested with just a simple program that just loops through executeQuery and prints out the values for isFile and isDirectory. The JDBC driver used was the Drill 0.7 snapshot. {code} isFile: true isDirectory: false isFile: false isDirectory: false isFile: false isDirectory: false isFile: false isDirectory: false isFile: false isDirectory: false {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3661) Add/edit various JDBC Javadoc.
Daniel Barclay (Drill) created DRILL-3661: - Summary: Add/edit various JDBC Javadoc. Key: DRILL-3661 URL: https://issues.apache.org/jira/browse/DRILL-3661 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Assignee: Daniel Barclay (Drill) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3659) UnionAllRecordBatch infers wrongly from next() IterOutcome values
Daniel Barclay (Drill) created DRILL-3659: - Summary: UnionAllRecordBatch infers wrongly from next() IterOutcome values Key: DRILL-3659 URL: https://issues.apache.org/jira/browse/DRILL-3659 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) When UnionAllRecordBatch uses IterOutcome values returned from the next() method of upstream batches, it seems to be using those values wrongly (making incorrect inferences about what they mean). In particular, some switch statements seem to check for NONE vs. OK_NEW_SCHEMA in order to determine whether there are any rows (instead of explicitly checking the number of rows). However, OK_NEW_SCHEMA can be returned even when there are zero rows. The apparent latent bug in the union code blocks the fix for DRILL-2288 (having ScanBatch return OK_NEW_SCHEMA for a zero-rows case in which is was wrongly (per the IterOutcome protocol) returning NONE without first returning OK_NEW_SCHEMA). For details of IterOutcome values, see the Javadoc documentation of RecordBatch.IterOutcome (after DRILL-3641 is merged; until then, see https://github.com/apache/drill/pull/113). For an environment/code state that exposes the UnionAllRecordBatch problems, see https://github.com/dsbos/incubator-drill/tree/bugs/WORK_2288_etc, which includes: - a test that exposes the DRILL-2288 problem; - an enhanced IteratorValidatorBatchIterator, which now detects IterOutcome value sequence violations; and - a fixed (though not-yet-cleaned) version of ScanBatch that fixes the DRILL-2288 problem and thereby exposes the UnionAllRecordBatch problem (several test methods in each of TestUnionAll and TestUnionDistinct fail). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3656) Accountor catch intended for ConfigException hides NullPointerException (?)
Daniel Barclay (Drill) created DRILL-3656: - Summary: Accountor catch intended for ConfigException hides NullPointerException (?) Key: DRILL-3656 URL: https://issues.apache.org/jira/browse/DRILL-3656 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) In org.apache.drill.exec.memory.Accountor's constructor, there is a catch(Exception e) ... clause that used to catch ConfigExceptions (when a requested configuration item wasn't known to the passed-in DrillConfig object, which occurred at least in some unit tests). However, now that catch clause is also catching NullPointerExceptions because (sometimes) the DrillConfig parameter is null (in some unit tests). It seems that: - that catch clause should specifically catch only ConfigException (so that it doesn't accidentlaly hide any unexpected exceptions), and - if the DrillConfig parameter is allowed to be null, the code should be handling that case explicitly with a test for null, not via a catch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3655) TIME - (minus) TIME dpesm
Daniel Barclay (Drill) created DRILL-3655: - Summary: TIME - (minus) TIME dpesm Key: DRILL-3655 URL: https://issues.apache.org/jira/browse/DRILL-3655 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) 0: jdbc:drill: VALUES CURRENT_TIME - CURRENT_TIME; Error: PARSE ERROR: From line 1, column 8 to line 1, column 34: Cannot apply '-' to arguments of type 'TIME(0) - TIME(0)'. Supported form(s): 'NUMERIC - NUMERIC' 'DATETIME_INTERVAL - DATETIME_INTERVAL' 'DATETIME - DATETIME_INTERVAL' [Error Id: ede6c073-ca82-4359-8adb-db413e051e29 on dev-linux2:31010] (state=,code=0) 0: jdbc:drill: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3641) Document RecordBatch.IterOutcome (enumerators and possible sequences)
Daniel Barclay (Drill) created DRILL-3641: - Summary: Document RecordBatch.IterOutcome (enumerators and possible sequences) Key: DRILL-3641 URL: https://issues.apache.org/jira/browse/DRILL-3641 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Assignee: Daniel Barclay (Drill) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3627) Poor visibility of failure to connect to ZooKeeper
Daniel Barclay (Drill) created DRILL-3627: - Summary: Poor visibility of failure to connect to ZooKeeper Key: DRILL-3627 URL: https://issues.apache.org/jira/browse/DRILL-3627 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) When Drill starts up, if it can't connect to ZooKeeper right away, it doesn't seem to write anything to its logs (at least at the default logging level) to indicate that it is retrying connecting to ZooKeeper. Also, it doesn't write any connecting to ZooKeeper message normally paired with a connected to ZooKeeper mesage, which would at least make it easier to notice the ZooKeeper connection problem (when the first message appears but the second one does not). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3628) drillbit.sh output mentions only secondary .out, but not primary .log
Daniel Barclay (Drill) created DRILL-3628: - Summary: drillbit.sh output mentions only secondary .out, but not primary .log Key: DRILL-3628 URL: https://issues.apache.org/jira/browse/DRILL-3628 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Although the output of the drillbit.sh script mentions the secondary output file .../drillbit.out, it never mentions the primary output file .../drillbit.log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3626) Many references to
Daniel Barclay (Drill) created DRILL-3626: - Summary: Many references to Key: DRILL-3626 URL: https://issues.apache.org/jira/browse/DRILL-3626 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3617) Apply shading to JDBC-all Jar file to avoid version conflicts
Daniel Barclay (Drill) created DRILL-3617: - Summary: Apply shading to JDBC-all Jar file to avoid version conflicts Key: DRILL-3617 URL: https://issues.apache.org/jira/browse/DRILL-3617 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Reporter: Daniel Barclay (Drill) Assignee: Daniel Barclay (Drill) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3614) drill
Daniel Barclay (Drill) created DRILL-3614: - Summary: drill Key: DRILL-3614 URL: https://issues.apache.org/jira/browse/DRILL-3614 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3612) Doc. says logging configuration at /conf/logback.xml
Daniel Barclay (Drill) created DRILL-3612: - Summary: Doc. says logging configuration at /conf/logback.xml Key: DRILL-3612 URL: https://issues.apache.org/jira/browse/DRILL-3612 Project: Apache Drill Issue Type: Bug Components: Documentation Reporter: Daniel Barclay (Drill) Assignee: Bridget Bevens The Drill documentation page at https://drill.apache.org/docs/log-and-debug-introduction/ says: bq. Logback behavior is defined by configurations set in /conf/logback.xml. Isn't the location really conf/logback.xml relative to Drill's root installation directory (the apache-drill-n.n.n directory)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3611) Drill/client unstable in connection-closed state
Daniel Barclay (Drill) created DRILL-3611: - Summary: Drill/client unstable in connection-closed state Key: DRILL-3611 URL: https://issues.apache.org/jira/browse/DRILL-3611 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) When Drill and/or a client get into the state in which the client reports that the connection is closed, the error messages are not stable. In the following (a series of empty queries executed about a half a second apart), notice how sometimes the exception is a CONNECTION ERROR: ... closed unexpectedly exception and sometimes it is a SYSTEM ERROR: ChannelClosedException exception: {noformat} 0: jdbc:drill: 0: jdbc:drill: ; Error: CONNECTION ERROR: Connection /127.0.0.1:46726 -- /127.0.0.1:31010 (user client) closed unexpectedly. [Error Id: 0848c18e-64e9-41e2-90d9-3a0ffaebc14e ] (state=,code=0) 0: jdbc:drill: ; Error: SYSTEM ERROR: ChannelClosedException [Error Id: b465b0e7-55a2-4ef6-ad0e-01258468f4e7 ] (state=,code=0) 0: jdbc:drill: ; Error: SYSTEM ERROR: ChannelClosedException [Error Id: 0b50a10c-42eb-47b6-bc3d-9a42afe4cd28 ] (state=,code=0) 0: jdbc:drill: ; Error: SYSTEM ERROR: ChannelClosedException [Error Id: 9cd1fd96-0aed-4d06-b0ae-d48ddc70b91e ] (state=,code=0) 0: jdbc:drill: ; Error: CONNECTION ERROR: Connection /127.0.0.1:46726 -- /127.0.0.1:31010 (user client) closed unexpectedly. [Error Id: 222a5358-6b2e-49e1-a1ec-931cacbbdbd1 ] (state=,code=0) 0: jdbc:drill: ; Error: SYSTEM ERROR: ChannelClosedException [Error Id: fc589b70-dd10-4484-963a-21bc88147a0d ] (state=,code=0) 0: jdbc:drill: ; Error: CONNECTION ERROR: Connection /127.0.0.1:46726 -- /127.0.0.1:31010 (user client) closed unexpectedly. [Error Id: 19965e75-9f2e-4a73-b1d8-29d61e6ea31a ] (state=,code=0) 0: jdbc:drill: 0: jdbc:drill: 0: jdbc:drill: {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3496) Augment logging in DrillConfig and classpath scanning.
[ https://issues.apache.org/jira/browse/DRILL-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) resolved DRILL-3496. --- Resolution: Fixed Augment logging in DrillConfig and classpath scanning. -- Key: DRILL-3496 URL: https://issues.apache.org/jira/browse/DRILL-3496 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Assignee: Daniel Barclay (Drill) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2815) Some PathScanner logging, misc. cleanup.
[ https://issues.apache.org/jira/browse/DRILL-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) resolved DRILL-2815. --- Resolution: Fixed Some PathScanner logging, misc. cleanup. Key: DRILL-2815 URL: https://issues.apache.org/jira/browse/DRILL-2815 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Assignee: Jason Altekruse Priority: Minor Fix For: 1.2.0 Attachments: DRILL-2815.5.patch.txt, DRILL-2815.6.patch.txt Add a little more logging to PathScanner; clean up a little. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3603) Document workaround for DRILL-2560 in JDBC Javadoc doc. (etc.)
Daniel Barclay (Drill) created DRILL-3603: - Summary: Document workaround for DRILL-2560 in JDBC Javadoc doc. (etc.) Key: DRILL-3603 URL: https://issues.apache.org/jira/browse/DRILL-3603 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Until DRILL-2560 and DRILL3267 are resolved, the Javadoc documentation for {{executeQuery(...))}} methods should document the workaround described in DRILL-2560's report. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Unable to connect to drill 1.1.0 using JDBC
Parth Chandra wrote: Yes and I probably reviewed it and missed it. We were missing a unit test which would have caught this. Yes; evidently we didn't have any test that used PreparedStatement enough to execute a query. Daniel It's being fixed ASAP. ( DRILL-3566 ) On Mon, Jul 27, 2015 at 9:23 PM, Jacques Nadeau jacq...@dremio.com wrote: Agreed. It should be fixed. Just trying to provide work around until it is fixed. It is unfortunate that it was broken 1.1. Must have been all the jdbc refactoring. On Jul 27, 2015 7:57 PM, Anas Mesrah anas.mes...@gmail.com wrote: Hi Jacques, You are right BUT there are many softwares built on PreparedStatements and they may integrate with Drll. I am not sure about the current big BI products integrating with drill, if you skip that you are simply ignoring them. In addition, this is working on previous versions, I tested 1.0.0 and 0.8.0. I am sure something can be fixed. -- Daniel Barclay MapR Technologies
[jira] [Created] (DRILL-3569) TestBuilder.baseLineRecords(ListMap) doesn't specify maps
Daniel Barclay (Drill) created DRILL-3569: - Summary: TestBuilder.baseLineRecords(ListMap) doesn't specify maps Key: DRILL-3569 URL: https://issues.apache.org/jira/browse/DRILL-3569 Project: Apache Drill Issue Type: Bug Components: Tools, Build Test Reporter: Daniel Barclay (Drill) Assignee: Jason Altekruse In method TestBuilder.baseLineRecords(ListMap), neither the parameter type nor the method documentation comment indicates how the maps represent materialized results. The Map part should have type parameters and/or the documentation should say something about how the maps in the list represent results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3570) TestBuilder.baselineRecords(ListMap) doesn't handle null well
Daniel Barclay (Drill) created DRILL-3570: - Summary: TestBuilder.baselineRecords(ListMap) doesn't handle null well Key: DRILL-3570 URL: https://issues.apache.org/jira/browse/DRILL-3570 Project: Apache Drill Issue Type: Bug Components: Tools, Build Test Reporter: Daniel Barclay (Drill) Assignee: Jason Altekruse TestBuilder.baselineRecords(ListMap) accepts a call with null (doesn't reject it with a bad-parameter error) but does not accept it as an specification of a baseline (to prevent the Must provide some kind of baseline, either a baseline file or another query error). Either it should reject null (and the parameter documentation should probably have ; not null added), or when it gets null, it should take that as providing some kind of baseline (maybe a baseline of zero rows (like and empty list), or maybe a don't-care baseline) (and the documentation should reflect that). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3571) TestBuilder: empty list to baselineRecords yields IndexOutOfBoundsException
Daniel Barclay (Drill) created DRILL-3571: - Summary: TestBuilder: empty list to baselineRecords yields IndexOutOfBoundsException Key: DRILL-3571 URL: https://issues.apache.org/jira/browse/DRILL-3571 Project: Apache Drill Issue Type: Bug Components: Tools, Build Test Reporter: Daniel Barclay (Drill) Assignee: Jason Altekruse Passing an empty list to TestBuilder.baselineRecords causes an IndexOutOfBoundsException in the expression baselineRecords.get(0) in compareMergedOnHeapVectors. If baselineRecords's list is not intended to be an empty list, that should be rejected at the call to baselineRecords. Use case note: I was trying to specify to test that the results contains zero records. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: dropping qualified names in logger declarations
I can't seem to find any guidance for this particular issue. Yeah; I wouldn't expect anything specific to our peculiar pattern, other than a general endorsement of usually using imports and usually not using fully-qualified names. Is there any reason not to go with imports and non-qualified names (with the parts I mentioned below to retain the easy commenting/ uncommenting to avoid unused imports and loggers)? If there's concern about having loggers in both styles for a long interim period: I think I could convert the declarations pretty rapidly (using some Emacs regular-expression bulk replacement), as long as I can get someone to approve and merge in the changes as quickly as we want the inconsistency to go away. Daniel Chris Westin wrote: I tried to follow up with Hakim's suggestion of consulting the checkstyle rules I expect to use (I've suggested before that we should start with Google's rules as a basis, and make a few tweaks), but unfortunately, on that day last week, SourceForge was down (that's where the rules are hosted). It's finally back, so here they are http://checkstyle.sourceforge.net/google_style.html . I can't seem to find any guidance for this particular issue. On Wed, Jul 22, 2015 at 10:51 AM, Daniel Barclay dbarc...@maprtech.com wrote: Chris Westin wrote: For the special case of the logger, I kind of like it this way, because I can turn it off just by commenting out a single line (to get rid of unreferenced variable warnings),or add it by pasting in or uncommenting a single line. In either case I don't have to worry about removing or adding the import line separately, which can be quite far away if there are a lot of imports. Why not use the modern Java feature intended for cases like this: have a @SuppressWarnings(unused) annotation on the logger member declaration if the declaration has been added but the member isn't used yet? Then: - We can still avoid unused-variable warnings for logger members that have already been declared before there are any uses. - We no longer have to move up top to adjust an already-existing logger declaration when adding a logger use down in the code. (Yes, we should remove (or comment out) the annotation if the change isn't temporary, but we don't have to do that immediately just to continue compiling.) - We can now use code completion when adding the first logger call to a previously unused logger, since the declaration is real (not just a comment). - We can still comment out/uncomment only a single line (the annotation) to switch between the no-logger-uses and some-logger-use cases. (That is, if you don't want to have to re-add the suppression annotation if the last use of a logger is removed later, you can comment out, rather than delete, the annotation when the first logger use it added. - We no longer have to either go adjust imports or use qualified names to avoid having to adjust imports. - We stop having unnecessarily qualified names in the code. - and, finally ...: - We stop having those names' extra visual clutter and length, which makes it harder to notice when the class literal ends up wrong. (Note the mention of pasting above.) Daniel On Tue, Jul 21, 2015 at 6:12 PM, Daniel Barclay dbarc...@maprtech.com wrote: For logger member declarations, can we drop the current pattern of using qualified names (like this: private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(StoragePluginRegistry.class); ) and allow using imports and non-qualified names (as we do for almost everything else)? Using qualified names adds a lot of visual noise, and pushes the class literal farther to the right, making it easier to fail to notice that it doesn't match the containing class. Thanks, Daniel -- Daniel Barclay MapR Technologies -- Daniel Barclay MapR Technologies -- Daniel Barclay MapR Technologies
[jira] [Created] (DRILL-3557) Reading empty CSV file fails with SYSTEM ERROR
Daniel Barclay (Drill) created DRILL-3557: - Summary: Reading empty CSV file fails with SYSTEM ERROR Key: DRILL-3557 URL: https://issues.apache.org/jira/browse/DRILL-3557 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Trying to read an empty CSV file (a file containing zero bytes) fails with a system error: {noformat} 0: jdbc:drill:zk=local SELECT * FROM `dfs.root`.`/tmp/empty.csv`; Error: SYSTEM ERROR: IllegalArgumentException: MinorFragmentId 0 has no read entries assigned [Error Id: f1da68f6-9749-45bc-956b-20cbc6d28894 on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:zk=local {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RecordBatch.MAX_BATCH_SIZE = 65536?
In org.apache.drill.exec.record.RecordBatch, MAX_BATCH_SIZE is 65536. Why isn't that 65535? Is a batch size of zero not possible? Daniel -- Daniel Barclay MapR Technologies
Re: dropping qualified names in logger declarations
Chris Westin wrote: For the special case of the logger, I kind of like it this way, because I can turn it off just by commenting out a single line (to get rid of unreferenced variable warnings),or add it by pasting in or uncommenting a single line. In either case I don't have to worry about removing or adding the import line separately, which can be quite far away if there are a lot of imports. Why not use the modern Java feature intended for cases like this: have a @SuppressWarnings(unused) annotation on the logger member declaration if the declaration has been added but the member isn't used yet? Then: - We can still avoid unused-variable warnings for logger members that have already been declared before there are any uses. - We no longer have to move up top to adjust an already-existing logger declaration when adding a logger use down in the code. (Yes, we should remove (or comment out) the annotation if the change isn't temporary, but we don't have to do that immediately just to continue compiling.) - We can now use code completion when adding the first logger call to a previously unused logger, since the declaration is real (not just a comment). - We can still comment out/uncomment only a single line (the annotation) to switch between the no-logger-uses and some-logger-use cases. (That is, if you don't want to have to re-add the suppression annotation if the last use of a logger is removed later, you can comment out, rather than delete, the annotation when the first logger use it added. - We no longer have to either go adjust imports or use qualified names to avoid having to adjust imports. - We stop having unnecessarily qualified names in the code. - and, finally ...: - We stop having those names' extra visual clutter and length, which makes it harder to notice when the class literal ends up wrong. (Note the mention of pasting above.) Daniel On Tue, Jul 21, 2015 at 6:12 PM, Daniel Barclay dbarc...@maprtech.com wrote: For logger member declarations, can we drop the current pattern of using qualified names (like this: private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(StoragePluginRegistry.class); ) and allow using imports and non-qualified names (as we do for almost everything else)? Using qualified names adds a lot of visual noise, and pushes the class literal farther to the right, making it easier to fail to notice that it doesn't match the containing class. Thanks, Daniel -- Daniel Barclay MapR Technologies -- Daniel Barclay MapR Technologies
Positionable.setPosition(int) - always byte offset, or depends
In org.apache.drill.exec.vector.complex.Positionable's method setPosition(int index), is the index parameter always a byte offset from the beginning of something, or is it sometimes in some other units (records, blocks, etc.)? (Is it correct to document on Positionable.setPosition(int) that it's a byte offset, or is the intent that the setPosition's contract leaves the units deferred to more-specific interfaces or classes?) Thanks, Daniel -- Daniel Barclay MapR Technologies
Re: question about UDF optimization
Should Drill be defaulting the other way? That is, instead of assuming pure unless declared otherwise (leading to wrong results in the case that the assumption is wrong (or the annotation was forgotten)), should Drill be assuming not pure unless declared pure (leading to only lower performance in the wrong-assumption case)? Daniel Jacques Nadeau wrote: There is an annotation on the function template. I don't have a laptop close but I believe it is something similar to isRandom. It basically tells Drill that this is a nondeterministic function. I will be more specific once I get back to my machine if you don't find it sooner. Jacques *Summary:* Drill is very aggressive about optimizing away calls to functions with constant arguments. I worry that could extend to per record batch optimization if I accidentally have constant values and even if that doesn't happen, it is a pain in the ass now largely because Drill is clever enough to see through my attempt to hide the constant nature of my parameters. *Question:* Is there a way to mark a UDF as not being a pure function? *Details:* I have written a UDF to generate a random number. It takes parameters that define the distribution. All seems well and good. I find, however, that the function is only called once (twice, actually apparently due to pipeline warmup) and then Drill optimizes away later calls, apparently because the parameters to the function are constant and Drill thinks my function is a pure function. If I make up some bogus data to pass in as a parameter, all is well and the function is called as much as I wanted. For instance, with the uniform distribution, my function takes two arguments, those being the minimum and maximum value to return. Here is what I see with constants for the min and max: 0: jdbc:drill:zk=local select random(0,10) from (values 5,5,5,5) as tbl(x); into eval into eval +-+ | EXPR$0| +-+ | 1.7787372583008298 | | 1.7787372583008298 | | 1.7787372583008298 | | 1.7787372583008298 | +-+ If I include an actual value, we see more interesting behavior even if the value is effectively constant: 0: jdbc:drill:zk=local select random(0,x) from (values 5,5,5,5) as tbl(x); into eval into eval into eval into eval +--+ |EXPR$0| +--+ | 3.688377805419459| | 0.2827056410711032 | | 2.3107479622644918 | | 0.10813788169218574 | +--+ 4 rows selected (0.088 seconds) Even if I make the max value come along from the sub-query, I get the evil behavior although the function is now surprisingly actually called three times, apparently to do with warming up the pipeline: 0: jdbc:drill:zk=local select random(0,max_value) from (select 14 as max_value,x from (values 5,5,5,5) as tbl(x)) foo; into eval into eval into eval +-+ | EXPR$0| +-+ | 13.404462063773702 | | 13.404462063773702 | | 13.404462063773702 | | 13.404462063773702 | +-+ 4 rows selected (0.121 seconds) The UDF itself is boring and can be found at https://gist.github.com/tdunning/0c2cc2089e6cd8c030c0 So how can I defeat this behavior? -- Daniel Barclay MapR Technologies
MaterializedField
What exactly is materialized about class org.apache.drill.exec.record.MaterializedField? The name gave me the impression that it would be a field/column with its data materialized (as a materialized view has copies of data). However, MaterializedField doesn't seem to contain data values (just field metadata like the name/pathname and data type). So what exactly does the class represent? (What's materialized, and relative to what?) Daniel -- Daniel Barclay MapR Technologies
[jira] [Created] (DRILL-3511) Dev.-level message (or bug?): You tried to write a Bit ... ValueWriter ... NullableFloat8WriterImpl.
Daniel Barclay (Drill) created DRILL-3511: - Summary: Dev.-level message (or bug?): You tried to write a Bit ... ValueWriter ... NullableFloat8WriterImpl. Key: DRILL-3511 URL: https://issues.apache.org/jira/browse/DRILL-3511 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Reporter: Daniel Barclay (Drill) Assignee: Steven Phillips For a JSON file containing this: {noformat} {x:[{y:-1.1,y:false}]} {noformat} a basic query yields an error message describing the problem in terms of Drill's implementation rather than in terms of the JSON data: {noformat} 0: jdbc:drill:zk=local SELECT * FROM `dfs`.`/tmp/xxx.json`; Error: DATA_READ ERROR: You tried to write a Bit type when you are using a ValueWriter of type NullableFloat8WriterImpl. File /tmp/2924/data1x/a.json Record 1 Line 1 Column 24 Field Fragment 0:0 [Error Id: abe134c1-0a2c-4ce4-9f7d-1b68aad819fc on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:zk=local {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3507) AbstractMapVector uses AbstractContainerVector's logger
Daniel Barclay (Drill) created DRILL-3507: - Summary: AbstractMapVector uses AbstractContainerVector's logger Key: DRILL-3507 URL: https://issues.apache.org/jira/browse/DRILL-3507 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) {{AbstractMapVector}} uses {{AbstractContainerVector}}'s {{logger}} field, and that logger field is not private. (There seems to be about 63 other cases of abnormally non-private logger fields.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3506) Remove logger fields from interfaces.
Daniel Barclay (Drill) created DRILL-3506: - Summary: Remove logger fields from interfaces. Key: DRILL-3506 URL: https://issues.apache.org/jira/browse/DRILL-3506 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) About 29 Java interfaces have extraneous logger fields like this: {noformat} public interface Counter { static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(Counter.class); ... {noformat} See: common/src/main/java/org/apache/drill/common/logical/data/visitors/LogicalVisitor.java common/src/main/java/org/apache/drill/common/logical/FormatPluginConfig.java common/src/main/java/org/apache/drill/common/expression/LogicalExpression.java exec/java-exec/src/test/java/org/apache/drill/exec/compile/ExampleTemplate.java exec/java-exec/src/main/java/org/apache/drill/exec/disk/Spool.java exec/java-exec/src/main/java/org/apache/drill/exec/rpc/DrillRpcFuture.java exec/java-exec/src/main/java/org/apache/drill/exec/rpc/RpcOutcome.java exec/java-exec/src/main/java/org/apache/drill/exec/rpc/RpcConnectionHandler.java exec/java-exec/src/main/java/org/apache/drill/exec/memory/BufferAllocator.java exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/DataCollector.java exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/RawBatchBuffer.java exec/java-exec/src/main/java/org/apache/drill/exec/work/RootNodeDriver.java exec/java-exec/src/main/java/org/apache/drill/exec/cache/DrillSerializable.java exec/java-exec/src/main/java/org/apache/drill/exec/cache/Counter.java exec/java-exec/src/main/java/org/apache/drill/exec/cache/DistributedMap.java exec/java-exec/src/main/java/org/apache/drill/exec/cache/DistributedMultiMap.java exec/java-exec/src/main/java/org/apache/drill/exec/cache/DistributedCache.java exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/FileWork.java exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaFactory.java exec/java-exec/src/main/java/org/apache/drill/exec/store/schedule/CompleteWork.java exec/java-exec/src/main/java/org/apache/drill/exec/store/RecordRecorder.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/PhysicalVisitor.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/BatchCreator.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/RootCreator.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/filter/Filterer.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/BatchIterator.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/WriteEntry.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/Prel.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/PrelVisitor.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3496) Augment classpath scanning and DrillConfig logging.
Daniel Barclay (Drill) created DRILL-3496: - Summary: Augment classpath scanning and DrillConfig logging. Key: DRILL-3496 URL: https://issues.apache.org/jira/browse/DRILL-3496 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Assignee: Daniel Barclay (Drill) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[plug-ins] AbstractGroupScan.getScanStats()
Somewhat (although not exactly) similarly to the case below, AbstractGroupScan has an implementation of GroupScan.clone(ListSchemaPath), but that implementation doesn't do anything other than throw an exception: throw new UnsupportedOperationException(String.format(%s does not implement clone(columns) method!, this.getClass().getCanonicalName())); Is there any need for AbstractGroupScan to declare a non-abstract clone(ListSchemaPath)? Is there any need for it to re-declare clone(ListSchemaPath)at all? (Does it narrow the contract?) Daniel I wrote: Method org.apache.drill.exec.physical.base.AbstractGroupScan.getScanStats() has a body that throws a This must be implemented exception. Why isn't getScanStats()simply an abstract method? (Is there any case in which a subclass doesn't need to implement the method (i.e., where that method won't ever be called for that subclass)?) Daniel -- Daniel Barclay MapR Technologies
decoding JsonTypeInfo on StoragePluginConfig
What exactly does the following Jackson @JsonTypeInfo annotation on org.apache.drill.common.logical.StoragePluginConfig do?: @JsonTypeInfo(use = JsonTypeInfo.Id.NAME, include = JsonTypeInfo.As.PROPERTY, property=type) public abstract class StoragePluginConfig { Also, would that annotation still have the desired effect if StoragePluginConfig were an interface? Daniel -- Daniel Barclay MapR Technologies