from:"Daniel Barclay"

[jira] [Resolved] (DRILL-4010) In HBase reader, create child vectors for referenced HBase columns to avoid spurious schema changes

2015-12-19 Thread Daniel Barclay (Drill) (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) resolved DRILL-4010.
---
Resolution: Fixed

Resolved as part of DRILL-2288 patch.

> In HBase reader, create child vectors for referenced HBase columns to avoid 
> spurious schema changes
> ---
>
> Key: DRILL-4010
> URL: https://issues.apache.org/jira/browse/DRILL-4010
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - HBase
>    Reporter: Daniel Barclay (Drill)
>    Assignee: Daniel Barclay (Drill)
>
> {{HBaseRecordReader}} needs to create child vectors for all 
> referenced/requested columns.
> Currently, if a fragment reads only HBase rows that don't have a particular 
> referenced column (within a given column family), downstream code adds a 
> dummy column of type {{NullableIntVector}} (as a child in the {{MapVector}} 
> for the containing HBase column family).
> If any other fragment reads an HBase row that _does_ contain the referenced 
> column, that fragment's reader will create a child 
> {{NullableVarBinaryVector}} for the referenced column.
> When the data from those two fragments comes together, Drill detects a schema 
> change, even though logically there isn't really any schema change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3659) UnionAllRecordBatch infers wrongly from next() IterOutcome values

2015-12-19 Thread Daniel Barclay (Drill) (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) resolved DRILL-3659.
---
   Resolution: Fixed
Fix Version/s: (was: 1.5.0)

Resolved as part of DRILL-2288 patch.

> UnionAllRecordBatch infers wrongly from next() IterOutcome values
> -
>
> Key: DRILL-3659
> URL: https://issues.apache.org/jira/browse/DRILL-3659
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>        Reporter: Daniel Barclay (Drill)
>    Assignee: Daniel Barclay (Drill)
>
> When UnionAllRecordBatch uses IterOutcome values returned from the next() 
> method of upstream batches, it seems to be using those values wrongly (making 
> incorrect inferences about what they mean).
> In particular, some switch statements seem to check for NONE vs. 
> OK_NEW_SCHEMA in order to determine whether there are any rows (instead of 
> explicitly checking the number of rows).  However, OK_NEW_SCHEMA can be 
> returned even when there are zero rows.
> The apparent latent bug in the union code blocks the fix for DRILL-2288 
> (having ScanBatch return OK_NEW_SCHEMA for a zero-rows case in which is was 
> wrongly (per the IterOutcome protocol) returning NONE without first returning 
> OK_NEW_SCHEMA).
> 
> For details of IterOutcome values, see the Javadoc documentation of 
> RecordBatch.IterOutcome (after DRILL-3641 is merged; until then, see 
> https://github.com/apache/drill/pull/113).
> For an environment/code state that exposes the UnionAllRecordBatch problems, 
> see https://github.com/dsbos/incubator-drill/tree/bugs/WORK_2288_etc, which 
> includes:
> - a test that exposes the DRILL-2288 problem;
> - an enhanced IteratorValidatorBatchIterator, which now detects IterOutcome 
> value sequence violations; and
> - a fixed (though not-yet-cleaned) version of ScanBatch that fixes the 
> DRILL-2288 problem and thereby exposes the UnionAllRecordBatch problem 
> (several test methods in each of TestUnionAll and TestUnionDistinct fail).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3641) Document RecordBatch.IterOutcome (enumerators and possible sequences)

2015-12-19 Thread Daniel Barclay (Drill) (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) resolved DRILL-3641.
---
Resolution: Fixed

Resolved as part of DRILL-2288 patch.

> Document RecordBatch.IterOutcome (enumerators and possible sequences)
> -
>
> Key: DRILL-3641
> URL: https://issues.apache.org/jira/browse/DRILL-3641
> Project: Apache Drill
>  Issue Type: Bug
>    Reporter: Daniel Barclay (Drill)
>        Assignee: Daniel Barclay (Drill)
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3955) Possible bug in creation of Drill columns for HBase column families

2015-12-19 Thread Daniel Barclay (Drill) (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) resolved DRILL-3955.
---
Resolution: Fixed

Resolved as part of DRILL-2288 patch.

> Possible bug in creation of Drill columns for HBase column families
> ---
>
> Key: DRILL-3955
> URL: https://issues.apache.org/jira/browse/DRILL-3955
> Project: Apache Drill
>  Issue Type: Bug
>    Reporter: Daniel Barclay (Drill)
>        Assignee: Daniel Barclay (Drill)
>
> If all of the rows read by a given {{HBaseRecordReader}} have no HBase 
> columns in a given HBase column family, {{HBaseRecordReader}} doesn't create 
> a Drill column for that HBase column family.
> Later, in a {{ProjectRecordBatch}}'s {{setupNewSchema}}, because no Drill 
> column exists for that HBase column family, that {{setupNewSchema}} creates a 
> dummy Drill column using the usual {{NullableIntVector}} type.  In 
> particular, it is not a map vector as {{HBaseRecordReader}} creates when it 
> sees an HBase column family.
> Should {{HBaseRecordReader}} and/or something around setting up for reading 
> HBase (including setting up that {{ProjectRecordBatch}}) make sure that all 
> HBase column families are represented with map vectors so that 
> {{setupNewSchema}} doesn't create a dummy field of type {{NullableIntVector}}?
> The problem is that, currently, when an HBase table is read in two separate 
> fragments, one fragment (seeing rows with columns in the column family) can 
> get a map vector for the column family while the other (seeing only rows with 
> no columns in the column familar) can get the {{NullableIntVector}}.  
> Downstream code that receives the two batches ends up with an unresolved 
> conflict, yielding IndexOutOfBoundsExceptions as in DRILL-3954.
> It's not clear whether there is only one bug\--that downstream code doesn't 
> resolve {{NullableIntValue}} dummy fields right (DRILL-TBD)\--or two\--that 
> the HBase reading code should set up a Drill column for every HBase column 
> family (regardless of whether it has any columns in the rows that were read) 
> and that downstream code doesn't resolve {{NullableIntValue}} dummy fields 
> (resolution is applicable to sources other than just HBase).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4045) FLATTEN case in testNestedFlatten yields no rows (test didn't detect)

2015-11-05 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-4045:
-

 Summary: FLATTEN case in testNestedFlatten yields no rows (test 
didn't detect)
 Key: DRILL-4045
 URL: https://issues.apache.org/jira/browse/DRILL-4045
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Daniel Barclay (Drill)


The case of using {{FLATTEN}} on nested lists appearing in 
{{TestComplexTypeReader.testNestedFlatten()}} yields no rows.

Part of the problem is that in the code generated by {{FlattenRecordBatch}}, 
the methods are empty.

(That test method doesn't check the results, so, previous to DRILL-2288 work, 
the problem was not detected. 

However, with DRILL-2288 fixes, the flatten problem causes an 
{{IllegalArgumentException}} (logically, an assertion exception) in 
RecordBatchLoader, so the test is being disabled (with @Ignore) as part of 
DRILL-2288.)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4010) In HBase reader, create child vectors for referenced HBase columns to avoid spurious schema changes

2015-11-02 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-4010:
-

 Summary: In HBase reader, create child vectors for referenced 
HBase columns to avoid spurious schema changes
 Key: DRILL-4010
 URL: https://issues.apache.org/jira/browse/DRILL-4010
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types, Storage - HBase
Reporter: Daniel Barclay (Drill)


{{HBaseRecordReader}} needs to create child vectors for all 
referenced/requested columns.

Currently, if a fragment reads only HBase rows that don't have a particular 
referenced column (within a given column family), downstream code adds a dummy 
column of type {{NullableIntVector}} (as a child in the {{MapVector}} for the 
containing HBase column family).

If any other fragment reads an HBase row that _does_ contain the referenced 
column, that fragment's reader will create a child {{NullableVarBinaryVector}} 
for the referenced column.

When the data from those two fragments comes together, Drill detects a schema 
change, even though logically there isn't really any schema change.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4001) Empty vectors from previous batch left by MapVector.load(...)/RecordBatchLoader.load(...)

2015-10-31 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-4001:
-

 Summary: Empty vectors from previous batch left by 
MapVector.load(...)/RecordBatchLoader.load(...)
 Key: DRILL-4001
 URL: https://issues.apache.org/jira/browse/DRILL-4001
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


In certain cases, {{MapVector.load(...)}} (called by 
{{RecordBatchLoader.load(...)}}) returns with some map child vectors having a 
length of zero instead of having a length matching the length of sibling 
vectors and the number of records in the batch.  (This caused some of the 
{{IndexOutOfBoundException}} errors seen in fixing DRILL-2288.)

The condition seems to be that a child field (e.g., an HBase column in a HBase 
column family) appears in an earlier batch and does not appear in a later 
batch.  

(The HBase column's child vector gets created (in the MapVector for the HBase 
column family) during loading of the earlier batch.  During loading of the 
later batch, all vectors get reset to zero length, and then only vectors for 
fields _appearing in the batch message being loaded_ get loaded and set to the 
length of the batch-\-other vectors created from earlier messages/{{load}} 
calls are left with a length of zero (instead of, say, being filled with nulls 
to the length of their siblings and the current record batch).)

See the TODO(DRILL-) mark and workaround in {{MapVector.getObject(int)}}.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4002) Result check doesn't execute in TestNewMathFunctions.runTest(...)

2015-10-31 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-4002:
-

 Summary: Result check doesn't execute in 
TestNewMathFunctions.runTest(...) 
 Key: DRILL-4002
 URL: https://issues.apache.org/jira/browse/DRILL-4002
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build & Test
Reporter: Daniel Barclay (Drill)


In {{TestNewMathFunctions}}, method {{runTest}}'s check of the result does not 
execute.

Method {{runTest(...)}} skips the first record batch--which currently contains 
the results to be checked.

The loop that is right after that and that checks any subsequent batches never 
executes.

Additionally, the test has no self-check assertions (e.g., that a second batch 
existed) to detect that its assumptions are not longer valid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3998) Check skipping of .clear and .release in BaseTestQuery.printResult(...) (bug?)

2015-10-30 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3998:
-

 Summary: Check skipping of .clear and .release in 
BaseTestQuery.printResult(...) (bug?)
 Key: DRILL-3998
 URL: https://issues.apache.org/jira/browse/DRILL-3998
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build & Test
Reporter: Daniel Barclay (Drill)


In {{BaseTestQuery.printResult(...)}}, if a loaded record batch has no records, 
the code skips calling not only the printout method but also 
{{RecordBatchLoader.clear()}} and {{QueryDataBatch.release()}} methods.  Is 
that correct?

(At some point in debugging DRILL-2288, that skipping of {{clear}} and 
{{release}} seemed to cause reporting of a memory leak.)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Maven/checkstyle changes

2015-10-28 Thread Daniel Barclay


Hey, what changed in the Maven setup regarding checkstyle?

I used to be able to run "mvn validate" to run checkstyle to find
style violations up front (e.g., before starting a long test run).

However, that doesn't seem to work any more.  It looks like checkstyle
is run later, but it's not clear in what Maven phase or step.


How do we run checkstyle (without building everything) now?

Thanks,
Daniel
--
Daniel Barclay
MapR Technologies

Re: [DISCUSS] Processing non-printable characters in Drill

2015-10-22 Thread Daniel Barclay


Khurram Faraaz wrote:

... It looks like Drill processes
non-printable characters in both cases, with and without the new text
reader (exec.storage.enable_new_text_reader)

Should we throw an error since these are non-printable characters ?

No, I don't think so.  Does there seem to be any need to reject non-printable 
characters?


...

Content from the csv file used in test
1,^A
2,^B
3,^C
4,^D
5,^E
6,^F

0: jdbc:drill:schema=dfs.tmp> select * from `nonPrintables.csv`;
+-+
| columns |
+-+
| ["1","\u0001"]  |
| ["2","\u0002"]  |
| ["3","\u0003"]  |
| ["4","\u0004"]  |
| ["5","\u0005"]  |
| ["6","\u0006"]  |
+-+
6 rows selected (0.521 seconds)

0: jdbc:drill:schema=dfs.tmp> select columns[1] from `nonPrintables.csv`;
+-+
| EXPR$0  |
+-+
||
||
||
||
||
||
+-+
6 rows selected (0.382 seconds)

Note what's going on there (re the difference between those two outputs):

In the first case, the strings with unprintable characters go through Drill's 
conversion of a value of a complex type (e.g., VARCHAR ARRAY) to a JSON string 
(in order to have a string to return through the JDBC API).  That conversion 
encodes string (VARCHAR) values as JSON string tokens, using JSON's escape 
sequences for the unprintable characters.  Finally, the resultant JSON string 
(the whole string of JSON, not the JSON string token) is displayed by SQLLine 
or the web UI or whatever.  (And don't forget the step of your copying and 
pasting into your message.)

In the second case, the core part of Drill is directly returning the characters 
 strings from the data through the JDBC API.  Then, SQLLine or the web UI or 
whatever is deciding how to display those strings--including how handle any 
special, e.g., unprintable, characters.  Evidently, SQLLine doesn't render 
unprintable characters into some visible form.  It probably just writes them to 
your terminal's output stream.  Since your terminal doesn't render them 
especially either, the characters still aren't visible, and when you copied to 
paste to compose your e-mail message, there was nothing from those special 
characters to copy.

(Actually, the non-printable characters are slightly visible--note how the six lines with visually 
blank values have terminating vertical-bar characters that don't line up with the other terminating 
"+" or "|" characters.)


From the point of view of the core part of Drill, it's up to the client of the 
JDBC API to decide how to display values, including character string with 
unprintable characters.  (The JDBC API returns the Java representations (String 
objects) of the VARCHAR values.)


However, from the point of view of users, SQLLine (and Drill's web UI too) 
should render all values visibly, including character strings with unprintable 
characters.

(They should also render byte strings competently, e.g., rendering in hex the 
bytes themselves rather than displaying in hex the hash code of the Java byte 
array object that contains (a specific copy of) the bytes of the byte 
string(!).)


Daniel

--
Daniel Barclay
MapR Technologies

[jira] [Created] (DRILL-3954) HBase tests use only 1 region, don't detect bug(s) in dummy-column NullableIntVector creation/resolution

2015-10-19 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3954:
-

 Summary: HBase tests use only 1 region, don't detect bug(s) in 
dummy-column NullableIntVector creation/resolution
 Key: DRILL-3954
 URL: https://issues.apache.org/jira/browse/DRILL-3954
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - HBase
Reporter: Daniel Barclay (Drill)


Currently, the HBase tests (e.g., {{TestHBaseFilterPushDown}}) use only one 
region.

That causes them to miss detecting a bug in creating and/or resolving dummy 
fields ({{NullableIntVectors}} for referenced but non-existent fields) 
somewhere between reading from HBase and {{ProjectRecordBatch.setupNewSchema}} 
(or maybe two separate bugs).

Reproduction:

In HBaseTestsSuite, change the line:
{noformat}
UTIL.startMiniHBaseCluster(1, 1);
{noformat}
to:
{noformat}
UTIL.startMiniHBaseCluster(1, 3);
{noformat}
and change the line:
{noformat}
TestTableGenerator.generateHBaseDataset1(admin, TEST_TABLE_1, 1);
{noformat}
to:
{noformat}
TestTableGenerator.generateHBaseDataset1(admin, TEST_TABLE_1, 3);
{noformat}
.

Run unit test class {{TestHBaseFilterPushDown}}.

Depending on which region gets processed first (it's non-deteministic), test 
methods {{testFilterPushDownOrRowKeyEqualRangePred}} and 
{{testFilterPushDownMultiColumns}} get exceptions like this:

{noformat}
java.lang.IndexOutOfBoundsException: index: 0, length: 1 (expected: range(0, 0))
at io.netty.buffer.DrillBuf.checkIndexD(DrillBuf.java:189)
at io.netty.buffer.DrillBuf.chk(DrillBuf.java:211)
at io.netty.buffer.DrillBuf.getByte(DrillBuf.java:746)
at 
org.apache.drill.exec.vector.UInt1Vector$Accessor.get(UInt1Vector.java:364)
at 
org.apache.drill.exec.vector.NullableVarBinaryVector$Accessor.isSet(NullableVarBinaryVector.java:391)
at 
org.apache.drill.exec.vector.NullableVarBinaryVector$Accessor.isNull(NullableVarBinaryVector.java:387)
at 
org.apache.drill.exec.vector.NullableVarBinaryVector$Accessor.getObject(NullableVarBinaryVector.java:411)
at 
org.apache.drill.exec.vector.NullableVarBinaryVector$Accessor.getObject(NullableVarBinaryVector.java:1)
at 
org.apache.drill.exec.vector.complex.MapVector$Accessor.getObject(MapVector.java:313)
at 
org.apache.drill.exec.util.VectorUtil.showVectorAccessibleContent(VectorUtil.java:166)
at org.apache.drill.BaseTestQuery.printResult(BaseTestQuery.java:487)
at 
org.apache.drill.hbase.BaseHBaseTest.printResultAndVerifyRowCount(BaseHBaseTest.java:95)
at 
org.apache.drill.hbase.BaseHBaseTest.runHBaseSQLVerifyCount(BaseHBaseTest.java:91)
at 
org.apache.drill.hbase.TestHBaseFilterPushDown.testFilterPushDownMultiColumns(TestHBaseFilterPushDown.java:592)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.lang.reflect.Method.invoke(Method.java:606)
{noformat}

See DRILL-TBD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3955) Possible bug in creation of Drill columns for HBase column families

2015-10-19 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3955:
-

 Summary: Possible bug in creation of Drill columns for HBase 
column families
 Key: DRILL-3955
 URL: https://issues.apache.org/jira/browse/DRILL-3955
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


If all of the rows read by a given {{HBaseRecordReader}} have no HBase columns 
in a given HBase column family, {{HBaseRecordReader}} doesn't create a Drill 
column for that HBase column family.

Later, in a {{ProjectRecordBatch}}'s {{setupNewSchema}}, because no Drill 
column exists for that HBase column family, that {{setupNewSchema}} creates a 
dummy Drill column using the usual {{NullableIntVector}} type.  In particular, 
it is not a map vector as {{HBaseRecordReader}} creates when it sees an HBase 
column family.

Should {{HBaseRecordReader}} and/or something around setting up for reading 
HBase (including setting up that {{ProjectRecordBatch}}) make sure that all 
HBase column families are represented with map vectors so that 
{{setupNewSchema}} doesn't create a dummy field of type {{NullableIntVector}}?


The problem is that, currently, when an HBase table is read in two separate 
fragments, one fragment (seeing rows with columns in the column family) can get 
a map vector for the column family while the other (seeing only rows with no 
columns in the column familar) can get the {{NullableIntVector}}.  Downstream 
code that receives the two batches ends up with an unresolved conflict, 
yielding IndexOutOfBoundsExceptions as in DRILL-3954.

It's not clear whether there is only one bug--that downstream code doesn't 
resolve {{NullableIntValue}} dummy fields right (DRILL-TBD)--or two--that the 
HBase reading code should set up a Drill column for every HBase column family 
(regardless of whether it has any columns in the rows that were read) and that 
downstream code doesn't resolve {{NullableIntValue}} dummy fields (resolution 
is applicable to sources other than just HBase).








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: intermittent IndexOutOfBoundsException from uninitialized null-bits vector?

2015-10-17 Thread Daniel Barclay

I wrote:

*> Is f2's type of **INT:OPTIONAL**correct?*
>
> Why wouldn't f2's type be MAP:REQUIRED? Even if the HBase reader didn't see 
any HBase columns in HBase column family f2, doesn't it still know that f2 is a 
column family and shouldn't it still set Drill's f2 column to be a map and not of 
type INT:OPTIONAL?

Okay, I see now where that part (f2's being INT:OPTIONAL) is coming from:

One of the HBase scans doesn't read any rows with HBase columns in column 
family f2, so HBaseRecordReader doesn't know about column family f2 and doesn't 
create any Drill column f2.

Drill column f2 is referred to in the query but doesn't exist the above scan's 
fragment, so some code in ProjectRecordBatch.setupNewSchema() creates it with 
type INT:OPTIONAL.

I can't tell whether:

 * The HBase reader should be getting /all /column families.
 o (Probably not--doing so could wastefully get unneeded data.)
 * ProjectRecordBatch.setupNewSchema() is wrongly creating f2 with the current 
assumed type (INT:OPTIONAL) for referenced-but-absent fields.
 o (Probably not--it doesn't know about HBase or that f2 is an HBase column 
family to know to make the Drill f2 column a map.)
 * Something downstream in the fragment that receives data from the two HBase 
scan fragments isn't properly resolving the INT:OPTIONAL version of f2 (from 
the unknown-in-that fragment reference to f2) with the MAP:REQUIRED version of 
f2 (from the fragment that did see f2), e.g., resolving/converting f2 in the 
batch from the no-f2 scan to MAP:REQUIRED.
 o (Unclear how likely--it's not clear how downstream code could know when 
to take INT:OPTIONAL as a dummy type and ignore it and when to take it as the 
real type and report a schema conflict.)
 * The real problem is simply that ambiguously reusing INT:OPTIONAL for unknown 
columns, instead of using something separate and dedicated, is fundamentally 
flawed, and that core problem should be fixed.
 o (Seems likely.)
 o Can we just create some sibling of our enumerator INT and use that for 
referenced-but-otherwise-unknown columns? (Does any need to also correspond to 
a Calcite enumerator prevent that approach (without a Calcite change)?)
 o If that doesn't work, would creating a new sibling major type work?
 o With a dedicated indication of the referenced-but-of-unknown-type case, 
code could easily tell which seemingly schema differences were actually simpley 
resolvable.

Daniel

Daniel Barclay wrote:

[With corrected track trace.]

Chris,

Actually, now I am seeing a non-deterministic IOOBE like yours (with length = 
1).

Note in the call stack way below that it's coming from an isNull() method.  The 
isNull() method was called with an index of 0 when the top-level vector 
container or whatever had one row.

*It looks like the subvector used to track null values didn't get filled in 
right. *(I can't tell yet if it also means that the value-printing code in the 
HBase test is missing something about a schema change.)

This is from an HBase query with "WHERE row_key = 'a2' or row_key between 'b5' and 'b6'".  A batch 
of 2 rows resulting from the "between ..." part comes first--*non-deterministically* (it was 
consistent for many runs, but it switched just now in a run for adding the schemas to this message)--and then 
the batch of one row for the "= 'a2'" part seems messed up in the second column family (f2):

The first batch's schema is (note *f2*'s type):
BatchSchema [fields=[`row_key`(VARBINARY:REQUIRED)[`$offsets$`(UINT4:REQUIRED)], `f`(MAP:REQUIRED)[`f`.`c1`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f`.`c1`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f`.`c2`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f`.`c2`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f`.`c3`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f`.`c3`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f`.`c4`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f`.`c4`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f`.`c5`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f`.`c5`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f`.`c6`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f`.`c6`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f`.`c8`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f`.`c8`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]]], `f2`(MAP:REQUIRED)[`f2`.`c1`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), 
`f2`.`c1`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f2`.`c3`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f2`.`c3`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f2`.`c5`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f2`.`c5`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f2`.`c7`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f2`.`c7`(VARBINARY:OPTIONAL)[`$offsets$`(UINT4:REQUIRED)]], `f2`.`c9`(VARBINARY:OPTIONAL)[`$bits$`(UINT1:REQUIRED), `f2`.`c9`(VARBINARY:OPTIONAL)[`$offsets$`(UI

intermittent IndexOutOfBoundsException from uninitialized null-bits vector?

2015-10-16 Thread Daniel Barclay

rameworkMethod$1(ReflectiveCallable).run() line: 12
FrameworkMethod.invokeExplosively(Object, Object...) line: 44
JUnit4TestRunnerDecorator.executeTestMethod(FrameworkMethod, Object, Object...) 
line: 120
JUnit4TestRunnerDecorator.invokeExplosively(FrameworkMethod, Object, Object...) 
line: 65
MockFrameworkMethod.invokeExplosively(Invocation, Object, Object...) line: 29
GeneratedMethodAccessor133.invoke(Object, Object[]) line: not available
DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43
Method.invoke(Object, Object...) line: 606
MethodReflection.invokeWithCheckedThrows(Object, Method, Object...) line: 95
MockMethodBridge.callMock(Object, boolean, String, String, String, int, int, 
boolean, Object[]) line: 76
MockMethodBridge.invoke(Object, Method, Object[]) line: 41
(FrameworkMethod).invokeExplosively(Object, Object...) 
line: 44
InvokeMethod.evaluate() line: 17
RunBefores.evaluate() line: 26
RunAfters.evaluate() line: 27
TestWatcher$1.evaluate() line: 55
TestWatcher$1.evaluate() line: 55
TestWatcher$1.evaluate() line: 55
ExpectedException$ExpectedExceptionStatement.evaluate() line: 168
TestWatcher$1.evaluate() line: 55
RunRules.evaluate() line: 20
BlockJUnit4ClassRunner(ParentRunner).runLeaf(Statement, Description, 
RunNotifier) line: 271
BlockJUnit4ClassRunner.runChild(FrameworkMethod, RunNotifier) line: 70
BlockJUnit4ClassRunner.runChild(Object, RunNotifier) line: 50
ParentRunner$3.run() line: 238
ParentRunner$1.schedule(Runnable) line: 63
BlockJUnit4ClassRunner(ParentRunner).runChildren(RunNotifier) line: 236
ParentRunner.access$000(ParentRunner, RunNotifier) line: 53
ParentRunner$2.evaluate() line: 229
RunBefores.evaluate() line: 26
RunAfters.evaluate() line: 27
BlockJUnit4ClassRunner(ParentRunner).run(RunNotifier) line: 309
JUnit4TestClassReference(JUnit4TestReference).run(TestExecution) line: 50
TestExecution.run(ITestReference[]) line: 38
RemoteTestRunner.runTests(String[], String, TestExecution) line: 459
RemoteTestRunner.runTests(TestExecution) line: 675
RemoteTestRunner.run() line: 382
RemoteTestRunner.main(String[]) line: 192

Daniel


Chris Westin wrote:

I seem to recall you were telling me about a new IOOB that you're seeing, was 
that you?
Is is this?

Execution Failures:
/root/drillAutomation/framework-master/framework/resources/Functional/aggregates/tpcds_variants/csv/aggregate26.q
Query:
select cast(case columns[0] when '' then 0 else columns[0] end as int) as 
soldd, cast(case columns[1] when '' then 0 else columns[1] end as bigint) as 
soldt, cast(case columns[2] when '' then 0 else columns[2] end as float) as 
itemsk, cast(case columns[3] when '' then 0 else columns[3] end as 
decimal(18,9)) as custsk, cast(case columns[4] when '' then 0 else columns[4] 
end as varchar(20)) as cdemo, columns[5] as hdemo, columns[6] as addrsk, 
columns[7] as storesk, columns[8] as promo, columns[9] as tickn, sum(case 
columns[10] when '' then 0 else cast(columns[10] as int) end) as quantities 
from `store_sales.dat` group by cast(case columns[0] when '' then 0 else 
columns[0] end as int), cast(case columns[1] when '' then 0 else columns[1] end 
as bigint), cast(case columns[2] when '' then 0 else columns[2] end as float), 
cast(case columns[3] when '' then 0 else columns[3] end as decimal(18,9)), 
cast(case columns[4] when '' then 0 else columns[4] end as varchar(20)), 
columns[5], columns[6], columns[7], columns[8], columns[9] order by soldd desc, 
soldt desc, itemsk desc limit 20
Failed with exception
java.sql.SQLException: SYSTEM ERROR: IndexOutOfBoundsException: index: 0, 
length: 1 (expected: range(0, 0))

Fragment 0:0

[Error Id: 2322e296-4fff-4770-a778-c20ea4d7 on atsqa6c61.qa.lab:31010]
at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:320)
at 
oadd.net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:160)
at 
org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:203)
at 
org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:89)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
ERROR: IndexOutOfBoundsException: index: 0, length: 1 (expected: range(0, 0))

Fragment 0:0



--
Daniel Barclay
MapR Technologies

intermittent IndexOutOfBoundsException from uninitialized null-bits vector?

2015-10-16 Thread Daniel Barclay

, Object...) line: 606
FrameworkMethod$1.runReflectiveCall() line: 47
FrameworkMethod$1(ReflectiveCallable).run() line: 12
FrameworkMethod.invokeExplosively(Object, Object...) line: 44
JUnit4TestRunnerDecorator.executeTestMethod(FrameworkMethod, Object, Object...) 
line: 120
JUnit4TestRunnerDecorator.invokeExplosively(FrameworkMethod, Object, Object...) 
line: 65
MockFrameworkMethod.invokeExplosively(Invocation, Object, Object...) line: 29
GeneratedMethodAccessor133.invoke(Object, Object[]) line: not available
DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43
Method.invoke(Object, Object...) line: 606
MethodReflection.invokeWithCheckedThrows(Object, Method, Object...) line: 95
MockMethodBridge.callMock(Object, boolean, String, String, String, int, int, 
boolean, Object[]) line: 76
MockMethodBridge.invoke(Object, Method, Object[]) line: 41
(FrameworkMethod).invokeExplosively(Object, Object...) 
line: 44
InvokeMethod.evaluate() line: 17
RunBefores.evaluate() line: 26
RunAfters.evaluate() line: 27
TestWatcher$1.evaluate() line: 55
TestWatcher$1.evaluate() line: 55
TestWatcher$1.evaluate() line: 55
ExpectedException$ExpectedExceptionStatement.evaluate() line: 168
TestWatcher$1.evaluate() line: 55
RunRules.evaluate() line: 20
BlockJUnit4ClassRunner(ParentRunner).runLeaf(Statement, Description, 
RunNotifier) line: 271
BlockJUnit4ClassRunner.runChild(FrameworkMethod, RunNotifier) line: 70
BlockJUnit4ClassRunner.runChild(Object, RunNotifier) line: 50
ParentRunner$3.run() line: 238
ParentRunner$1.schedule(Runnable) line: 63
BlockJUnit4ClassRunner(ParentRunner).runChildren(RunNotifier) line: 236
ParentRunner.access$000(ParentRunner, RunNotifier) line: 53
ParentRunner$2.evaluate() line: 229
RunBefores.evaluate() line: 26
RunAfters.evaluate() line: 27
BlockJUnit4ClassRunner(ParentRunner).run(RunNotifier) line: 309
JUnit4TestClassReference(JUnit4TestReference).run(TestExecution) line: 50
TestExecution.run(ITestReference[]) line: 38
RemoteTestRunner.runTests(String[], String, TestExecution) line: 459
RemoteTestRunner.runTests(TestExecution) line: 675
RemoteTestRunner.run() line: 382
RemoteTestRunner.main(String[]) line: 192

Daniel


Chris Westin wrote:

I seem to recall you were telling me about a new IOOB that you're seeing, was 
that you?
Is is this?

Execution Failures:
/root/drillAutomation/framework-master/framework/resources/Functional/aggregates/tpcds_variants/csv/aggregate26.q
Query:
select cast(case columns[0] when '' then 0 else columns[0] end as int) as 
soldd, cast(case columns[1] when '' then 0 else columns[1] end as bigint) as 
soldt, cast(case columns[2] when '' then 0 else columns[2] end as float) as 
itemsk, cast(case columns[3] when '' then 0 else columns[3] end as 
decimal(18,9)) as custsk, cast(case columns[4] when '' then 0 else columns[4] 
end as varchar(20)) as cdemo, columns[5] as hdemo, columns[6] as addrsk, 
columns[7] as storesk, columns[8] as promo, columns[9] as tickn, sum(case 
columns[10] when '' then 0 else cast(columns[10] as int) end) as quantities 
from `store_sales.dat` group by cast(case columns[0] when '' then 0 else 
columns[0] end as int), cast(case columns[1] when '' then 0 else columns[1] end 
as bigint), cast(case columns[2] when '' then 0 else columns[2] end as float), 
cast(case columns[3] when '' then 0 else columns[3] end as decimal(18,9)), 
cast(case columns[4] when '' then 0 else columns[4] end as varchar(20)), 
columns[5], columns[6]
, columns[7], columns[8], columns[9] order by soldd desc, soldt desc, itemsk 
desc limit 20
Failed with exception
java.sql.SQLException: SYSTEM ERROR: IndexOutOfBoundsException: index: 0, 
length: 1 (expected: range(0, 0))

Fragment 0:0

[Error Id: 2322e296-4fff-4770-a778-c20ea4d7 on atsqa6c61.qa.lab:31010]
at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:320)
at 
oadd.net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:160)
at 
org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:203)
at 
org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:89)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
ERROR: IndexOutOfBoundsException: index: 0, length: 1 (expected: range(0, 0))

Fragment 0:0



--
Daniel Barclay
MapR Technologies

current support/non-support for empty JSON files mixed in with regular JSON files

2015-10-08 Thread Daniel Barclay


Do we intend/expect Drill to _currently_ handle having empty (zero-byte)
JSON files mixed in with regular JSON files?


The in-progress fix for DRILL-2288 hits problems with some cases of
having empty JSON files mixed in with some non-empty JSON files.

However, it's not clear whether those cases are currently supported
(and DRILL-2288 changes need to not break that support) or those cases
are not supported yet anyway, so changes needed to fix DRILL-2288 don't
need to leave Drill in the state of handling such empty JSON files.

Thanks,
Daniel
--
Daniel Barclay
MapR Technologies

[jira] [Created] (DRILL-3902) Bad error message: core cause not included in text; maybe wrong kind

2015-10-06 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3902:
-

 Summary: Bad error message:  core cause not included in text; 
maybe wrong kind
 Key: DRILL-3902
 URL: https://issues.apache.org/jira/browse/DRILL-3902
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


When trying to use an empty directory as a table causes Drill to fail by 
hitting an IndexOutOfBoundsException, the final error message includes the text 
from the IndexOutOfBoundsException's getMessage()--but fails to mention 
IndexOutOfBoundsException itself (or equivalent information):

{noformat}
0: jdbc:drill:zk=localhost:2181> SELECT *   FROM 
`dfs`.`root`.`/tmp/empty_directory`;
Error: VALIDATION ERROR: Index: 0, Size: 0


[Error Id: 66ff61ed-ea41-4af9-87c5-f91480ef1b21 on dev-linux2:31010] 
(state=,code=0)
0: jdbc:drill:zk=localhost:2181> 
{noformat}

Also, since this isn't a coherent/intentional validation error but an internal 
error, shouldn't this be a SYSTEM ERROR message?

(Does the SYSTEM ERROR case including the exception class name in the message?)

Daniel







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: ensureAtLeastOneField: obsolete? needed?

2015-10-05 Thread Daniel Barclay


Jacques,

> ... it ... should probably check if a column is already setup in the output 
mutation and, if so, don't re-add the ensure at least one column. Thus probably 
requires a new method in output mutator (isEmpty or size)

Some questions on the various moving pieces:

 Does ensureAtLeastOneField() need to check only for the 
fieldWriter.integer(...) call, or does it also need to check for the earlier 
fieldWriter.map(...) call? (I'm not clear on what cases that loop can 
encounter.)

Would the method go (be declared) on MapWriter (re ensureAtLeastOneField()'s 
fieldWriter variable) or on ComplexWriter (re its writer parameter) (or 
something else)?

Which data should the method implementation be consulting?  For example, I see 
that SingleMapWriter has both container and fields members, and traversing down 
container (at least in the current case I'm looking at) leads to fields 
primary, secondary, and delegate (in MapWithOrdinal). Which is the primary/best 
source for whether there's any column (or subcolumn?) yet?)

Thanks,
Daniel



Jacques Nadeau wrote:


It's purpose is to make sure that a projection that finds no columns still 
produces output.  Think record {a:4,b:6}

If I say 'select c from t' ,  the reader (without this functionality) return 
zero columns (not supported).   As such, ensure at least one adds a column.  I 
believe it is still needed but should probably check if a column is already 
setup in the output mutation and, if so, don't re-add the ensure at least one 
column. Thus probably requires a new method in output mutator (isEmpty or size)

On Oct 3, 2015 7:49 PM, "Daniel Barclay" <dbarc...@maprtech.com 
<mailto:dbarc...@maprtech.com>> wrote:

What exactly is the purpose of ensureAtLeastOneField() 
org.apache.drill.exec.vector.complex.fn.JsonReader (drill/JsonReader.java at 
fdb6b4fecee30282d8f490e78b7f2dc3a2e27347 · apache/drill 
<https://github.com/apache/drill/blob/fdb6b4fecee30282d8f490e78b7f2dc3a2e27347/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L92>)?

Is it still needed?


Currently, it makes the first field of a map be a nullable-integer field if 
the JSON reader reads no nows. However, it does that /regardless/ of whether 
the first field already exists from earlier readers, causing a later reader and 
ScanBatch to signal a schema change when there wasn't really a schema change.  
This is currently causing breaking in the attempt to fix the 
ScanBatch/IterOutcome problems underlying DRILL-2288.

Example case:
- There are three JSON files.  The first and last to be read have the same 
schema.  The middle file is empty.  They are all read by the same ScanBatch.
- The first JSON reader sets map fields.
- The second JSON reader sees no rows, so its atLeastOneWrite flag isn't 
set, so its ensureAtLeastOneField() thinks it needs to add a field, but 
forcibly sets the first field to be a nullable-int field--/regardless /of 
whether a first field exists, so it changes what the first reader set it to.
- Then, somewhere in and/or downstream of the third reader (with 
in-progress ScanBatch fixes in place), Drill gets incompatible-vector errors 
(mentioning the second reader's NullableIntVector vs. the original reader's 
type for the first field) and/or schema-change-not-supported errors because 
ScanBatch reported OK_NEW_SCHEMA (instead of OK) when the schema didn't really 
change (between the first and third JSON files).


Disabling ensureAtLeastOneField() eliminated the wrong-vector-type or 
unsupported-schema-change errors, and did not cause any new errors in the 
java-exec unit tests. (I haven't checked other tests yet.)

Also, ensureAtLeastOneField() (or next()) has a comment about making sure 
there's a field in order to return a record count, but next() returns the 
record count.

Those two things make me wonder if ensureAtLeastOneField()is now obsolete.

Can it be deleted now?

Or is it the case that it is still needed, but it needs to check whether 
there are already any fields before (currently blindly) creating one?


Thanks,
Daniel

-- 
Daniel Barclay

MapR Technologies




--
Daniel Barclay
MapR Technologies

Re: New Slack setup for Devs and Users

2015-10-05 Thread Daniel Barclay


Is Slack threaded as e-mail is?  (If not, and you can't see messages grouped by 
subject (that is, if it's the case that you can't see the tree of messages for 
a subject without wading through irrelevant messages that occurred at 
intervening times), how is Slack better than using the existing e-mail lists?)

Also, should we really be siphoning discussion away from the public Apache 
Drill mailing lists and into the private Slack enviroment?  (Or did I miss some 
public way to see what's in Slack?)

Daniel



Jacques Nadeau wrote:

Hey Guys,

We've been using Slack a lot internally and have found it very useful. I
setup a new slack for Drill developers and users. I've set it to
automatically accept new users from @apache.org as well as @mapr.com, @
maprtech.com and @dremio.com. I'm more than happy to whitelist other
domains (but slack won't let me enable general domains such as gmail and
yahoo).

If you aren't on one the whitelisted domains, just send me an email and
I'll invite you (or add your domain as appropriate).

Remember, these channels should be used for help as opposed to making
design decisions. My goal is also to post a digest once a week from the
channels back on to the list so that the information is publicly available.

I set up two initial channels: #user and #dev.

Let's see if this makes things easier for everybody.

To Join, go to: https://drillers.slack.com/signup


--
Jacques Nadeau
CTO and Co-Founder, Dremio




--
Daniel Barclay
MapR Technologies

Re: ensureAtLeastOneField: obsolete? needed?

2015-10-04 Thread Daniel Barclay

[Also adding forgotten cc to dev. list]

Jacques,

Jacques Nadeau wrote:

It's purpose is to make sure that a projection that finds no columns still
produces output. Think record {a:4,b:6}

If I say 'select c from t' , the reader (without this functionality) return
zero columns (not supported). As such, ensure at least one adds a column. I
believe it is still needed

In what sense is that not supported? For example, where would returning zero
columns be expected to break?

When I tried disabling ensureAtLeastOneField(), the only thing I've seen break
so far is the empty-schema check in IteratorValidatorBatchIterator. What that
validator check is also disabled, all drill-java-exec unit tests run. (I'm
currently checking later-run tests (drill-jdbc, etc.) and have not yet tried
the regression tests.)

but should probably check if a column is already setup in the output mutation
and, if so, don't re-add the ensure at least one column. Thus probably requires
a new method in output mutator (isEmpty or size)

Yeah; I was wondering how to get that information given that only writing
methods seemed to be available on the interface types used in
ensureAtLeastOneField().

On Oct 3, 2015 7:49 PM, "Daniel Barclay" <dbarc...@maprtech.com
<mailto:dbarc...@maprtech.com>> wrote:

What exactly is the purpose of ensureAtLeastOneField()
org.apache.drill.exec.vector.complex.fn.JsonReader (drill/JsonReader.java at
fdb6b4fecee30282d8f490e78b7f2dc3a2e27347 · apache/drill
<https://github.com/apache/drill/blob/fdb6b4fecee30282d8f490e78b7f2dc3a2e27347/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L92>)?

Is it still needed?

Currently, it makes the first field of a map be a nullable-integer field if
the JSON reader reads no nows. However, it does that /regardless/ of whether
the first field already exists from earlier readers, causing a later reader and
ScanBatch to signal a schema change when there wasn't really a schema change.
This is currently causing breaking in the attempt to fix the
ScanBatch/IterOutcome problems underlying DRILL-2288.

Example case:
- There are three JSON files. The first and last to be read have the same
schema. The middle file is empty. They are all read by the same ScanBatch.
- The first JSON reader sets map fields.
- The second JSON reader sees no rows, so its atLeastOneWrite flag isn't
set, so its ensureAtLeastOneField() thinks it needs to add a field, but
forcibly sets the first field to be a nullable-int field--/regardless /of
whether a first field exists, so it changes what the first reader set it to.
- Then, somewhere in and/or downstream of the third reader (with
in-progress ScanBatch fixes in place), Drill gets incompatible-vector errors
(mentioning the second reader's NullableIntVector vs. the original reader's
type for the first field) and/or schema-change-not-supported errors because
ScanBatch reported OK_NEW_SCHEMA (instead of OK) when the schema didn't really
change (between the first and third JSON files).

Disabling ensureAtLeastOneField() eliminated the wrong-vector-type or
unsupported-schema-change errors, and did not cause any new errors in the
java-exec unit tests. (I haven't checked other tests yet.)

Also, ensureAtLeastOneField() (or next()) has a comment about making sure
there's a field in order to return a record count, but next() returns the
record count.

Those two things make me wonder if ensureAtLeastOneField()is now obsolete.

Can it be deleted now?

Or is it the case that it is still needed, but it needs to check whether
there are already any fields before (currently blindly) creating one?

Thanks,
Daniel

--
Daniel Barclay

MapR Technologies

--
Daniel Barclay
MapR Technologies

[jira] [Created] (DRILL-3885) Column alias "`f.c`" rejected if number of regions is > 1 in HBase unit tests

2015-10-01 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3885:
-

 Summary: Column alias "`f.c`" rejected if number of regions is > 1 
in HBase unit tests
 Key: DRILL-3885
 URL: https://issues.apache.org/jira/browse/DRILL-3885
 Project: Apache Drill
  Issue Type: Bug
    Reporter: Daniel Barclay (Drill)


Drill rejects the column alias {{`f.c`}}, because of its period character, in 
this query:

{noformat}
SELECT
  row_key, convert_from(tableName.f.c, 'UTF8') `f.c`
FROM
  hbase.`TestTable3` tableName
WHERE
  row_key LIKE '08%0' OR row_key LIKE '%70'
{noformat}

in unit test {{TestHBaseFilterPushDown.testFilterPushDownRowKeyLike}} if the 
number of regions used in {{HBaseTestsSuite}} is set to something greater than 
one.

One problem seems to be that the validation check is inconsistent, happening 
only if the data structure containing that alias get serialized and 
deserialized.

The rejection of that alias seems like a problem (at least from the SQL level), 
although it seems that it might be reasonable given some nearby code, 
suggesting that maybe names/expressions/something aren't encoded enough to 
handle name segments with periods. 

The exception stack trace is:
{noformat}
org.apache.drill.exec.rpc.RpcException: 
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
UnsupportedOperationException: Field references must be singular names.

Fragment 1:1

[Error Id: 34475f52-6f22-43be-9011-c31a84469781 on dev-linux2:31010]
at 
org.apache.drill.exec.rpc.RpcException.mapException(RpcException.java:60)
at 
org.apache.drill.exec.client.DrillClient$ListHoldingResultsListener.getResults(DrillClient.java:386)
at 
org.apache.drill.exec.client.DrillClient.runQuery(DrillClient.java:291)
at 
org.apache.drill.BaseTestQuery.testRunAndReturn(BaseTestQuery.java:292)
at 
org.apache.drill.BaseTestQuery.testSqlWithResults(BaseTestQuery.java:279)
at 
org.apache.drill.hbase.BaseHBaseTest.runHBaseSQLlWithResults(BaseHBaseTest.java:86)
at 
org.apache.drill.hbase.BaseHBaseTest.runHBaseSQLVerifyCount(BaseHBaseTest.java:90)
at 
org.apache.drill.hbase.TestHBaseFilterPushDown.testFilterPushDownRowKeyLike(TestHBaseFilterPushDown.java:466)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.lang.reflect.Method.invoke(Method.java:606)
Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
ERROR: UnsupportedOperationException: Field references must be singular names.

Fragment 1:1

[Error Id: 34475f52-6f22-43be-9011-c31a84469781 on dev-linux2:31010]
at 
org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118)
at 
org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:110)
at 
org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47)
at 
org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:1)
at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61)
at 
org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233)
at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:1)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteCha

[jira] [Created] (DRILL-3863) TestBuilder.baseLineColumns(...) doesn't take net strings; parses somehow--can't test some names

2015-09-29 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3863:
-

 Summary: TestBuilder.baseLineColumns(...) doesn't take net 
strings; parses somehow--can't test some names
 Key: DRILL-3863
 URL: https://issues.apache.org/jira/browse/DRILL-3863
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build & Test
Reporter: Daniel Barclay (Drill)
Assignee: Jason Altekruse


{{TestBuilder}}'s {{baseLineColumns(String...)}} method doesn't take the given 
strings as net column names, and instead tries to parse them somehow, but 
doesn't parse them as the SQL parser would (and that method's Javadoc 
documentation doesn't seem to say how the strings are parsed/interpreted or 
indicate any third way of specifying arbitrary net column names).

That means that certain column names _cannot be checked_ for (cannot be used in 
the result set being checked).

For example, in Drill, the SQL delimited identifier  "{{`Column B`}}"  
specifies a net column name of "{{Column B}}".  However, passing that net 
column name (that is, a {{String}} representing that net column name) to 
{{baseLineColumns}} results in a strange parsing error.  (See Test Class 1 and 
the error in Failure Trace 1.)

Checking whether {{baseLineColumns}} takes SQL-level syntax for column names 
rather than net column names (by passing a string including the back-quote 
characters of the delimited identifier) seems to indicate that 
{{baseLineColumns}} doesn't take that syntax that either.  (See Test Class 2 
and the three expected/returned records in Failure Trace 2.)

That seems to mean that it's impossible to use {{baseLineColumns}} to validate 
certain column names (including the fairly simple/common case of alias names 
containing spaces for output formatting purposes).


Test Class 1:
{noformat}
import org.junit.Test;

public class TestTEMPFileNameBugs extends BaseTestQuery {

  @Test
  public void test1() throws Exception {
testBuilder()
.sqlQuery( "SELECT * FROM ( VALUES (1, 2) ) AS T(column_a, `Column B`)" )
.unOrdered()
.baselineColumns("column_a", "Column B")
.baselineValues(1, 2)
.go();
  }
}
{noformat}

Failure Trace 1:
{noformat}
org.apache.drill.common.exceptions.ExpressionParsingException: Expression has 
syntax error! line 1:0:no viable alternative at input 'Column'
at 
org.apache.drill.common.expression.parser.ExprParser.displayRecognitionError(ExprParser.java:169)
at org.antlr.runtime.BaseRecognizer.reportError(BaseRecognizer.java:186)
at 
org.apache.drill.common.expression.parser.ExprParser.lookup(ExprParser.java:5163)
at 
org.apache.drill.common.expression.parser.ExprParser.atom(ExprParser.java:4370)
at 
org.apache.drill.common.expression.parser.ExprParser.unaryExpr(ExprParser.java:4252)
at 
org.apache.drill.common.expression.parser.ExprParser.xorExpr(ExprParser.java:3954)
at 
org.apache.drill.common.expression.parser.ExprParser.mulExpr(ExprParser.java:3821)
at 
org.apache.drill.common.expression.parser.ExprParser.addExpr(ExprParser.java:3689)
at 
org.apache.drill.common.expression.parser.ExprParser.relExpr(ExprParser.java:3564)
at 
org.apache.drill.common.expression.parser.ExprParser.equExpr(ExprParser.java:3436)
at 
org.apache.drill.common.expression.parser.ExprParser.andExpr(ExprParser.java:3310)
at 
org.apache.drill.common.expression.parser.ExprParser.orExpr(ExprParser.java:3185)
at 
org.apache.drill.common.expression.parser.ExprParser.condExpr(ExprParser.java:3110)
at 
org.apache.drill.common.expression.parser.ExprParser.expression(ExprParser.java:3041)
at 
org.apache.drill.common.expression.parser.ExprParser.parse(ExprParser.java:206)
at org.apache.drill.TestBuilder.parsePath(TestBuilder.java:202)
at org.apache.drill.TestBuilder.baselineColumns(TestBuilder.java:333)
at 
org.apache.drill.TestTEMPFileNameBugs.test1(TestTEMPFileNameBugs.java:30)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.lang.reflect.Method.invoke(Method.java:606)
{noformat}

Test Class 2:
{noformat}
import org.junit.Test;

public class TestTEMPFileNameBugs extends BaseTestQuery {

  @Test
  public void test1() throws Exception {
testBuilder()
.sqlQuery( "SELECT * FROM ( VALUES (1, 2) ) AS T(column_a, `Column B`)" )
.unOrdered()
.baselineColumns("column_a", "`Column B`")
.baselineValues(1, 2)
.go();
  }

}
{noformat}

Failure Trace 2:
{noformat}

java.lang.Exception: After matching 0 records, did not find expected record in 
result set: `Column B` : 2, `column_a` : 1, 


Some examples of ex

[jira] [Created] (DRILL-3859) Delimited identifier `*` breaks in aliases list--causes AssertionError saying "INTEGER"

2015-09-29 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3859:
-

 Summary: Delimited identifier `*` breaks in aliases list--causes 
AssertionError saying "INTEGER"
 Key: DRILL-3859
 URL: https://issues.apache.org/jira/browse/DRILL-3859
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


When a delimited identifier whose body consists of a single asterisk 
("{{`*`}}") is used in a subquery aliases list and the containing query's 
select list refers to a non-existent column, Drill throws an assertion error 
(and its message says only "INTEGER").

For example, see the third query and its error message in the following:

{noformat}
0: jdbc:drill:zk=local> SELECT * FROM (VALUES (0, 0)) AS T(A, `*`);
+++
| A  | *  |
+++
| 0  | 0  |
+++
1 row selected (0.143 seconds)
0: jdbc:drill:zk=local> SELECT a FROM (VALUES (0, 0)) AS T(A, `*`);
++
| a  |
++
| 0  |
++
1 row selected (0.127 seconds)
0: jdbc:drill:zk=local> SELECT b FROM (VALUES (0, 0)) AS T(A, `*`);
Error: SYSTEM ERROR: AssertionError: INTEGER


[Error Id: 859d3ef9-b1e7-497b-b366-b64b2b592b69 on dev-linux2:31010] 
(state=,code=0)
0: jdbc:drill:zk=local>
{noformat}

It's not clear that the problem is in the SQL parser area (because another bug 
with {{`*`}} that _acts_ the same as a hypothetical parser problem strongly 
seems to be downstream of the parser).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3861) Apparent uncontrolled format string error in table name error reporting

2015-09-29 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3861:
-

 Summary: Apparent uncontrolled format string error in table name 
error reporting
 Key: DRILL-3861
 URL: https://issues.apache.org/jira/browse/DRILL-3861
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Reporter: Daniel Barclay (Drill)


It seems that a data string is being used as a printf format string.

In the following, note the percent character in name of the table file (which 
does not exist, apparently trying to cause an expected no-such-table error) and 
that the actual error mentions format conversion characters:

{noformat}
0: jdbc:drill:zk=local> select * from `test%percent.json`;
Sep 29, 2015 2:59:37 PM org.apache.calcite.sql.validate.SqlValidatorException 

SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 
'test%percent.json' not found
Sep 29, 2015 2:59:37 PM org.apache.calcite.runtime.CalciteException 
SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 
15 to line 1, column 33: Table 'test%percent.json' not found
Error: SYSTEM ERROR: UnknownFormatConversionException: Conversion = 'p'


[Error Id: 8025e561-6ba1-4045-bbaa-a96cafc7f719 on dev-linux2:31010] 
(state=,code=0)
0: jdbc:drill:zk=local> 
{noformat}

(Selecting SQL Parser component because I _think_ table/file existing is 
checked in validation called in or near the parsing step.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

"../" in file pathnames - intend to block or not?

2015-09-29 Thread Daniel Barclay


In file/directory pathnames for tables, does Drill intend to block use of "../" 
that traverses up beyond the root of the workspace (i.e., above /tmp for (default) 
dfs.tmp)?

Daniel

--
Daniel Barclay
MapR Technologies

[jira] [Created] (DRILL-3864) TestBuilder "Unexpected column" message doesn't show records

2015-09-29 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3864:
-

 Summary: TestBuilder "Unexpected column" message doesn't show 
records
 Key: DRILL-3864
 URL: https://issues.apache.org/jira/browse/DRILL-3864
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build & Test
Reporter: Daniel Barclay (Drill)
Assignee: Jason Altekruse


When {{TestBuilder}} reports that the actual result set contains an unexpected 
column, it doesn't show any whole expected record (as 
it shows some expected record and some actual records for the "did not find 
expected record in result set" case).

Showing a couple of whole expected records, rather than just reporting the 
unexpected column name(s), would speed up diagnosis of test failures.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3860) Delimited identifier `*` breaks in select list--acts like plain asterisk token

2015-09-29 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3860:
-

 Summary: Delimited identifier `*` breaks in select list--acts like 
plain asterisk token
 Key: DRILL-3860
 URL: https://issues.apache.org/jira/browse/DRILL-3860
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


At least when it appears in a SELECT list, a delimited identifier whose body 
consists of a single asterisk ("{{`*`}}") is not treated consistently with 
other delimited identifiers (that is, specifying a column whose name matches 
the body ("{{*}}").)

For example, in the following, notice how in the first two queries, each select 
list delimited identifier selects the one expected column, but in the third 
query, instead of selecting the one expected column, it selected all columns 
(list the regular "{{*}}" in the fourth query):

{noformat}
0: jdbc:drill:zk=local> SELECT `a` FROM (VALUES (1, 2, 3)) AS T(a, `.`, `*`);
++
| a  |
++
| 1  |
++
1 row selected (0.132 seconds)
0: jdbc:drill:zk=local> SELECT `.` FROM (VALUES (1, 2, 3)) AS T(a, `.`, `*`);
++
| .  |
++
| 2  |
++
1 row selected (0.152 seconds)
0: jdbc:drill:zk=local> SELECT `*` FROM (VALUES (1, 2, 3)) AS T(a, `.`, `*`);
++++
| a  | .  | *  |
++++
| 1  | 2  | 3  |
++++
1 row selected (0.136 seconds)
0: jdbc:drill:zk=local> SELECT * FROM (VALUES (1, 2, 3)) AS T(a, `.`, `*`);
++++
| a  | .  | *  |
++++
| 1  | 2  | 3  |
++++
1 row selected (0.128 seconds)
0: jdbc:drill:zk=local> 
{noformat}

Although this acts the same as if the SQL parser treated the delimited 
identifier {{`*`}} as a plain asterisk token, that does not seem to be the 
actual mechanism for this behavior.  (The problem seems to be further 
downstream.)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: "../" in file pathnames - intend to block or not?

2015-09-29 Thread Daniel Barclay


Jason Altekruse wrote:

Yes, we want workspaces to be able to used in conjunction with
authentication to provide limited views of data to some users. Is this
currently not being enforced?


I'm not sure what would be enforced with authentication/impersonation
turned on (especially, whether access is checked after all pathname
resolution is done or is checked too early).

I was just running in regular (no-impersonation) local mode and noticed
that using "../" in a pathname can get to directories outside the
workspace's root.

Is that behavior expected or is that a bug?

(Part of, or another way to ask, my question is whether we:
- only intend the workspace to be a like a default working directory
  (where you usually give downward-only relative names to files in its
  subtree, but might occasionally reach out of the subtree), or
- intend the workspace to be more restricted.)


Daniel




On Tue, Sep 29, 2015 at 3:49 PM, Daniel Barclay <dbarc...@maprtech.com>
wrote:


In file/directory pathnames for tables, does Drill intend to block use of
"../" that traverses up beyond the root of the workspace (i.e., above /tmp
for (default) dfs.tmp)?

Daniel

--
Daniel Barclay
MapR Technologies







--
Daniel Barclay
MapR Technologies

[jira] [Created] (DRILL-3848) Increase timeout time on several tests that time out frequently.

2015-09-28 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3848:
-

 Summary: Increase timeout time on several tests that time out 
frequently.
 Key: DRILL-3848
 URL: https://issues.apache.org/jira/browse/DRILL-3848
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)
Assignee: Daniel Barclay (Drill)


Increase test timeout time a bit on: 
- TestTpchDistributedConcurrent
- TestExampleQueries
- TestFunctionsQuery




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

How get pull req. linked (auto-updating) to two JIRA issues?

2015-09-27 Thread Daniel Barclay


For dealing with a branch addressing two (interrelated) JIRA issues, have we 
figured out what to do so that the GitHub pull request is linked to both JIRA 
issues (i.e., it auto-updates both)?

(Some tries at including both DRILL- keys in the PR description failed to 
link to JIRA reports.  Does anyone know exactly where GitHub recognizes JIRA 
keys?)

Thanks,
Daniel

--
Daniel Barclay
MapR Technologies

[jira] [Created] (DRILL-3814) Directory containing only unrecognized files reported as not found vs. taken as empty table

2015-09-21 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3814:
-

 Summary: Directory containing only unrecognized files reported as 
not found vs. taken as empty table
 Key: DRILL-3814
 URL: https://issues.apache.org/jira/browse/DRILL-3814
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser, Storage - Other
Reporter: Daniel Barclay (Drill)
Assignee: Aman Sinha


A directory subtree all of whose descendent files have unrecognized extensions 
is reported as non-existent rather treated as a table with zero rows.

Is this intended? 

(The error message is the exact same error message that results if the user 
gets a directory name wrong and refers to a non-existent directory, making the 
message really confusing and misleading.)

For example, for directory {{/tmp/unrecognized_files_directory}} containing 
only file {{/tmp/unrecognized_files_directory/junk.junk}}:

{noformat}
0: jdbc:drill:zk=local> SELECT * FROM 
`dfs`.`tmp`.`unrecognized_files_directory`;
Sep 20, 2015 11:16:34 PM org.apache.calcite.sql.validate.SqlValidatorException 

SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 
'dfs.tmp.unrecognized_files_directory' not found
Sep 20, 2015 11:16:34 PM org.apache.calcite.runtime.CalciteException 
SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 
15 to line 1, column 19: Table 'dfs.tmp.unrecognized_files_directory' not found
Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 19: Table 
'dfs.tmp.unrecognized_files_directory' not found


[Error Id: 0ce9ba05-7f62-4063-a2c0-7d2b4f1f7967 on dev-linux2:31010] 
(state=,code=0)
0: jdbc:drill:zk=local> 
{noformat}

Notice how that is the same message as for a non-existent directory:

{noformat}
0: jdbc:drill:zk=local> SELECT * FROM `dfs`.`tmp`.`no_such_directory`;
Sep 20, 2015 11:17:12 PM org.apache.calcite.sql.validate.SqlValidatorException 

SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 
'dfs.tmp.no_such_directory' not found
Sep 20, 2015 11:17:12 PM org.apache.calcite.runtime.CalciteException 
SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 
15 to line 1, column 19: Table 'dfs.tmp.no_such_directory' not found
Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 19: Table 
'dfs.tmp.no_such_directory' not found


[Error Id: 49f423f1-5dfe-4435-8b72-78e0b80e on dev-linux2:31010] 
(state=,code=0)
0: jdbc:drill:zk=local> 
{noformat}






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Resolving ScanBatch.next() behavior for 0-row readers; handling of NONE, OK_NEW_SCHEMA

2015-09-21 Thread Daniel Barclay


Jacques Nadeau wrote:

I think we should start with a much simpler discussion:

If I query an empty file, what should we return?


One thing to clarify is the difference between empty files (e.g., a
zero-byte .json file) and other zero-row files (e.g., a non-empty file
that is a Parquet file that represents no rows but still carries a
schema).

(We might also need to distinguish between types of empty files (e.g.,
.json files, where without any row objects we know nothing about the
schema, vs. .csv files, where we can know that there's one logical
column of type VARCHAR ARRAY, even though we don't know the length).)


Here's one possible view of the JSON-style case of empty files:

An empty file implies no rows of data and implies no columns in the
schema.  (I don't mean that it implies that there are no columns; I
just mean that it does not imply any columns.)  It also most probably
does not imply the absence of any columns either--so it never
conflicts with another schema.

That non-implication of columns (i.e., the schema) means that:
- taking that file by itself as a table yields no rows and an empty
  schema, and
- taking that files as part of taking a subtree of files as a table
  means that the empty file never causes a conflict with the schema
  from other files in that subtree.

(The only thing that the empty file would imply would be the type (file
name extension) of other files when an ancestor directory is taken as a
table.  (That's assuming we don't allow mixing, say, JSON and CSV files
in the same subtree.))


Daniel





--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Fri, Sep 18, 2015 at 3:26 PM, Daniel Barclay <dbarc...@maprtech.com>
wrote:


What sequence of RecordBatch.IterOutcome<
https://github.com/dsbos/incubator-drill/blob/bugs/drill-3641/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java#L106

<

https://github.com/dsbos/incubator-drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java#L41>
values should ScanBatch's next() return for a reader (file/etc.) that has
zero rows of data, and what does that sequence depend on (e.g., whether
there's still a non-empty schema even though there are no rows, whether
there other files in the scan)?  [See other questions at bottom.]


I'm trying to resolve this question to fix DRILL-2288 <
https://issues.apache.org/jira/browse/DRILL-2288>. Its initial symptom
was that INFORMATION_SCHEMA queries that return zero rows because of
pushed-down filtering yielded results that have zero columns instead of the
expected columns.  An additional symptom was that "SELECT A, B, *" from an
empty JSON file yielded zero columns instead of the expected columns A and
B (with zero rows).

The immediate cause of the problem (the missing schema information) was
how ScanBatch.next() handled readers that returned no rows:

If a reader has no rows at all, then the first call to its next() method
(from ScanBatch.next()) returns zero (indicating that there are no more
rows, and, in this case, no rows at all), and ScanBatch.next()'s call to
the reader's mutator's isNewSchema() returns true, indicating that the
reader has a schema that ScanBatch has not yet processed (e.g., notified
its caller about).

The way ScanBatch.next()'s code checked those conditions, when the last
reader had no rows at all, ScanBatch.next() returned IterOutcome.NONE.

However, when that /last /reader was the /only /reader, that returning of
IterOutcome.NONE for a no-rows reader by ScanBatch.next() meant that next()
never returned IterOutcome.OK_NEW_SCHEMA for that ScanBatch.

That immediate return of NONE in turn meant that the downstream operator
_never received a return value of __OK_NEW_SCHEMA__to trigger its schema
processing_.  (For example, in the DRILL-2288 JSON case, the project
operator never constructed its own schema containing columns A and B plus
whatever columns (none) came from the empty JSON file; in DRILL-2288 's
other case, the caller never propagated the statically known columns from
the INFORMATION_SCHEMA table.)

That returning of NONE without ever returning OK_NEW_SCHEMA also violates
the (apparent) intended call/return protocol (sequence of IterOutcome
values) for RecordBatch.next(). (See the draft Javadoc comments currently
at RecordBatch.IterOutcome <
https://github.com/dsbos/incubator-drill/blob/bugs/drill-3641/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java#L106

.)



Therefore, it seems that ScanBatch.next() _must_ return OK_NEW_SCHEMA
before returning NONE, instead of immediately returning NONE, for
readers/files with zero rows for at least _some_ cases.  (It must both
notify the downstream caller that there is a schema /and/ give the caller a
chance to read the schema (which is allowed after OK_NEW_SCHEMA is returned
but not after NONE).)

However, it is not clear exactly what that set of cases is.  (It does not
seem to be _all_ zero-row cas

[jira] [Created] (DRILL-3816) weird file-extension recognition behavior in directory subtree scanning

2015-09-21 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3816:
-

 Summary: weird file-extension recognition behavior in directory 
subtree scanning
 Key: DRILL-3816
 URL: https://issues.apache.org/jira/browse/DRILL-3816
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Other
Reporter: Daniel Barclay (Drill)
Assignee: Jacques Nadeau


In scanning of directory subtrees for files, recognition of known vs. unknown 
file extensions seems really screwy (not following any apparent pattern). 

For example:
- a suffix of {{.jsxon_not}}, as expected, is not recognized as a JSON file
- a suffix of {{.jsoxn_not}} unexpectedly _is_ taken as JSON
- a suffix of .{{jsonx_not}}, as expected, is not recognized as a JSON file

(Creating a directory containing only a non-empty JSON file ending with 
{{.json}} and another non-empty JSON file ending with one of the above suffixes 
sometimes reads both JSON files and sometimes reports a (presumably) expected 
error because of the mixed file extensions).)

The result sometimes seems to also depend on the rest of the filename, 
presumably related to the order of listing of files.  (It's not clear if it 
depends only on the order after filename sorting, or also depends on the order 
file names are listed the by OS.)

Here are more data points (using a JSON file named {{voter1.json}}): 

- with {{voter2.xjson_not}} - read, as JSON
- with {{voter2.jxson_not}} - read, as JSON
- with {{voter2.jsxon_not}} - causes expected error
- with {{voter2.jsoxn_not}} - read, as JSON
- with {{voter2.jsonx_not}} - causes expected error
- with {{voter2.json_xnot}} - read, as JSON
- with {{voter2.json_nxot}} - read, as JSON
- with {{voter2.json_noxt}} - read, as JSON
- with {{voter2.json_notx}} - read, as JSON
- with {{voter2.jsonxnot}}  - read, as JSON
- with {{voter2.jsonxot}}   - read, as JSON
- with {{voter2.jsoxot}}- causes expected error
- with {{voter2.jxsxoxn}}   - read, as JSON
- with {{voter2.xjxsxoxn}}  - read, as JSON
- with {{voter2.xjxsxoxnx}} - causes expected error
- with {{voter2.xjxxoxn}}   - read, as JSON
- with {{voter2.xjxxxn}- read, as JSON
- with {{voter2.n} - read, as JSON
- with {{voter2.}  - read, as JSON
- with {{voter2.xxx}}   - read, as JSON
- with {{voter2.xx}}- read, as JSON
- with {{voter2.x}} - read, as JSON
- with {{voter2.}}  - causes expected error
- with {{voter2.x - read, as JSON
- with {{voter2.xx- read, as JSON






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3815) unknown suffixes .not_json and .json_not treated differently (multi-file case)

2015-09-21 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3815:
-

 Summary: unknown suffixes .not_json and .json_not treated 
differently (multi-file case)
 Key: DRILL-3815
 URL: https://issues.apache.org/jira/browse/DRILL-3815
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Other
Reporter: Daniel Barclay (Drill)
Assignee: Jacques Nadeau


In scanning a directory subtree used as a table, unknown filename extensions 
seem to be treated differently depending on whether they're similar to known 
file extensions.  The behavior suggests that Drill checks whether a file name 
_contains_ an extension's string rather than _ending_ with it. 

For example, given these subtrees with almost identical leaf file names:

{noformat}
$ find /tmp/testext_xx_json/
/tmp/testext_xx_json/
/tmp/testext_xx_json/voter2.not_json
/tmp/testext_xx_json/voter1.json
$ find /tmp/testext_json_xx/
/tmp/testext_json_xx/
/tmp/testext_json_xx/voter1.json
/tmp/testext_json_xx/voter2.json_not
$ 
{noformat}

the results of trying to use them as tables differs:

{noformat}
0: jdbc:drill:zk=local> SELECT *   FROM `dfs.tmp`.`testext_xx_json`;
Sep 21, 2015 11:41:50 AM org.apache.calcite.sql.validate.SqlValidatorException 

...
Error: VALIDATION ERROR: From line 1, column 17 to line 1, column 25: Table 
'dfs.tmp.testext_xx_json' not found


[Error Id: 6fe41deb-0e39-43f6-beca-de27b39d276b on dev-linux2:31010] 
(state=,code=0)
0: jdbc:drill:zk=local> SELECT *   FROM `dfs.tmp`.`testext_json_xx`;
+---+
| onecf |
+---+
| {"name":"someName1"}  |
| {"name":"someName2"}  |
+---+
2 rows selected (0.149 seconds)
{noformat}

(Other probing seems to indicate that there is also some sensitivity to whether 
the extension contains an underscore character.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3812) message for invalid compound name doesn't identify part that's bad

2015-09-20 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3812:
-

 Summary: message for invalid compound name doesn't identify part 
that's bad
 Key: DRILL-3812
 URL: https://issues.apache.org/jira/browse/DRILL-3812
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Reporter: Daniel Barclay (Drill)
Assignee: Aman Sinha


When a compound name (e.g., {{schema.subschema.table}}) is invalid, the error 
message doesn't where it went bad (e.g., which part referred to something 
unknown and/or non-existent).  For example, see the query and the "VALIDATION 
ERROR ..." line in the following:

{noformat}
0: jdbc:drill:zk=local> SELECT * FROM `dfs.NoSuchSchema`.`empty_directory`;
Sep 20, 2015 10:38:24 PM org.apache.calcite.sql.validate.SqlValidatorException 

SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 
'dfs.NoSuchSchema.empty_directory' not found
Sep 20, 2015 10:38:24 PM org.apache.calcite.runtime.CalciteException 
SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 
15 to line 1, column 32: Table 'dfs.NoSuchSchema.empty_directory' not found
Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 32: Table 
'dfs.NoSuchSchema.empty_directory' not found


[Error Id: 2a298c8e-2923-4744-8f78-b0cf36c83799 on dev-linux2:31010] 
(state=,code=0)
{noformat}

A better error message would say that {{dfs.NoSuchSchema}} was not found (or 
that no {{NoSuchSchema}} was found in schema {{dfs}}).





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Resolving ScanBatch.next() behavior for 0-row readers; handling of NONE, OK_NEW_SCHEMA

2015-09-18 Thread Daniel Barclay

ould be checking the number of 
rows too, or OK_NEW_SCHEMA shouldn't be returned in as many subcases of the 
no-rows last-reader/file case.


So, some open and potential questions seem to be:

1. Is it the case that a) any batch's next() should return OK_NEW_SCHEMA before 
it returns NONE, and callers/downstream batches should be able to count on 
getting OK_NEW_SCHEMA (e.g., to trigger setting up their downstream schemas), 
or that b) empty files can cause next() to return NONE without ever returning 
OK_NEW_SCHEMA , and therefore all downstream batch classes must handle getting 
NONE before they have set up their schemas?
2. For a file/source kind that has a schema even when there are no rows, should 
getting an empty file constitute a schema change?  (On one hand there are no 
actual /rows/ (following the new schema) conflicting with any previous schema 
(and maybe rows), but on the other hand there is a non-empty /schema /that can 
conflict when that's enough to matter.)
3. For a file/source kind that implies a schema only when there are rows (e.g., 
JSON), when should or shouldn't that be considered a schema change?  If 
ScanBatch reads non-empty JSON file A, reads empty JSON file B, and reads 
non-empty JSON file C implying the same schema as A did, should that be 
considered to not be schema change or not?  (When reading no-/empty-schema B, 
should ScanBatch the keep the schema from A and check against that when it gets 
to C, effectively ignoring the existence of B completely?)
4. In ScanBatch.next(), when the last reader had no rows at all, when should 
next() return OK_NEW_SCHEMA? always? /iff/ the reader has a non-empty schema?  
just enough to never return NONE before returning OK_NEW_SCHEMA (which means it 
acts differently for otherwise-identical empty files, depending on what 
happened with previous readers)?  as in that last case except only if the 
reader has a non-empty schema?

Thanks,
Daniel

--
Daniel Barclay
MapR Technologies

Re: Is anyone having issues with the jdbc unit tests (ITTestShadedJar)?

2015-09-16 Thread Daniel Barclay


Aman Sinha wrote:

Here's a different issue I encountered running unit tests on a clean master
branch on my Mac:

Running org.apache.drill.exec.store.jdbc.TestJdbcPlugin


But that's the JDBC storage plug-in, not the Drill JDBC driver and
its tests.

Daniel




  ...
  ...
Tue Sep 15 20:01:36 PDT 2015 : Could not listen on port 2 on host
127.0.0.1:
  java.net.BindException: Address already in use

It has to do with starting up derby:

storage-jdbc asinha$ cat derby.log

Tue Sep 15 22:13:01 PDT 2015 : Apache Derby Network Server - 10.11.1.1 -
(1616546) started and ready to accept connections on port 1527
Tue Sep 15 22:13:01 PDT 2015 : Could not listen on port 2 on host
127.0.0.1:
  java.net.BindException: Address already in use
An exception was thrown during network server startup.
DRDA_ListenPort.S:Could not listen on port 2 on host 127.0.0.1:
  java.net.BindException: Address already in use


On Tue, Sep 15, 2015 at 9:37 PM, Chris Westin <chriswesti...@gmail.com>
wrote:


Variability: for me so far 2 out of 2 times.

No stack trace, but as above, when I try to reproduce it in an IDE
"This seems to be because the test is getting an ExceptionInInitializer in
DrillbitClassLoader because the app.class.path property isn't set (and then
the resulting String.length() on its value throws an NPE)."

I don't see app.class.path set anywhere in any pom.xml (so it's not getting
set when I copy the surefire arguments into the IDE's launch configuration
for the test, either).


On Tue, Sep 15, 2015 at 9:09 PM, Jacques Nadeau <jacq...@dremio.com>
wrote:


It was tested on a clean machine a number of times. Any thoughts on the
variability? Can you provide stack trace?
On Sep 15, 2015 6:28 PM, "Sudheesh Katkam" <skat...@maprtech.com> wrote:


Yes, I see this issue too.


On Sep 15, 2015, at 5:53 PM, Chris Westin <chriswesti...@gmail.com>

wrote:


This seems to be because the test is getting an

ExceptionInInitializer

in

DrillbitClassLoader because the app.class.path property isn't set

(and

then

the resulting String.length() on its value throws an NPE).

Bueller?

On Tue, Sep 15, 2015 at 5:20 PM, Chris Westin <

chriswesti...@gmail.com



wrote:


I just rebased, and twice in a row I've gotten wedged running
org.apache.drill.jdbc.ITTestShadedJar















--
Daniel Barclay
MapR Technologies

Re: Is anyone having issues with the jdbc unit tests (ITTestShadedJar)?

2015-09-16 Thread Daniel Barclay

> The level complexity of the build process to get a test to correctly test
> the right thing means jumping through a bunch of hoops to clear the
> classpath and then use a special classloader.

Hey, do we know if using a Maven integration test would make testing
the JDBC-all Jar file easier?

Presumably, integration testing supports having different dependencies
and classpaths for server that's started and stopped once vs. the set
of client tests.

If that's accurate, then, when we start moving some tests to be
Maven integration tests, maybe this JDBC-all Jar test could be simplified
a lot.

Daniel

Jacques Nadeau wrote:

Ah, you're focused on testing from within the IDE?

The level complexity of the build process to get a test to correctly test
the right thing means jumping through a bunch of hoops to clear the
classpath and then use a special classloader. I can't imagine that you
could get it to run correctly in an ide. For example, Eclipse is very
sloppy about keeping classpaths perfect versus what is declared in the pom
file.

The parameter you're looking for is generated by the ant plugin simply
because that appears the way to get the value into an environment variable
so that the inner classloader can load the drillbit for the test.

The test: loads a drillbit in one classloader using the alternative
classpath provided by the app.class.path variable. This is taken from what
would have typically been the jvm level classpath. We then clear the jvm
classpath to only include the test class, Junit and  hamcrest. After the
drillbit is initialized and we've run one query,  we then add the jdbc all
jar to the system classloader and open a connection to the drillbit and
execute a query. The test is designed specifically to confirm that the
requisite classes are correctly included in jdbc-all and that it will run
correctly. The test can't run without the shaded jar being generated and I
can't imagine any of the of the ides have good enough understanding of the
various maven plugins and options used that they would correctly work. Even
if you found some changes that made the test execute in and ide, I can't
imagine that it would correctly manage all the classpath stuff.
On Sep 15, 2015 9:37 PM, "Chris Westin" <chriswesti...@gmail.com> wrote:

Variability: for me so far 2 out of 2 times.

No stack trace, but as above, when I try to reproduce it in an IDE
"This seems to be because the test is getting an ExceptionInInitializer in
DrillbitClassLoader because the app.class.path property isn't set (and then
the resulting String.length() on its value throws an NPE)."

I don't see app.class.path set anywhere in any pom.xml (so it's not getting
set when I copy the surefire arguments into the IDE's launch configuration
for the test, either).

On Tue, Sep 15, 2015 at 9:09 PM, Jacques Nadeau <jacq...@dremio.com>
wrote:

It was tested on a clean machine a number of times. Any thoughts on the
variability? Can you provide stack trace?
On Sep 15, 2015 6:28 PM, "Sudheesh Katkam" <skat...@maprtech.com> wrote:

Yes, I see this issue too.

On Sep 15, 2015, at 5:53 PM, Chris Westin <chriswesti...@gmail.com>

wrote:

This seems to be because the test is getting an

ExceptionInInitializer

in

DrillbitClassLoader because the app.class.path property isn't set

(and

then

the resulting String.length() on its value throws an NPE).

Bueller?

On Tue, Sep 15, 2015 at 5:20 PM, Chris Westin <

chriswesti...@gmail.com

wrote:

I just rebased, and twice in a row I've gotten wedged running
org.apache.drill.jdbc.ITTestShadedJar

--
Daniel Barclay
MapR Technologies

filesystem pathnames or (file) URI references?

2015-09-16 Thread Daniel Barclay


For the file system plug-in, are Drill table name identifiers supposed
to be taken as filesystem pathnames or as URI references?  (Or is it
sometimes one and sometimes the other, and, if so, when one and when
the other?)

For example, would the delimited identifier `10%20%30` refer to a file
with simple name "10%20%30" or file with simple name "10 0"?  (Or, the
other way around, to refer to a file whose simple name is "23:59:59",
can one use simply `23:59:59` or must one use `./23:59:59`?)


I ask because I see that a number of tests use the "file:" URI for
a file instead of the filesystem pathname for the file.


Daniel
--
Daniel Barclay
MapR Technologies

[jira] [Resolved] (DRILL-3658) Missing org.apache.hadoop in the JDBC jar

2015-09-15 Thread Daniel Barclay (Drill) (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) resolved DRILL-3658.
---
Resolution: Fixed

> Missing org.apache.hadoop in the JDBC jar
> -
>
> Key: DRILL-3658
> URL: https://issues.apache.org/jira/browse/DRILL-3658
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Piotr Sokólski
>    Assignee: Daniel Barclay (Drill)
>Priority: Blocker
> Fix For: 1.2.0
>
>
> java.lang.ClassNotFoundException: local.org.apache.hadoop.io.Text is thrown 
> while trying to access a text field from a result set returned from Drill 
> while using the drill-jdbc-all.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2482) JDBC : calling getObject when the actual column type is 'NVARCHAR' results in NoClassDefFoundError

2015-09-15 Thread Daniel Barclay (Drill) (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) resolved DRILL-2482.
---
Resolution: Fixed

> JDBC : calling getObject when the actual column type is 'NVARCHAR' results in 
> NoClassDefFoundError
> --
>
> Key: DRILL-2482
> URL: https://issues.apache.org/jira/browse/DRILL-2482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Rahul Challapalli
>    Assignee: Daniel Barclay (Drill)
>Priority: Blocker
> Fix For: 1.2.0
>
>
> git.commit.id.abbrev=7b4c887
> I tried to call getObject(i) on a column which is of type varchar, drill 
> failed with the below error :
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/io/Text
>   at 
> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject(VarCharVector.java:407)
>   at 
> org.apache.drill.exec.vector.NullableVarCharVector$Accessor.getObject(NullableVarCharVector.java:386)
>   at 
> org.apache.drill.exec.vector.accessor.NullableVarCharAccessor.getObject(NullableVarCharAccessor.java:98)
>   at 
> org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:137)
>   at 
> org.apache.drill.jdbc.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:136)
>   at 
> net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
>   at Dummy.testComplexQuery(Dummy.java:94)
>   at Dummy.main(Dummy.java:30)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Text
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   ... 8 more
> {code}
> When the underlying type is a primitive, the getObject call succeeds



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: anyone seen these errors on master ?

2015-09-15 Thread Daniel Barclay

im Deneche

Software Engineer

   <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<



http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available








--

Abdelhakim Deneche

Software Engineer

   <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<



http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available








--

Abdelhakim Deneche

Software Engineer

   <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<



http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available







--

Abdelhakim Deneche

Software Engineer

   <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<


http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available













--
Daniel Barclay
MapR Technologies

Drill data types correspondence table

2015-09-14 Thread Daniel Barclay


For those who didn't see earlier drafts, here's a spreadsheet showing the 
correspondence between various kinds/levels of types in Drill (e.g., SQL data 
type names vs. Calcite Java enumerations vs. RPC-level Protobuf enumerations):

Drill Data Types Correspondence Table 
<https://docs.google.com/spreadsheets/d/1NwfYScdOjKqM4Ow5tVVtFa0nNpd70x1Ew-hzS3bgyJw/edit?pli=1#gid=0>

If you notice anything incorrect, or have answers to any of the loose ends (labeled with 
"TBD:" or "Q:" (for questions)), please comment on the document or e-mail me.

Thanks,
Daniel

--
Daniel Barclay
MapR Technologies

[jira] [Resolved] (DRILL-3617) Apply "shading" to JDBC-all Jar file to avoid version conflicts

2015-09-14 Thread Daniel Barclay (Drill) (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) resolved DRILL-3617.
---
Resolution: Fixed

Fixed by fix for DRILL-3589.

> Apply "shading" to JDBC-all Jar file to avoid version conflicts
> ---
>
> Key: DRILL-3617
> URL: https://issues.apache.org/jira/browse/DRILL-3617
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Daniel Barclay (Drill)
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3502) JDBC driver can cause conflicts

2015-09-14 Thread Daniel Barclay (Drill) (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) resolved DRILL-3502.
---
Resolution: Fixed

Fixed by fix for DRILL-3589.

> JDBC driver can cause conflicts
> ---
>
> Key: DRILL-3502
> URL: https://issues.apache.org/jira/browse/DRILL-3502
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.1.0
>Reporter: Stefán Baxter
>    Assignee: Daniel Barclay (Drill)
> Fix For: 1.2.0
>
>
> Using the JDBC driver in Java projects is problematic as it contains older 
> versions of some popular libraries and since they are not isolated/shaded 
> they may conflict with newer versions being used in these projects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Beginners Query

2015-09-11 Thread Daniel Barclay


Ajay Shriwastava wrote:

...

I was checking org.apache.drill.jdbc.impl.DrillStatementImpl class and see
that its still using

import net.hydromatic.avatica.AvaticaStatement;

and the drill-jdbc pom has dependency


   net.hydromatic
   optiq-avatica
   0.9-drill-r20


so I searched JIRA and found out that issue to rebase drill on calcite has
been resolved.
https://issues.apache.org/jira/browse/DRILL-1384

Can you help me understand that if it was rebased to calcite in 0.9 why is
it referring to hydromatic in master branch. I am missing something obvious
here so please excuse my ignorance.


Only Drill's SQL parser and related using Calcite were rebased to use a
newer version of Calcite.

Drill's JDBC driver was not part of that rebasing.  (It doesn't use
the newer version of Avatica that's in the version of Calcite on which
the parser, etc., where rebased.)

Daniel
--
Daniel Barclay
MapR Technologies

[jira] [Created] (DRILL-3768) HTML- and JavaScript-injection vulnerability (lack of HTML encoding)

2015-09-11 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3768:
-

 Summary: HTML- and JavaScript-injection vulnerability (lack of 
HTML encoding)
 Key: DRILL-3768
 URL: https://issues.apache.org/jira/browse/DRILL-3768
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - HTTP
Reporter: Daniel Barclay (Drill)
Assignee: Jason Altekruse
Priority: Critical


The Web UI does not properly encode query text or error message text into HTML. 
 This makes the Web UI vulnerable to JavaScript-injection attacks.


Most importantly, the Web UI doesn't encode characters that are special in 
HTML, e.g., encoding "<" in that plain text to "lt;" in the HTML text.

This means that some queries containing a less-than character ("<") are 
displayed wrong.  For example, submit this query and then look at its profile 
via the Web UI:

{noformat}
SELECT 1 alert("Gotcha!") 
{noformat}



Another, though less serious, problem is that line breaks in plain text are not 
encoded into HTML (e.g., as "").

That means that separate lines of error messages are run together, making them 
harder or impossible to parse correctly when see in the Web UI.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Maven build failing on checkstyle

2015-09-10 Thread Daniel Barclay


Ted Dunning wrote:

...
I think that we should have a rule that every class should have javadoc on
the class.


Since it will take a long time to get to that state, we should
probably start with whichever are the most important classes to
document.

That fuzzy set of "most important" classes probably includes:

- classes relevant to using Drill's application-programming
  interfaces (e.g., Drill's driver JDBC now (which already has
  significant documentation), and, in the future, DrillClient/etc.)

- classes relevant to developing plug-ins

- most classes at the roots of big and/or significant hierarchies
  (since they specify methods, contracts, and protocols used by many
  other things)

- other classes that methods, contracts, and protocols used by many
  other parts of Drill

Daniel


P.S.  Speaking of documentation...

Could a committer approve and merge my DRILL-3641 IterOutcome doc.
pull request at https://github.com/apache/drill/pull/113?  (Only
one file to review! only an enum class's documentation!)  (It
should help avoid future bugs like DRILL-2288 and DRILL-3569.)

Also, could anyone take a look at
https://github.com/apache/drill/pull/118?  It's mostly just Javadoc
edits on code somewhat related to storage plug-ins (things I
encountered in starting to explore how to write a storage plug-in).

Thanks,
Daniel


--
Daniel Barclay
MapR Technologies

[jira] [Created] (DRILL-3760) Casting interval to string and back to interval fails

2015-09-10 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3760:
-

 Summary: Casting interval to string and back to interval fails
 Key: DRILL-3760
 URL: https://issues.apache.org/jira/browse/DRILL-3760
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Reporter: Daniel Barclay (Drill)
Assignee: Mehant Baid


Casting from an interval type to {{VARCHAR(...)}} and then casting back to the 
same interval type yields data format errors, for example:

{noformat}
0: jdbc:drill:drillbit=localhost> VALUES CAST( CAST( INTERVAL '1' MONTH AS 
VARCHAR(99) ) AS INTERVAL MONTH );
Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: "0 years 1 month 
"


[Error Id: 339d28df-b687-47f0-b6ce-1f7732e41660 on dev-linux2:31010] 
(state=,code=0)
0: jdbc:drill:drillbit=localhost> 
{noformat}

The problem seems to be in casting from interval types to strings.  The SQL 
standard specifies that the result string has the syntax of a SQL literal, but 
Drill currently uses some other syntax:

{noformat}
0: jdbc:drill:drillbit=localhost> VALUES CAST( INTERVAL '1' YEAR AS VARCHAR(99) 
);
+---+
|  EXPR$0   |
+---+
| 1 year 0 months   |
+---+
1 row selected (0.27 seconds)
0: jdbc:drill:drillbit=localhost> 
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3748) apache-release profile on single module fails

2015-09-08 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3748:
-

 Summary: apache-release profile on single module fails
 Key: DRILL-3748
 URL: https://issues.apache.org/jira/browse/DRILL-3748
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build & Test
Reporter: Daniel Barclay (Drill)
Assignee: Steven Phillips


Running Maven with the {{apache-release}} profile enabled fails when run on 
just the {{drill-jdbc-all}} module.

(Building all of Drill with that profile seems to work, but trying to re-build 
just the {{drill-jdbc-all}} module}} fails.  That means that the 
edit/regenerate/check loop for JDBC Javadoc documentation (coming with 
DRILL-3160) takes much longer, since one can't incrementally rebuild just the 
{{drill-jdbc-all}} module (with the {{apache-release}} profile, which is needed 
for that Javadoc).)

Specifically, although this command works:

{noformat}
mvn install -DskipTests -Papache-release -Dgpg.skip=true
{noformat}

executing the following (even right after the above command) fails:

{noformat}
cd exec/jdbc-all
mvn install -DskipTests -Papache-release -Dgpg.skip=true
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3744) Resolve missing expected {@inheritDoc} results in JDBC Javadoc

2015-09-06 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3744:
-

 Summary: Resolve missing expected {@inheritDoc} results in JDBC 
Javadoc
 Key: DRILL-3744
 URL: https://issues.apache.org/jira/browse/DRILL-3744
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Reporter: Daniel Barclay (Drill)
Assignee: Daniel Barclay (Drill)
Priority: Minor


Review the JDBC Javadoc comments that use {{\{@inheritDoc\}}}.

Results from running the regular Javadoc command (including via Maven) are 
different from the results that were expected (from composing/editing 
documentation using Eclipse's dynamic Javadoc view).

(Beware--Eclipse's dynamic Javadoc view does not update correctly.  What the 
view displays for a given state of the documentation comment depends on the 
order of changes (comment contents and cursor position) that got the view into 
that state, not just the state itself.)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: The meaning of the methods in StoragePlugin and EasyFormatPlugin

2015-09-03 Thread Daniel Barclay


I wrote:

... Below are some notes on the detailed requirements I had extracted from
the code.  ...

I found a later copy of my (still rough) notes.

See the Google Docs document at
[Notes for] Instructions on Creating Storage Plug-ins 
<https://docs.google.com/document/d/1GqPV1oXMYoVAMihhfhTDObL4utg2NvCq4_obFPlar18/edit>.

Daniel

--
Daniel Barclay
MapR Technologies

[jira] [Resolved] (DRILL-3661) Add/edit various JDBC Javadoc.

2015-09-02 Thread Daniel Barclay (Drill) (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) resolved DRILL-3661.
---
Resolution: Fixed

> Add/edit various JDBC Javadoc.
> --
>
> Key: DRILL-3661
> URL: https://issues.apache.org/jira/browse/DRILL-3661
> Project: Apache Drill
>  Issue Type: Bug
>    Reporter: Daniel Barclay (Drill)
>        Assignee: Daniel Barclay (Drill)
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [GitHub] drill pull request: DRILL-3589: Update JDBC driver to shade and mi...

2015-09-01 Thread Daniel Barclay


I wrote:

Github user dsbos commented on the pull request:

 https://github.com/apache/drill/pull/116#issuecomment-136889943

 Something seems to be broken.


Never mind that comment.  I seemed to have a local version/build mismatch.
Daniel






 After rebasing your branch on my branch with my DRILL-3347 (Hadoop Test) 
and DRILL-3566 (Prep.Stmt.) fixes, I tried installing the resulting JDBC-all 
Jar file on Spotfire, but Spotfire's getting IndexOutOfBoundsExceptions 
somewhere within ResultSet.next().

 I'll see if I can identify which shading might be causing that or what's 
going on in next().



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---




--
Daniel Barclay
MapR Technologies

[jira] [Created] (DRILL-3730) Change JDBC driver's DrillConnectionConfig to interface

2015-09-01 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3730:
-

 Summary: Change JDBC driver's DrillConnectionConfig to interface
 Key: DRILL-3730
 URL: https://issues.apache.org/jira/browse/DRILL-3730
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Reporter: Daniel Barclay (Drill)
Assignee: Daniel Barclay (Drill)


Change {{org.apache.drill.jdbc.DrillConnectionConfig}} (in Drill's published 
interface for the JDBC driver) from being a class to being an interface.

Move the implementation (including the inheritance from 
{{net.hydromatic.avatica.ConnectionConfigImpl}}) from published-interface 
package {{org.apache.drill.jdbc}} to a class in implementation package
{{org.apache.drill.jdbc.impl}}.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

DRILL-3548 - plug-in-exploration--related doc.

2015-08-31 Thread Daniel Barclay


Jacques,

Since you seem to be addressing plug-in--related documentation now, could you please review 
my patch for DRILL-3548 <https://issues.apache.org/jira/browse/DRILL-3548> in Pull 
Request #118 <https://github.com/apache/drill/pull/118>, or at least assimilate the 
plug-in--related parts of it (e.g., FormatPluginConfig.java, DrillConfig.java (doc. of 
drill-module.conf, etc), StoragePluginRegistry.java (clearer messages), DrillTable.java, 
AbstractSchema.java, RecordGenerator.java, etc.)?

Thanks,
Daniel

--
Daniel Barclay
MapR Technologies

Re: The meaning of the methods in StoragePlugin and EasyFormatPlugin

2015-08-31 Thread Daniel Barclay


Edmon,

I see you're working on adding documentation about creating storage
plug-ins.

I was looking into that myself a little while ago, but wasn't able to
continue.

Below are some notes on the detailed requirements I had extracted from
the code.  Hopefully they'll be helpful in filling in your documentation
of what's required to create a storage plug-in.

Daniel


Storage Plug-In Notes

Pieces needed for/aspects of a storage plug-in:

* Need storage plug-in configuration class:
  - per StoragePluginConfig (abstract class (currently))
- (org.apache.drill.common.logical.StoragePluginConfig)
  - public class
  - public no-argument constructor? (modulo Jackson/@JsonTypeName?)
  - one per storage plug-in type?
  - What about Jackson serialization?
- TODO: REFINE: plug-in type name (for ?) defaults to simple class name; can
  be specified by some kind of NAME property (caps?  any?)
 - what are requirements on serializability?

* Need storage plug-in class:
  - per StoragePlugin (interface)
- (org.apache.drill.exec.store StoragePlugin)
  - public class (not clearly needed)
  - public constructor ...( SomeStoragePluginConfig, DrillContext, String),
where:
- StoragePluginConfig is _specific_ implementation class of
  StoragePluginConfig
  - multiple storage plug-ins can share one storage plug-in class--one
constructor per StoragePluginConfig class


* Class path scanning requirement:
  - StoragePluginConfig and StoragePlugin implementation classes are found by
classpath scanning
  - Need drill-module.conf file in root of classpath subtree containing classes
to be found.
  - Normally need to append name of each package (immediately?) containing
implementation classes to configuration property
drill.exec.storage.packages.

* bootstrap-storage-plugins.json
  - Normally need to have bootstrap-storage-plugins.json file in same classpath
root.
  - Normally have default configuration for plug-in in same classpath root's
bootstrap-storage-plugins.json file.
  - Format seems to Jackson's serialization of some kind of list of
StoragePluginConfig:
- Jackson seems to follow Java Beans getter/setting mapping rules
  (verified only for simple values--String, boolean)
- (What else?)

* Schema, ROUGH:
  - Calcite's Schema
  - Drill's AbstractSchema
  - implementations of interface Calcite's Table must be subclasses of Drill's
DrillTable


(Document that old code doesn't follow recently clarified terminology in (most
of) user documetation:
- "storage plug-in" refers to the code itself (what plugs into Drill)
- "storage plug-in configuration" refers to the configuration associated with
  names such as "cp" and "dfs"--different configurations of the file-system
  plug-in
- "storage plug-in configuration name" refers to names such as "cp" and "dfs"
- "storage plug-in type name" refers to ... (e.g., "file", "hive")
- (old terms in code: "storage engine" (sometimes) means storage plug-in
  configuration name)
)

Pending questions:
- Q: What does the @JsonTypeInfo annotation on StoragePluginConfig do?
  Specifically, how exactly does it relate to "type" in 'type: "file"' and to
  "NAME" and "name" fields (JavaBeans/Jackson properties?) on plug-in classes?

  @JsonTypeInfo(use = JsonTypeInfo.Id.NAME,
include = JsonTypeInfo.As.PROPERTY, property="type") on  StoragePluginConfig
specify that JavaBeans/Jackson property named "type" on subclasses

- Q: What exactly does SystemTablePluginConfig's _public_ NAME field do?
- Q: What exactly does SystemTablePluginConfig's _public_ INSTANCE field do?







--
Daniel Barclay
MapR Technologies

Re: Hangout happening in 30mins! (at 10:00am Pacific)

2015-08-25 Thread Daniel Barclay


This weeks's Drill Hangout will be happening in about 35 minutes.

(https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc)


Hsuan Yi Chu wrote:

Come join the Drill community as we discuss what has been happening lately
and what is in the pipeline. All are welcome, if you know about Drill, want
to know more or just want to listen in.

Link: https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc




--
Daniel Barclay
MapR Technologies

Review Request 37685: DRILL-2489: Throw exception from remaining methods for closed objects.

2015-08-25 Thread Daniel Barclay


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37685/
---

Review request for drill, Mehant Baid and Parth Chandra.


Bugs: DRILL-2489
https://issues.apache.org/jira/browse/DRILL-2489


Repository: drill-git


Description
---

(Note:  Patch depends on (needs to be applied after) patches for DRILL-3153, 
-3347, -3566, and -3661.)

Refactored unit test to check all methods per interface.  (Replaced individual,
static test methods with bulk reflection-based checking.)
[Drill2489CallsAfterCloseThrowExceptionsTest]

Added DrillResultSetMetaDataImpl.

Added method overrides to check state for remaining methods from Connection,
Statement, PreparedStatement, ResultSet, ResultSetMetaData and DatabaseMetaData.

Also:
- renamed checkNotClosed to throwIfClosed.


Diffs
-

  exec/jdbc/src/main/java/org/apache/drill/jdbc/DrillConnection.java 608bf05 
  exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillConnectionImpl.java 
243e627 
  
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillDatabaseMetaDataImpl.java
 9d0c132 
  exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillJdbc41Factory.java 
11191ae 
  
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillPreparedStatementImpl.java
 86683cb 
  exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java 
1b37dc1 
  
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetMetaDataImpl.java
 PRE-CREATION 
  exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java 
6cba58e 
  exec/jdbc/src/test/java/org/apache/drill/jdbc/ConnectionTest.java 8735146 
  
exec/jdbc/src/test/java/org/apache/drill/jdbc/ConnectionTransactionMethodsTest.java
 1aff918 
  exec/jdbc/src/test/java/org/apache/drill/jdbc/StatementTest.java 3e64fcb 
  
exec/jdbc/src/test/java/org/apache/drill/jdbc/test/Drill2489CallsAfterCloseThrowExceptionsTest.java
 01008b2 

Diff: https://reviews.apache.org/r/37685/diff/


Testing
---

Ran new tests in this patch.  Also manually injected various errors to confirm 
detection.

Ran existing tests; no new errors.


Thanks,

Daniel Barclay

[jira] [Created] (DRILL-3693) SQLLine/drill-localhost seems to demand 2.9GB just to start

2015-08-23 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3693:
-

 Summary: SQLLine/drill-localhost seems to demand 2.9GB just to 
start
 Key: DRILL-3693
 URL: https://issues.apache.org/jira/browse/DRILL-3693
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


Starting SQLLine/drill-localhost when not enough virtual memory is available 
seems to indicate that Drill's _client-side_ code is trying to allocate almost 
2.9 GB:

{noformat}
$ 
./distribution/target/apache-drill-1.2.0-SNAPSHOT/apache-drill-1.2.0-SNAPSHOT/bin/drill-localhost
 
Java HotSpot(TM) 64-Bit Server VM warning: INFO: 
os::commit_memory(0x0006fff8, 2863661056, 0) failed; error='Cannot 
allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 2863661056 bytes for 
committing reserved memory.
# An error report file with more information is saved as:
# /home/dbarclay/work/git/incubator-drill/hs_err_pid21405.log
$
{noformat}

Is that intended?  




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3696) REGEXP_REPLACE doesn't document that replacement is pattern (not plain string)

2015-08-23 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3696:
-

 Summary: REGEXP_REPLACE doesn't document that replacement is 
pattern (not plain string)
 Key: DRILL-3696
 URL: https://issues.apache.org/jira/browse/DRILL-3696
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3695) SYSTEM ERROR for REGEXP_REPLACE replacement pattern format error

2015-08-23 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3695:
-

 Summary: SYSTEM ERROR for REGEXP_REPLACE replacement pattern 
format error
 Key: DRILL-3695
 URL: https://issues.apache.org/jira/browse/DRILL-3695
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


Similar to the problem with REGEXP_REPLACE match patterns reported in 
DRILL-3694, REGEXP_REPLACE reports SYSTEM ERROR errors rather than specific 
(FUNCTION ERROR) errors for bad replacement pattern strings:

0: jdbc:drill:drillbit=localhost VALUES REGEXP_REPLACE( 'abc', 'b', '\');
Error: SYSTEM ERROR: StringIndexOutOfBoundsException: String index out of 
range: 1

{noformat}
[Error Id: 12f09e63-8dcb-4ab8-bfe6-183d81617c1e on dev-linux2:31010] 
(state=,code=0)
0: jdbc:drill:drillbit=localhost VALUES REGEXP_REPLACE( 'abc', 'b', '$');
Error: SYSTEM ERROR: StringIndexOutOfBoundsException: String index out of 
range: 1


[Error Id: 084ce8ce-8c11-4d53-82a4-be19aa9140b2 on dev-linux2:31010] 
(state=,code=0)
0: jdbc:drill:drillbit=localhost VALUES REGEXP_REPLACE( 'abc', 'b', '$2');
Error: SYSTEM ERROR: IndexOutOfBoundsException: No group 2


[Error Id: 04d5e101-1f94-46df-8590-6f94aac9201c on dev-linux2:31010] 
(state=,code=0)
0: jdbc:drill:drillbit=localhost 
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3694) SYSTEM ERROR for REGEXP_REPLACE regex format error

2015-08-23 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3694:
-

 Summary: SYSTEM ERROR for REGEXP_REPLACE regex format error
 Key: DRILL-3694
 URL: https://issues.apache.org/jira/browse/DRILL-3694
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


Giving a bad regular expression as the match-pattern argument string to 
REGEXP_REPLACE yields a SYSTEM ERROR (apparently from a non-specific catch of 
PatternSyntaxException from the Java implementation) rather than a more 
specific error (FUNCTION ERROR?) from explicit validation (e.g., via a specific 
catch of PatternSyntaxException from the Java implementation).

For example:

{noformat}
0: jdbc:drill:drillbit=localhost 
VALUES REGEXP_REPLACE( 'abc', '\', 'x');
Error: SYSTEM ERROR: PatternSyntaxException: Unexpected internal error near 
index 1
\
 ^


[Error Id: 6a4dfb45-cd7b-4c24-b720-3813522254a4 on dev-linux2:31010] 
(state=,code=0)
0: jdbc:drill:drillbit=localhost 
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3697) REGEXP_REPLACE doc. says POSIX reg. expr.; which? not Java?

2015-08-23 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3697:
-

 Summary: REGEXP_REPLACE doc. says POSIX reg. expr.; which? not 
Java?
 Key: DRILL-3697
 URL: https://issues.apache.org/jira/browse/DRILL-3697
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


The {{REGEXP_REPLACE}} function documentation currently at 
[https://drill.apache.org/docs/string-manipulation/#regexp_replace] says that 
{{REGEXP_REPLACE}} uses POSIX regular expressions.  

Is that true (that Drill using POSIX regular expressions and not Java regular 
expressions)?

If that's really true, are they BREs or EREs?

Assuming it's actually Java regular expressions, the documentation should 
probably have a link to some appropriate target in the JDK Java doc (maybe 
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html and 
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#replaceAll-java.lang.String-.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3686) no-ZooKeeper logging: Curator messages not in main file, no trying ZooKeeper from Drill

2015-08-21 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3686:
-

 Summary: no-ZooKeeper logging: Curator messages not in main file, 
no trying ZooKeeper from Drill
 Key: DRILL-3686
 URL: https://issues.apache.org/jira/browse/DRILL-3686
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


When Drill is started when ZooKeeper is not running, the logging could be 
clearer.

The log messages from Curator (e.g., ERROR org.apache.curator.ConnectionState 
- Connection timed out for connection string (localhost:2181) and timeout 
(5000) / elapsed (5568)) don't go to Drill's normal/main log file 
.../drillbit.log; instead they go to .../Drillbit.out. 

(They'd be easier to notice and find if they were in the main log file where 
most of the rest of Drill's logging output is.)


Additionally, at least at the default logging level (for drillbit.sh), nothing 
in the main log says that Drill's about to try to connect to ZooKeeper.  
(Seeing a connecting to ZooKeeper message without a following connected to 
ZooKeeper message) in the main log would help point the reader to the 
secondary log, even if we can't/don't get the Curator log output into the main 
log file.







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3687) NullPointerException from query with WITH, VALUES, and USING

2015-08-21 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3687:
-

 Summary: NullPointerException from query with WITH, VALUES, and 
USING
 Key: DRILL-3687
 URL: https://issues.apache.org/jira/browse/DRILL-3687
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


The following query fails:

{noformat}
WITH q(key) AS (VALUES 1, 1)  SELECT *  FROM q q1 INNER JOIN q q2  USING (key)
{noformat}

The failures is a NullPointerException SYSTEM ERROR message:


{noformat}
Error: SYSTEM ERROR: NullPointerException


[Error Id: ba74e744-7c8a-4ec8-b046-ac28ad0a03a4 on dev-linux2:31010] 
(state=,code=0)
{noformat}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2312) JDBC driver returning incorrect data after extended usage

2015-08-21 Thread Daniel Barclay (Drill) (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) resolved DRILL-2312.
---
Resolution: Cannot Reproduce

Closing because attempts to reproduce this now on Drill pre-1.2 have failed and 
this bug was reported way back on version 0.7. 

 JDBC driver returning incorrect data after extended usage
 -

 Key: DRILL-2312
 URL: https://issues.apache.org/jira/browse/DRILL-2312
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Reporter: Norris Lee
Assignee: Daniel Barclay (Drill)
 Fix For: 1.2.0


 After executing ~20-30 queries with the JDBC driver, the data returned from a 
 Show files query is incorrect, particularly the isFile and isDirectory 
 columns. The first item in the schema/directory will be correct, but 
 subsequent items will report false for isFile and isDirectory.
 This was tested with just a simple program that just loops through 
 executeQuery and prints out the values for isFile and isDirectory. The JDBC 
 driver used was the Drill 0.7 snapshot.
 {code}
 isFile: true
 isDirectory: false
 isFile: false
 isDirectory: false
 isFile: false
 isDirectory: false
 isFile: false
 isDirectory: false
 isFile: false
 isDirectory: false
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3661) Add/edit various JDBC Javadoc.

2015-08-17 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3661:
-

 Summary: Add/edit various JDBC Javadoc.
 Key: DRILL-3661
 URL: https://issues.apache.org/jira/browse/DRILL-3661
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)
Assignee: Daniel Barclay (Drill)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3659) UnionAllRecordBatch infers wrongly from next() IterOutcome values

2015-08-17 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3659:
-

 Summary: UnionAllRecordBatch infers wrongly from next() 
IterOutcome values
 Key: DRILL-3659
 URL: https://issues.apache.org/jira/browse/DRILL-3659
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


When UnionAllRecordBatch uses IterOutcome values returned from the next() 
method of upstream batches, it seems to be using those values wrongly (making 
incorrect inferences about what they mean).

In particular, some switch statements seem to check for NONE vs. OK_NEW_SCHEMA 
in order to determine whether there are any rows (instead of explicitly 
checking the number of rows).  However, OK_NEW_SCHEMA can be returned even when 
there are zero rows.

The apparent latent bug in the union code blocks the fix for DRILL-2288 (having 
ScanBatch return OK_NEW_SCHEMA for a zero-rows case in which is was wrongly 
(per the IterOutcome protocol) returning NONE without first returning 
OK_NEW_SCHEMA).


For details of IterOutcome values, see the Javadoc documentation of 
RecordBatch.IterOutcome (after DRILL-3641 is merged; until then, see 
https://github.com/apache/drill/pull/113).

For an environment/code state that exposes the UnionAllRecordBatch problems, 
see https://github.com/dsbos/incubator-drill/tree/bugs/WORK_2288_etc, which 
includes:
- a test that exposes the DRILL-2288 problem;
- an enhanced IteratorValidatorBatchIterator, which now detects IterOutcome 
value sequence violations; and
- a fixed (though not-yet-cleaned) version of ScanBatch that fixes the 
DRILL-2288 problem and thereby exposes the UnionAllRecordBatch problem (several 
test methods in each of TestUnionAll and TestUnionDistinct fail).






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3656) Accountor catch intended for ConfigException hides NullPointerException (?)

2015-08-15 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3656:
-

 Summary: Accountor catch intended for ConfigException hides 
NullPointerException (?)
 Key: DRILL-3656
 URL: https://issues.apache.org/jira/browse/DRILL-3656
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


In org.apache.drill.exec.memory.Accountor's constructor, there is a 
catch(Exception e) ... clause that used to catch ConfigExceptions (when a 
requested configuration item wasn't known to the passed-in DrillConfig object, 
which occurred at least in some unit tests).

However, now that catch clause is also catching NullPointerExceptions because 
(sometimes) the DrillConfig parameter is null (in some unit tests).

It seems that:
- that catch clause should specifically catch only ConfigException (so that it 
doesn't accidentlaly hide any unexpected exceptions), and
- if the DrillConfig parameter is allowed to be null, the code should be 
handling that case explicitly with a test for null, not via a catch.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3655) TIME - (minus) TIME dpesm

2015-08-14 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3655:
-

 Summary: TIME - (minus) TIME dpesm
 Key: DRILL-3655
 URL: https://issues.apache.org/jira/browse/DRILL-3655
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


0: jdbc:drill: VALUES CURRENT_TIME - CURRENT_TIME;
Error: PARSE ERROR: From line 1, column 8 to line 1, column 34: Cannot apply 
'-' to arguments of type 'TIME(0) - TIME(0)'. Supported form(s): 'NUMERIC 
- NUMERIC'
'DATETIME_INTERVAL - DATETIME_INTERVAL'
'DATETIME - DATETIME_INTERVAL'


[Error Id: ede6c073-ca82-4359-8adb-db413e051e29 on dev-linux2:31010] 
(state=,code=0)
0: jdbc:drill: 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3641) Document RecordBatch.IterOutcome (enumerators and possible sequences)

2015-08-13 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3641:
-

 Summary: Document RecordBatch.IterOutcome (enumerators and 
possible sequences)
 Key: DRILL-3641
 URL: https://issues.apache.org/jira/browse/DRILL-3641
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)
Assignee: Daniel Barclay (Drill)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3627) Poor visibility of failure to connect to ZooKeeper

2015-08-11 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3627:
-

 Summary: Poor visibility of failure to connect to ZooKeeper
 Key: DRILL-3627
 URL: https://issues.apache.org/jira/browse/DRILL-3627
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


When Drill starts up, if it can't connect to ZooKeeper right away, it doesn't 
seem to write anything to its logs (at least at the default logging level) to 
indicate that it is retrying connecting to ZooKeeper.

Also, it doesn't write any connecting to ZooKeeper message normally paired 
with a connected to ZooKeeper mesage, which would at least make it easier to 
notice the ZooKeeper connection problem (when the first message appears but the 
second one does not).






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3628) drillbit.sh output mentions only secondary .out, but not primary .log

2015-08-11 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3628:
-

 Summary: drillbit.sh output mentions only secondary .out, but not 
primary .log
 Key: DRILL-3628
 URL: https://issues.apache.org/jira/browse/DRILL-3628
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


Although the output of the drillbit.sh script mentions the secondary output 
file .../drillbit.out, it never mentions the primary output file 
.../drillbit.log.







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3626) Many references to

2015-08-11 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3626:
-

 Summary: Many references to 
 Key: DRILL-3626
 URL: https://issues.apache.org/jira/browse/DRILL-3626
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3617) Apply shading to JDBC-all Jar file to avoid version conflicts

2015-08-07 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3617:
-

 Summary: Apply shading to JDBC-all Jar file to avoid version 
conflicts
 Key: DRILL-3617
 URL: https://issues.apache.org/jira/browse/DRILL-3617
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Reporter: Daniel Barclay (Drill)
Assignee: Daniel Barclay (Drill)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3614) drill

2015-08-06 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3614:
-

 Summary: drill
 Key: DRILL-3614
 URL: https://issues.apache.org/jira/browse/DRILL-3614
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3612) Doc. says logging configuration at /conf/logback.xml

2015-08-06 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3612:
-

 Summary: Doc. says logging configuration at /conf/logback.xml
 Key: DRILL-3612
 URL: https://issues.apache.org/jira/browse/DRILL-3612
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Reporter: Daniel Barclay (Drill)
Assignee: Bridget Bevens


The Drill documentation page at 
https://drill.apache.org/docs/log-and-debug-introduction/ says:

bq. Logback behavior is defined by configurations set in /conf/logback.xml. 

Isn't the location really conf/logback.xml relative to Drill's root 
installation directory (the apache-drill-n.n.n directory)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3611) Drill/client unstable in connection-closed state

2015-08-05 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3611:
-

 Summary: Drill/client unstable in connection-closed state
 Key: DRILL-3611
 URL: https://issues.apache.org/jira/browse/DRILL-3611
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


When Drill and/or a client get into the state in which the client reports that 
the connection is closed, the error messages are not stable.  

In the following (a series of empty queries executed about a half a second 
apart), notice how sometimes the exception is a CONNECTION ERROR: ... closed 
unexpectedly exception and sometimes it is a SYSTEM ERROR: 
ChannelClosedException exception:


{noformat}
0: jdbc:drill: 
0: jdbc:drill: ;
Error: CONNECTION ERROR: Connection /127.0.0.1:46726 -- /127.0.0.1:31010 
(user client) closed unexpectedly.


[Error Id: 0848c18e-64e9-41e2-90d9-3a0ffaebc14e ] (state=,code=0)
0: jdbc:drill: ;
Error: SYSTEM ERROR: ChannelClosedException


[Error Id: b465b0e7-55a2-4ef6-ad0e-01258468f4e7 ] (state=,code=0)
0: jdbc:drill: ;
Error: SYSTEM ERROR: ChannelClosedException


[Error Id: 0b50a10c-42eb-47b6-bc3d-9a42afe4cd28 ] (state=,code=0)
0: jdbc:drill: ;
Error: SYSTEM ERROR: ChannelClosedException


[Error Id: 9cd1fd96-0aed-4d06-b0ae-d48ddc70b91e ] (state=,code=0)
0: jdbc:drill: ;
Error: CONNECTION ERROR: Connection /127.0.0.1:46726 -- /127.0.0.1:31010 
(user client) closed unexpectedly.


[Error Id: 222a5358-6b2e-49e1-a1ec-931cacbbdbd1 ] (state=,code=0)
0: jdbc:drill: ;
Error: SYSTEM ERROR: ChannelClosedException


[Error Id: fc589b70-dd10-4484-963a-21bc88147a0d ] (state=,code=0)
0: jdbc:drill: ;
Error: CONNECTION ERROR: Connection /127.0.0.1:46726 -- /127.0.0.1:31010 
(user client) closed unexpectedly.


[Error Id: 19965e75-9f2e-4a73-b1d8-29d61e6ea31a ] (state=,code=0)
0: jdbc:drill: 
0: jdbc:drill: 
0: jdbc:drill: 
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3496) Augment logging in DrillConfig and classpath scanning.

2015-08-04 Thread Daniel Barclay (Drill) (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) resolved DRILL-3496.
---
Resolution: Fixed

 Augment logging in DrillConfig and classpath scanning.
 --

 Key: DRILL-3496
 URL: https://issues.apache.org/jira/browse/DRILL-3496
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)
Assignee: Daniel Barclay (Drill)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2815) Some PathScanner logging, misc. cleanup.

2015-08-04 Thread Daniel Barclay (Drill) (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) resolved DRILL-2815.
---
Resolution: Fixed

 Some PathScanner logging, misc. cleanup.
 

 Key: DRILL-2815
 URL: https://issues.apache.org/jira/browse/DRILL-2815
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)
Assignee: Jason Altekruse
Priority: Minor
 Fix For: 1.2.0

 Attachments: DRILL-2815.5.patch.txt, DRILL-2815.6.patch.txt


 Add a little more  logging to PathScanner; clean up a little.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3603) Document workaround for DRILL-2560 in JDBC Javadoc doc. (etc.)

2015-08-04 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3603:
-

 Summary: Document workaround for DRILL-2560 in JDBC Javadoc doc. 
(etc.)
 Key: DRILL-3603
 URL: https://issues.apache.org/jira/browse/DRILL-3603
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


Until DRILL-2560 and DRILL3267 are resolved, the Javadoc documentation for 
{{executeQuery(...))}} methods should document the workaround described in 
DRILL-2560's report.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Unable to connect to drill 1.1.0 using JDBC

2015-07-28 Thread Daniel Barclay


Parth Chandra wrote:

Yes and I probably reviewed it and missed it. We were missing a unit test
which would have caught this.


Yes; evidently we didn't have any test that used PreparedStatement enough
to execute a query.

Daniel



It's being fixed ASAP. ( DRILL-3566 )



On Mon, Jul 27, 2015 at 9:23 PM, Jacques Nadeau jacq...@dremio.com wrote:


Agreed.  It should be fixed.  Just trying to provide work around until it
is fixed.  It is unfortunate that it was broken 1.1. Must have been all the
jdbc refactoring.
On Jul 27, 2015 7:57 PM, Anas Mesrah anas.mes...@gmail.com wrote:


Hi Jacques,

You are right BUT there are many softwares built on PreparedStatements

and

they may integrate with Drll. I am not sure about the current big BI
products integrating with drill, if you skip that you are simply ignoring
them. In addition, this is working on previous versions, I tested 1.0.0

and

0.8.0. I am sure something can be fixed.








--
Daniel Barclay
MapR Technologies

[jira] [Created] (DRILL-3569) TestBuilder.baseLineRecords(ListMap) doesn't specify maps

2015-07-28 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3569:
-

 Summary: TestBuilder.baseLineRecords(ListMap) doesn't specify 
maps
 Key: DRILL-3569
 URL: https://issues.apache.org/jira/browse/DRILL-3569
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build  Test
Reporter: Daniel Barclay (Drill)
Assignee: Jason Altekruse


In method TestBuilder.baseLineRecords(ListMap), neither the parameter type 
nor the method documentation comment indicates how the maps represent 
materialized results.

The Map part should have type parameters and/or the documentation should say 
something about how the maps in the list represent results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3570) TestBuilder.baselineRecords(ListMap) doesn't handle null well

2015-07-28 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3570:
-

 Summary: TestBuilder.baselineRecords(ListMap) doesn't handle null 
well
 Key: DRILL-3570
 URL: https://issues.apache.org/jira/browse/DRILL-3570
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build  Test
Reporter: Daniel Barclay (Drill)
Assignee: Jason Altekruse


TestBuilder.baselineRecords(ListMap) accepts a call with null (doesn't reject 
it with a bad-parameter error) but does not accept it as an specification of a 
baseline (to prevent the Must provide some kind of baseline, either a baseline 
file or another query error).

Either it should reject null (and the parameter documentation should probably 
have ; not null added), or when it gets null, it should take that as 
providing some kind of baseline (maybe a baseline of zero rows (like and empty 
list), or maybe a don't-care baseline) (and the documentation should reflect 
that).






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3571) TestBuilder: empty list to baselineRecords yields IndexOutOfBoundsException

2015-07-28 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3571:
-

 Summary: TestBuilder: empty list to baselineRecords yields 
IndexOutOfBoundsException
 Key: DRILL-3571
 URL: https://issues.apache.org/jira/browse/DRILL-3571
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build  Test
Reporter: Daniel Barclay (Drill)
Assignee: Jason Altekruse


Passing an empty list to TestBuilder.baselineRecords causes an 
IndexOutOfBoundsException in the expression baselineRecords.get(0) in 
compareMergedOnHeapVectors.

If baselineRecords's list is not intended to be an empty list, that should be 
rejected at the call to baselineRecords.

Use case note:  I was trying to specify to test that the results contains zero 
records.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: dropping qualified names in logger declarations

2015-07-27 Thread Daniel Barclay


  I can't seem to find any guidance for this particular issue.

Yeah; I wouldn't expect anything specific to our peculiar pattern,
other than a general endorsement of usually using imports and usually
not using fully-qualified names.

Is there any reason not to go with imports and non-qualified names
(with the parts I mentioned below to retain the easy commenting/
uncommenting to avoid unused imports and loggers)?


If there's concern about having loggers in both styles for a long
interim period:  I think I could convert the declarations pretty
rapidly (using some Emacs regular-expression bulk replacement), as
long as I can get someone to approve and merge in the changes as
quickly as we want the inconsistency to go away.

Daniel


Chris Westin wrote:

I tried to follow up with Hakim's suggestion of consulting the checkstyle
rules I expect to use (I've suggested before that we should start with
Google's rules as a basis, and make a few tweaks), but unfortunately, on
that day last week, SourceForge was down (that's where the rules are
hosted). It's finally back, so here they are
http://checkstyle.sourceforge.net/google_style.html . I can't seem to find
any guidance for this particular issue.

On Wed, Jul 22, 2015 at 10:51 AM, Daniel Barclay dbarc...@maprtech.com
wrote:



Chris Westin wrote:


For the special case of the logger, I kind of like it this way, because I
can turn it off just by commenting out a single line (to get rid of
unreferenced variable warnings),or add it by pasting in or uncommenting a
single line. In either case I don't have to worry about removing or adding
the import line separately, which can be quite far away if there are a lot
of imports.



Why not use the modern Java feature intended for cases like this:  have
a @SuppressWarnings(unused) annotation on the logger member
declaration if the declaration has been added but the member isn't used
yet?

Then:
- We can still avoid unused-variable warnings for logger members that
   have already been declared before there are any uses.
- We no longer have to move up top to adjust an already-existing logger
   declaration when adding a logger use down in the code.
   (Yes, we should remove (or comment out) the annotation if the change
   isn't temporary, but we don't have to do that immediately just to
   continue compiling.)
- We can now use code completion when adding the first logger call to a
   previously unused logger, since the declaration is real (not just a
   comment).
- We can still comment out/uncomment only a single line (the annotation)
   to switch between the no-logger-uses and some-logger-use cases.  (That
   is, if you don't want to have to re-add the suppression annotation if
   the last use of a logger is removed later, you can comment out, rather
   than delete, the annotation when the first logger use it added.
- We no longer have to either go adjust imports or use qualified names
   to avoid having to adjust imports.
- We stop having unnecessarily qualified names in the code.
- and, finally ...:
- We stop having those names' extra visual clutter and length, which
   makes it harder to notice when the class literal ends up wrong.
   (Note the mention of pasting above.)

Daniel







On Tue, Jul 21, 2015 at 6:12 PM, Daniel Barclay dbarc...@maprtech.com
wrote:

  For logger member declarations, can we drop the current pattern of using

qualified names (like this:

private static final org.slf4j.Logger logger =
org.slf4j.LoggerFactory.getLogger(StoragePluginRegistry.class);

) and allow using imports and non-qualified names (as we do for almost
everything else)?


Using qualified names adds a lot of visual noise, and pushes the class
literal farther to the right, making it easier to fail to notice that
it doesn't match the containing class.

Thanks,
Daniel
--
Daniel Barclay
MapR Technologies






--
Daniel Barclay
MapR Technologies






--
Daniel Barclay
MapR Technologies

[jira] [Created] (DRILL-3557) Reading empty CSV file fails with SYSTEM ERROR

2015-07-26 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3557:
-

 Summary: Reading empty CSV file fails with SYSTEM ERROR
 Key: DRILL-3557
 URL: https://issues.apache.org/jira/browse/DRILL-3557
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


Trying to read an empty CSV file (a file containing zero bytes) fails with a 
system error:

{noformat}
0: jdbc:drill:zk=local SELECT * FROM `dfs.root`.`/tmp/empty.csv`;
Error: SYSTEM ERROR: IllegalArgumentException: MinorFragmentId 0 has no read 
entries assigned


[Error Id: f1da68f6-9749-45bc-956b-20cbc6d28894 on dev-linux2:31010] 
(state=,code=0)
0: jdbc:drill:zk=local 
{noformat}






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

RecordBatch.MAX_BATCH_SIZE = 65536?

2015-07-25 Thread Daniel Barclay


In org.apache.drill.exec.record.RecordBatch, MAX_BATCH_SIZE is 65536.

Why isn't that 65535?  Is a batch size of zero not possible?


Daniel
--
Daniel Barclay
MapR Technologies

Re: dropping qualified names in logger declarations

2015-07-22 Thread Daniel Barclay



Chris Westin wrote:

For the special case of the logger, I kind of like it this way, because I
can turn it off just by commenting out a single line (to get rid of
unreferenced variable warnings),or add it by pasting in or uncommenting a
single line. In either case I don't have to worry about removing or adding
the import line separately, which can be quite far away if there are a lot
of imports.


Why not use the modern Java feature intended for cases like this:  have
a @SuppressWarnings(unused) annotation on the logger member
declaration if the declaration has been added but the member isn't used
yet?

Then:
- We can still avoid unused-variable warnings for logger members that
  have already been declared before there are any uses.
- We no longer have to move up top to adjust an already-existing logger
  declaration when adding a logger use down in the code.
  (Yes, we should remove (or comment out) the annotation if the change
  isn't temporary, but we don't have to do that immediately just to
  continue compiling.)
- We can now use code completion when adding the first logger call to a
  previously unused logger, since the declaration is real (not just a
  comment).
- We can still comment out/uncomment only a single line (the annotation)
  to switch between the no-logger-uses and some-logger-use cases.  (That
  is, if you don't want to have to re-add the suppression annotation if
  the last use of a logger is removed later, you can comment out, rather
  than delete, the annotation when the first logger use it added.
- We no longer have to either go adjust imports or use qualified names
  to avoid having to adjust imports.
- We stop having unnecessarily qualified names in the code.
- and, finally ...:
- We stop having those names' extra visual clutter and length, which
  makes it harder to notice when the class literal ends up wrong.
  (Note the mention of pasting above.)

Daniel






On Tue, Jul 21, 2015 at 6:12 PM, Daniel Barclay dbarc...@maprtech.com
wrote:


For logger member declarations, can we drop the current pattern of using
qualified names (like this:

   private static final org.slf4j.Logger logger =
org.slf4j.LoggerFactory.getLogger(StoragePluginRegistry.class);

) and allow using imports and non-qualified names (as we do for almost
everything else)?


Using qualified names adds a lot of visual noise, and pushes the class
literal farther to the right, making it easier to fail to notice that
it doesn't match the containing class.

Thanks,
Daniel
--
Daniel Barclay
MapR Technologies






--
Daniel Barclay
MapR Technologies

Positionable.setPosition(int) - always byte offset, or depends

2015-07-22 Thread Daniel Barclay


In org.apache.drill.exec.vector.complex.Positionable's method
setPosition(int index), is the index parameter always a byte offset
from the beginning of something, or is it sometimes in some other
units (records, blocks, etc.)?

(Is it correct to document on Positionable.setPosition(int) that it's
a byte offset, or is the intent that the setPosition's contract leaves
the units deferred to more-specific interfaces or classes?)

Thanks,
Daniel
--
Daniel Barclay
MapR Technologies

Re: question about UDF optimization

2015-07-21 Thread Daniel Barclay



Should Drill be defaulting the other way?

That is, instead of assuming pure unless declared otherwise (leading to
wrong results in the case that the assumption is wrong (or the annotation
was forgotten)), should Drill be assuming not pure unless declared pure
(leading to only lower performance in the wrong-assumption case)?

Daniel



Jacques Nadeau wrote:

There is an annotation on the function template.  I don't have a laptop
close but I believe it is something similar to isRandom. It basically tells
Drill that this is a nondeterministic function. I will be more specific
once I get back to my machine if you don't find it sooner.

Jacques
*Summary:*

Drill is very aggressive about optimizing away calls to functions with
constant arguments. I worry that could extend to per record batch
optimization if I accidentally have constant values and even if that
doesn't happen, it is a pain in the ass now largely because Drill is clever
enough to see through my attempt to hide the constant nature of my
parameters.

*Question:*

Is there a way to mark a UDF as not being a pure function?

*Details:*

I have written a UDF to generate a random number.  It takes parameters that
define the distribution.  All seems well and good.

I find, however, that the function is only called once (twice, actually
apparently due to pipeline warmup) and then Drill optimizes away later
calls, apparently because the parameters to the function are constant and
Drill thinks my function is a pure function.  If I make up some bogus data
to pass in as a parameter, all is well and the function is called as much
as I wanted.

For instance, with the uniform distribution, my function takes two
arguments, those being the minimum and maximum value to return.  Here is
what I see with constants for the min and max:

0: jdbc:drill:zk=local select random(0,10) from (values 5,5,5,5) as tbl(x);
into eval
into eval
+-+
|   EXPR$0|
+-+
| 1.7787372583008298  |
| 1.7787372583008298  |
| 1.7787372583008298  |
| 1.7787372583008298  |
+-+


If I include an actual value, we see more interesting behavior even if the
value is effectively constant:

0: jdbc:drill:zk=local select random(0,x) from (values 5,5,5,5) as tbl(x);
into eval
into eval
into eval
into eval
+--+
|EXPR$0|
+--+
| 3.688377805419459|
| 0.2827056410711032   |
| 2.3107479622644918   |
| 0.10813788169218574  |
+--+
4 rows selected (0.088 seconds)


Even if I make the max value come along from the sub-query, I get the evil
behavior although the function is now surprisingly actually called three
times, apparently to do with warming up the pipeline:

0: jdbc:drill:zk=local select random(0,max_value) from (select 14 as
max_value,x from (values 5,5,5,5) as tbl(x)) foo;
into eval
into eval
into eval
+-+
|   EXPR$0|
+-+
| 13.404462063773702  |
| 13.404462063773702  |
| 13.404462063773702  |
| 13.404462063773702  |
+-+
4 rows selected (0.121 seconds)

The UDF itself is boring and can be found at
https://gist.github.com/tdunning/0c2cc2089e6cd8c030c0

So how can I defeat this behavior?




--
Daniel Barclay
MapR Technologies

MaterializedField

2015-07-17 Thread Daniel Barclay


What exactly is materialized about class
org.apache.drill.exec.record.MaterializedField?

The name gave me the impression that it would be a field/column with
its data materialized (as a materialized view has copies of data).

However, MaterializedField doesn't seem to contain data values (just
field metadata like the name/pathname and data type).

So what exactly does the class represent?  (What's materialized,
and relative to what?)

Daniel
--
Daniel Barclay
MapR Technologies

[jira] [Created] (DRILL-3511) Dev.-level message (or bug?): You tried to write a Bit ... ValueWriter ... NullableFloat8WriterImpl.

2015-07-17 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3511:
-

 Summary: Dev.-level message (or bug?): You tried to write a Bit 
... ValueWriter ... NullableFloat8WriterImpl.
 Key: DRILL-3511
 URL: https://issues.apache.org/jira/browse/DRILL-3511
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON
Reporter: Daniel Barclay (Drill)
Assignee: Steven Phillips


For a JSON file containing this:

{noformat}
{x:[{y:-1.1,y:false}]}
{noformat}

a basic query yields an error message describing the problem in terms of 
Drill's implementation rather than in terms of the JSON data:

{noformat}
0: jdbc:drill:zk=local SELECT * FROM `dfs`.`/tmp/xxx.json`;
Error: DATA_READ ERROR: You tried to write a Bit type when you are using a 
ValueWriter of type NullableFloat8WriterImpl.

File  /tmp/2924/data1x/a.json
Record  1
Line  1
Column  24
Field  
Fragment 0:0

[Error Id: abe134c1-0a2c-4ce4-9f7d-1b68aad819fc on dev-linux2:31010] 
(state=,code=0)
0: jdbc:drill:zk=local 
{noformat}






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3507) AbstractMapVector uses AbstractContainerVector's logger

2015-07-16 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3507:
-

 Summary: AbstractMapVector uses AbstractContainerVector's logger
 Key: DRILL-3507
 URL: https://issues.apache.org/jira/browse/DRILL-3507
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


{{AbstractMapVector}} uses {{AbstractContainerVector}}'s {{logger}} field, and 
that logger field is not private.

(There seems to be about 63 other cases of abnormally non-private logger 
fields.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3506) Remove logger fields from interfaces.

2015-07-16 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3506:
-

 Summary: Remove logger fields from interfaces.
 Key: DRILL-3506
 URL: https://issues.apache.org/jira/browse/DRILL-3506
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)


About 29 Java interfaces have extraneous logger fields like this:

{noformat}
public interface Counter {
  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(Counter.class);
...
{noformat}

See:
common/src/main/java/org/apache/drill/common/logical/data/visitors/LogicalVisitor.java
common/src/main/java/org/apache/drill/common/logical/FormatPluginConfig.java
common/src/main/java/org/apache/drill/common/expression/LogicalExpression.java
exec/java-exec/src/test/java/org/apache/drill/exec/compile/ExampleTemplate.java
exec/java-exec/src/main/java/org/apache/drill/exec/disk/Spool.java
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/DrillRpcFuture.java
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/RpcOutcome.java
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/RpcConnectionHandler.java
exec/java-exec/src/main/java/org/apache/drill/exec/memory/BufferAllocator.java
exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/DataCollector.java
exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/RawBatchBuffer.java
exec/java-exec/src/main/java/org/apache/drill/exec/work/RootNodeDriver.java
exec/java-exec/src/main/java/org/apache/drill/exec/cache/DrillSerializable.java
exec/java-exec/src/main/java/org/apache/drill/exec/cache/Counter.java
exec/java-exec/src/main/java/org/apache/drill/exec/cache/DistributedMap.java
exec/java-exec/src/main/java/org/apache/drill/exec/cache/DistributedMultiMap.java
exec/java-exec/src/main/java/org/apache/drill/exec/cache/DistributedCache.java
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/FileWork.java
exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaFactory.java
exec/java-exec/src/main/java/org/apache/drill/exec/store/schedule/CompleteWork.java
exec/java-exec/src/main/java/org/apache/drill/exec/store/RecordRecorder.java
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/PhysicalVisitor.java
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/BatchCreator.java
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/RootCreator.java
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/filter/Filterer.java
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/BatchIterator.java
exec/java-exec/src/main/java/org/apache/drill/exec/physical/WriteEntry.java
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/Prel.java
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/PrelVisitor.java





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3496) Augment classpath scanning and DrillConfig logging.

2015-07-14 Thread Daniel Barclay (Drill) (JIRA)

Daniel Barclay (Drill) created DRILL-3496:
-

 Summary: Augment classpath scanning and DrillConfig logging.
 Key: DRILL-3496
 URL: https://issues.apache.org/jira/browse/DRILL-3496
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)
Assignee: Daniel Barclay (Drill)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[plug-ins] AbstractGroupScan.getScanStats()

2015-07-13 Thread Daniel Barclay


Somewhat (although not exactly) similarly to the case below, AbstractGroupScan has an 
implementation of GroupScan.clone(ListSchemaPath), but that implementation 
doesn't do anything other than throw an exception:

   throw new UnsupportedOperationException(String.format(%s does not implement 
clone(columns) method!, this.getClass().getCanonicalName()));

Is there any need for AbstractGroupScan to declare a non-abstract 
clone(ListSchemaPath)?  Is there any need for it to re-declare 
clone(ListSchemaPath)at all?  (Does it narrow the contract?)

Daniel

I wrote:

Method org.apache.drill.exec.physical.base.AbstractGroupScan.getScanStats()
has a body that throws a This must be implemented exception.

Why isn't getScanStats()simply an abstract method?

(Is there any case in which a subclass doesn't need to implement the
method (i.e., where that method won't ever be called for that subclass)?)


Daniel



--
Daniel Barclay
MapR Technologies

decoding JsonTypeInfo on StoragePluginConfig

2015-07-13 Thread Daniel Barclay


What exactly does the following Jackson @JsonTypeInfo annotation on
org.apache.drill.common.logical.StoragePluginConfig do?:


  @JsonTypeInfo(use = JsonTypeInfo.Id.NAME, include = JsonTypeInfo.As.PROPERTY, 
property=type)
  public abstract class StoragePluginConfig {


Also, would that annotation still have the desired effect if
StoragePluginConfig were an interface?


Daniel
--
Daniel Barclay
MapR Technologies

1 2 3 4 5 >

1 - 100 of 400 matches

Mail list logo