Re: [ANNOUNCE]: New committer: Ankush Kapur

2020-08-07 Thread Bohdan Kazydub
Congratulations Ankush!

On Fri, Aug 7, 2020 at 3:49 PM Ankush Kapur  wrote:

> Thank you, looking forward to it.
>
> On Thu, Aug 6, 2020, 11:06 AM Igor Guzenko 
> wrote:
>
> > Congratulations Ankush!
> >
> > On Thu, Aug 6, 2020 at 8:12 AM weijie tong 
> > wrote:
> >
> >> Congratulations Ankush!
> >>
> >> On Thu, Aug 6, 2020 at 2:37 AM Charles Givre  wrote:
> >>
> >> > The Project Management Committee (PMC) for Apache [PROJECT] has
> invited
> >> > Ankush Kapur to become a committer and we are pleased to announce that
> >> he
> >> > has accepted.
> >> >
> >> > Being a committer enables easier contribution to the project since
> there
> >> > is no need to go via the patch submission process. This should enable
> >> > better productivity. Being a PMC member enables assistance with the
> >> > management and to guide the direction of the project.
> >> >
> >> > Welcome Ankush!
> >> > -- C
> >> >
> >> >
> >> >
> >>
> >
>


[jira] [Created] (DRILL-7764) Cleanup warning messages in GuavaPatcher class

2020-07-03 Thread Bohdan Kazydub (Jira)
Bohdan Kazydub created DRILL-7764:
-

 Summary: Cleanup warning messages in GuavaPatcher class
 Key: DRILL-7764
 URL: https://issues.apache.org/jira/browse/DRILL-7764
 Project: Apache Drill
  Issue Type: Bug
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


Currently GuavaPatcher contains
{code}
logger.warn("Unable to patch Guava classes.", e);
{code}
which outputs whole exception stack trace to logs which is unnecessary alarming.

This log message will be changed to 
{code}
logger.warn("Unable to patch Guava classes: {}", e.getMessage());
logger.debug("Exception:", e);
{code}
logging the stack trace only in debug mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7759) Code compilation exception for queries containing (untyped) NULL

2020-06-30 Thread Bohdan Kazydub (Jira)
Bohdan Kazydub created DRILL-7759:
-

 Summary: Code compilation exception for queries containing 
(untyped) NULL
 Key: DRILL-7759
 URL: https://issues.apache.org/jira/browse/DRILL-7759
 Project: Apache Drill
  Issue Type: Bug
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7750) Drill fails to read KeyStore password from Credential provider

2020-06-17 Thread Bohdan Kazydub (Jira)
Bohdan Kazydub created DRILL-7750:
-

 Summary: Drill fails to read KeyStore password from Credential 
provider
 Key: DRILL-7750
 URL: https://issues.apache.org/jira/browse/DRILL-7750
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.17.0
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub
 Fix For: 1.18


When core-site.xml has keystore or truststore specific properties along with 
Hadoop's CredentialProvider path, e.g.:
{code}



...
 
  ssl.server.truststore.location
  /etc/conf/ssl_truststore


  ssl.server.truststore.type
  jks


  ssl.server.truststore.reload.interval
  1


  ssl.server.keystore.location
  /etc/conf/ssl_keystore


  ssl.server.keystore.type
  jks


  hadoop.security.credential.provider.path
jceks://file/etc/conf/ssl_server.jceks

{code}
Drill fails to start.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7694) Register drill.queries.* counter metrics on Drillbit startup

2020-04-09 Thread Bohdan Kazydub (Jira)
Bohdan Kazydub created DRILL-7694:
-

 Summary: Register drill.queries.* counter metrics on Drillbit 
startup 
 Key: DRILL-7694
 URL: https://issues.apache.org/jira/browse/DRILL-7694
 Project: Apache Drill
  Issue Type: Bug
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] New PMC member: Bohdan Kazydub

2020-02-13 Thread Bohdan Kazydub
Thank you very much for greetings!
It is an honor to be part of the Drill community.

On Thu, Jan 30, 2020 at 11:36 PM Denys Ordynskiy 
wrote:

> Congratulations Bohdan!
>
> Kind regards,
> Denys Ordynskiy
>
> On Thu, Jan 30, 2020 at 9:38 PM Pritesh Maker 
> wrote:
>
> > Congrats Bohdan!
> >
> > On Thu, Jan 30, 2020 at 6:13 AM Vitalii Diravka 
> > wrote:
> >
> > > Congrats Bohdan! Well deserved!
> > >
> > > Kind regards
> > > Vitalii
> > >
> > >
> > > On Thu, Jan 30, 2020 at 7:53 AM Igor Guzenko <
> ihor.huzenko@gmail.com
> > >
> > > wrote:
> > >
> > > > Congratulations, Bohdan!
> > > >
> > > > Kind regards,
> > > > Igor
> > > >
> > > > On Wed, Jan 29, 2020 at 11:22 PM Volodymyr Vysotskyi <
> > > volody...@apache.org
> > > > >
> > > > wrote:
> > > >
> > > > > Congrats, Bohdan!
> > > > >
> > > > > Kind regards,
> > > > > Volodymyr Vysotskyi
> > > > >
> > > > >
> > > > > On Wed, Jan 29, 2020 at 8:39 PM Paul Rogers
> >  > > >
> > > > > wrote:
> > > > >
> > > > > > Congratulations Bohdan, well deserved!
> > > > > > - Paul
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wednesday, January 29, 2020, 09:41:21 AM PST, Arina
> > > Ielchiieva <
> > > > > > ar...@apache.org> wrote:
> > > > > >
> > > > > >  I am pleased to announce that Drill PMC invited Bohdan Kazydub
> to
> > > the
> > > > > PMC
> > > > > > and
> > > > > > he has accepted the invitation.
> > > > > >
> > > > > > Congratulations Bohdan and welcome!
> > > > > >
> > > > > > - Arina
> > > > > > (on behalf of Drill PMC)
> > > > > >
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (DRILL-7565) ANALYZE TABLE ... REFRESH METADATA does not work for empty Parquet files

2020-02-03 Thread Bohdan Kazydub (Jira)
Bohdan Kazydub created DRILL-7565:
-

 Summary: ANALYZE TABLE ... REFRESH METADATA does not work for 
empty Parquet files
 Key: DRILL-7565
 URL: https://issues.apache.org/jira/browse/DRILL-7565
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.17.0
Reporter: Bohdan Kazydub
Assignee: Vova Vysotskyi


The following query does not create metadata for empty Parquet table: 
{code}
@Test
  public void testAnalyzeEmptyParquetTable() throws Exception {

String tableName = "parquet/empty/simple/empty_simple.parquet";

try {
  client.alterSession(ExecConstants.METASTORE_ENABLED, true);
  testBuilder()
  .sqlQuery("ANALYZE TABLE dfs.`%s` REFRESH METADATA", tableName)
  .unOrdered()
  .baselineColumns("ok", "summary")
  .baselineValues(true, String.format("Collected / refreshed metadata 
for table [dfs.default.%s]", tableName))
  .go();
} finally {
  run("analyze table dfs.`%s` drop metadata if exists", tableName);
  client.resetSession(ExecConstants.METASTORE_ENABLED);
}
  }
{code}
but yields
{code}
java.lang.AssertionError: Different number of records returned 
Expected :1
Actual   :0



at 
org.apache.drill.test.DrillTestWrapper.compareResults(DrillTestWrapper.java:862)
at 
org.apache.drill.test.DrillTestWrapper.compareUnorderedResults(DrillTestWrapper.java:567)
at org.apache.drill.test.DrillTestWrapper.run(DrillTestWrapper.java:171)
at org.apache.drill.test.TestBuilder.go(TestBuilder.java:145)
at 
org.apache.drill.exec.store.parquet.TestEmptyParquet.testSelectWithDisabledMetastore(TestEmptyParquet.java:430)
at java.lang.Thread.run(Thread.java:748)
{code}


When changing expected result set to empty 
({{TestBuilder#expectsEmptyResultSet()}}), {{SHOW TABLES}} command after 
{{ANALYZE TABLE ...}} does not show any table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: DICT keys in projection

2020-01-22 Thread Bohdan Kazydub
Hi Paul,

Regarding your point "We can also handle a map projection: `a.b` which
matches:

* A (possibly repeated) map
* A (possibly repeated) DICT with VARCHAR keys
* A UNION (because a union might contain a possibly-repeated map)
* A LIST (because the list can contain a union which might contain a
possibly-repeated map)":

I am not sure why `a.b` is possible for REPEATED MAP - this looks as a
shortcut of some sort. I mean, it looks wrong with respect to data types,
isn't it? Consider an example in Java: `Map[] a = ...;
Object result = a.get("b");` does not yield array of Integer; let's pretend
the 'Map' represents a Drill's MAP. But this notation
could have been an alias to some 'function', like `Integer[] array =
collect((Map) a, "b")`. This does not work for REPEATED
MAP in Drill currently, though such behaviour is present in Hive. (I am not
saying this is wrong to support it for a REPEATED MAP, it may be useful.)

In the case of REPEATED DICT we _may_ choose not to support such
"shortcut", but provide UDFs with needed functionality.

Regarding using keys in filter: I think, it is a good idea to provide UDFs
for such needs. Hive, for example, has following functions for (Hive's) MAP
[1] (see "Collection Functions"):
array map_keys(Map)
array map_values(Map)


But yes, we must treat projections as general as possible until the real
schema is known and this is a hard task.


[1]
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-OperatorsonComplexTypes


Re: DICT keys in projection

2020-01-21 Thread Bohdan Kazydub
Hi Paul,

the DICT projection acts as a placeholder now, the real implementation is
to be added when we completely switch to EVF. (Actually, at first I used
map projection but added dict projection to separate this two.) These two
projections are currently the same - it can be removed now and added later
if needed.

About key types other than VARCHAR and INT: they are supported, depending
on usage.

During planning, if queried table supports conversion of its row structure
to `RelDataType` (`RelDataType
org.apache.calcite.schema.Table#getRowType(RelDataTypeFactory)`) then such
`` SELECT `dict2`[123.4] FROM cp.`employee.json` `` (and for other
primitive key types) will work as expected (see
`org.apache.drill.exec.hive.complex_types.TestHiveMaps`, there are many
tests with projecting dict's values with different key types). The `key`
will be represented as `ArraySegment` in case of INT, SMALLINT, TINYINT and
`NamedSegment` in other but these same `PathSegment`s contain original key
value which is then used when generating code for projection.

The other 'implementation' is when we query, Parquet file, for example,
there is no (currently) such conversion to `RelDataType` and during
planning each column is considered `ANY`. If one uses an example you
provided, `` SELECT `dict2`[123.4] FROM cp.`employee.json` ``, it will fail
with the same error. But, DOUBLE key can be passed as VARCHAR - '123.4' -
it will be converted to appropriate type, based on
`DictVector#getKeyType()`, but this is not as efficient as in the case
above, because the key is converted for each row. Surely, some improvements
can be done.

Hope, this answers your questions, do ask again if any clarification is
needed.

On Tue, Jan 21, 2020 at 12:11 AM Paul Rogers 
wrote:

> Hi Bohdan,
>
> Thanks for your explanation.  My question comes from a little project I'm
> working on to handle projection in EVF. Queries go through two major steps:
> planing and execution. At the planning stage we use SQL syntax for the
> project list. For example:
>
> explain plan for SELECT a, e.`map`.`member`, `dict`['key'], `array`[10]
> FROM cp.`employee.json` e
>
> The planner sends an execution plan to operators. The project list appears
> in JSON. For the above:
>
>"columns" : [ "`a`", "`map`.`member`", "`dict`.`key`", "`array`[10]" ],
>
> We see that the JSON works as you described:
>
> * The SQL map "map.member" syntax is converted to "`map`.`member`" in the
> JSON plan.
>
> * The SQL DICT "`dict`['key']" syntax is converted to a form identical to
> maps: "`dict`.`key`".
>
> * The SQL DICT/array "`array`[10]" syntax is converted to "`array`[10]" in
> JSON.
>
> That is, on the execution side, we can't tell the difference between a MAP
> and a DICT request. We also can't tell the difference between an Array and
> DICT request. Apparently, because of this, the Schema Path parser does not
> recognize DICT syntax.
>
> Given the way projection works, "a.b" and "a['b']" are identical: either
> works for both a map or a DICT with VARCHAR keys.
>
> I was confused because the "ProjectionType" and "RequestedColumn" classes
> were extended with a DICT projection type. But, as we just saw, it is
> impossible to ever use that projection type (the dict['key'] syntax) in the
> execution engine.
>
> Shall I just remove special support for DICT projection, and just say that
> map and array projection are both compatible with a DICT column?
>
> One other related question. As I recall, a DICT allows any scalar type as
> a key. We saw that VARCHAR keys are converted to map references, INT keys
> are converted to array references. But, what about DOUBLE keys (recognizing
> that such keys are a bad idea):
>
> explain plan for SELECT `dict2`[123.4] FROM cp.`employee.json`
>
> VALIDATION ERROR: From line 1, column 25 to line 1, column 38: Cannot
> apply 'ITEM' to arguments of type 'ITEM(, )'
>
>
> So. We only support INT and VARCHAR keys in DICT when used with literals.
> Is this intentional?
>
> Obviously, to change this behavior, we'd have to change how columns are
> stored in JSON and we'd have to change the schema path parser. Doing so
> would impact all code that uses schema paths (including the projection
> stuff I'm working on.)
>
> Thanks,
> - Paul
>
>
>
> On Monday, January 20, 2020, 12:02:29 AM PST, Bohdan Kazydub <
> bohdan.kazy...@gmail.com> wrote:
>
>  Hi Paul,
>
> `SELECT myMap.x ...` and `SELECT myMap['x'] ...` is treated the same in
> Drill - schema path parser recogn

Re: DICT keys in projection

2020-01-20 Thread Bohdan Kazydub
Hi Paul,

`SELECT myMap.x ...` and `SELECT myMap['x'] ...` is treated the same in
Drill - schema path parser recognizes it as `myMap.x` in both cases.
The same is true for DICT - both `myDict['key1']` and myDict.key1 allows
Python-like syntax for projecting DICT values, but in schema path it is
also stored as in the case for MAP -
myDIct.key1 - as you can see, there is no distinction between MAP and DICT
based on schema path alone. (Note, that one can't project a `key` in DICT -
`SELECT myDict.key ...` will be treated as if `value` identified by `key`
with value 'key' is projected, as in Java's `Map map = ...;
Object value = map.get("key");`). In case when a key is an integer, schema
path is the same as in case of array.

Is this what you meant by "schema path parser does not recognize the
syntax" or do you get an error?

On Mon, Jan 20, 2020 at 5:16 AM Paul Rogers 
wrote:

> Hi All,
>
> What did we decide to do about projecting DICT values? Drill allows us to
> project specific MAP members:
>
> SELECT myMap.x ...
>
> And, Drill allows projecting array members:
>
> SELECT myArray[3] ...
>
> I thought there was discussion of allowing Python-like syntax for
> projecting DICT values:
>
> SELECT myDict['key1'] ...
>
> I tried this with no quotes, single-quotes and back-tick quotes. Seems
> that the schema path parser does not recognize the syntax.
>
> Is there some trick that I'm missing?
>
> Thanks,
> - Paul
>
>


Re: [ANNOUNCE] New Committer: Denys Ordynskiy

2020-01-03 Thread Bohdan Kazydub
Congratulations, Denys! Well deserved!

On Fri, Jan 3, 2020 at 9:35 AM Kunal Khatua  wrote:

> Congratulations, Denys!
>
> ~ Kunal
>
> On Wed, Jan 1, 2020 at 12:19 AM Pritesh Maker 
> wrote:
>
> > Congrats, Denys!
> >
> > Pritesh
> >
> > 
> > From: Igor Guzenko 
> > Sent: Tuesday, December 31, 2019 2:39 PM
> > To: dev
> > Cc: user
> > Subject: Re: [ANNOUNCE] New Committer: Denys Ordynskiy
> >
> > Congratulations Denys! Well done!
> >
> > On Mon, Dec 30, 2019 at 8:24 PM Paul Rogers 
> > wrote:
> >
> > > Congratulations Denys!
> > >
> > > - Paul
> > >
> > >
> > >
> > > On Monday, December 30, 2019, 4:25:49 AM PST, Arina Ielchiieva <
> > > ar...@apache.org> wrote:
> > >
> > > The Project Management Committee (PMC) for Apache Drill has invited
> Denys
> > > Ordynskiy to become a committer, and we are pleased to announce that he
> > has
> > > accepted.
> > >
> > > Denys has been contributing to Drill for more than a year. He did many
> > > contributions as a QA, he found, tested and verified important bugs and
> > > features. Recently he has actively participated in Hadoop 3 migration
> > > verification and actively tested current and previous releases. He also
> > > contributed into drill-test-framework to automate Drill tests.
> > >
> > > Welcome Denys, and thank you for your contributions!
> > >
> > > - Arina
> > > (on behalf of Drill PMC)
> > >
> >
>


[jira] [Created] (DRILL-7509) Incorrect TupleSchema is created when

2020-01-02 Thread Bohdan Kazydub (Jira)
Bohdan Kazydub created DRILL-7509:
-

 Summary: Incorrect TupleSchema is created when 
 Key: DRILL-7509
 URL: https://issues.apache.org/jira/browse/DRILL-7509
 Project: Apache Drill
  Issue Type: Bug
Reporter: Bohdan Kazydub






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release Apache Drill 1.17.0 - RC1

2019-12-20 Thread Bohdan Kazydub
Checked queries on Parquet files containing complex types.

+1.

On Fri, Dec 20, 2019 at 6:37 PM Arina Yelchiyeva 
wrote:

> Hi Holder,
>
> Thanks for participation in release verification.
>
> Only regressions or some really significant issues can be a release
> blockers.
> As Anton mentioned, since this rc jars are present in previous Drill
> versions, it is not a regression.
> I believe Drill has dependency to this libraries (maybe transitive) and I
> am not sure we can remove them, maybe version can be upgraded.
> I would be good if you could check and file a Jira if issue is still
> actual.
>
> Kind regards,
> Arina
>
> > On Dec 20, 2019, at 6:18 PM, Anton Gozhiy  wrote:
> >
> > Hi Holger,
> > These RC files were present in Drill 1.16.0, so this is not a regression.
> > And about JDBC connection problem, could you please file a JIRA with more
> > details?
> >
> > On Fri, Dec 20, 2019 at 6:02 PM  wrote:
> >
> >> - Tested custom authenticator with JDBC connect to Drill, and was unable
> >> to connect (connection hangs, ACL disabled, couldn't spend a deeper look
> >> into it).
> >> + Custom authenticator login with local sqlline has been successful
> >> - Found RC files for 3rd party jars in the binary archive, which
> shouldn't
> >> be there in a release version (from my perspective):
> >>
> >> apache-drill-1.17.0/jars/3rdparty/kerb-client-1.0.0-RC2.jar
> >> apache-drill-1.17.0/jars/3rdparty/kerby-config-1.0.0-RC2.jar
> >> apache-drill-1.17.0/jars/3rdparty/kerb-common-1.0.0-RC2.jar
> >> apache-drill-1.17.0/jars/3rdparty/kerb-crypto-1.0.0-RC2.jar
> >> apache-drill-1.17.0/jars/3rdparty/kerb-util-1.0.0-RC2.jar
> >> apache-drill-1.17.0/jars/3rdparty/kerb-core-1.0.0-RC2.jar
> >> apache-drill-1.17.0/jars/3rdparty/kerby-asn1-1.0.0-RC2.jar
> >> apache-drill-1.17.0/jars/3rdparty/kerby-pkix-1.0.0-RC2.jar
> >> apache-drill-1.17.0/jars/3rdparty/kerby-util-1.0.0-RC2.jar
> >> apache-drill-1.17.0/jars/3rdparty/kerb-simplekdc-1.0.0-RC2.jar
> >> apache-drill-1.17.0/jars/3rdparty/kerb-server-1.0.0-RC2.jar
> >> apache-drill-1.17.0/jars/3rdparty/kerb-identity-1.0.0-RC2.jar
> >> apache-drill-1.17.0/jars/3rdparty/kerb-admin-1.0.0-RC2.jar
> >>
> >> From me:
> >> -1 (non-binding)
> >>
> >> BR
> >> Holger
> >>
> >
> >
> > --
> > Sincerely, Anton Gozhiy
> > anton5...@gmail.com
>
>


Re: [ANNOUNCE] New PMC member: Ihor Guzenko

2019-12-13 Thread Bohdan Kazydub
Congratulations, well deserved!

On Fri, Dec 13, 2019 at 3:50 PM Anton Gozhiy  wrote:

> Congratulations, well deserved!
>
> On Fri, Dec 13, 2019 at 3:42 PM Arina Yelchiyeva <
> arina.yelchiy...@gmail.com>
> wrote:
>
> > Congratulations, Ihor!
> >
> > > On Dec 13, 2019, at 3:38 PM, Volodymyr Vysotskyi  >
> > wrote:
> > >
> > > I am pleased to announce that Drill PMC invited Ihor Guzenko to
> > > the PMC and he has accepted the invitation.
> > >
> > > Congratulations Ihor and welcome!
> > >
> > > - Vova
> > > (on behalf of Drill PMC)
> >
> >
>
> --
> Sincerely, Anton Gozhiy
> anton5...@gmail.com
>


[jira] [Created] (DRILL-7453) Update joda-time to 2.10.5 to have correct time zone info

2019-11-21 Thread Bohdan Kazydub (Jira)
Bohdan Kazydub created DRILL-7453:
-

 Summary: Update joda-time to 2.10.5 to have correct time zone info
 Key: DRILL-7453
 URL: https://issues.apache.org/jira/browse/DRILL-7453
 Project: Apache Drill
  Issue Type: Bug
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


As Brazil decided not to follow the DST changes for 2019 
(https://www.timeanddate.com/news/time/brazil-scraps-dst.html), update 
joda-time to the latest {{2.10.5}} version which contains the most recent dbtz 
info.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (DRILL-2000) Hive generated parquet files with maps show up in drill as map(key value)

2019-09-19 Thread Bohdan Kazydub (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bohdan Kazydub resolved DRILL-2000.
---
Fix Version/s: (was: Future)
   Resolution: Fixed

Fixed in scope of DRILL-7096

> Hive generated parquet files with maps show up in drill as map(key value)
> -
>
> Key: DRILL-2000
> URL: https://issues.apache.org/jira/browse/DRILL-2000
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 0.7.0
>Reporter: Ramana Inukonda Nagaraj
>    Assignee: Bohdan Kazydub
>Priority: Major
>
> Created a parquet file in hive having the following DDL
> hive> desc alltypesparquet; 
> OK
> c1 int 
> c2 boolean 
> c3 double 
> c4 string 
> c5 array 
> c6 map 
> c7 map 
> c8 struct
> c9 tinyint 
> c10 smallint 
> c11 float 
> c12 bigint 
> c13 array>  
> c15 struct>
> c16 array,n:int>> 
> Time taken: 0.076 seconds, Fetched: 15 row(s)
> Columns which are maps such as c6 map 
> show up as 
> 0: jdbc:drill:> select c6 from `/user/hive/warehouse/alltypesparquet`;
> ++
> | c6 |
> ++
> | {"map":[]} |
> | {"map":[]} |
> | {"map":[{"key":1,"value":"eA=="},{"key":2,"value":"eQ=="}]} |
> ++
> 3 rows selected (0.078 seconds)
> hive> select c6 from alltypesparquet;   
> NULL
> NULL
> {1:"x",2:"y"}
> Ignore the wrong values, I have raised DRILL-1997 for the same. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7373) Fix problems involving reading from DICT type

2019-09-11 Thread Bohdan Kazydub (Jira)
Bohdan Kazydub created DRILL-7373:
-

 Summary: Fix problems involving reading from DICT type
 Key: DRILL-7373
 URL: https://issues.apache.org/jira/browse/DRILL-7373
 Project: Apache Drill
  Issue Type: Bug
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


Add better support for different key types ({{boolean}}, {{decimal}}, 
{{float}}, {{double}} etc.) when retrieving values by key from {{DICT}} column  
when querying data source with known (during query validation phase) field 
types (such as Hive table), so that actual key object instance  is created in 
generated code and is passed to given {{DICT}} reader instead of generating its 
value for every row based on {{int}} ({{ArraySegment}}) or {{String}} 
({{NamedSegment}}) value.

This may be achieved by storing original literal value of passed key (as 
{{Object}}) in {{PathSegment}} and its type (as {{MajorType}}) and using it 
during code generation when reading {{DICT}}'s values by key in 
{{EvaluationVisitor}}.

Also, fix NPE when reading some cases involving reading values from {{DICT}} 
and fix wrong result when reading complex structures using many ITEM operators 
(i.e. , [] brackets), e.g. 
{code}
SELECT rid, mc.map_arr_map['key01'][1]['key01.1'] p16 FROM hive.map_complex_tbl 
mc
{code}
where {{map_arr_map}} is of following type: {{MAP>>}}




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-08-23 Thread Bohdan Kazydub (Jira)
Bohdan Kazydub created DRILL-7359:
-

 Summary: Add support for DICT type in RowSet Framework
 Key: DRILL-7359
 URL: https://issues.apache.org/jira/browse/DRILL-7359
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: [ANNOUNCE] New PMC Chair of Apache Drill

2019-08-22 Thread Bohdan Kazydub
Congratulations, Charles!

On Thu, Aug 22, 2019 at 12:26 PM Volodymyr Vysotskyi 
wrote:

> Congratulations, Charles!
>
> And thanks Arina for your effort to grow the community and make the project
> much better!
> It will be hard to keep the bar so high, but for now, I know who tops my
> chart of PMC chairs :)
>
> Kind regards,
> Volodymyr Vysotskyi
>
>
> On Thu, Aug 22, 2019 at 10:39 AM Igor Guzenko 
> wrote:
>
> > Congratulations, Charles! Great job!
> >
> > On Thu, Aug 22, 2019 at 10:28 AM Arina Ielchiieva 
> > wrote:
> >
> > > Hi all,
> > >
> > > It has been a honor to serve as Drill Chair during the past year but
> it's
> > > high time for the new one...
> > >
> > > I am very pleased to announce that the Drill PMC has voted to elect
> > Charles
> > > Givre as the new PMC chair of Apache Drill. He has also been approved
> > > unanimously by the Apache Board in last board meeting.
> > >
> > > Congratulations, Charles!
> > >
> > > Kind regards,
> > > Arina
> > >
> >
>


Re: [ANNOUNCE] New Committer: Anton Gozhyi

2019-07-29 Thread Bohdan Kazydub
Congratulations Anton!

On Mon, Jul 29, 2019 at 12:44 PM Igor Guzenko 
wrote:

> Congratulations Anton!
>
> On Mon, Jul 29, 2019 at 12:09 PM denysord88  wrote:
>
> > Congratulations Anton! Well deserved!
> >
> > On 07/29/2019 12:02 PM, Volodymyr Vysotskyi wrote:
> > > The Project Management Committee (PMC) for Apache Drill has invited
> Anton
> > > Gozhyi to become a committer, and we are pleased to announce that he
> has
> > > accepted.
> > >
> > > Anton Gozhyi has been contributing to Drill for more than a year and a
> > > half. He did significant contributions as a QA, including reporting
> > > non-trivial issues and working on automation of Drill tests. All the
> > issues
> > > reported by Anton have a clear description of the problem, steps to
> > > reproduce and expected behavior. Besides contributions as a QA, Anton
> > made
> > > high-quality fixes into Drill.
> > >
> > > Welcome Anton, and thank you for your contributions!
> > >
> > > - Volodymyr
> > > (on behalf of Drill PMC)
> > >
> >
> >
>


Re: [ANNOUNCE] New Committer: Igor Guzenko

2019-07-23 Thread Bohdan Kazydub
Congratulations Igor!

On Mon, Jul 22, 2019 at 5:02 PM Arina Ielchiieva  wrote:

> The Project Management Committee (PMC) for Apache Drill has invited Igor
> Guzenko to become a committer, and we are pleased to announce that he has
> accepted.
>
> Igor has been contributing into Drill for 9 months and made a number of
> significant contributions, including cross join syntax support, Hive views
> support, as well as improving performance for Hive show schema and unit
> tests. Currently he is working on supporting Hive complex types
> [DRILL-3290]. He already added support for list type and working on struct
> and canonical map.
>
> Welcome Igor, and thank you for your contributions!
>
> - Arina
> (on behalf of the Apache Drill PMC)
>


Re: Apache Drill Hangout - July 9, 2019

2019-07-10 Thread Bohdan Kazydub
Hi Weijie,

It'd be nice to hear about your recent work but it looks like the regular
Hangout time is not convenient for you.
Maybe you could give a talk on the next Hangout session?
If you're still willing to do so, please let us know with suggestion of
time that works for you in response to the email, so that Apache Drill
community decides how to proceed with this (i.e. we find a convenient time
that works for all interested in the topic).

Kind regards,
Bohdan Kazydub

On Tue, Jul 9, 2019 at 2:37 AM weijie tong  wrote:

> I could give a short talk about my recent work about parallel HashJoin and
> something others.
>
> On Mon, Jul 8, 2019 at 7:28 PM Bohdan Kazydub 
> wrote:
>
> > Hi Drillers,
> >
> > We will have our bi-weekly hangout tomorrow, July 9th, at 10 AM PST
> > (link: https://meet.google.com/yki-iqdf-tai ).
> >
> > If there are any topics you would like to discuss during the hangout
> please
> > respond to this email.
> >
> > Kind regards,
> > Bohdan Kazydub
> >
>


Apache Drill Hangout - July 9, 2019

2019-07-08 Thread Bohdan Kazydub
Hi Drillers,

We will have our bi-weekly hangout tomorrow, July 9th, at 10 AM PST
(link: https://meet.google.com/yki-iqdf-tai ).

If there are any topics you would like to discuss during the hangout please
respond to this email.

Kind regards,
Bohdan Kazydub


[jira] [Created] (DRILL-7312) Allow case sensitivity for column names when it is supported by storage format

2019-07-01 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-7312:
-

 Summary: Allow case sensitivity for column names when it is 
supported by storage format
 Key: DRILL-7312
 URL: https://issues.apache.org/jira/browse/DRILL-7312
 Project: Apache Drill
  Issue Type: Bug
Reporter: Bohdan Kazydub


After upgrade to Calcite 1.20.0 (DRILL-7200), there is a following issue:
If HBase table has 2 columns which are equal if the case is ignored and are not 
equal if case is considered, e.g. a table has column 'F' and 'f', a following 
query
{code}
select * from hbase.`TestTableMultiCF` t
{code}
fails with following exception
{code}
 (org.apache.calcite.runtime.CalciteContextException) At line 1, column 8: 
Column 'F' is ambiguous
sun.reflect.NativeConstructorAccessorImpl.newInstance0():-2
sun.reflect.NativeConstructorAccessorImpl.newInstance():62
sun.reflect.DelegatingConstructorAccessorImpl.newInstance():45
java.lang.reflect.Constructor.newInstance():423
org.apache.calcite.runtime.Resources$ExInstWithCause.ex():463
org.apache.calcite.sql.SqlUtil.newContextException():824
org.apache.calcite.sql.SqlUtil.newContextException():809
org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError():4805
org.apache.calcite.sql.validate.DelegatingScope.fullyQualify():496
org.apache.calcite.sql.validate.SqlValidatorImpl.findTableColumnPair():3501
org.apache.calcite.sql.validate.SqlValidatorImpl.isRolledUpColumn():3535
org.apache.calcite.sql.validate.SqlValidatorImpl.expandStar():519
org.apache.calcite.sql.validate.SqlValidatorImpl.expandSelectItem():429
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelectList():4069
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3376
org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():995
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():955
org.apache.calcite.sql.SqlSelect.validate():216
{code}

If HBase is case-sensitive in regards to column name Drill should support this 
as well when querying from HBase table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSSION] DRILL-7097 Rename MapVector to StructVector

2019-06-04 Thread Bohdan Kazydub
Hi Paul,

if I understood you correctly, you are talking about implementation of
"true map" as list of STRUCT (which is currently named MAP in Drill). While
this implementation is viable we still do need to introduce a new type for
such "true map" as REPEATED MAP is still a different data type. That is
while a "true map" can be implemented using REPEATED MAP under the hood
these are not the same types (e.g., what if user wants to use REPEATED MAP
(AKA repeated struct) and not "true map").
Is my understanding correct?

The approach found in [1] was taken similarly to that done in Hive[2] as I
find it clearer and not to meddle with MAP's innards.

Also worth mentioning that there is this[3] open [WIP] PR into Apache Arrow
which introduces MapVector (opened a few days ago) which uses the approach
you suggested.

[1]
https://docs.google.com/presentation/d/1FG4swOrkFIRL7qjiP7PSOPy8a1vnxs5Z9PM3ZfRPRYo/edit#slide=id.p
[2]
https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/MapColumnVector.java#L30
[3]https://github.com/apache/arrow/pull/

On Tue, Jun 4, 2019 at 2:59 AM Paul Rogers 
wrote:

> Hi Igor,
>
> Glad the community was able to provide a bit of help.
>
> Let's talk about about another topic. You said: "And main purpose will be
> hiding of repeated map meta keys
> ("key","value") and simulation of real map functionality."
>
> On the one hand, we are all accustomed to thinking of a Java (or Python)
> map as a black box: store (key, value) pairs, retrieve values by key. This
> is the programming view. I wonder, however, if it is the best SQL view.
>
> Drill is, of course, SQL-based. It may be easier to bring the data to SQL
> than to bring SQL to the data. SQL works on tables (relations) and is very
> powerful when doing so. Standard SQL does not, however, provide tools to
> work with dictionaries. (There is an extension, SQL++, that might provide
> such extensions. But, even if Drill supported SQL++, no front-end tools
> provides such support AFAIK.)
>
> So, how do we bring the DICT type to SQL? We do so by noting that a DICT
> is really a little table of (key, value) pairs (with a uniqueness
> constraint on the key.) Once we adopt this view, we can apply (I hope!) the
> nested table mechanism recently added to Drill.
>
> This means that the user DOES want to know the name of the key and value
> columns: they are columns in a tuple (relation) that can be joined and
> filtered. Suppose each customer has a DICT of contact information with keys
> as "office", "home", "cell",... and values as the phone number. You can use
> SQL to find the office numbers:
>
>
> SELECT custName, contactInfo.value as phone WHERE contactInfo.key =
> "office"...
>
>
> So, rather than wanting to hide the (key, value) structure of a DICT, we
> could argue that exposing that structure allows the DICT to look like a
> relation, and thus exploit existing Drill features. In fact, this may make
> Drill more powerful when working Hive maps than is Hive itself (If Hive
> treats maps as opaque objects.)
>
>
> You also showed the SQLLine output you would like for a DICT column. This
> example exposes a "lie" (a short-cut) that Sqlline exploits. SqlLine asks
> Drill to convert a column to a Java Object of some sort, then SqlLine calls
> toString() on that object to produce the value you see in SqlLine output.
>
> Some examples. An array (repeated) column is a set of values. Drill
> converts the repeated value to a Java array, which toString() converts to
> something like "[1, 2, 3]". The same is true of MAP: Drill converts it to a
> Java Map, toString converts it to a JSON-like presentation.
>
> So, your DICT (or repeated map) type should provide a getObject() method
> that converts the repeated map to a Java Map. SqlLine will convert the map
> object to the display format you showed in your example. (My guess is that
> a repeated map today produces an array of Java Map objects: you want a
> single Java Map built from the key/value pairs.)
>
>
> A JDBC user can use the getObject() method to retrieve a Java Map
> representation of a Drill DICT. (This functionality is not available in
> ODBC AFAIK.) The same is true for anyone brave enough to use the native
> Drill client API.
>
>
> Thanks,
> - Paul
>
>
>
> On Monday, June 3, 2019, 7:08:42 AM PDT, Igor Guzenko <
> ihor.huzenko@gmail.com> wrote:
>
>  Hi all,
>
> So finally, I'm going to abandon the renaming ticket DRILL-7097 and
> related PR (1803).
>
> Next, the DRILL-7096 should be rewritten to cover addition of new DICT
> type. But, if I understand correctly,
> based on repeated vector, now result for new type will be returned like:
>
> row |  dict_column MAP
>
> --
>   1  | [{"key":1, "value":"v1"}, {"key":2, "value":"v2"} ]
>   2  | [{"key":0, "value":"v7"}, {"key":2, "value":"v2"}, {"key":4,
> "value":"v4"} ]
>   3  | [{"key":-1, "v

[jira] [Created] (DRILL-7200) Update Calcite to 1.19

2019-04-24 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-7200:
-

 Summary: Update Calcite to 1.19
 Key: DRILL-7200
 URL: https://issues.apache.org/jira/browse/DRILL-7200
 Project: Apache Drill
  Issue Type: Task
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


Calcite has released the 1.19.0 version. Upgrade Calcite dependency in Drill to 
the newest version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Hangout Discussion Topics for 04-16-2019

2019-04-15 Thread Bohdan Kazydub
Hello,
Igor and I would like to discuss Hive Complex types support.

Thanks,
Bohdan

On Mon, Apr 15, 2019 at 8:47 PM Charles Givre  wrote:

> I’d like to promote the Drill track for ApacheCon.
>
> Sent from my iPhone
>
> > On Apr 15, 2019, at 13:09, Jyothsna Reddy 
> wrote:
> >
> > Hello Everyone,
> > Does anyone have any topics for tomorrow's hangout?
> >
> > We will start the hangout at 10 AM PST (link
> > http://meet.google.com/yki-iqdf-tai).
> >
> > Thank you,
> > Jyothsna
>


Re: [DISCUSS] 1.16.0 release

2019-03-19 Thread Bohdan Kazydub
Hello Sorabh,

I'm currently working on a small bug when querying views from S3 storage
with plain authentication [DRILL-7079].
I'd like to include it into Drill 1.16. I'll need 1-2 days for making the
PR as well.

Thanks,
Bohdan


On Tue, Mar 19, 2019 at 1:55 PM Igor Guzenko 
wrote:

> Hello Sorabh,
>
> I'm currently working on small improvement for Hive schema show tables
> [DRILL-7115].
> I'd like to include it into Drill 1.16. I'll need 1-2 days for making the
> PR.
>
> Thanks,
> Igor
>
> On Mon, Mar 18, 2019 at 11:25 PM Kunal Khatua  wrote:
> >
> > Khurram
> >
> > Currently, for DRILL-7061, the feature has been disabled. If I'm able to
> get a commit within the week for DRILL-6960, this will be addressed in that.
> >
> > DRILL-7110 is something that is required as well, to allow DRILL-6960 to
> be accepted.
> >
> > ~ Kunal
> > On 3/18/2019 1:20:16 PM, Khurram Faraaz  wrote:
> > Can we also have the fix fox DRILL-7061
> > in Drill 1.16 ?
> >
> > Thanks,
> > Khurram
> >
> > On Mon, Mar 18, 2019 at 11:40 AM Karthikeyan Manivannan
> > kmanivan...@mapr.com> wrote:
> >
> > > Please include DRILL-7107
> > > https://issues.apache.org/jira/browse/DRILL-7107>
> > > I will open the PR today.
> > > It is a small change and fixes a basic usability issue.
> > >
> > > Thanks.
> > >
> > > Karthik
> > >
> > > On Thu, Mar 14, 2019 at 4:50 PM Charles Givre wrote:
> > >
> > > > One more… DRILL-7014 is basically done and I’d like to see that get
> into
> > > > Drill 1.16.
> > > >
> > > > > On Mar 14, 2019, at 19:44, Charles Givre wrote:
> > > > >
> > > > > Who should I add as a reviewer for 7032?
> > > > >
> > > > >> On Mar 14, 2019, at 19:42, Sorabh Hamirwasia
> > > >
> > > > wrote:
> > > > >>
> > > > >> Hi Charles,
> > > > >> Can you please add reviewer for DRILL-7032 ?
> > > > >> For DRILL-6970 the PR is closed by the author, I have pinged in
> JIRA
> > > > asking
> > > > >> to re-open so that it can be reviewed.
> > > > >>
> > > > >> Thanks,
> > > > >> Sorabh
> > > > >>
> > > > >> On Thu, Mar 14, 2019 at 4:29 PM Charles Givre
> > > wrote:
> > > > >>
> > > > >>> Hi Sorabh,
> > > > >>> I have 3 PRs that are almost done, awaiting final review.
> > > Drill-7077,
> > > > >>> DRILL-7032, DRILL-7021. I owe @ariina some fixes for 7077, but
> I’m
> > > > waiting
> > > > >>> for review of the others. Also, there is that DRILL-6970 about
> the
> > > > buffer
> > > > >>> overflows in the logRegex reader that isn’t mine but I’d like to
> see
> > > > >>> included.
> > > > >>> Thanks,
> > > > >>> —C
> > > > >>>
> > > >  On Mar 14, 2019, at 13:13, Sorabh Hamirwasia
> > > sohami.apa...@gmail.com
> > > > >
> > > > >>> wrote:
> > > > 
> > > >  Hi Arina,
> > > >  Thanks for your response. With ETA of two weeks we are looking
> at
> > > end
> > > > of
> > > >  the month or beginning next month. I will wait until Monday for
> > > > others to
> > > >  respond and then will finalize on a cut-off date.
> > > > 
> > > >  Thanks,
> > > >  Sorabh
> > > > 
> > > >  On Wed, Mar 13, 2019 at 4:28 AM Arina Ielchiieva
> > > > >>> arina.yelchiy...@gmail.com>
> > > >  wrote:
> > > > 
> > > > > Hi Sorabh,
> > > > >
> > > > > thanks for volunteering to do the release.
> > > > >
> > > > > Paul and I are working on file schema provisioning for text
> file
> > > > storage
> > > > > which is aimed for 1.16. To wrap up the work we need to
> deliver two
> > > > >>> Jiras:
> > > > > DRILL-7095 and DRILL-7011. ETA: 2 weeks.
> > > > > Please plan the release date accordingly.
> > > > >
> > > > > Kind regards,
> > > > > Arina
> > > > >
> > > > > On Tue, Mar 12, 2019 at 9:16 PM Sorabh Hamirwasia
> > > > >>> sohami.apa...@gmail.com
> > > > >>
> > > > > wrote:
> > > > >
> > > > >> Hi All,
> > > > >> It's around two and a half month since we did 1.15.0 release
> for
> > > > Apache
> > > > >> Drill. Based on our 3 months release cadence I think it's
> time to
> > > > >>> discuss
> > > > >> our next release. I will volunteer to manage the next release.
> > > > >>
> > > > >> **Below is the current JIRA stats:*
> > > > >> *[1] Open#: 15*
> > > > >>
> > > > >> - Would be great if everyone can look into their assigned
> tickets
> > > > and
> > > > >> update the fix version as needed. Please keep the ones which
> you
> > > > find
> > > > >> must
> > > > >> have and can be completed sooner.
> > > > >>
> > > > >> *[2] InProgress#: 11*
> > > > >>
> > > > >> - If you think we *must* include any issues from this list
> then
> > > > >>> please
> > > > >> reply on this thread. Also would be great to know how much
> time
> > > you
> > > > >> think
> > > > >> is needed for these issues. Based on that we can take a call
> which
> > > > >>> one
> > > > >> to
> > > > >> target for this release.
> > > > >>
> > > > >> *[3] Reviewable#: 14*
> > > > >>
> > > > >> - All the review

[jira] [Resolved] (DRILL-6430) Drill Should Not Fail If It Sees Deprecated Options Stored In Zookeeper Or Locally

2019-03-19 Thread Bohdan Kazydub (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bohdan Kazydub resolved DRILL-6430.
---
   Resolution: Done
Fix Version/s: (was: 1.17.0)
   1.16.0

> Drill Should Not Fail If It Sees Deprecated Options Stored In Zookeeper Or 
> Locally
> --
>
> Key: DRILL-6430
> URL: https://issues.apache.org/jira/browse/DRILL-6430
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>    Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.16.0
>
>
> This is required for resource management since we will likely remove many 
> options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7038) Queries on partitioned columns scan the entire datasets

2019-02-13 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-7038:
-

 Summary: Queries on partitioned columns scan the entire datasets
 Key: DRILL-7038
 URL: https://issues.apache.org/jira/browse/DRILL-7038
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub
 Fix For: 1.16.0


For tables with hive-style partitions like
{code}
/table/2018/Q1
/table/2018/Q2
/table/2019/Q1
etc.
{code}
if any of the following queries is run:
{code}
select distinct dir0 from dfs.`/table`
{code}
{code}
select dir0 from dfs.`/table` group by dir0
{code}
it will actually scan every single record in the table rather than just getting 
a list of directories at the dir0 level. This applies even when cached metadata 
is available. This is a big penalty especially as the datasets grow.

To avoid such situations, a logical prune rule can be used to collect partition 
columns (`dir0`), either from metadata cache (if available) or group scan, and 
drop unnecessary files from being read. The rule will be applied on following 
conditions:
1) all queried columns are partitoin columns, and
2) either {{DISTINCT}} or {{GROUP BY}} operations are performed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Apache Drill Hangout - 05 Feb, 2019

2019-02-05 Thread Bohdan Kazydub
Hi Drillers,
The bi-weekly Apache Drill hangout is scheduled for today, Tuesday, Feb
5th, at 10 AM PST. The original plan is for Sorabh & Hanumath to talk about
Resource Management. If there are any other topics or questions, feel free
to reply or raise during the hangout.

The hangout link: https://meet.google.com/yki-iqdf-tai?authuser=3

P.s. Sorry for late notification.


Re: January Apache Drill board report

2019-02-01 Thread Bohdan Kazydub
Hi, Arina

Just to clarify, one of the options, the one "to return null for empty
string" (`drill.exec.functions.cast_empty_string_to_null`),
was present for some time (from Drill 0.8.0 at least) and affected CASTs
from string to numeric types only (if the option is set to true, empty
string in CAST is treated as NULL).
In 1.15 the option was expanded to affect CASTs to other types and TO_*
functions.


On Fri, Feb 1, 2019 at 9:24 AM Abhishek Girish  wrote:

> +1. Looks good!
>
> On Thu, Jan 31, 2019 at 9:15 AM Vitalii Diravka 
> wrote:
>
> > +1
> >
> > Kind regards
> > Vitalii
> >
> >
> > On Thu, Jan 31, 2019 at 6:18 PM Aman Sinha  wrote:
> >
> > > Thanks for putting this together, Arina.
> > > The Drill Developer Day and Meetup were separate events, so you can
> split
> > > them up.
> > >   - A half day Drill Developer Day was held on Nov 14.  A variety of
> > > technical design issues were discussed.
> > >   - A Drill user meetup was held on the same evening.  2 presentations
> -
> > > one on use case for Drill and one about indexing support in Drill were
> > > presented.
> > >
> > > Rest of the report LGTM.
> > >
> > > -Aman
> > >
> > >
> > > On Thu, Jan 31, 2019 at 7:58 AM Arina Ielchiieva 
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > please take a look at the draft board report for the last quarter and
> > let
> > > > me know if you have any comments.
> > > >
> > > > Thanks,
> > > > Arina
> > > >
> > > > =
> > > >
> > > > ## Description:
> > > >  - Drill is a Schema-free SQL Query Engine for Hadoop, NoSQL and
> Cloud
> > > > Storage.
> > > >
> > > > ## Issues:
> > > >  - There are no issues requiring board attention at this time.
> > > >
> > > > ## Activity:
> > > >  - Since the last board report, Drill has released version 1.15.0,
> > > >including the following enhancements:
> > > >- Add capability to do index based planning and execution
> > > >- CROSS join support
> > > >- INFORMATION_SCHEMA FILES and FUNCTIONS were added
> > > >- Support for TIMESTAMPADD and TIMESTAMPDIFF functions
> > > >- Ability to secure znodes with custom ACLs
> > > >- Upgrade to SQLLine 1.6
> > > >- Parquet filter pushdown for VARCHAR and DECIMAL data types
> > > >- Support JPPD (Join Predicate Push Down)
> > > >- Lateral join functionality was enabled by default
> > > >- Multiple Web UI improvements to simplify the use of options and
> > > submit
> > > > queries
> > > >- Query performance with the semi-join functionality was improved
> > > >- Support for aliases in the GROUP BY clause
> > > >- Options to return null for empty string and prevents Drill from
> > > > returning
> > > >  a result set for DDL statements
> > > >- Storage plugin names became case-insensitive
> > > >
> > > > - Drill developer meet up was held on November 14, 2018.
> > > >
> > > > ## Health report:
> > > >  - The project is healthy. Development activity
> > > >as reflected in the pull requests and JIRAs is good.
> > > >  - Activity on the dev and user mailing lists are stable.
> > > >  - Three committers were added in the last period.
> > > >
> > > > ## PMC changes:
> > > >
> > > >  - Currently 23 PMC members.
> > > >  - No new PMC members added in the last 3 months
> > > >  - Last PMC addition was Charles Givre on Mon Sep 03 2018
> > > >
> > > > ## Committer base changes:
> > > >
> > > >  - Currently 51 committers.
> > > >  - New commmitters:
> > > > - Hanumath Rao Maduri was added as a committer on Thu Nov 01 2018
> > > > - Karthikeyan Manivannan was added as a committer on Fri Dec 07
> > 2018
> > > > - Salim Achouche was added as a committer on Mon Dec 17 2018
> > > >
> > > > ## Releases:
> > > >
> > > >  - 1.15.0 was released on Mon Dec 31 2018
> > > >
> > > > ## Mailing list activity:
> > > >
> > > >  - dev@drill.apache.org:
> > > > - 415 subscribers (down -12 in the last 3 months):
> > > > - 2066 emails sent to list (2653 in previous quarter)
> > > >
> > > >  - iss...@drill.apache.org:
> > > > - 18 subscribers (up 0 in the last 3 months):
> > > > - 2480 emails sent to list (3228 in previous quarter)
> > > >
> > > >  - u...@drill.apache.org:
> > > > - 592 subscribers (down -5 in the last 3 months):
> > > > - 249 emails sent to list (310 in previous quarter)
> > > >
> > > >
> > > > ## JIRA activity:
> > > >
> > > >  - 196 JIRA tickets created in the last 3 months
> > > >  - 171 JIRA tickets closed/resolved in the last 3 months
> > > >
> > >
> >
>


[jira] [Created] (DRILL-6993) VARBINARY length is ignored on cast

2019-01-22 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6993:
-

 Summary: VARBINARY length is ignored on cast
 Key: DRILL-6993
 URL: https://issues.apache.org/jira/browse/DRILL-6993
 Project: Apache Drill
  Issue Type: Bug
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


{{VARBINARY}} precision is not set when casting to {{VARBINARY}} with specified 
length.
For example, test case 
{code}
  String query = "select cast(r_name as varbinary(31)) as vb from 
cp.`tpch/region.parquet`;"
  MaterializedField field = new ColumnBuilder("vb", 
TypeProtos.MinorType.VARBINARY)
  .setMode(TypeProtos.DataMode.OPTIONAL)
  .setWidth(31)
  .build();
  BatchSchema expectedSchema = new SchemaBuilder()
  .add(field)
  .build();

  // Validate schema
  testBuilder()
  .sqlQuery(q)
  .schemaBaseLine(expectedSchema)
  .go();
{code}
will fail with
{code}
java.lang.Exception: Schema path or type mismatch for column #0:
Expected schema path: vb
Actual   schema path: vb
Expected type: MajorType[minor_type: VARBINARY mode: OPTIONAL precision: 31 
scale: 0]
Actual   type: MajorType[minor_type: VARBINARY mode: OPTIONAL]
{code}
while for other types, like {{VARCHAR}}, it seems to work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6962) Function coalesce returns an Error when none of the columns in coalesce exist in a parquet file

2019-01-10 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6962:
-

 Summary: Function coalesce returns an Error when none of the 
columns in coalesce exist in a parquet file
 Key: DRILL-6962
 URL: https://issues.apache.org/jira/browse/DRILL-6962
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


As Drill is schema-free, COALESCE function is expected to return a result and 
not error out even if none of the columns being referred to exists in files 
being queried.

Here is an example for 2 columns, `unk_col` and `unk_col2`, which do not exist 
in the parquet files
{code:java}
select coalesce(unk_col, unk_col2) from dfs.`/tmp/parquetfiles`;
Error: SYSTEM ERROR: CompileException: Line 56, Column 27: Assignment 
conversion not possible from type 
“org.apache.drill.exec.expr.holders.NullableIntHolder” to type 
“org.apache.drill.exec.vector.UntypedNullHolder”

Fragment 1:0

[Error Id: 7b9193fb-289b-4fbf-a52a-2b93b01f0cd0 on dkvm2c:31010] (state=,code=0)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6922) QUERY-level options are shown on Profiles tab

2018-12-21 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6922:
-

 Summary: QUERY-level options are shown on Profiles tab
 Key: DRILL-6922
 URL: https://issues.apache.org/jira/browse/DRILL-6922
 Project: Apache Drill
  Issue Type: Bug
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


Option `exec.return_result_set_for_ddl` is shown on Web UI's Profiles even when 
it was not set explicitly. The issue is because the option is being set on 
query level internally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6875) Drill doesn't try to update connection for S3 after session expired

2018-12-18 Thread Bohdan Kazydub (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bohdan Kazydub resolved DRILL-6875.
---
Resolution: Not A Bug

> Drill doesn't try to update connection for S3 after session expired
> ---
>
> Key: DRILL-6875
> URL: https://issues.apache.org/jira/browse/DRILL-6875
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Denys Ordynskiy
>    Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: drillbit.log, not_a_bug_drillbit.log
>
>
> *Steps to reproduce:*
> - Drill has S3 storage plugin.
> - Open sqlline and run query to S3.
> - Leave sqlline opened for more than 12 hours.
> - In opened sqlline run query to S3.
> *Expected result:*
> Drill should update authorization session and successfully execute query.
> *Actual result:*
> Sqlline returns an error:
> *{color:#d04437}Error: VALIDATION ERROR: Forbidden (Service: Amazon S3; 
> Status Code: 403; Error Code: 403 Forbidden; Request ID: 4A94DD331A035625; S3 
> Extended Request ID: 
> uy94YdRpQ3ZriCz9xbnDi0yinB4O9kGrH7XPAURhjh8WZoxsbawojQA6v7mfvu920yOYbEI5WP8=)
> [Error Id: 4b44a83b-0e47-45a4-92e3-75f94f5a70cb on maprhost:31010] 
> (state=,code=0){color}*
> *Reopening sqlline doesn't help to get S3 access.*
> *Access problem can be solved only by restarting Drill.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6834) New option to disable result set on CTAS, create view and drop table/view

2018-11-07 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6834:
-

 Summary: New option to disable result set on CTAS, create view and 
drop table/view
 Key: DRILL-6834
 URL: https://issues.apache.org/jira/browse/DRILL-6834
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


There are some tools (Unica, dBeaver, TalenD) that do not expect to obtain 
result set on CTAS query. As a result the query gets canceled. Hive, on the 
other hand, does not return result set for the query and these tools work well.

To improve Drill's integration with such tools a session option 
`exec.fetch_resultset` will be introduced. If the option is enabled (set to 
`true`) Drill's behaviour will be unchanged. If the option is disabled (set to 
`false`), CTAS, create view and drop table/view queries will not return result 
set but show messages instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6817) Update to_number function to be consistent with CAST function

2018-10-30 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6817:
-

 Summary: Update to_number function to be consistent with CAST 
function
 Key: DRILL-6817
 URL: https://issues.apache.org/jira/browse/DRILL-6817
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


{{In case when `drill.exec.functions.cast_empty_string_to_null` is enabled 
casting empty string ('') to numeric types will return NULL. If `to_number` is 
used to convert empty string to a number, UnsupportedOperationException will be 
thrown.}}

The aim is to make these functions (CASTs and `to_number`) work consistently as 
is done for date/time functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6815) Improve code generation to handle functions with NullHandling.NULL_IF_NULL better

2018-10-29 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6815:
-

 Summary: Improve code generation to handle functions with 
NullHandling.NULL_IF_NULL better
 Key: DRILL-6815
 URL: https://issues.apache.org/jira/browse/DRILL-6815
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Bohdan Kazydub


If a (simple) function is declared with NULL_IF_NULL null handling strategy 
(`nulls = NullHandling.NULL_IF_NULL`) there is a additional code generated 
which checks if any of the inputs is NULL (not set). In case if there is, 
output is set to be null otherwise function's code is executed and at the end 
output value is marked as set in case if ANY of the inputs is OPTIONAL (see 
[https://github.com/apache/drill/blob/8edeb49873d1a1710cfe28e0b49364d07eb1aef4/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillSimpleFuncHolder.java#L143).]

The problem is, this behavior makes it impossible to make output value NULL 
from within [function's evaluation 
body|https://github.com/apache/drill/blob/7b0c9034753a8c5035fd1c0f1f84a37b376e6748/exec/java-exec/src/main/java/org/apache/drill/exec/expr/DrillSimpleFunc.java#L22].
 Which may prove useful in certain situations, e.g. when input is an empty 
string and output should be NULL in the case etc. Sometimes it may result in 
two separate functions instead of one with NULL_IF_NULL. It does not follow a 
[Principle of Least 
Astonishment|https://en.wikipedia.org/wiki/Principle_of_least_astonishment] as 
effectively it behaves more like "null if and only if null" and documentation 
for NULL_IF_NULL is currently



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6771) Queries on Hive 2.3.x fails with SYSTEM ERROR: ArrayIndexOutOfBoundsException

2018-10-29 Thread Bohdan Kazydub (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bohdan Kazydub resolved DRILL-6771.
---
Resolution: Fixed

> Queries on Hive 2.3.x fails with SYSTEM ERROR: ArrayIndexOutOfBoundsException
> -
>
> Key: DRILL-6771
> URL: https://issues.apache.org/jira/browse/DRILL-6771
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Hive
>Affects Versions: 1.15.0
> Environment: Hive 2.3.3
> MapR 6.1.0
>Reporter: Abhishek Girish
>Assignee: Bohdan Kazydub
>Priority: Critical
> Fix For: 1.15.0
>
>
> Query: Functional/partition_pruning/hive/general/plan/orc1.q
> {code}
> select * from hive.orc_create_people_dp where state = 'Ca'
> java.sql.SQLException: SYSTEM ERROR: ArrayIndexOutOfBoundsException: 6
> {code}
> Stack Trace:
> {code}
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: Error while applying rule Prel.ScanPrule, 
> args [rel#2103503:DrillScanRel.LOGICAL.ANY([]).[](table=[hive, 
> orc_create_people_dp],groupscan=HiveScan [table=Table(dbName:default, 
> tableName:orc_create_people_dp), columns=[`id`, `first_name`, `last_name`, 
> `address`, `state`, `**`], numPartitions=1, partitions= 
> [Partition(values:[Ca])], 
> inputDirectories=[maprfs:/drill/testdata/hive_storage/orc_create_people_dp/state=Ca],
>  confProperties={}])]
> org.apache.drill.exec.work.foreman.Foreman.run():300
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
>   Caused By (java.lang.RuntimeException) Error while applying rule 
> Prel.ScanPrule, args 
> [rel#2103503:DrillScanRel.LOGICAL.ANY([]).[](table=[hive, 
> orc_create_people_dp],groupscan=HiveScan [table=Table(dbName:default, 
> tableName:orc_create_people_dp), columns=[`id`, `first_name`, `last_name`, 
> `address`, `state`, `**`], numPartitions=1, partitions= 
> [Partition(values:[Ca])], 
> inputDirectories=[maprfs:/drill/testdata/hive_storage/orc_create_people_dp/state=Ca],
>  confProperties={}])]
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():236
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():648
> org.apache.calcite.tools.Programs$RuleSetProgram.run():339
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():425
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel():455
> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan():68
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():145
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():83
> org.apache.drill.exec.work.foreman.Foreman.runSQL():584
> org.apache.drill.exec.work.foreman.Foreman.run():272
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
>   Caused By (org.apache.drill.common.exceptions.DrillRuntimeException) Failed 
> to get InputSplits
> org.apache.drill.exec.store.hive.HiveMetadataProvider.getInputSplits():182
> org.apache.drill.exec.store.hive.HiveScan.getInputSplits():288
> org.apache.drill.exec.store.hive.HiveScan.getMaxParallelizationWidth():197
> org.apache.drill.exec.planner.physical.ScanPrule.onMatch():42
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():212
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():648
> org.apache.calcite.tools.Programs$RuleSetProgram.run():339
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():425
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel():455
> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan():68
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():145
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():83
> org.apache.drill.exec.work.foreman.Foreman.runSQL():584
> org.apache.drill.exec.work.foreman.Foreman.run():272
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
>   Caused By (java.lang.RuntimeException) ORC split generation failed with 
> exception: java.lang.ArrayIndexOutOfBoundsException: 6
> org.apache.hadoop.hive.ql.

[jira] [Created] (DRILL-6810) Disable NULL_IF_NULL NullHandling for functions with ComplexWriter

2018-10-23 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6810:
-

 Summary: Disable NULL_IF_NULL NullHandling for functions with 
ComplexWriter
 Key: DRILL-6810
 URL: https://issues.apache.org/jira/browse/DRILL-6810
 Project: Apache Drill
  Issue Type: Bug
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


Currently NullHandling.NULL_IF_NULL is allowed for UDFs with @Output of type 
org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter but no 
null handling is performed for the kind of functions which leads to confusion. 
The problem is ComplexWriter holds list/map values and Drill does not yet 
support NULL values for the types (there is an issue to allow null maps/lists 
in [DRILL-4824|https://issues.apache.org/jira/browse/DRILL-4824]).
For such functions support for NULL_IF_NULL will be disabled, as it is done for 
aggregate functions, and NullHandling.INTERNAL should be used instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6783) CAST string literal as INTERVAL MONTH/YEAR works inconsistently when selecting from a table with multiple rows

2018-10-08 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6783:
-

 Summary: CAST string literal as INTERVAL MONTH/YEAR works 
inconsistently when selecting from a table with multiple rows
 Key: DRILL-6783
 URL: https://issues.apache.org/jira/browse/DRILL-6783
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.15.0
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


Casting string literal as INTERVAL MONTH or INTERVAL YEAR produces different 
values for each row (actually, with period of 4) when selecting data from table 
with more than one row.

For example:

{code}

0: jdbc:drill:zk=local> select cast('P314M' as interval month) from 
cp.`employee.json` limit 10;
+--+
|  EXPR$0  |
+--+
| 26 years 2 months    |
| 81089877 years 5 months  |
| 1714858 years 8 months   |
| 6698 years 8 months  |
| 26 years 2 months    |
| 81089877 years 5 months  |
| 1714858 years 8 months   |
| 6698 years 8 months  |
| 26 years 2 months    |
| 81089877 years 5 months  |
+--+
10 rows selected (0.186 seconds)

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6768) Improve to_date, to_time and to_timestamp and corresponding cast functions to handle empty string when `drill.exec.functions.cast_empty_string_to_null` option is enabled

2018-10-03 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6768:
-

 Summary: Improve to_date, to_time and to_timestamp and 
corresponding cast functions to handle empty string when 
`drill.exec.functions.cast_empty_string_to_null` option is enabled
 Key: DRILL-6768
 URL: https://issues.apache.org/jira/browse/DRILL-6768
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


When `drill.exec.functions.cast_empty_string_to_null` option is enabled

`to_date`, `to_time` and `to_timestamp` functions while converting string to 
according type in case if null or empty string values are passed will return 
NULL (to avoid CASE clauses which are littering a query and will work uniformly 
with numeric types) for both cases.

 
 
 

CASTs will  be handled in a similar way:

 
||Value to cast||Now||Will be||
|NULL|NULL|NULL|
|'' (empty string)|Error in many cases (except numerical types)|NULL|

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [IDEAS] Drill start up quotes

2018-09-12 Thread Bohdan Kazydub
A query result is never late, nor is it early, it arrives precisely when it
means to! (Gandalf)

Kind regards,
Bohdan Kazydub

On Wed, Sep 12, 2018 at 1:14 PM, Vova Vysotskyi  wrote:

> Two things are infinite: the universe and drill; and I'm not sure about the
> universe. (Albert Einstein)
> If drill hasn't profoundly shocked you, you haven't understood it yet.
> (Niels Bohr)
> Drill gives you meaning and purpose, and life is empty without it. (Stephen
> Hawking)
> Drill must go on. (Queen)
>
> Kind regards,
> Volodymyr Vysotskyi
>
>
> On Tue, Sep 11, 2018 at 11:36 PM Oleksandr Kalinin 
> wrote:
>
> > Some random ideas, not sure how appropriate - feel free to choose/modify
> > as necessary :)
> >
> > In use since 37 thousand years (according to Wikipedia, initial use of
> > rotary instruments by homo sapiens is dated about 35000 BC)
> > It’s not just Bosch or Black+Decker (Two major elec. drill brands)
> > This drill bit is made of bits
> > This product is free of steel, cobalt and titanium (Typical drill bit
> > materials)
> > Let’s drill something more solid than concrete
> > Eye and hearing protection are not required when using this drill
> > If only Mr Arnot knew ... (elec. drill inventor)
> >
> > Cheers,
> > Alex
> >
> > > On 11 Sep 2018, at 19:27, Arina Yelchiyeva  >
> > wrote:
> > >
> > > Some quotes ideas:
> > >
> > > drill never goes out of style
> > > everything is easier with drill
> > >
> > > Kunal,
> > > regarding config, sounds reasonable, I'll do that.
> > >
> > > Kind regards,
> > > Arina
> > >
> > >
> > > On Tue, Sep 11, 2018 at 12:17 AM Benedikt Koehler <
> eigenarb...@gmail.com
> > >
> > > wrote:
> > >
> > >> You told me to drill sergeant! (Forrest Gump)
> > >>
> > >> Benedikt
> > >> @furukama
> > >>
> > >>
> > >> Kunal Khatua  schrieb am Mo. 10. Sep. 2018 um
> 21:01:
> > >>
> > >>> +1 on the suggestion.
> > >>>
> > >>> I would also suggest that we change the backend implementation of the
> > >>> quotes to refer to a properties file (within the classpath) rather
> than
> > >>> have it hard coded within the SqlLine package.  This will ensure that
> > new
> > >>> quotes can be added with every release without the need to touch the
> > >>> SqlLine fork for Drill.
> > >>>
> > >>> ~ Kunal
> > >>> On 9/10/2018 7:06:59 AM, Arina Ielchiieva  wrote:
> > >>> Hi all,
> > >>>
> > >>> we are close to SqlLine 1.5.0 upgrade which now has the mechanism to
> > >>> preserve Drill customizations. This one does include multiline
> support
> > >> but
> > >>> the next release might.
> > >>> You all know that one of the Drill customizations is quotes at
> > startup. I
> > >>> was thinking we might want to fresh up the list a little bit.
> > >>>
> > >>> Here is the current list:
> > >>>
> > >>> start your sql engine
> > >>> this isn't your grandfather's sql
> > >>> a little sql for your nosql
> > >>> json ain't no thang
> > >>> drill baby drill
> > >>> just drill it
> > >>> say hello to my little drill
> > >>> what ever the mind of man can conceive and believe, drill can query
> > >>> the only truly happy people are children, the creative minority and
> > drill
> > >>> users
> > >>> a drill is a terrible thing to waste
> > >>> got drill?
> > >>> a drill in the hand is better than two in the bush
> > >>>
> > >>> If anybody has new serious / funny / philosophical / creative quotes
> > >>> ideas, please share and we can consider adding them to the existing
> > list.
> > >>>
> > >>> Kind regards,
> > >>> Arina
> > >>>
> > >> --
> > >>
> > >> --
> > >> Dr. Benedikt Köhler
> > >> Kreuzweg 4 • 82131 Stockdorf
> > >> Mobil: +49 170 333 0161 • Telefon: +49 89 857 45 84
> > >> Mail: bened...@eigenarbeit.org
> > >>
> >
>


[jira] [Created] (DRILL-6724) Convert IndexOutOfBounds exception to UserException with context data where possible

2018-08-31 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6724:
-

 Summary: Convert IndexOutOfBounds exception to UserException with 
context data where possible
 Key: DRILL-6724
 URL: https://issues.apache.org/jira/browse/DRILL-6724
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


Sometimes when IndexOutOfBoundsException is exposed to users it is not clear 
what causes the problem. Instead this exception may be converted to a more 
useful UserException containing context data like filename, line, position etc 
(depending on what is available from a reader for a given type).

A possible approach is to add a method to a RecordReader interface which 
produces UserException of type ErrorType.DATA_READ with context data, so that 
when such an UserException is needed it can be easily obtained by invoking the 
method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6689) Include query user information to drillbit.log

2018-08-15 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6689:
-

 Summary: Include query user information to drillbit.log
 Key: DRILL-6689
 URL: https://issues.apache.org/jira/browse/DRILL-6689
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


Currently, query information is logged as

INFO o.a.drill.exec.work.foreman.Foreman - Query text for query id _queryId_: 
explain plan for select * from dfs./tmp/customer where name like 'Jo%'

In order to easily track user details username of user which issues the query 
could be included in a following format

INFO o.a.drill.exec.work.foreman.Foreman - Query text for query with id 
_queryId_ *issued by _username_*: explain plan for select * from 
dfs./tmp/customer where name like 'Jo%'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[DISCUSS] Add Hadoop Credentials API support for Drill S3 storage plugin

2018-08-02 Thread Bohdan Kazydub
Hi all,

Currently, to access S3A filesystem, `fs.s3a.secret.key` and
`fs.s3a.access.key` properties should be configured either in S3 Storage
Plugin or in core-site.xml in plaintext. This approach is considered
unsecure. To eliminate a need to store passwords in plaintext,
CredentialProvider API [1] may be used to extract secret keys from
encrypted store.

Here is a document with implementation details:
https://docs.google.com/document/d/1ow4v5HOh0qJh-5KsZHqSjohM2ukGSayEd9360tHZZvo/edit#
.
And here is an open issue for the improvement:
https://issues.apache.org/jira/browse/DRILL-6662

Any thoughts?

[1]
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html

Kind regards,
Bohdan


[jira] [Created] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin

2018-08-02 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6662:
-

 Summary: Access AWS access key ID and secret access key using 
Credential Provider API for S3 storage plugin
 Key: DRILL-6662
 URL: https://issues.apache.org/jira/browse/DRILL-6662
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


Hadoop provides [CredentialProvider 
API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]]
 which allows passwords and other sensitive secrets to be stored in an external 
provider rather than in configuration files in plaintext.

Currently S3 storage plugin is accessing passwords, namely 'fs.s3a.access.key' 
and 'fs.s3a.secret.key', stored in clear text in Configuration with get() 
method. To give users an ability to remove clear text passwords for S3 from 
configuration files Configuration.getPassword() method should be used, given 
they configure 'hadoop.security.credential.provider.path' property which points 
to a file containing encrypted passwords instead of configuring two 
aforementioned properties.

By using this approach, credential providers will be checked first and if the 
secret is not provided or providers are not configured there will be a fallback 
to secrets configured in clear text (unless 
'hadoop.security.credential.clear-text-fallback' is configured to be "false"), 
thus making new change backwards-compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6606) Hash Join returns incorrect data types when joining subqueries with limit 0

2018-07-13 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6606:
-

 Summary: Hash Join returns incorrect data types when joining 
subqueries with limit 0
 Key: DRILL-6606
 URL: https://issues.apache.org/jira/browse/DRILL-6606
 Project: Apache Drill
  Issue Type: Task
Reporter: Bohdan Kazydub


PreparedStatement for query
{code:sql}
SELECT l.l_quantity, l.l_shipdate, o.o_custkey
FROM (SELECT * FROM cp.`tpch/lineitem.parquet` LIMIT 0) l
    JOIN (SELECT * FROM cp.`tpch/orders.parquet` LIMIT 0) o 
    ON l.l_orderkey = o.o_orderkey
LIMIT 0
{code}
 is created with wrong types (nullable INTEGER) for all selected columns, no 
matter what their actual type is. This behavior reproduces with hash join only 
and is very likely to be caused by DRILL-6027 as the query works fine before 
this feature was implemented.

To reproduce the problem you can put the aforementioned query into 
TestPreparedStatementProvider#joinOrderByQuery() test method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6580) Upgrade Curator libraries

2018-07-04 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6580:
-

 Summary: Upgrade Curator libraries
 Key: DRILL-6580
 URL: https://issues.apache.org/jira/browse/DRILL-6580
 Project: Apache Drill
  Issue Type: Task
Reporter: Bohdan Kazydub


As of [DRILL-6534|https://issues.apache.org/jira/browse/DRILL-6534] Drill uses 
explicit version ZooKeeper (ZK) and curator. ZK's version was upgraded so it is 
better to upgrade Curator as it manages ZK.

Note: I've tried to upgrade Curator to 2.12.0, 3.3.0 and 4.0.1 (has hard 
dependency on ZK 3.5.x) and everything seems to work except that size of 
curator-client artifact is much larger than the current one's (~2+ MB and 70 kB 
respectively) and as a result jdbc-all maxSize should be increased.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6574) Add LIMIT(0) on top of SCAN for a prepare statement

2018-07-02 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6574:
-

 Summary: Add LIMIT(0) on top of SCAN for a prepare statement
 Key: DRILL-6574
 URL: https://issues.apache.org/jira/browse/DRILL-6574
 Project: Apache Drill
  Issue Type: Task
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


Currently prepare statements use LIMIT 0 to get the result schema. Adding 
LIMIT(0) on top of SCAN causes an early termination of the query.

Create an option "planner.enable_limit0_on_scan", enabled by default. Change 
"planner.enable_limit0_optimization" option to be enabled by default.

LIMIT(0) on SCAN for UNION and complex functions are disabled i.e. UNION and 
complex functions need data to produce result schema. 

If function is unsupported, the plan won't be affected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6541) Upgrade ZooKeeper patch version to 3.4.11 for mapr profile

2018-06-26 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6541:
-

 Summary: Upgrade ZooKeeper patch version to 3.4.11 for mapr profile
 Key: DRILL-6541
 URL: https://issues.apache.org/jira/browse/DRILL-6541
 Project: Apache Drill
  Issue Type: Task
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6534) Upgrade ZooKeeper patch version to 3.4.11

2018-06-25 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6534:
-

 Summary: Upgrade ZooKeeper patch version to 3.4.11
 Key: DRILL-6534
 URL: https://issues.apache.org/jira/browse/DRILL-6534
 Project: Apache Drill
  Issue Type: Task
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6491) Prevent merge join for full outer join at planning stage

2018-06-12 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6491:
-

 Summary: Prevent merge join for full outer join at planning stage
 Key: DRILL-6491
 URL: https://issues.apache.org/jira/browse/DRILL-6491
 Project: Apache Drill
  Issue Type: Bug
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6473) Upgrade Drill 1.14 with Hive 2.3 for mapr profile

2018-06-06 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6473:
-

 Summary: Upgrade Drill 1.14 with Hive 2.3 for mapr profile
 Key: DRILL-6473
 URL: https://issues.apache.org/jira/browse/DRILL-6473
 Project: Apache Drill
  Issue Type: Task
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)