Re: query from hbase issue

2016-05-23 Thread qiang li
I upgrade to latest cdh version which is HBase 1.2.0-cdh5.7.0 and test it
again, The result are correct now. Thanks for the help.

Even though, I think the query plan still can be optimized.

Here is what I think can improve:
a. specify the columns when need
b. remove the rowFilter when scan startRow and stopRow can meet it.
for example : select row_key from hbase.browser_action a where
a.row_key >'0' and  a.row_key < '1' , I think sql like this will query
faster without rowfilter.



2016-05-24 9:29 GMT+08:00 qiang li :

> Yes, Its seems like the same issue. I will upgrade it and test again. But
> do you think we can update the physical plan too.  If we only want to query
> one qualifier,  then the columns of the plan should only contains the
> qualifier instead of "*". Maybe this plan will be query fast. Am I right?
>
> 2016-05-23 23:38 GMT+08:00 Krystal Nguyen :
>
>> Hi Qiang,
>>
>> Looks like you might be encountering this issue:
>> https://issues.apache.org/jira/browse/DRILL-4271
>>
>> Thanks
>>
>> On Sun, May 22, 2016 at 8:38 PM, qiang li  wrote:
>>
>> > I test it step by step again. And finally I find out that the issue
>> > happened only if the qualifier number is more than 3.
>> >
>> > It's werid, but this is the result I test.
>> >
>> > I tested about 10 thousands row of data. The length of the event is
>> 6,the
>> > code I used to test is like below:
>> >
>> > String[] earr = action.getEvent().geteArr();
>> > for(int i=0;i<6;i++){
>> > put.addColumn(family, Bytes.toBytes("e"+i),
>> Bytes.toBytes(earr[i]));
>> > }
>> >
>> > Then I test step by step like below :
>> >
>> > put.addColumn(family, Bytes.toBytes("e0"),  Bytes.toBytes("e0"));
>> > put.addColumn(family, Bytes.toBytes("e1"),  Bytes.toBytes("e1"));
>> > put.addColumn(family, Bytes.toBytes("e2"),  Bytes.toBytes("e2"));
>> > put.addColumn(family, Bytes.toBytes("e3"),  Bytes.toBytes("e3"));
>> >
>> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> > count(a.`v`.`e0`) p from hbase.browser_action a where a.row_key > '0'
>> >  group by a.`v`.`e0`;
>> > +-+---+
>> > |  k  |   p   |
>> > +-+---+
>> > | e0  | 3856  |
>> > +-+---+
>> >
>> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> > count(a.`v`.`e0`) p from hbase.browser_action a group by a.`v`.`e0`;
>> > +-+---+
>> > |  k  |   p   |
>> > +-+---+
>> > | e0  | 9094  |
>> > +-+---+
>> >
>> >
>> > if I put the qualifier "e3" into the table(that is more than 3
>> qulifier),
>> > then the issue reproduced.
>> >
>> >
>> > 2016-05-23 9:23 GMT+08:00 qiang li :
>> >
>> > > Sorry late.
>> > >
>> > > Yes, Hadoop 2.6.0-cdh5.4.5 and HBase 1.0.0-cdh5.4.5.
>> > >
>> > > 2016-05-20 23:06 GMT+08:00 Krystal Nguyen :
>> > >
>> > >> Qiang, Can you please let us know the hbase version and hadoop
>> > >> distribution
>> > >> version that you are using.
>> > >>
>> > >> On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen <
>> kngu...@maprtech.com>
>> > >> wrote:
>> > >>
>> > >> > Can you please let us know the hbase version and hadoop
>> distribution
>> > >> > version that you are using.
>> > >> >
>> > >> >
>> > >> > On Fri, May 20, 2016 at 1:35 AM, qiang li 
>> > wrote:
>> > >> >
>> > >> >> Khurram , I send the mail again, the last mail forget to cc to
>> > >> >> user@drill.apache.org
>> > >> >>
>> > >> >> The main process is the same, but my rowkey is more complicate,
>> > >> >> Here is the detail I tested.
>> > >> >> rowkey is like this : [salt 1byte string] + [day 8byte string] +
>> > >> [event] +
>> > >> >> [uid long] + [ts long]
>> > >> >> also I have other qualifiers, only qualifier v:v is integer, the
>> > others
>> > >> >> are
>> > >> >> string.
>> > >> >>
>> > >> >> example:
>> > >> >> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
>> > >> >> ROW
>> > >> COLUMN+CELL
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e0,
>> > >> >> timestamp=1461839343076, value=pay
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e1,
>> > >> >> timestamp=1461839343076, value=bijia
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e2,
>> > >> >> timestamp=1461839343076, value=browser
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e3,
>> > >> >> timestamp=1461839343076, value=*
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e4,
>> > >> >> timestamp=1461839343076, value=*
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.b

Re: "user" as a reserved word

2016-05-23 Thread Jinfeng Ni
mydb=# select  "user" from t1;
 user
--
 ABC

I should take back what I said. With quoted identifier, Postgres
behaved different from Drill. Both of the interpretations seem to be
reasonable, since the identifier could represent two different things.


On Mon, May 23, 2016 at 7:41 PM, Zelaine Fong  wrote:
> Jinfeng,
>
> What does postgres return for the following query in your example?
>
> select "user" from t1;
>
> -- Zelaine
>
> On Mon, May 23, 2016 at 7:39 PM, John Omernik  wrote:
>
>> Hmm, you are correct, I don't have to like it :) but there is both logic
>> and precedence here.  Thanks for following up
>>
>> John
>>
>> On Monday, May 23, 2016, Jinfeng Ni  wrote:
>>
>> > An quoted identifier is still an identifier (Drill uses back tick as
>> > quote). Per SQL standard,  identifier CURRENT_USER / USER/
>> > CURRENT_SESSION/etc are implicit function calls; no () is required.
>> >
>> > I checked Postgre, and seems it has the same behavior.
>> >
>> > mydb=# create table t1 (id int, "user" varchar(10));
>> >
>> > mydb=# insert into t1 values(100, 'ABC');
>> > INSERT 0 1
>> >
>> > mydb=# select * from t1;
>> >  id  | user
>> > -+--
>> >  100 | ABC
>> > (1 row)
>> >
>> > mydb=# select user from t1;
>> >  current_user
>> > --
>> >  postgres
>> > (1 row)
>> >
>> > mydb=# select t1.user from t1;
>> >  user
>> > --
>> >  ABC
>> > (1 row)
>> >
>> >
>> >
>> > On Mon, May 23, 2016 at 5:12 PM, John Omernik > > > wrote:
>> > > Can (should) things inside back ticks be callable? I guess this makes a
>> > > very difficult situation from a usability standpoint because user is a
>> > not
>> > > uncommon column name (think security logs, web logs, etc) yet in the
>> > > current setup there is lots of possibility for assumptions on calling
>> > back
>> > > tick user back tick and without an error users may have wrong, but
>> > "error"
>> > > free results.
>> > > On May 23, 2016 4:54 PM, "Jinfeng Ni" > > > wrote:
>> > >
>> > >> The problem here is that identifier 'user' is not only a reserved
>> > >> word, but also represents a special function ==  current_user() call.
>> > >> The identifier 'user', whether it's quoted or not, could mean either
>> > >> column name or the function call.  Without the table alias, it could
>> > >> be ambiguous to sql parser. The table alias informs the parser that
>> > >> this identifier is not a function call, but a regular identifier, thus
>> > >> removes the ambiguity.
>> > >>
>> > >> This is different from other cases you use quoted reserved word to
>> > >> represent a column name, since those reserved words do not represent a
>> > >> special function, thus no ambiguity.
>> > >>
>> > >> select `update`, `insert` from dfs.tmp.`1.json`;
>> > >> +-+-+
>> > >> | update  | insert  |
>> > >> +-+-+
>> > >> | abc | 100 |
>> > >> +-+-+
>> > >>
>> > >>
>> > >>
>> > >> On Mon, May 23, 2016 at 10:44 AM, John Omernik > > > wrote:
>> > >> > Ya, as I am testing, this works, however, the users of the system
>> > expect
>> > >> to
>> > >> > be able to use `user` and while I can provide them instructions to
>> > use a
>> > >> > table alias, I am very worried that they will forget and since it
>> > doesn't
>> > >> > error, but instead puts in a different string, this could lead to
>> bad
>> > >> > downstream results...
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Mon, May 23, 2016 at 12:41 PM, John Omernik > > > wrote:
>> > >> >
>> > >> >> I filed https://issues.apache.org/jira/browse/DRILL-4692
>> > >> >>
>> > >> >> I see an alias would work as a tmp fix, but this should be address
>> (I
>> > >> >> wonder if other words may have a problem too?)
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> On Mon, May 23, 2016 at 12:38 PM, Andries Engelbrecht <
>> > >> >> aengelbre...@maprtech.com > wrote:
>> > >> >>
>> > >> >>> Hmm interesting.
>> > >> >>>
>> > >> >>> As a workaround just use a table alias when referencing the
>> column.
>> > >> >>>
>> > >> >>>
>> > >> >>> Might be good to se if there is a JIRA for this, or file one if
>> not.
>> > >> >>>
>> > >> >>> --Andries
>> > >> >>>
>> > >> >>> > On May 23, 2016, at 10:28 AM, John Omernik > > > wrote:
>> > >> >>> >
>> > >> >>> > I have data with a field name user.
>> > >> >>> >
>> > >> >>> > When I select, with backticks, it doesn't show the field, but
>> > >> instead my
>> > >> >>> > current logged in user...
>> > >> >>> >
>> > >> >>> >
>> > >> >>> > select CONVERT_FROM(`user`, 'UTF8') as `user` from table limit
>> 10;
>> > >> >>> >
>> > >> >>> >
>> > >> >>> > Shouldn't the backticks allow me to reference the field
>> properly?
>> > >> >>> >
>> > >> >>> > John
>> > >> >>>
>> > >> >>>
>> > >> >>
>> > >>
>> >
>>
>>
>> --
>> Sent from my iThing
>>


Re: "user" as a reserved word

2016-05-23 Thread Zelaine Fong
Jinfeng,

What does postgres return for the following query in your example?

select "user" from t1;

-- Zelaine

On Mon, May 23, 2016 at 7:39 PM, John Omernik  wrote:

> Hmm, you are correct, I don't have to like it :) but there is both logic
> and precedence here.  Thanks for following up
>
> John
>
> On Monday, May 23, 2016, Jinfeng Ni  wrote:
>
> > An quoted identifier is still an identifier (Drill uses back tick as
> > quote). Per SQL standard,  identifier CURRENT_USER / USER/
> > CURRENT_SESSION/etc are implicit function calls; no () is required.
> >
> > I checked Postgre, and seems it has the same behavior.
> >
> > mydb=# create table t1 (id int, "user" varchar(10));
> >
> > mydb=# insert into t1 values(100, 'ABC');
> > INSERT 0 1
> >
> > mydb=# select * from t1;
> >  id  | user
> > -+--
> >  100 | ABC
> > (1 row)
> >
> > mydb=# select user from t1;
> >  current_user
> > --
> >  postgres
> > (1 row)
> >
> > mydb=# select t1.user from t1;
> >  user
> > --
> >  ABC
> > (1 row)
> >
> >
> >
> > On Mon, May 23, 2016 at 5:12 PM, John Omernik  > > wrote:
> > > Can (should) things inside back ticks be callable? I guess this makes a
> > > very difficult situation from a usability standpoint because user is a
> > not
> > > uncommon column name (think security logs, web logs, etc) yet in the
> > > current setup there is lots of possibility for assumptions on calling
> > back
> > > tick user back tick and without an error users may have wrong, but
> > "error"
> > > free results.
> > > On May 23, 2016 4:54 PM, "Jinfeng Ni"  > > wrote:
> > >
> > >> The problem here is that identifier 'user' is not only a reserved
> > >> word, but also represents a special function ==  current_user() call.
> > >> The identifier 'user', whether it's quoted or not, could mean either
> > >> column name or the function call.  Without the table alias, it could
> > >> be ambiguous to sql parser. The table alias informs the parser that
> > >> this identifier is not a function call, but a regular identifier, thus
> > >> removes the ambiguity.
> > >>
> > >> This is different from other cases you use quoted reserved word to
> > >> represent a column name, since those reserved words do not represent a
> > >> special function, thus no ambiguity.
> > >>
> > >> select `update`, `insert` from dfs.tmp.`1.json`;
> > >> +-+-+
> > >> | update  | insert  |
> > >> +-+-+
> > >> | abc | 100 |
> > >> +-+-+
> > >>
> > >>
> > >>
> > >> On Mon, May 23, 2016 at 10:44 AM, John Omernik  > > wrote:
> > >> > Ya, as I am testing, this works, however, the users of the system
> > expect
> > >> to
> > >> > be able to use `user` and while I can provide them instructions to
> > use a
> > >> > table alias, I am very worried that they will forget and since it
> > doesn't
> > >> > error, but instead puts in a different string, this could lead to
> bad
> > >> > downstream results...
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Mon, May 23, 2016 at 12:41 PM, John Omernik  > > wrote:
> > >> >
> > >> >> I filed https://issues.apache.org/jira/browse/DRILL-4692
> > >> >>
> > >> >> I see an alias would work as a tmp fix, but this should be address
> (I
> > >> >> wonder if other words may have a problem too?)
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Mon, May 23, 2016 at 12:38 PM, Andries Engelbrecht <
> > >> >> aengelbre...@maprtech.com > wrote:
> > >> >>
> > >> >>> Hmm interesting.
> > >> >>>
> > >> >>> As a workaround just use a table alias when referencing the
> column.
> > >> >>>
> > >> >>>
> > >> >>> Might be good to se if there is a JIRA for this, or file one if
> not.
> > >> >>>
> > >> >>> --Andries
> > >> >>>
> > >> >>> > On May 23, 2016, at 10:28 AM, John Omernik  > > wrote:
> > >> >>> >
> > >> >>> > I have data with a field name user.
> > >> >>> >
> > >> >>> > When I select, with backticks, it doesn't show the field, but
> > >> instead my
> > >> >>> > current logged in user...
> > >> >>> >
> > >> >>> >
> > >> >>> > select CONVERT_FROM(`user`, 'UTF8') as `user` from table limit
> 10;
> > >> >>> >
> > >> >>> >
> > >> >>> > Shouldn't the backticks allow me to reference the field
> properly?
> > >> >>> >
> > >> >>> > John
> > >> >>>
> > >> >>>
> > >> >>
> > >>
> >
>
>
> --
> Sent from my iThing
>


Re: "user" as a reserved word

2016-05-23 Thread John Omernik
Hmm, you are correct, I don't have to like it :) but there is both logic
and precedence here.  Thanks for following up

John

On Monday, May 23, 2016, Jinfeng Ni  wrote:

> An quoted identifier is still an identifier (Drill uses back tick as
> quote). Per SQL standard,  identifier CURRENT_USER / USER/
> CURRENT_SESSION/etc are implicit function calls; no () is required.
>
> I checked Postgre, and seems it has the same behavior.
>
> mydb=# create table t1 (id int, "user" varchar(10));
>
> mydb=# insert into t1 values(100, 'ABC');
> INSERT 0 1
>
> mydb=# select * from t1;
>  id  | user
> -+--
>  100 | ABC
> (1 row)
>
> mydb=# select user from t1;
>  current_user
> --
>  postgres
> (1 row)
>
> mydb=# select t1.user from t1;
>  user
> --
>  ABC
> (1 row)
>
>
>
> On Mon, May 23, 2016 at 5:12 PM, John Omernik  > wrote:
> > Can (should) things inside back ticks be callable? I guess this makes a
> > very difficult situation from a usability standpoint because user is a
> not
> > uncommon column name (think security logs, web logs, etc) yet in the
> > current setup there is lots of possibility for assumptions on calling
> back
> > tick user back tick and without an error users may have wrong, but
> "error"
> > free results.
> > On May 23, 2016 4:54 PM, "Jinfeng Ni"  > wrote:
> >
> >> The problem here is that identifier 'user' is not only a reserved
> >> word, but also represents a special function ==  current_user() call.
> >> The identifier 'user', whether it's quoted or not, could mean either
> >> column name or the function call.  Without the table alias, it could
> >> be ambiguous to sql parser. The table alias informs the parser that
> >> this identifier is not a function call, but a regular identifier, thus
> >> removes the ambiguity.
> >>
> >> This is different from other cases you use quoted reserved word to
> >> represent a column name, since those reserved words do not represent a
> >> special function, thus no ambiguity.
> >>
> >> select `update`, `insert` from dfs.tmp.`1.json`;
> >> +-+-+
> >> | update  | insert  |
> >> +-+-+
> >> | abc | 100 |
> >> +-+-+
> >>
> >>
> >>
> >> On Mon, May 23, 2016 at 10:44 AM, John Omernik  > wrote:
> >> > Ya, as I am testing, this works, however, the users of the system
> expect
> >> to
> >> > be able to use `user` and while I can provide them instructions to
> use a
> >> > table alias, I am very worried that they will forget and since it
> doesn't
> >> > error, but instead puts in a different string, this could lead to bad
> >> > downstream results...
> >> >
> >> >
> >> >
> >> >
> >> > On Mon, May 23, 2016 at 12:41 PM, John Omernik  > wrote:
> >> >
> >> >> I filed https://issues.apache.org/jira/browse/DRILL-4692
> >> >>
> >> >> I see an alias would work as a tmp fix, but this should be address (I
> >> >> wonder if other words may have a problem too?)
> >> >>
> >> >>
> >> >>
> >> >> On Mon, May 23, 2016 at 12:38 PM, Andries Engelbrecht <
> >> >> aengelbre...@maprtech.com > wrote:
> >> >>
> >> >>> Hmm interesting.
> >> >>>
> >> >>> As a workaround just use a table alias when referencing the column.
> >> >>>
> >> >>>
> >> >>> Might be good to se if there is a JIRA for this, or file one if not.
> >> >>>
> >> >>> --Andries
> >> >>>
> >> >>> > On May 23, 2016, at 10:28 AM, John Omernik  > wrote:
> >> >>> >
> >> >>> > I have data with a field name user.
> >> >>> >
> >> >>> > When I select, with backticks, it doesn't show the field, but
> >> instead my
> >> >>> > current logged in user...
> >> >>> >
> >> >>> >
> >> >>> > select CONVERT_FROM(`user`, 'UTF8') as `user` from table limit 10;
> >> >>> >
> >> >>> >
> >> >>> > Shouldn't the backticks allow me to reference the field properly?
> >> >>> >
> >> >>> > John
> >> >>>
> >> >>>
> >> >>
> >>
>


-- 
Sent from my iThing


Re: query from hbase issue

2016-05-23 Thread qiang li
Yes, Its seems like the same issue. I will upgrade it and test again. But
do you think we can update the physical plan too.  If we only want to query
one qualifier,  then the columns of the plan should only contains the
qualifier instead of "*". Maybe this plan will be query fast. Am I right?

2016-05-23 23:38 GMT+08:00 Krystal Nguyen :

> Hi Qiang,
>
> Looks like you might be encountering this issue:
> https://issues.apache.org/jira/browse/DRILL-4271
>
> Thanks
>
> On Sun, May 22, 2016 at 8:38 PM, qiang li  wrote:
>
> > I test it step by step again. And finally I find out that the issue
> > happened only if the qualifier number is more than 3.
> >
> > It's werid, but this is the result I test.
> >
> > I tested about 10 thousands row of data. The length of the event is 6,the
> > code I used to test is like below:
> >
> > String[] earr = action.getEvent().geteArr();
> > for(int i=0;i<6;i++){
> > put.addColumn(family, Bytes.toBytes("e"+i),  Bytes.toBytes(earr[i]));
> > }
> >
> > Then I test step by step like below :
> >
> > put.addColumn(family, Bytes.toBytes("e0"),  Bytes.toBytes("e0"));
> > put.addColumn(family, Bytes.toBytes("e1"),  Bytes.toBytes("e1"));
> > put.addColumn(family, Bytes.toBytes("e2"),  Bytes.toBytes("e2"));
> > put.addColumn(family, Bytes.toBytes("e3"),  Bytes.toBytes("e3"));
> >
> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> > count(a.`v`.`e0`) p from hbase.browser_action a where a.row_key > '0'
> >  group by a.`v`.`e0`;
> > +-+---+
> > |  k  |   p   |
> > +-+---+
> > | e0  | 3856  |
> > +-+---+
> >
> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> > count(a.`v`.`e0`) p from hbase.browser_action a group by a.`v`.`e0`;
> > +-+---+
> > |  k  |   p   |
> > +-+---+
> > | e0  | 9094  |
> > +-+---+
> >
> >
> > if I put the qualifier "e3" into the table(that is more than 3 qulifier),
> > then the issue reproduced.
> >
> >
> > 2016-05-23 9:23 GMT+08:00 qiang li :
> >
> > > Sorry late.
> > >
> > > Yes, Hadoop 2.6.0-cdh5.4.5 and HBase 1.0.0-cdh5.4.5.
> > >
> > > 2016-05-20 23:06 GMT+08:00 Krystal Nguyen :
> > >
> > >> Qiang, Can you please let us know the hbase version and hadoop
> > >> distribution
> > >> version that you are using.
> > >>
> > >> On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen  >
> > >> wrote:
> > >>
> > >> > Can you please let us know the hbase version and hadoop distribution
> > >> > version that you are using.
> > >> >
> > >> >
> > >> > On Fri, May 20, 2016 at 1:35 AM, qiang li 
> > wrote:
> > >> >
> > >> >> Khurram , I send the mail again, the last mail forget to cc to
> > >> >> user@drill.apache.org
> > >> >>
> > >> >> The main process is the same, but my rowkey is more complicate,
> > >> >> Here is the detail I tested.
> > >> >> rowkey is like this : [salt 1byte string] + [day 8byte string] +
> > >> [event] +
> > >> >> [uid long] + [ts long]
> > >> >> also I have other qualifiers, only qualifier v:v is integer, the
> > others
> > >> >> are
> > >> >> string.
> > >> >>
> > >> >> example:
> > >> >> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
> > >> >> ROW
> > >> COLUMN+CELL
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:e0,
> > >> >> timestamp=1461839343076, value=pay
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:e1,
> > >> >> timestamp=1461839343076, value=bijia
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:e2,
> > >> >> timestamp=1461839343076, value=browser
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:e3,
> > >> >> timestamp=1461839343076, value=*
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:e4,
> > >> >> timestamp=1461839343076, value=*
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:e5,
> > >> >> timestamp=1461839343076, value=*
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:h,
> > >> >> timestamp=145977120, value=20
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:m,
> > >> >> timestamp=145977120, value=0
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:v,
> > >> >> timestamp=145977120, value=\

Re: "user" as a reserved word

2016-05-23 Thread Jinfeng Ni
An quoted identifier is still an identifier (Drill uses back tick as
quote). Per SQL standard,  identifier CURRENT_USER / USER/
CURRENT_SESSION/etc are implicit function calls; no () is required.

I checked Postgre, and seems it has the same behavior.

mydb=# create table t1 (id int, "user" varchar(10));

mydb=# insert into t1 values(100, 'ABC');
INSERT 0 1

mydb=# select * from t1;
 id  | user
-+--
 100 | ABC
(1 row)

mydb=# select user from t1;
 current_user
--
 postgres
(1 row)

mydb=# select t1.user from t1;
 user
--
 ABC
(1 row)



On Mon, May 23, 2016 at 5:12 PM, John Omernik  wrote:
> Can (should) things inside back ticks be callable? I guess this makes a
> very difficult situation from a usability standpoint because user is a not
> uncommon column name (think security logs, web logs, etc) yet in the
> current setup there is lots of possibility for assumptions on calling back
> tick user back tick and without an error users may have wrong, but "error"
> free results.
> On May 23, 2016 4:54 PM, "Jinfeng Ni"  wrote:
>
>> The problem here is that identifier 'user' is not only a reserved
>> word, but also represents a special function ==  current_user() call.
>> The identifier 'user', whether it's quoted or not, could mean either
>> column name or the function call.  Without the table alias, it could
>> be ambiguous to sql parser. The table alias informs the parser that
>> this identifier is not a function call, but a regular identifier, thus
>> removes the ambiguity.
>>
>> This is different from other cases you use quoted reserved word to
>> represent a column name, since those reserved words do not represent a
>> special function, thus no ambiguity.
>>
>> select `update`, `insert` from dfs.tmp.`1.json`;
>> +-+-+
>> | update  | insert  |
>> +-+-+
>> | abc | 100 |
>> +-+-+
>>
>>
>>
>> On Mon, May 23, 2016 at 10:44 AM, John Omernik  wrote:
>> > Ya, as I am testing, this works, however, the users of the system expect
>> to
>> > be able to use `user` and while I can provide them instructions to use a
>> > table alias, I am very worried that they will forget and since it doesn't
>> > error, but instead puts in a different string, this could lead to bad
>> > downstream results...
>> >
>> >
>> >
>> >
>> > On Mon, May 23, 2016 at 12:41 PM, John Omernik  wrote:
>> >
>> >> I filed https://issues.apache.org/jira/browse/DRILL-4692
>> >>
>> >> I see an alias would work as a tmp fix, but this should be address (I
>> >> wonder if other words may have a problem too?)
>> >>
>> >>
>> >>
>> >> On Mon, May 23, 2016 at 12:38 PM, Andries Engelbrecht <
>> >> aengelbre...@maprtech.com> wrote:
>> >>
>> >>> Hmm interesting.
>> >>>
>> >>> As a workaround just use a table alias when referencing the column.
>> >>>
>> >>>
>> >>> Might be good to se if there is a JIRA for this, or file one if not.
>> >>>
>> >>> --Andries
>> >>>
>> >>> > On May 23, 2016, at 10:28 AM, John Omernik  wrote:
>> >>> >
>> >>> > I have data with a field name user.
>> >>> >
>> >>> > When I select, with backticks, it doesn't show the field, but
>> instead my
>> >>> > current logged in user...
>> >>> >
>> >>> >
>> >>> > select CONVERT_FROM(`user`, 'UTF8') as `user` from table limit 10;
>> >>> >
>> >>> >
>> >>> > Shouldn't the backticks allow me to reference the field properly?
>> >>> >
>> >>> > John
>> >>>
>> >>>
>> >>
>>


Re: "user" as a reserved word

2016-05-23 Thread John Omernik
Can (should) things inside back ticks be callable? I guess this makes a
very difficult situation from a usability standpoint because user is a not
uncommon column name (think security logs, web logs, etc) yet in the
current setup there is lots of possibility for assumptions on calling back
tick user back tick and without an error users may have wrong, but "error"
free results.
On May 23, 2016 4:54 PM, "Jinfeng Ni"  wrote:

> The problem here is that identifier 'user' is not only a reserved
> word, but also represents a special function ==  current_user() call.
> The identifier 'user', whether it's quoted or not, could mean either
> column name or the function call.  Without the table alias, it could
> be ambiguous to sql parser. The table alias informs the parser that
> this identifier is not a function call, but a regular identifier, thus
> removes the ambiguity.
>
> This is different from other cases you use quoted reserved word to
> represent a column name, since those reserved words do not represent a
> special function, thus no ambiguity.
>
> select `update`, `insert` from dfs.tmp.`1.json`;
> +-+-+
> | update  | insert  |
> +-+-+
> | abc | 100 |
> +-+-+
>
>
>
> On Mon, May 23, 2016 at 10:44 AM, John Omernik  wrote:
> > Ya, as I am testing, this works, however, the users of the system expect
> to
> > be able to use `user` and while I can provide them instructions to use a
> > table alias, I am very worried that they will forget and since it doesn't
> > error, but instead puts in a different string, this could lead to bad
> > downstream results...
> >
> >
> >
> >
> > On Mon, May 23, 2016 at 12:41 PM, John Omernik  wrote:
> >
> >> I filed https://issues.apache.org/jira/browse/DRILL-4692
> >>
> >> I see an alias would work as a tmp fix, but this should be address (I
> >> wonder if other words may have a problem too?)
> >>
> >>
> >>
> >> On Mon, May 23, 2016 at 12:38 PM, Andries Engelbrecht <
> >> aengelbre...@maprtech.com> wrote:
> >>
> >>> Hmm interesting.
> >>>
> >>> As a workaround just use a table alias when referencing the column.
> >>>
> >>>
> >>> Might be good to se if there is a JIRA for this, or file one if not.
> >>>
> >>> --Andries
> >>>
> >>> > On May 23, 2016, at 10:28 AM, John Omernik  wrote:
> >>> >
> >>> > I have data with a field name user.
> >>> >
> >>> > When I select, with backticks, it doesn't show the field, but
> instead my
> >>> > current logged in user...
> >>> >
> >>> >
> >>> > select CONVERT_FROM(`user`, 'UTF8') as `user` from table limit 10;
> >>> >
> >>> >
> >>> > Shouldn't the backticks allow me to reference the field properly?
> >>> >
> >>> > John
> >>>
> >>>
> >>
>


Re: Hangout Frequency

2016-05-23 Thread Parth Chandra
The overwhelming response (?!) seems to have been to agree to have the
hangout every other week.
So the next hangout will be Tuesday 5/31.

See you all then.

On Fri, May 20, 2016 at 7:34 PM, Aman Sinha  wrote:

> Every other week sounds good to me.  It is a substantial commitment to do
> one every week.
> Many useful discussions already happen on the dev and user mailing lists.
>
> On Fri, May 20, 2016 at 12:44 PM, Parth Chandra 
> wrote:
>
> > Drill Users, Devs,
> >
> >   Attendance at the hangouts has been getting sparse and it seems like
> the
> > hangouts are too frequent. I'd like to propose that we move to having
> > hangouts every other week.
> >
> >   What do folks think?
> >
> > Parth
> >
>


Re: CORS for Apache Drill

2016-05-23 Thread Hanifi GUNES
Great. CrossOriginFilter ships with jetty-servlets package so you will need
to import it in exec pom file at [1].

Also you may enjoy additional community support if you target dev@ list for
your code/implementation related questions.


Let me know.
-Hanifi

1: https://github.com/apache/drill/blob/master/exec/java-exec/pom.xml

2016-05-21 9:06 GMT-07:00 Wojciech Nowak :

> Hello!
>
> I have started working on issue related to enabling CORS:
> Created issue in Jira [1],
> Created branch with initial commit [2] but i have trouble importing Jetty
> class CrossOriginFilter,
> Can you guide me how should I import that dependency which normally should
> be located in org/eclipse/jetty/servlets/CrossOriginFilter.java?
> Source code of that class [3]
>
> [1] - https://issues.apache.org/jira/browse/DRILL-4690
> [2] -
> https://github.com/PythonicNinja/drill/commit/73d659ae23e455464a3530fd18b2eff3ba192a30
> [3] -
> https://github.com/eclipse/jetty.project/blob/master/jetty-servlets/src/main/java/org/eclipse/jetty/servlets/CrossOriginFilter.java
>
>
> —
> kind regards,
> Wojciech Nowak
>
>


Re: "user" as a reserved word

2016-05-23 Thread Jinfeng Ni
The problem here is that identifier 'user' is not only a reserved
word, but also represents a special function ==  current_user() call.
The identifier 'user', whether it's quoted or not, could mean either
column name or the function call.  Without the table alias, it could
be ambiguous to sql parser. The table alias informs the parser that
this identifier is not a function call, but a regular identifier, thus
removes the ambiguity.

This is different from other cases you use quoted reserved word to
represent a column name, since those reserved words do not represent a
special function, thus no ambiguity.

select `update`, `insert` from dfs.tmp.`1.json`;
+-+-+
| update  | insert  |
+-+-+
| abc | 100 |
+-+-+



On Mon, May 23, 2016 at 10:44 AM, John Omernik  wrote:
> Ya, as I am testing, this works, however, the users of the system expect to
> be able to use `user` and while I can provide them instructions to use a
> table alias, I am very worried that they will forget and since it doesn't
> error, but instead puts in a different string, this could lead to bad
> downstream results...
>
>
>
>
> On Mon, May 23, 2016 at 12:41 PM, John Omernik  wrote:
>
>> I filed https://issues.apache.org/jira/browse/DRILL-4692
>>
>> I see an alias would work as a tmp fix, but this should be address (I
>> wonder if other words may have a problem too?)
>>
>>
>>
>> On Mon, May 23, 2016 at 12:38 PM, Andries Engelbrecht <
>> aengelbre...@maprtech.com> wrote:
>>
>>> Hmm interesting.
>>>
>>> As a workaround just use a table alias when referencing the column.
>>>
>>>
>>> Might be good to se if there is a JIRA for this, or file one if not.
>>>
>>> --Andries
>>>
>>> > On May 23, 2016, at 10:28 AM, John Omernik  wrote:
>>> >
>>> > I have data with a field name user.
>>> >
>>> > When I select, with backticks, it doesn't show the field, but instead my
>>> > current logged in user...
>>> >
>>> >
>>> > select CONVERT_FROM(`user`, 'UTF8') as `user` from table limit 10;
>>> >
>>> >
>>> > Shouldn't the backticks allow me to reference the field properly?
>>> >
>>> > John
>>>
>>>
>>


Re: Reading and converting Parquet files intended for Impala

2016-05-23 Thread John Omernik
Troubleshooting this is made more difficult by the fact that the file that
gives the error works fine when I select directly from it into a new
table... this makes it very tricky to troubleshoot, any assistance on this
would be appreciated, I've opened a ticket with MapR as well, but I am
stumped, and this is our primary use case right now, thus this is a
blocker. (Note I've tried three different days, two fail, one works)

John

On Mon, May 23, 2016 at 9:48 AM, John Omernik  wrote:

> I have a largish directory of parquet files generated for use in Impala.
> They were created with the CDH version of apache-parquet-mr (not sure on
> version at this time)
>
> Some settings:
> Compression: snappy
> Use Dictionary: true
> WRITER_VERION: PARQUET_1_0
>
> I can read them as is in Drill, however, the strings all come through as
> binary (see other thread). I can cast all those fields as VARCHAR and read
> them but take a bad performance hit (2 seconds to read directly from raw
> parquet, limit 10, but showing binary.  25 seconds to use a view that CASTS
> all fields into the proper types... data returns accurately, but 10 rows
> taking 25 seconds is too long)
>
> So I want to read from this directory (approx 126GB) and CTAS in a way
> Drill will be happier.
>
> I've tried this two ways. One was just to ctas directly from view I
> created. All else being default. The other was to set the reader
> "new_reader" = true. Neither worked, and new_reader actually behaves very
> badly (need to restart drill bits)  At least the other default reader
> errors :)
>
> store.parquet.use_new_reader = false (the default)
> This through the error below (it's a truncated error, lots of fireld names
> and other things.  It stored 6 GB of files and died.
>
> store.parquet.use_new_reader = true
>
> 1.4 GB of files created and  everything hangs, need to restart drillbits
> (is this an issue?)
>
>
>
> Error from "non" new_reader:
>
> rror: SYSTEM ERROR: ArrayIndexOutOfBoundsException: 107014
>
>
>
> Fragment 1:36
>
>
>
> [Error Id: ab5b202f-94cc-4275-b136-537dfbea6b31 on
> atl1ctuzeta05.ctu-bo.secureworks.net:20001]
>
>
>
>   (org.apache.drill.common.exceptions.DrillRuntimeException) Error in
> parquet record reader.
>
> Message:
>
> Hadoop path: /path/to/files/-m-1.snappy.parquet
>
> Total records read: 393120
>
> Mock records read: 0
>
> Records to read: 32768
>
> Row group index: 0
>
> Records in row group: 536499
>
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message events {
>
> …
>
>
>
>
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():352
>
>
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():454
>
> org.apache.drill.exec.physical.impl.ScanBatch.next():191
>
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
>
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
>
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>
>
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
>
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
>
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
>
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
>
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>
>
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
>
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
>
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
>
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
>
> org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():91
>
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
>
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>
>
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
>
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
>
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
>
> java.security.AccessController.doPrivileged():-2
>
> javax.security.auth.Subject.doAs():422
>
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
>
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
>
> org.apache.drill.common.SelfCleaningRunnable.run():38
>
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>
> java.lang.Thread.run():745
>
>   Caused By (java.lang.ArrayIndexOutOfBoundsException) 107014
>
>
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary.decodeToLong():164
>
>
> org.apache.parquet.column.values.dictionary.DictionaryValuesReader.readLong():122
>
>
> org.apache.drill.exec.store.parquet.columnreaders.ParquetFixedWidthDictionaryReaders$Dictionary

Re: Issue with Queries Hanging

2016-05-23 Thread John Omernik
Distributed.  (MapR FS, but via NFS)

On Mon, May 23, 2016 at 3:26 PM, Abdel Hakim Deneche 
wrote:

> One question about the missing query profile: do you store the query
> profiles in the local file system or the distributed file system ?
>
> On Mon, May 23, 2016 at 9:31 AM, John Omernik  wrote:
>
> > Hey all, this is separate, yet related issue to my other posts RE
> Parquet,
> > however, I thought I'd post this to see if this is normal or should be
> > handled (and/or JIRAed)
> >
> > I am running Drill 1.6, if you've read the other posts, I am trying to
> CTAS
> > a large amount of data (largish) 120 GB from Parquet to better Parquet.
> >
> > As I am running, I sometimes get the Index Out of Bounds (as in the other
> > threads), but depending on source data and/or settings like using the new
> > parquet reader, I get a odd situation.
> >
> > When I refresh the profile in the WebUII get an error "VALIDATION ERROR:
> no
> > profile with given query id '' exists"
> >
> > I am running this in sqlline, and at this point, there is no error, but I
> > can't access my query profile.
> >
> > Other notes:
> >
> > 1. The webui is HORRIBLY slow
> > 2. If I cancel the query, it will show me some written parquet, but
> obvious
> > it wasn't finished
> > 3. There are no errors in any of the drillbits log files (except the
> forman
> > which starts to get "WARN" "Messos of mode (REQUEST OR RESPONSE) of type
> 8
> > (or type 1) too longer than 500ms Actual duration was (high number of ms
> > betwen 1900 and 3500 ms)
> > 4. Like I said, no errors, just everything appears to hang.
> >
> > My memory and such seems good here, I have 96 GB of ram DIRECT per node,
> > and 12 GB of HEAP per node, 5 nodes,.
> >
> > The cluster seems really sluggish and out of sorts until I restart drill
> > bits... This seems like a very bad "error state"
> >
> > Has anyone seen this? Any thoughts on this? Should I open a JIRA?
> >
> >
> > Thanks,
> > John
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>


Re: Issue with Queries Hanging

2016-05-23 Thread Abdel Hakim Deneche
One question about the missing query profile: do you store the query
profiles in the local file system or the distributed file system ?

On Mon, May 23, 2016 at 9:31 AM, John Omernik  wrote:

> Hey all, this is separate, yet related issue to my other posts RE Parquet,
> however, I thought I'd post this to see if this is normal or should be
> handled (and/or JIRAed)
>
> I am running Drill 1.6, if you've read the other posts, I am trying to CTAS
> a large amount of data (largish) 120 GB from Parquet to better Parquet.
>
> As I am running, I sometimes get the Index Out of Bounds (as in the other
> threads), but depending on source data and/or settings like using the new
> parquet reader, I get a odd situation.
>
> When I refresh the profile in the WebUII get an error "VALIDATION ERROR: no
> profile with given query id '' exists"
>
> I am running this in sqlline, and at this point, there is no error, but I
> can't access my query profile.
>
> Other notes:
>
> 1. The webui is HORRIBLY slow
> 2. If I cancel the query, it will show me some written parquet, but obvious
> it wasn't finished
> 3. There are no errors in any of the drillbits log files (except the forman
> which starts to get "WARN" "Messos of mode (REQUEST OR RESPONSE) of type 8
> (or type 1) too longer than 500ms Actual duration was (high number of ms
> betwen 1900 and 3500 ms)
> 4. Like I said, no errors, just everything appears to hang.
>
> My memory and such seems good here, I have 96 GB of ram DIRECT per node,
> and 12 GB of HEAP per node, 5 nodes,.
>
> The cluster seems really sluggish and out of sorts until I restart drill
> bits... This seems like a very bad "error state"
>
> Has anyone seen this? Any thoughts on this? Should I open a JIRA?
>
>
> Thanks,
> John
>



-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training



Re: "user" as a reserved word

2016-05-23 Thread John Omernik
Ya, as I am testing, this works, however, the users of the system expect to
be able to use `user` and while I can provide them instructions to use a
table alias, I am very worried that they will forget and since it doesn't
error, but instead puts in a different string, this could lead to bad
downstream results...




On Mon, May 23, 2016 at 12:41 PM, John Omernik  wrote:

> I filed https://issues.apache.org/jira/browse/DRILL-4692
>
> I see an alias would work as a tmp fix, but this should be address (I
> wonder if other words may have a problem too?)
>
>
>
> On Mon, May 23, 2016 at 12:38 PM, Andries Engelbrecht <
> aengelbre...@maprtech.com> wrote:
>
>> Hmm interesting.
>>
>> As a workaround just use a table alias when referencing the column.
>>
>>
>> Might be good to se if there is a JIRA for this, or file one if not.
>>
>> --Andries
>>
>> > On May 23, 2016, at 10:28 AM, John Omernik  wrote:
>> >
>> > I have data with a field name user.
>> >
>> > When I select, with backticks, it doesn't show the field, but instead my
>> > current logged in user...
>> >
>> >
>> > select CONVERT_FROM(`user`, 'UTF8') as `user` from table limit 10;
>> >
>> >
>> > Shouldn't the backticks allow me to reference the field properly?
>> >
>> > John
>>
>>
>


Re: "user" as a reserved word

2016-05-23 Thread John Omernik
I filed https://issues.apache.org/jira/browse/DRILL-4692

I see an alias would work as a tmp fix, but this should be address (I
wonder if other words may have a problem too?)



On Mon, May 23, 2016 at 12:38 PM, Andries Engelbrecht <
aengelbre...@maprtech.com> wrote:

> Hmm interesting.
>
> As a workaround just use a table alias when referencing the column.
>
>
> Might be good to se if there is a JIRA for this, or file one if not.
>
> --Andries
>
> > On May 23, 2016, at 10:28 AM, John Omernik  wrote:
> >
> > I have data with a field name user.
> >
> > When I select, with backticks, it doesn't show the field, but instead my
> > current logged in user...
> >
> >
> > select CONVERT_FROM(`user`, 'UTF8') as `user` from table limit 10;
> >
> >
> > Shouldn't the backticks allow me to reference the field properly?
> >
> > John
>
>


Re: "user" as a reserved word

2016-05-23 Thread Andries Engelbrecht
Hmm interesting.

As a workaround just use a table alias when referencing the column.


Might be good to se if there is a JIRA for this, or file one if not.

--Andries

> On May 23, 2016, at 10:28 AM, John Omernik  wrote:
> 
> I have data with a field name user.
> 
> When I select, with backticks, it doesn't show the field, but instead my
> current logged in user...
> 
> 
> select CONVERT_FROM(`user`, 'UTF8') as `user` from table limit 10;
> 
> 
> Shouldn't the backticks allow me to reference the field properly?
> 
> John



"user" as a reserved word

2016-05-23 Thread John Omernik
I have data with a field name user.

When I select, with backticks, it doesn't show the field, but instead my
current logged in user...


select CONVERT_FROM(`user`, 'UTF8') as `user` from table limit 10;


Shouldn't the backticks allow me to reference the field properly?

John


Re: Issue with Queries Hanging

2016-05-23 Thread John Omernik
Note: I did see after letting one just hang for a long time, a message
about HEAP space... I was running with 12GB of Heap and 96 GB of Direct, I
switched to 24 GB of Heap and 84 GB of Direct, and now my queries fail, but
all with the index out of bounds issue, and then my drill bits stay
responsive.

Could this hanging issue be related to Heap?  I thought 12 GB would be
quite a bit of Heap, but I guess not? Could we handle this better, provide
better errors?  Maybe the unresponsiveness is doing to some GC or other
work? (Just spitballing ideas here).

Thoughts?

John

On Mon, May 23, 2016 at 11:31 AM, John Omernik  wrote:

> Hey all, this is separate, yet related issue to my other posts RE Parquet,
> however, I thought I'd post this to see if this is normal or should be
> handled (and/or JIRAed)
>
> I am running Drill 1.6, if you've read the other posts, I am trying to
> CTAS a large amount of data (largish) 120 GB from Parquet to better
> Parquet.
>
> As I am running, I sometimes get the Index Out of Bounds (as in the other
> threads), but depending on source data and/or settings like using the new
> parquet reader, I get a odd situation.
>
> When I refresh the profile in the WebUII get an error "VALIDATION ERROR:
> no profile with given query id '' exists"
>
> I am running this in sqlline, and at this point, there is no error, but I
> can't access my query profile.
>
> Other notes:
>
> 1. The webui is HORRIBLY slow
> 2. If I cancel the query, it will show me some written parquet, but
> obvious it wasn't finished
> 3. There are no errors in any of the drillbits log files (except the
> forman which starts to get "WARN" "Messos of mode (REQUEST OR RESPONSE) of
> type 8 (or type 1) too longer than 500ms Actual duration was (high number
> of ms betwen 1900 and 3500 ms)
> 4. Like I said, no errors, just everything appears to hang.
>
> My memory and such seems good here, I have 96 GB of ram DIRECT per node,
> and 12 GB of HEAP per node, 5 nodes,.
>
> The cluster seems really sluggish and out of sorts until I restart drill
> bits... This seems like a very bad "error state"
>
> Has anyone seen this? Any thoughts on this? Should I open a JIRA?
>
>
> Thanks,
> John
>
>
>


Issue with Queries Hanging

2016-05-23 Thread John Omernik
Hey all, this is separate, yet related issue to my other posts RE Parquet,
however, I thought I'd post this to see if this is normal or should be
handled (and/or JIRAed)

I am running Drill 1.6, if you've read the other posts, I am trying to CTAS
a large amount of data (largish) 120 GB from Parquet to better Parquet.

As I am running, I sometimes get the Index Out of Bounds (as in the other
threads), but depending on source data and/or settings like using the new
parquet reader, I get a odd situation.

When I refresh the profile in the WebUII get an error "VALIDATION ERROR: no
profile with given query id '' exists"

I am running this in sqlline, and at this point, there is no error, but I
can't access my query profile.

Other notes:

1. The webui is HORRIBLY slow
2. If I cancel the query, it will show me some written parquet, but obvious
it wasn't finished
3. There are no errors in any of the drillbits log files (except the forman
which starts to get "WARN" "Messos of mode (REQUEST OR RESPONSE) of type 8
(or type 1) too longer than 500ms Actual duration was (high number of ms
betwen 1900 and 3500 ms)
4. Like I said, no errors, just everything appears to hang.

My memory and such seems good here, I have 96 GB of ram DIRECT per node,
and 12 GB of HEAP per node, 5 nodes,.

The cluster seems really sluggish and out of sorts until I restart drill
bits... This seems like a very bad "error state"

Has anyone seen this? Any thoughts on this? Should I open a JIRA?


Thanks,
John


DATA_READ ERROR: Error processing input: Cannot use newline character within quoted string

2016-05-23 Thread Wilburn, Scott
Hello Drill community,
I'm seeing this error with Drill 1.5 when querying CSV or TSV data that 
contains quotes within some of the fields. I have a couple questions about 
this. 
  1. Does anyone know why Drill cares about quotes in the data?
  2. Is there a workaround to this problem, besides pre-parsing the data?

Thanks,
Scott Wilburn



Re: query from hbase issue

2016-05-23 Thread Krystal Nguyen
Hi Qiang,

Looks like you might be encountering this issue:
https://issues.apache.org/jira/browse/DRILL-4271

Thanks

On Sun, May 22, 2016 at 8:38 PM, qiang li  wrote:

> I test it step by step again. And finally I find out that the issue
> happened only if the qualifier number is more than 3.
>
> It's werid, but this is the result I test.
>
> I tested about 10 thousands row of data. The length of the event is 6,the
> code I used to test is like below:
>
> String[] earr = action.getEvent().geteArr();
> for(int i=0;i<6;i++){
> put.addColumn(family, Bytes.toBytes("e"+i),  Bytes.toBytes(earr[i]));
> }
>
> Then I test step by step like below :
>
> put.addColumn(family, Bytes.toBytes("e0"),  Bytes.toBytes("e0"));
> put.addColumn(family, Bytes.toBytes("e1"),  Bytes.toBytes("e1"));
> put.addColumn(family, Bytes.toBytes("e2"),  Bytes.toBytes("e2"));
> put.addColumn(family, Bytes.toBytes("e3"),  Bytes.toBytes("e3"));
>
> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> count(a.`v`.`e0`) p from hbase.browser_action a where a.row_key > '0'
>  group by a.`v`.`e0`;
> +-+---+
> |  k  |   p   |
> +-+---+
> | e0  | 3856  |
> +-+---+
>
> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> count(a.`v`.`e0`) p from hbase.browser_action a group by a.`v`.`e0`;
> +-+---+
> |  k  |   p   |
> +-+---+
> | e0  | 9094  |
> +-+---+
>
>
> if I put the qualifier "e3" into the table(that is more than 3 qulifier),
> then the issue reproduced.
>
>
> 2016-05-23 9:23 GMT+08:00 qiang li :
>
> > Sorry late.
> >
> > Yes, Hadoop 2.6.0-cdh5.4.5 and HBase 1.0.0-cdh5.4.5.
> >
> > 2016-05-20 23:06 GMT+08:00 Krystal Nguyen :
> >
> >> Qiang, Can you please let us know the hbase version and hadoop
> >> distribution
> >> version that you are using.
> >>
> >> On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen 
> >> wrote:
> >>
> >> > Can you please let us know the hbase version and hadoop distribution
> >> > version that you are using.
> >> >
> >> >
> >> > On Fri, May 20, 2016 at 1:35 AM, qiang li 
> wrote:
> >> >
> >> >> Khurram , I send the mail again, the last mail forget to cc to
> >> >> user@drill.apache.org
> >> >>
> >> >> The main process is the same, but my rowkey is more complicate,
> >> >> Here is the detail I tested.
> >> >> rowkey is like this : [salt 1byte string] + [day 8byte string] +
> >> [event] +
> >> >> [uid long] + [ts long]
> >> >> also I have other qualifiers, only qualifier v:v is integer, the
> others
> >> >> are
> >> >> string.
> >> >>
> >> >> example:
> >> >> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
> >> >> ROW
> >> COLUMN+CELL
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:e0,
> >> >> timestamp=1461839343076, value=pay
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:e1,
> >> >> timestamp=1461839343076, value=bijia
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:e2,
> >> >> timestamp=1461839343076, value=browser
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:e3,
> >> >> timestamp=1461839343076, value=*
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:e4,
> >> >> timestamp=1461839343076, value=*
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:e5,
> >> >> timestamp=1461839343076, value=*
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:h,
> >> >> timestamp=145977120, value=20
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:m,
> >> >> timestamp=145977120, value=0
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:v,
> >> >> timestamp=145977120, value=\x00\x00\x00\x17
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >> 1 row(s) in 0.0410 seconds
> >> >>
> >> >>
> >> >> Here is the example I how the issue look like:
> >> >>
> >> >> hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'],
> >> >> STARTROW=> '0'}
> >> >> 
> >> >>  920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\
> >> column=v:e0,
> >> >> timestamp=1463723029448, value=visit
> >> >>
> >> >>
> >> >>  x01T\x00\x0A\xFA\x00
> >> >>
> >> >>
> >> >>
> >> >>  920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\
> >> column=v:e0,
> >> >> timestamp=1463723029217, value=visit
> >> >>
> >> >>
> >> >>  x01T\x0

Re: Performance tuning for TPC-H Q1 on a three nodes cluster

2016-05-23 Thread Dechang Gu
Hi Yijie,
This is Dechang at MapR. I work on Drill performance.

>From what you described, looks like scan took most of the time.
How are the files are distributed on the disks, are there any skew?
How many disks are there?
If possible can you provide the profile for the run?

Thanks,
Dechang

On Sun, May 22, 2016 at 9:06 AM, Yijie Shen 
wrote:

> Hi all,
>
> I'm trying out Drill on master branch lately and have deployed a cluster on
> three physical server.
>
> The input data `lineitem` is in parquet format of total size 150GB, 101MB
> per file and 1516 files in total.
>
> The server has two Intel(R) Xeon(R) CPU E5645 @2.40GHz CPUs and 24 cores in
> total, 32GB memory.
>
> While executing Q1 using:
>
>  SELECT
>   L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY), SUM(L_EXTENDEDPRICE),
> SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)),
> SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)), AVG(L_QUANTITY),
> AVG(L_EXTENDEDPRICE), AVG(L_DISCOUNT), COUNT(1)
> FROM
>   dfs.tpch.`lineitem`
> WHERE
>   L_SHIPDATE<='1998-09-02'
> GROUP BY L_RETURNFLAG, L_LINESTATUS
> ORDER BY L_RETURNFLAG, L_LINESTATUS
>
> I've noticed the parallelism was 51 (planner.width.max_per_node = 17) in my
> case for Major Fragment 03 (Scan Filter Project HashAgg and Project), and
> each Minor fragment last about 8 to 9 minutes. one record for example:
>
> 03-00-xx hw080 7.309s 42.358s 9m35s 118,758,489 14,540 22:31:32 22:31:32
> 33MB FINISHED
>
> Is this a normal speed (more than 10 minutes) for Drill for my current
> cluster? Did I miss something important in conf to accelerate the
> execution?
>
> Thanks very much!
>
> Yijie
>


Re: Reading Parquet Files Created Elsewhere

2016-05-23 Thread John Omernik
That did work faster, I can now get the 10 rows in 12 seconds as opposed to
25.

So in my 25 sec. query, I CAST all items from the parquet, but do I need to
that? for the 12 seconds query, I only CONVERT_FROM on the string values,
the view seems happier. So that's nice.

Thanks for the point, I am playing around CTAS this into another table, I
will try the CTAS (see the other thread) with the CONVERT_FROM rather than
CAST.

Thanks!

On Mon, May 23, 2016 at 9:45 AM, Andries Engelbrecht <
aengelbre...@maprtech.com> wrote:

> John,
>
> See if convert_from helps in this regard, I believe it is supposed to be
> faster than cast varchar.
>
> This is likely what will work on your data
> CONVERT_FROM(, 'UTF8')
>
> Hopefully someone with more in depth knowledge of the Drill Parquet reader
> can comment.
>
> --Andries
>
>
>
> > On May 23, 2016, at 7:35 AM, John Omernik  wrote:
> >
> > I am learning more about my data here, the data was created in a CDH
> > version of the apache parquet-mr library. (Not sure version yet, getting
> > that soon).  They used snappy and version 1.0 of the Parquet spec due to
> > Impala needing it.  They are also using setEnableDictionary on the write.
> >
> > Trying to figure things out right now
> >
> > If I make a view and cast all string fields to a VARCHAR drill shows the
> > right result, but it's slow.
> >
> > (10 row select from raw = 1.9 seconds, 10 row select with CAST in a view
> =
> > 25 seconds)
> >
> > I've resigned myself to converting the table once for performance, which
> > isn't an issue however I am getting different issues on that front  (I'll
> > open a new thread for that)
> >
> > Other than the cast(field AS VARCHAR) as field  is there any other
> (perhaps
> > more performant) way to handle this situation?
> >
> >
> >
> >
> >
> > On Mon, May 23, 2016 at 8:31 AM, Todd  wrote:
> >
> >>
> >> Looks like Impala encoded string as binary data, I think there is some
> >> configuration in Drill(I know spark has) that helps do the conversion.
> >>
> >>
> >>
> >>
> >>
> >> At 2016-05-23 21:25:17, "John Omernik"  wrote:
> >>> Hey all, I have some Parquet files that I believe were made in a Map
> >> Reduce
> >>> job and work well in Impala, however, when I read them in Drill, the
> >> fields
> >>> that are strings come through as [B@25ddbb etc. The exact string
> >>> represented as regex would be /\[B@[a-f0-9]{8}/  (Pointers maybe?)
> >>>
> >>> Well, I found I  can cast those fields as Varchar... and get the right
> >>> data... is this the right approach?  Why is this happening? Performance
> >>> wise am I hurting something by doing the cast to Varchar?
> >>>
> >>>
> >>> Any thoughts would be helpful...
> >>>
> >>> John
> >>
>
>


Re: Reading Parquet Files Created Elsewhere

2016-05-23 Thread Andries Engelbrecht
John,

See if convert_from helps in this regard, I believe it is supposed to be faster 
than cast varchar.

This is likely what will work on your data 
CONVERT_FROM(, 'UTF8')

Hopefully someone with more in depth knowledge of the Drill Parquet reader can 
comment.

--Andries



> On May 23, 2016, at 7:35 AM, John Omernik  wrote:
> 
> I am learning more about my data here, the data was created in a CDH
> version of the apache parquet-mr library. (Not sure version yet, getting
> that soon).  They used snappy and version 1.0 of the Parquet spec due to
> Impala needing it.  They are also using setEnableDictionary on the write.
> 
> Trying to figure things out right now
> 
> If I make a view and cast all string fields to a VARCHAR drill shows the
> right result, but it's slow.
> 
> (10 row select from raw = 1.9 seconds, 10 row select with CAST in a view =
> 25 seconds)
> 
> I've resigned myself to converting the table once for performance, which
> isn't an issue however I am getting different issues on that front  (I'll
> open a new thread for that)
> 
> Other than the cast(field AS VARCHAR) as field  is there any other (perhaps
> more performant) way to handle this situation?
> 
> 
> 
> 
> 
> On Mon, May 23, 2016 at 8:31 AM, Todd  wrote:
> 
>> 
>> Looks like Impala encoded string as binary data, I think there is some
>> configuration in Drill(I know spark has) that helps do the conversion.
>> 
>> 
>> 
>> 
>> 
>> At 2016-05-23 21:25:17, "John Omernik"  wrote:
>>> Hey all, I have some Parquet files that I believe were made in a Map
>> Reduce
>>> job and work well in Impala, however, when I read them in Drill, the
>> fields
>>> that are strings come through as [B@25ddbb etc. The exact string
>>> represented as regex would be /\[B@[a-f0-9]{8}/  (Pointers maybe?)
>>> 
>>> Well, I found I  can cast those fields as Varchar... and get the right
>>> data... is this the right approach?  Why is this happening? Performance
>>> wise am I hurting something by doing the cast to Varchar?
>>> 
>>> 
>>> Any thoughts would be helpful...
>>> 
>>> John
>> 



Reading and converting Parquet files intended for Impala

2016-05-23 Thread John Omernik
I have a largish directory of parquet files generated for use in Impala.
They were created with the CDH version of apache-parquet-mr (not sure on
version at this time)

Some settings:
Compression: snappy
Use Dictionary: true
WRITER_VERION: PARQUET_1_0

I can read them as is in Drill, however, the strings all come through as
binary (see other thread). I can cast all those fields as VARCHAR and read
them but take a bad performance hit (2 seconds to read directly from raw
parquet, limit 10, but showing binary.  25 seconds to use a view that CASTS
all fields into the proper types... data returns accurately, but 10 rows
taking 25 seconds is too long)

So I want to read from this directory (approx 126GB) and CTAS in a way
Drill will be happier.

I've tried this two ways. One was just to ctas directly from view I
created. All else being default. The other was to set the reader
"new_reader" = true. Neither worked, and new_reader actually behaves very
badly (need to restart drill bits)  At least the other default reader
errors :)

store.parquet.use_new_reader = false (the default)
This through the error below (it's a truncated error, lots of fireld names
and other things.  It stored 6 GB of files and died.

store.parquet.use_new_reader = true

1.4 GB of files created and  everything hangs, need to restart drillbits
(is this an issue?)



Error from "non" new_reader:

rror: SYSTEM ERROR: ArrayIndexOutOfBoundsException: 107014



Fragment 1:36



[Error Id: ab5b202f-94cc-4275-b136-537dfbea6b31 on
atl1ctuzeta05.ctu-bo.secureworks.net:20001]



  (org.apache.drill.common.exceptions.DrillRuntimeException) Error in
parquet record reader.

Message:

Hadoop path: /path/to/files/-m-1.snappy.parquet

Total records read: 393120

Mock records read: 0

Records to read: 32768

Row group index: 0

Records in row group: 536499

Parquet Metadata: ParquetMetaData{FileMetaData{schema: message events {

…




org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():352


org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():454

org.apache.drill.exec.physical.impl.ScanBatch.next():191

org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.record.AbstractRecordBatch.next():109

org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51


org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129

org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.record.AbstractRecordBatch.next():109

org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51


org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129

org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.record.AbstractRecordBatch.next():109

org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():91

org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.physical.impl.BaseRootExec.next():104


org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92

org.apache.drill.exec.physical.impl.BaseRootExec.next():94

org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257

org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251

java.security.AccessController.doPrivileged():-2

javax.security.auth.Subject.doAs():422

org.apache.hadoop.security.UserGroupInformation.doAs():1595

org.apache.drill.exec.work.fragment.FragmentExecutor.run():251

org.apache.drill.common.SelfCleaningRunnable.run():38

java.util.concurrent.ThreadPoolExecutor.runWorker():1142

java.util.concurrent.ThreadPoolExecutor$Worker.run():617

java.lang.Thread.run():745

  Caused By (java.lang.ArrayIndexOutOfBoundsException) 107014


org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary.decodeToLong():164


org.apache.parquet.column.values.dictionary.DictionaryValuesReader.readLong():122


org.apache.drill.exec.store.parquet.columnreaders.ParquetFixedWidthDictionaryReaders$DictionaryBigIntReader.readField():161


org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.readValues():120


org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPageData():169


org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.determineSize():146


org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPages():107


org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields():393


org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():439

org.apache.drill.exec.physical.impl.ScanBatch.next():191

org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.record.Abstra

Re: Reading Parquet Files Created Elsewhere

2016-05-23 Thread John Omernik
I am learning more about my data here, the data was created in a CDH
version of the apache parquet-mr library. (Not sure version yet, getting
that soon).  They used snappy and version 1.0 of the Parquet spec due to
Impala needing it.  They are also using setEnableDictionary on the write.

Trying to figure things out right now

If I make a view and cast all string fields to a VARCHAR drill shows the
right result, but it's slow.

(10 row select from raw = 1.9 seconds, 10 row select with CAST in a view =
25 seconds)

I've resigned myself to converting the table once for performance, which
isn't an issue however I am getting different issues on that front  (I'll
open a new thread for that)

Other than the cast(field AS VARCHAR) as field  is there any other (perhaps
more performant) way to handle this situation?





On Mon, May 23, 2016 at 8:31 AM, Todd  wrote:

>
> Looks like Impala encoded string as binary data, I think there is some
> configuration in Drill(I know spark has) that helps do the conversion.
>
>
>
>
>
> At 2016-05-23 21:25:17, "John Omernik"  wrote:
> >Hey all, I have some Parquet files that I believe were made in a Map
> Reduce
> >job and work well in Impala, however, when I read them in Drill, the
> fields
> >that are strings come through as [B@25ddbb etc. The exact string
> >represented as regex would be /\[B@[a-f0-9]{8}/  (Pointers maybe?)
> >
> >Well, I found I  can cast those fields as Varchar... and get the right
> >data... is this the right approach?  Why is this happening? Performance
> >wise am I hurting something by doing the cast to Varchar?
> >
> >
> >Any thoughts would be helpful...
> >
> >John
>


Re:Reading Parquet Files Created Elsewhere

2016-05-23 Thread Todd

Looks like Impala encoded string as binary data, I think there is some 
configuration in Drill(I know spark has) that helps do the conversion.





At 2016-05-23 21:25:17, "John Omernik"  wrote:
>Hey all, I have some Parquet files that I believe were made in a Map Reduce
>job and work well in Impala, however, when I read them in Drill, the fields
>that are strings come through as [B@25ddbb etc. The exact string
>represented as regex would be /\[B@[a-f0-9]{8}/  (Pointers maybe?)
>
>Well, I found I  can cast those fields as Varchar... and get the right
>data... is this the right approach?  Why is this happening? Performance
>wise am I hurting something by doing the cast to Varchar?
>
>
>Any thoughts would be helpful...
>
>John


Reading Parquet Files Created Elsewhere

2016-05-23 Thread John Omernik
Hey all, I have some Parquet files that I believe were made in a Map Reduce
job and work well in Impala, however, when I read them in Drill, the fields
that are strings come through as [B@25ddbb etc. The exact string
represented as regex would be /\[B@[a-f0-9]{8}/  (Pointers maybe?)

Well, I found I  can cast those fields as Varchar... and get the right
data... is this the right approach?  Why is this happening? Performance
wise am I hurting something by doing the cast to Varchar?


Any thoughts would be helpful...

John