Re: Apache Drill 1.10.0 - Issue with combination of Like operator & newline (\n) character in data

2019-07-17 Thread Shankar Mane
Hi,  just tested on *Drill 1.16.0 (docker) *:

*--1) **result = success*
apache drill> select * from hive.default.withdraw;
+---+
|  id   |
+---+
| withdraw
cash |
+---+
1 row selected (0.189 seconds)


*--2) **result = success*
apache drill> select * from hive.default.withdraw where id  like
'%withdraw%';
+---+
|  id   |
+---+
| withdraw
cash |
+---+
1 row selected (0.168 seconds)


*--3) **result = wrong*
apache drill> select * from hive.default.withdraw where id  like
'%withdraw%cash';
++
| id |
++
++
No rows selected (0.214 seconds)


*--4)* *result = success*
apache drill> select * from hive.default.withdraw where id  like '%cash%';
+---+
|  id   |
+---+
| withdraw
cash |
+---+
1 row selected (0.181 seconds)





On Wed, Jul 17, 2019 at 5:03 PM Arina Yelchiyeva 
wrote:

> Hi Shankar,
>
> To verify behavior on the latest Drill version, you can download Drill
> from here https://drill.apache.org/download/ <
> https://drill.apache.org/download/>.
>
> Kind regards,
> Arina
>
> > On Jul 17, 2019, at 2:29 PM, Shankar Mane 
> wrote:
> >
> > Hi All,
> >
> > I am facing some issues while using Like operator & newline (\n)
> character.
> > Below is the in details description :
> >
> > *-- Hive Queries
> > *
> >
> > create table default.withdraw(
> > id string
> > ) stored as parquet;
> >
> >
> > insert into default.withdraw select 'withdraw\ncash';
> >
> >
> > *--1)  result = success*
> >
> > hive> select * from default.withdraw where id like '%withdraw%';
> > OK
> > withdraw
> > cash
> > Time taken: 0.078 seconds, Fetched: 1 row(s)
> >
> >
> > *--2) **result = wrong*
> >
> > hive> select * from default.withdraw where id like '%withdraw%cash';
> > OK
> > Time taken: 0.066 seconds
> >
> >
> > *--3) **result = success*
> >
> > hive> select * from default.withdraw where id like '%cash%';
> > OK
> > withdraw
> > cash
> > Time taken: 0.086 seconds, Fetched: 1 row(s)
> >
> >
> >
> > *-- Drill Queries
> > -*
> > FYI - Drill (v1.10.0) is using above table meta store. We tested above
> > queries on drill too.
> >
> > *Summary* - I m getting unexpected results in this version (1.10.0). *Can
> > someone please confirm what is the output on latest Drill version ?*
> >
> > Here are the Drill queries and its outputs:
> >
> > *--1) result = success*
> >
> > 0: jdbc:drill:> select * from withdraw;
> > ++
> > |   id   |
> > ++
> > | withdraw
> > cash  |
> > ++
> > 1 row selected (0.34 seconds)
> >
> >
> > *--2) **result = wrong*
> >
> > 0: jdbc:drill:> select * from withdraw where id like '%withdraw%';
> > +-+
> > | id  |
> > +-+
> > +-+
> > No rows selected (1.219 seconds)
> >
> >
> >
> >
> > *--3) **result = wrong*
> >
> > 0: jdbc:drill:> select * from withdraw where id like '%withdraw%cash';
> > +-+
> > | id  |
> > +-+
> > +-+
> > No rows selected (0.188 seconds)
> >
> >
> >
> > *--4) **result = wrong*
> >
> > 0: jdbc:drill:> select * from withdraw where id like '%cash%';
> > +-+
> > | id  |
> > +-+
> > +-+
> > No rows selected (0.181 seconds)
> >
> > *-- *
> > *-- *
> >
> > Please help here in case i am missing anything.
> >
> > regards,
> > shankar
>
>


Apache Drill 1.10.0 - Issue with combination of Like operator & newline (\n) character in data

2019-07-17 Thread Shankar Mane
Hi All,

I am facing some issues while using Like operator & newline (\n) character.
Below is the in details description :

*-- Hive Queries
*

create table default.withdraw(
id string
) stored as parquet;


insert into default.withdraw select 'withdraw\ncash';


*--1)  result = success*

hive> select * from default.withdraw where id like '%withdraw%';
OK
withdraw
cash
Time taken: 0.078 seconds, Fetched: 1 row(s)


*--2) **result = wrong*

hive> select * from default.withdraw where id like '%withdraw%cash';
OK
Time taken: 0.066 seconds


*--3) **result = success*

hive> select * from default.withdraw where id like '%cash%';
OK
withdraw
cash
Time taken: 0.086 seconds, Fetched: 1 row(s)



*-- Drill Queries
-*
FYI - Drill (v1.10.0) is using above table meta store. We tested above
queries on drill too.

*Summary* - I m getting unexpected results in this version (1.10.0). *Can
someone please confirm what is the output on latest Drill version ?*

Here are the Drill queries and its outputs:

*--1) result = success*

0: jdbc:drill:> select * from withdraw;
++
|   id   |
++
| withdraw
cash  |
++
1 row selected (0.34 seconds)


*--2) **result = wrong*

0: jdbc:drill:> select * from withdraw where id like '%withdraw%';
+-+
| id  |
+-+
+-+
No rows selected (1.219 seconds)




*--3) **result = wrong*

0: jdbc:drill:> select * from withdraw where id like '%withdraw%cash';
+-+
| id  |
+-+
+-+
No rows selected (0.188 seconds)



*--4) **result = wrong*

0: jdbc:drill:> select * from withdraw where id like '%cash%';
+-+
| id  |
+-+
+-+
No rows selected (0.181 seconds)

*-- *
*-- *

Please help here in case i am missing anything.

regards,
shankar


Re: QUESTION: Drill Configuration to access S3 buckets

2017-06-14 Thread Shankar Mane
aws new regions uses only signature version 4 protocol for S3. Other
regions has both V2 and V4 compatible. Drill works very well if regions has
both signature versions.

By adding endpoints, same problem persists. May be Drill API doesn't have
support to V4 protocol yet.

This V4 problems is also with native hadoop versions prior to 2.8.0.



On 15-Jun-2017 9:49 AM, "Jack Ingoldsby"  wrote:

> Useful to know, thanks. Also having problems with Ohio. Will try another
> region
>
> On Wed, Jun 14, 2017, 19:46 Сергей Боровик  wrote:
>
> > Hi!
> > I have an AWS EC2 instance with apache drill 1-10.0.and configured IAM
> > Role.
> >
> > And I am able to access and query S3 bucket in US East (N. Virginia)
> > region,
> > but not able to access/query buckets in US East (Ohio) region, it fails
> > with
> > "error: system error: amazons3exception: status code 400, AWS Service:
> > Amazon S3,
> > AWS Request ID:9D54A8310F26582B, AWS Error Code: null, AWS Error Message:
> > Bad Request"
> >
> >
> > I've tried set conf/core-site.xml property to:
> >
> > 
> > fs.s3a.endpoint
> > s3.us-east-2.amazonaws.com
> > 
> >
> > in this case Ohio fails with the same error,
> > and N. Virginia has error status code 301, AWS Error Code:
> > PermanentRedirect,
> > AWS Error message: The bucket you are attempting to access must be
> > addressed using the specified endpoint
> >
> > 1) Is there any specific configuration that needs to be enabled on Drill
> > for Ohio region?
> > 2) Does Drill not work on aws signature version 4?
> >
> > Thank you in advance.
> > Any advice is much appreciated!
> >
>


Does s3 plugin support AWS S3 signature version 4 ?

2017-04-04 Thread Shankar Mane
Quick question here:

Does s3 plugin support S3 signature version 4  ?

FYI: s3 plugin works in case when region has support for both v2 and v4
signature. Whereas it seems problematic, for regions (eg. ap-south-1) which
only has v4 signature version support.

regards,
shankar


Re: Hangout starting at 10am PST

2017-04-04 Thread Shankar Mane
Quick question here:

Does s3 plugin support S3 signature version 4  ?

FYI: s3 plugin works when region which has support for both v2 and v4
signature. Whereas it seems problematic, when new regions (eg. ap-south-1)
which only has v4 signature version.

On Tue, Apr 4, 2017 at 10:28 PM, Aman Sinha  wrote:

> Hangout link:
> https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc
>


Re: [HANGOUT] Topics for 2017-02-21

2017-02-20 Thread Shankar Mane
any plans to support Hive 2.x line?

On 21-Feb-2017 12:40 AM, "Paul Rogers"  wrote:

> Hi All,
>
> Our bi-weekly hangout is tomorrow (2017-02-21, 10 AM PT). Please respond
> with suggested topics. We will also ask for additional topics at the
> beginning of the hangout.
>
> One topic I’d like to suggest: how we can make Drill even more stable than
> it already is? Suggestions for focus areas? Tests? Particular JIRA tickets?
> Other ideas?
>
> Thanks,
>
> - Paul


RE: Query on performance using Drill and Amazon s3.

2017-02-20 Thread Shankar Mane
1. how much memory have u configured for drill?
2. what about network bandwidth between your s3 and cluster?

On 20-Feb-2017 8:14 PM, "Nitin Pawar"  wrote:

> Hi chetan,
>
> Projjwal has the issue. Me too asked the same question
>
> On Feb 20, 2017 7:56 PM, "Chetan Kothari" 
> wrote:
>
> > Hi Nitin
> >
> >
> >
> > Where does the query execute?
> >
> > Does Drill execute query on AWS and fetch results to be displayed?
> >
> >
> >
> > Regards
> >
> > Chetan
> >
> >
> >
> > -Original Message-
> > From: Nitin Pawar [mailto:nitinpawar...@gmail.com]
> > Sent: Monday, February 20, 2017 6:19 PM
> > To: user@drill.apache.org
> > Subject: Re: Query on performance using Drill and Amazon s3.
> >
> >
> >
> > how are you doing select * .. using drill UI or sqlline?
> >
> > where are you running it from ?
> >
> > is the drill hosted in aws or on your local machine?
> >
> >
> >
> > I think majority of the time is spent on displaying the result set
> instead
> > of querying the file if the drill server is on aws.
> >
> > If the drill server is local then it might be your network which might
> > take a lot of time based on s3 bucket location and where your drill
> server
> > is
> >
> >
> >
> > On Mon, Feb 20, 2017 at 5:37 PM, PROJJWAL SAHA  > proj.s...@gmail.com"proj.s...@gmail.com> wrote:
> >
> >
> >
> > > Hello all,
> >
> > >
> >
> > > I am using 1GB data in the form of .tsv file, stored in Amazon S3
> >
> > > using Drill 1.8. I am using default configurations of Drill using S3
> >
> > > storage plugin coming out of the box. The drill bits are configured on
> >
> > > a 5 node cluster with 32GB RAM and 4VCPU.
> >
> > >
> >
> > > I see that select * from xxx; query takes 23 mins to fetch 1,040,000
> > rows.
> >
> > >
> >
> > > Is this the expected behaviour ?
> >
> > > I am looking for any quick tuning that can improve the performance or
> >
> > > any other suggestions.
> >
> > >
> >
> > > Attaching is the JSON profile for this query.
> >
> > >
> >
> > > Regards,
> >
> > > Projjwal
> >
> > >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > Nitin Pawar
> >
> >
> >
>


[Drill-Issues] Drill-1.6.0: CTAS - No table found in case select returns empty set.

2016-08-22 Thread Shankar Mane
Is there any configuration to do, To force create table even when SELECT
statement return empty data ?


0: jdbc:drill:> create table glv_abcookie_min_newsid as
. . . . . . . > select
. . . . . . . > app_sessionid,
. . . . . . . > new_sessionid,
. . . . . . . > min(serverTime) as min_serverTime
. . . . . . . > from
. . . . . . . > base_newsid
. . . . . . . > where
. . . . . . . > STRPOS(cookie, 'Opera') > 0
. . . . . . . > group by
. . . . . . . > app_sessionid,
. . . . . . . > new_sessionid ;
+---++
| Fragment  | Number of records written  |
+---++
| 1_2   | 0  |
| 1_1   | 0  |
| 1_3   | 0  |
| 1_0   | 0  |
| 1_5   | 0  |
| 1_4   | 0  |
| 1_6   | 0  |
| 1_7   | 0  |
| 1_8   | 0  |
| 1_10  | 0  |
| 1_11  | 0  |
| 1_12  | 0  |
| 1_14  | 0  |
| 1_9   | 0  |
| 1_15  | 0  |
| 1_18  | 0  |
| 1_13  | 0  |
| 1_19  | 0  |
| 1_16  | 0  |
| 1_17  | 0  |
| 1_20  | 0  |
| 1_21  | 0  |
| 1_22  | 0  |
+---++
23 rows selected (39.99 seconds)

0: jdbc:drill:> select * from glv_abcookie_min_newsid limit 1 ;
Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 37: Table
'glv_abcookie_min_newsid' not found

SQL Query null

[Error Id: de854901-99a3-4237-90be-0162f5d995d2 on datanode2:31010]
(state=,code=0)


Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-05 Thread Shankar Mane
Yes, i went through the benchmarks and started testing this one.

I have tested this one using Hadoop Map-Reduce. And it seems BZ worked
faster than GZ.  As i know GZ is non-splittable and BZ is splittable.
Hadoop MR takes the advantage of this splittable property and launched
multiple mappers and reducers (multiple CPU's) whereas in case of GZ only
single mapper runs (single CPU) .

Can't drill use this splittable property ?



On Fri, Aug 5, 2016 at 8:50 PM, Khurram Faraaz <kfar...@maprtech.com> wrote:

> Shankar,
>
> This is expected behavior, bzip2 decompression is four to twelve times
> slower than decompressing gzip compressed files.
> You can look at the comparison benchmark here for numbers -
> http://tukaani.org/lzma/benchmarks.html
>
> On Thu, Aug 4, 2016 at 5:13 PM, Shankar Mane <shankar.m...@games24x7.com>
> wrote:
>
> > Please find the query plan for both queries. FYI: I am not seeing
> > any planning difference between these 2 queries except Cost.
> >
> >
> > / Query on GZ
> > /
> >
> > 0: jdbc:drill:> explain plan for select channelid, count(serverTime) from
> > dfs.`/tmp/stest-gz/kafka_3_25-Jul-2016-12a.json.gz` group by channelid ;
> > +--+--+
> > | text | json |
> > +--+--+
> > | 00-00Screen
> > 00-01  Project(channelid=[$0], EXPR$1=[$1])
> > 00-02UnionExchange
> > 01-01  HashAgg(group=[{0}], EXPR$1=[$SUM0($1)])
> > 01-02Project(channelid=[$0], EXPR$1=[$1])
> > 01-03  HashToRandomExchange(dist0=[[$0]])
> > 02-01UnorderedMuxExchange
> > 03-01  Project(channelid=[$0], EXPR$1=[$1],
> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> > 03-02HashAgg(group=[{0}], EXPR$1=[COUNT($1)])
> > 03-03  Scan(groupscan=[EasyGroupScan
> > [selectionRoot=hdfs://namenode:9000/tmp/stest-gz/
> > kafka_3_25-Jul-2016-12a.json.gz,
> > numFiles=1, columns=[`channelid`, `serverTime`],
> > files=[hdfs://namenode:9000/tmp/stest-gz/kafka_3_25-Jul-
> > 2016-12a.json.gz]]])
> >  | {
> >   "head" : {
> > "version" : 1,
> > "generator" : {
> >   "type" : "ExplainHandler",
> >   "info" : ""
> > },
> > "type" : "APACHE_DRILL_PHYSICAL",
> > "options" : [ ],
> > "queue" : 0,
> > "resultMode" : "EXEC"
> >   },
> >   "graph" : [ {
> > "pop" : "fs-scan",
> > "@id" : 196611,
> > "userName" : "hadoop",
> > "files" : [
> > "hdfs://namenode:9000/tmp/stest-gz/kafka_3_25-Jul-2016-12a.json.gz" ],
> > "storage" : {
> >   "type" : "file",
> >   "enabled" : true,
> >   "connection" : "hdfs://namenode:9000",
> >   "config" : null,
> >   "workspaces" : {
> > "root" : {
> >   "location" : "/tmp/",
> >   "writable" : true,
> >   "defaultInputFormat" : null
> > },
> > "tmp" : {
> >   "location" : "/tmp",
> >   "writable" : true,
> >   "defaultInputFormat" : null
> > }
> >   },
> >   "formats" : {
> > "psv" : {
> >   "type" : "text",
> >   "extensions" : [ "tbl" ],
> >   "delimiter" : "|"
> > },
> > "csv" : {
> >   "type" : "text",
> >   "extensions" : [ "csv" ],
> >   "delimiter" : ","
> > },
> > "tsv" : {
> >   "type" : "text",
> >   "extensions" : [ "tsv" ],
> >   "delimiter" : "\t"
> > },
> > "parquet" : {
> >   "type" : "parquet"
> > },
> > "json" : {
> >   "type" : "json",
> >   "extensions" : [ "json" ]
> > },
> > "avro" : {

Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-04 Thread Shankar Mane
 ],
  "delimiter" : ","
},
"tsv" : {
  "type" : "text",
  "extensions" : [ "tsv" ],
  "delimiter" : "\t"
},
"parquet" : {
  "type" : "parquet"
},
"json" : {
  "type" : "json",
  "extensions" : [ "json" ]
},
"avro" : {
  "type" : "avro"
}
  }
},
"format" : {
  "type" : "json",
  "extensions" : [ "json" ]
},
"columns" : [ "`channelid`", "`serverTime`" ],
"selectionRoot" :
"hdfs://namenode:9000/tmp/stest-bz2/kafka_3_25-Jul-2016-12a.json.bz2",
"cost" : 1148224.0
  }, {
"pop" : "hash-aggregate",
"@id" : 196610,
"child" : 196611,
"cardinality" : 1.0,
"initialAllocation" : 100,
"maxAllocation" : 100,
"groupByExprs" : [ {
  "ref" : "`channelid`",
  "expr" : "`channelid`"
} ],
"aggrExprs" : [ {
  "ref" : "`EXPR$1`",
  "expr" : "count(`serverTime`) "
} ],
"cost" : 574112.0
  }, {
"pop" : "project",
"@id" : 196609,
"exprs" : [ {
  "ref" : "`channelid`",
  "expr" : "`channelid`"
}, {
  "ref" : "`EXPR$1`",
  "expr" : "`EXPR$1`"
}, {
  "ref" : "`E_X_P_R_H_A_S_H_F_I_E_L_D`",
  "expr" : "hash32asdouble(`channelid`) "
} ],
"child" : 196610,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : 114822.4
  }, {
"pop" : "unordered-mux-exchange",
"@id" : 131073,
"child" : 196609,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : 114822.4
  }, {
"pop" : "hash-to-random-exchange",
"@id" : 65539,
"child" : 131073,
"expr" : "`E_X_P_R_H_A_S_H_F_I_E_L_D`",
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : 114822.4
  }, {
"pop" : "project",
"@id" : 65538,
"exprs" : [ {
  "ref" : "`channelid`",
  "expr" : "`channelid`"
}, {
  "ref" : "`EXPR$1`",
  "expr" : "`EXPR$1`"
} ],
"child" : 65539,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : 114822.4
  }, {
"pop" : "hash-aggregate",
"@id" : 65537,
"child" : 65538,
"cardinality" : 1.0,
"initialAllocation" : 100,
"maxAllocation" : 100,
"groupByExprs" : [ {
  "ref" : "`channelid`",
  "expr" : "`channelid`"
} ],
"aggrExprs" : [ {
  "ref" : "`EXPR$1`",
  "expr" : "$sum0(`EXPR$1`) "
} ],
"cost" : 57411.2
  }, {
"pop" : "union-exchange",
"@id" : 2,
"child" : 65537,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : 11482.24
  }, {
"pop" : "project",
"@id" : 1,
"exprs" : [ {
  "ref" : "`channelid`",
  "expr" : "`channelid`"
}, {
  "ref" : "`EXPR$1`",
  "expr" : "`EXPR$1`"
} ],
"child" : 2,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : 11482.24
  }, {
"pop" : "screen",
"@id" : 0,
"child" : 1,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : 11482.24
  } ]
} |
+--+--+
1 row selected (0.381 seconds)
0: jdbc:drill:>


On Thu, Aug 4, 2016 at 3:07 PM, Khurram Faraaz <kfar...@maprtech.com> wrote:

> Can you please do an explain plan over the two aggregate queries. That way
> we can know where most of the time is being spent, is it in the query
> planning phase or is it query execution that is taking longer. Please share
> the query plans and the time taken for those explain plan statements.
>
> On Mon, Aug 1, 2016 at 3:46 PM, Shankar Mane <shankar.m...@games24x7.com>
> wrote:
>
> > It is plain json (1 json per line).
> > Each json message size = ~4kb
> > no. of json messages = ~5 Millions.
> >
> > store.parquet.compression = snappy ( i don't think, this parameter get
> > used. As I am querying select only.)
> >
> >
> > On Mon, Aug 1, 2016 at 3:27 PM, Khurram Faraaz <kfar...@maprtech.com>
> > wrote:
> >
> > > What is the data format within those .gz and .bz2 files ? It is parquet
> > or
> > > JSON or plain text (CSV) ?
> > > Also, what was this config parameter `store.parquet.compression` set
> to,
> > > when ypu ran your test ?
> > >
> > > - Khurram
> > >
> > > On Sun, Jul 31, 2016 at 11:17 PM, Shankar Mane <
> > shankar.m...@games24x7.com
> > > >
> > > wrote:
> > >
> > > > Awaiting for response..
> > > >
> > > > On 30-Jul-2016 3:20 PM, "Shankar Mane" <shankar.m...@games24x7.com>
> > > wrote:
> > > >
> > > > >
> > > >
> > > > > I am Comparing Querying speed between GZ and BZ2.
> > > > >
> > > > > Below are the 2 files and their sizes (This 2 files have same
> data):
> > > > > kafka_3_25-Jul-2016-12a.json.gz = 1.8G
> > > > > kafka_3_25-Jul-2016-12a.json.bz2= 1.1G
> > > > >
> > > > >
> > > > >
> > > > > Results:
> > > > >
> > > > > 0: jdbc:drill:> select channelid, count(serverTime) from
> > > > dfs.`/tmp/stest-gz/kafka_3_25-Jul-2016-12a.json.gz` group by
> channelid
> > ;
> > > > > ++--+
> > > > > | channelid  |  EXPR$1  |
> > > > > ++--+
> > > > > | 3  | 977134   |
> > > > > | 0  | 836850   |
> > > > > | 2  | 3202854  |
> > > > > ++--+
> > > > > 3 rows selected (86.034 seconds)
> > > > >
> > > > >
> > > > >
> > > > > 0: jdbc:drill:> select channelid, count(serverTime) from
> > > > dfs.`/tmp/stest-bz2/kafka_3_25-Jul-2016-12a.json.bz2` group by
> > channelid
> > > ;
> > > > > ++--+
> > > > > | channelid  |  EXPR$1  |
> > > > > ++--+
> > > > > | 3  | 977134   |
> > > > > | 0  | 836850   |
> > > > > | 2  | 3202854  |
> > > > > ++--+
> > > > > 3 rows selected (459.079 seconds)
> > > > >
> > > > >
> > > > >
> > > > > Questions:
> > > > > 1. As per above Test: Gz is 6x fast than Bz2. why is that ?
> > > > > 2. How can we speed to up Bz2.  Are there any configuration to do ?
> > > > > 3. As bz2 is splittable format, How drill using it ?
> > > > >
> > > > >
> > > > > regards,
> > > > > shankar
> > > >
> > >
> >
>


[Drill-Issues] Drill-1.6.0: Drillbit not starting

2016-08-04 Thread Shankar Mane
I am getting this error infrequently. Most of the time drill starts
normally and sometimes this gives below error. I am running drill 1.6.0 in
cluster mode. ZK has also setup.

Could some one please explain where the issue is ?



2016-08-04 03:45:15,870 [main] INFO  o.a.d.e.s.s.PersistentStoreRegistry -
Using the configured PStoreProvider class: 'org.apache.drill.exec.store.sy
s.store.provider.ZookeeperPersistentStoreProvider'.
2016-08-04 03:45:16,430 [main] INFO  o.apache.drill.exec.server.Drillbit -
Construction completed (1294 ms).
2016-08-04 03:45:28,250 [main] WARN  o.apache.drill.exec.server.Drillbit -
Failure on close()
java.lang.NullPointerException: null
at
org.apache.drill.exec.work.WorkManager.close(WorkManager.java:157)
~[drill-java-exec-1.6.0.jar:1.6.0]
at
org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76)
~[drill-common-1.6.0.jar:1.6.0]
at
org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64)
~[drill-common-1.6.0.jar:1.6.0]
at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:149)
[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:283)
[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:261)
[drill-java-exec-1.6.0.jar:1.6.0]
at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:257)
[drill-java-exec-1.6.0.jar:1.6.0]
2016-08-04 03:45:28,250 [main] INFO  o.apache.drill.exec.server.Drillbit -
Shutdown completed (1819 ms).


Re: Partition reading problem (like operator) while using hive partition table in drill

2016-08-03 Thread Shankar Mane
Excellent.. n thanks !!

Yes will try n update here.

On 03-Aug-2016 11:08 PM, "rahul challapalli" <challapallira...@gmail.com>
wrote:

DRILL-4665 has been fixed. Can you try it out with the latest master and
see if it works for you now?

- Rahul

On Wed, Aug 3, 2016 at 10:28 AM, Shankar Mane <shankar.m...@games24x7.com>
wrote:

> has any 1 started working on this ?
>
> On Wed, Jun 1, 2016 at 8:27 PM, Zelaine Fong <zf...@maprtech.com> wrote:
>
> > Shankar,
> >
> > Work on this issue has not yet started.  Hopefully, the engineer
assigned
> > to the issue will be able to take a look in a week or so.
> >
> > -- Zelaine
> >
> > On Tue, May 31, 2016 at 10:33 PM, Shankar Mane <
> shankar.m...@games24x7.com
> > >
> > wrote:
> >
> > > I didn't get any response or updates on this jira ticket (
DRILL-4665).
> > >
> > > Does anyone looking into this?
> > > On 11 May 2016 03:31, "Aman Sinha" <amansi...@apache.org> wrote:
> > >
> > > > The Drill test team was able to repro this and is now filed as:
> > > > https://issues.apache.org/jira/browse/DRILL-4665
> > > >
> > > > On Tue, May 10, 2016 at 8:16 AM, Aman Sinha <amansi...@apache.org>
> > > wrote:
> > > >
> > > > > This is supposed to work, especially since LIKE predicate is not
> even
> > > on
> > > > > the partitioning column (it should work either way).  I did a
quick
> > > test
> > > > > with file system tables and it works for LIKE conditions.  Not
sure
> > yet
> > > > > about Hive tables.  Could you pls file a JIRA and we'll follow up.
> > > > > Thanks.
> > > > >
> > > > > -Aman
> > > > >
> > > > > On Tue, May 10, 2016 at 1:09 AM, Shankar Mane <
> > > > shankar.m...@games24x7.com>
> > > > > wrote:
> > > > >
> > > > >> Problem:
> > > > >>
> > > > >> 1. In drill, we are using hive partition table. But explain plan
> > (same
> > > > >> query) for like and = operator differs and used all partitions in
> > case
> > > > of
> > > > >> like operator.
> > > > >> 2. If you see below drill explain plans: Like operator uses *all*
> > > > >> partitions where
> > > > >> = operator uses *only* partition filtered by log_date condition.
> > > > >>
> > > > >> FYI- We are storing our logs in hive partition table (parquet,
> > > > >> gz-compressed). Each partition is having ~15 GB data. Below is
the
> > > > >> describe
> > > > >> statement output from hive:
> > > > >>
> > > > >>
> > > > >> /
> > Hive
> > > > >>
> > > > >>
> > > >
> > >
> >
>
/
> > > > >> hive> desc hive_kafkalogs_daily ;
> > > > >> OK
> > > > >> col_name data_type comment
> > > > >> sessionid   string
> > > > >> ajaxurl string
> > > > >>
> > > > >> log_date string
> > > > >>
> > > > >> # Partition Information
> > > > >> # col_name data_type   comment
> > > > >>
> > > > >> log_date string
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
/*
> > > Drill
> > > > >> Plan (query with LIKE)
> > > > >>
> > > > >>
> > > >
> > >
> >
>
***/
> > > > >>
> > > > >> explain plan for select sessionid, servertime, ajaxUrl from
> > > > >> hive.hive_kafkalogs_daily where log_date = '2016-05-09' and
> ajaxUrl
> > > like
> > > > >> '%utm_source%' limit 1 ;
> > > > >>
> > > > >> +--+--+
> > > > >> | text | json |
> > > > >> +--+--+
> > > > >> | 00-00Screen
> >

Re: Partition reading problem (like operator) while using hive partition table in drill

2016-08-03 Thread Shankar Mane
has any 1 started working on this ?

On Wed, Jun 1, 2016 at 8:27 PM, Zelaine Fong <zf...@maprtech.com> wrote:

> Shankar,
>
> Work on this issue has not yet started.  Hopefully, the engineer assigned
> to the issue will be able to take a look in a week or so.
>
> -- Zelaine
>
> On Tue, May 31, 2016 at 10:33 PM, Shankar Mane <shankar.m...@games24x7.com
> >
> wrote:
>
> > I didn't get any response or updates on this jira ticket ( DRILL-4665).
> >
> > Does anyone looking into this?
> > On 11 May 2016 03:31, "Aman Sinha" <amansi...@apache.org> wrote:
> >
> > > The Drill test team was able to repro this and is now filed as:
> > > https://issues.apache.org/jira/browse/DRILL-4665
> > >
> > > On Tue, May 10, 2016 at 8:16 AM, Aman Sinha <amansi...@apache.org>
> > wrote:
> > >
> > > > This is supposed to work, especially since LIKE predicate is not even
> > on
> > > > the partitioning column (it should work either way).  I did a quick
> > test
> > > > with file system tables and it works for LIKE conditions.  Not sure
> yet
> > > > about Hive tables.  Could you pls file a JIRA and we'll follow up.
> > > > Thanks.
> > > >
> > > > -Aman
> > > >
> > > > On Tue, May 10, 2016 at 1:09 AM, Shankar Mane <
> > > shankar.m...@games24x7.com>
> > > > wrote:
> > > >
> > > >> Problem:
> > > >>
> > > >> 1. In drill, we are using hive partition table. But explain plan
> (same
> > > >> query) for like and = operator differs and used all partitions in
> case
> > > of
> > > >> like operator.
> > > >> 2. If you see below drill explain plans: Like operator uses *all*
> > > >> partitions where
> > > >> = operator uses *only* partition filtered by log_date condition.
> > > >>
> > > >> FYI- We are storing our logs in hive partition table (parquet,
> > > >> gz-compressed). Each partition is having ~15 GB data. Below is the
> > > >> describe
> > > >> statement output from hive:
> > > >>
> > > >>
> > > >> /
> Hive
> > > >>
> > > >>
> > >
> >
> /
> > > >> hive> desc hive_kafkalogs_daily ;
> > > >> OK
> > > >> col_name data_type comment
> > > >> sessionid   string
> > > >> ajaxurl string
> > > >>
> > > >> log_date string
> > > >>
> > > >> # Partition Information
> > > >> # col_name data_type   comment
> > > >>
> > > >> log_date string
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> /*
> > Drill
> > > >> Plan (query with LIKE)
> > > >>
> > > >>
> > >
> >
> ***/
> > > >>
> > > >> explain plan for select sessionid, servertime, ajaxUrl from
> > > >> hive.hive_kafkalogs_daily where log_date = '2016-05-09' and ajaxUrl
> > like
> > > >> '%utm_source%' limit 1 ;
> > > >>
> > > >> +--+--+
> > > >> | text | json |
> > > >> +--+--+
> > > >> | 00-00Screen
> > > >> 00-01  Project(sessionid=[$0], servertime=[$1], ajaxUrl=[$2])
> > > >> 00-02SelectionVectorRemover
> > > >> 00-03  Limit(fetch=[1])
> > > >> 00-04UnionExchange
> > > >> 01-01  SelectionVectorRemover
> > > >> 01-02Limit(fetch=[1])
> > > >> 01-03  Project(sessionid=[$0], servertime=[$1],
> > > >> ajaxUrl=[$2])
> > > >> 01-04SelectionVectorRemover
> > > >> 01-05  Filter(condition=[AND(=($3,
> '2016-05-09'),
> > > >> LIKE($2, '%utm_source%'))])
> > > >> 01-06Scan(groupscan=[HiveScan
> > > >> [table=Table(dbName:default, ta

Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-01 Thread Shankar Mane
It is plain json (1 json per line).
Each json message size = ~4kb
no. of json messages = ~5 Millions.

store.parquet.compression = snappy ( i don't think, this parameter get
used. As I am querying select only.)


On Mon, Aug 1, 2016 at 3:27 PM, Khurram Faraaz <kfar...@maprtech.com> wrote:

> What is the data format within those .gz and .bz2 files ? It is parquet or
> JSON or plain text (CSV) ?
> Also, what was this config parameter `store.parquet.compression` set to,
> when ypu ran your test ?
>
> - Khurram
>
> On Sun, Jul 31, 2016 at 11:17 PM, Shankar Mane <shankar.m...@games24x7.com
> >
> wrote:
>
> > Awaiting for response..
> >
> > On 30-Jul-2016 3:20 PM, "Shankar Mane" <shankar.m...@games24x7.com>
> wrote:
> >
> > >
> >
> > > I am Comparing Querying speed between GZ and BZ2.
> > >
> > > Below are the 2 files and their sizes (This 2 files have same data):
> > > kafka_3_25-Jul-2016-12a.json.gz = 1.8G
> > > kafka_3_25-Jul-2016-12a.json.bz2= 1.1G
> > >
> > >
> > >
> > > Results:
> > >
> > > 0: jdbc:drill:> select channelid, count(serverTime) from
> > dfs.`/tmp/stest-gz/kafka_3_25-Jul-2016-12a.json.gz` group by channelid ;
> > > ++--+
> > > | channelid  |  EXPR$1  |
> > > ++--+
> > > | 3  | 977134   |
> > > | 0  | 836850   |
> > > | 2  | 3202854  |
> > > ++--+
> > > 3 rows selected (86.034 seconds)
> > >
> > >
> > >
> > > 0: jdbc:drill:> select channelid, count(serverTime) from
> > dfs.`/tmp/stest-bz2/kafka_3_25-Jul-2016-12a.json.bz2` group by channelid
> ;
> > > ++--+
> > > | channelid  |  EXPR$1  |
> > > ++--+
> > > | 3  | 977134   |
> > > | 0  | 836850   |
> > > | 2  | 3202854  |
> > > ++--+
> > > 3 rows selected (459.079 seconds)
> > >
> > >
> > >
> > > Questions:
> > > 1. As per above Test: Gz is 6x fast than Bz2. why is that ?
> > > 2. How can we speed to up Bz2.  Are there any configuration to do ?
> > > 3. As bz2 is splittable format, How drill using it ?
> > >
> > >
> > > regards,
> > > shankar
> >
>


Re: CHAR data type

2016-07-04 Thread Shankar Mane
It is being supported since 1.7.0. Please check this link
https://drill.apache.org/docs/apache-drill-1-7-0-release-notes/

On 04-Jul-2016 8:07 PM, "Santosh Kulkarni" 
wrote:

Hello,

While running another simple query for select count(*) from table_name,
Drill gave an error for Unsupported Hive data type CHAR.

The column is of CHAR(6) data type. Drill documentation shows CHAR as
supported data type.

This is on Drill version 1.6

Thanks,

Santosh


Re: Partition reading problem (like operator) while using hive partition table in drill

2016-05-31 Thread Shankar Mane
I didn't get any response or updates on this jira ticket ( DRILL-4665).

Does anyone looking into this?
On 11 May 2016 03:31, "Aman Sinha" <amansi...@apache.org> wrote:

> The Drill test team was able to repro this and is now filed as:
> https://issues.apache.org/jira/browse/DRILL-4665
>
> On Tue, May 10, 2016 at 8:16 AM, Aman Sinha <amansi...@apache.org> wrote:
>
> > This is supposed to work, especially since LIKE predicate is not even on
> > the partitioning column (it should work either way).  I did a quick test
> > with file system tables and it works for LIKE conditions.  Not sure yet
> > about Hive tables.  Could you pls file a JIRA and we'll follow up.
> > Thanks.
> >
> > -Aman
> >
> > On Tue, May 10, 2016 at 1:09 AM, Shankar Mane <
> shankar.m...@games24x7.com>
> > wrote:
> >
> >> Problem:
> >>
> >> 1. In drill, we are using hive partition table. But explain plan (same
> >> query) for like and = operator differs and used all partitions in case
> of
> >> like operator.
> >> 2. If you see below drill explain plans: Like operator uses *all*
> >> partitions where
> >> = operator uses *only* partition filtered by log_date condition.
> >>
> >> FYI- We are storing our logs in hive partition table (parquet,
> >> gz-compressed). Each partition is having ~15 GB data. Below is the
> >> describe
> >> statement output from hive:
> >>
> >>
> >> / Hive
> >>
> >>
> /
> >> hive> desc hive_kafkalogs_daily ;
> >> OK
> >> col_name data_type comment
> >> sessionid   string
> >> ajaxurl string
> >>
> >> log_date string
> >>
> >> # Partition Information
> >> # col_name data_type   comment
> >>
> >> log_date string
> >>
> >>
> >>
> >>
> >> /* Drill
> >> Plan (query with LIKE)
> >>
> >>
> ***/
> >>
> >> explain plan for select sessionid, servertime, ajaxUrl from
> >> hive.hive_kafkalogs_daily where log_date = '2016-05-09' and ajaxUrl like
> >> '%utm_source%' limit 1 ;
> >>
> >> +--+--+
> >> | text | json |
> >> +--+--+
> >> | 00-00Screen
> >> 00-01  Project(sessionid=[$0], servertime=[$1], ajaxUrl=[$2])
> >> 00-02SelectionVectorRemover
> >> 00-03  Limit(fetch=[1])
> >> 00-04UnionExchange
> >> 01-01  SelectionVectorRemover
> >> 01-02Limit(fetch=[1])
> >> 01-03  Project(sessionid=[$0], servertime=[$1],
> >> ajaxUrl=[$2])
> >> 01-04SelectionVectorRemover
> >> 01-05  Filter(condition=[AND(=($3, '2016-05-09'),
> >> LIKE($2, '%utm_source%'))])
> >> 01-06Scan(groupscan=[HiveScan
> >> [table=Table(dbName:default, tableName:hive_kafkalogs_daily),
> >> columns=[`sessionid`, `servertime`, `ajaxurl`, `log_date`],
> >> numPartitions=29, partitions= [Partition(values:[2016-04-11]),
> >> Partition(values:[2016-04-12]), Partition(values:[2016-04-13]),
> >> Partition(values:[2016-04-14]), Partition(values:[2016-04-15]),
> >> Partition(values:[2016-04-16]), Partition(values:[2016-04-17]),
> >> Partition(values:[2016-04-18]), Partition(values:[2016-04-19]),
> >> Partition(values:[2016-04-20]), Partition(values:[2016-04-21]),
> >> Partition(values:[2016-04-22]), Partition(values:[2016-04-23]),
> >> Partition(values:[2016-04-24]), Partition(values:[2016-04-25]),
> >> Partition(values:[2016-04-26]), Partition(values:[2016-04-27]),
> >> Partition(values:[2016-04-28]), Partition(values:[2016-04-29]),
> >> Partition(values:[2016-04-30]), Partition(values:[2016-05-01]),
> >> Partition(values:[2016-05-02]), Partition(values:[2016-05-03]),
> >> Partition(values:[2016-05-04]), Partition(values:[2016-05-05]),
> >> Partition(values:[2016-05-06]), Partition(values:[2016-05-07]),
> >> Partition(values:[2016-05-08]), Partition(values:[2016-05-09])],
> >>
> >>
> inputDirectories=[hdfs://namenode:9000/usr/hive/warehouse/hive_k

Partition reading problem (like operator) while using hive partition table in drill

2016-05-10 Thread Shankar Mane
Problem:

1. In drill, we are using hive partition table. But explain plan (same
query) for like and = operator differs and used all partitions in case of
like operator.
2. If you see below drill explain plans: Like operator uses *all*
partitions where
= operator uses *only* partition filtered by log_date condition.

FYI- We are storing our logs in hive partition table (parquet,
gz-compressed). Each partition is having ~15 GB data. Below is the describe
statement output from hive:


/ Hive
/
hive> desc hive_kafkalogs_daily ;
OK
col_name data_type comment
sessionid   string
ajaxurl string

log_date string

# Partition Information
# col_name data_type   comment

log_date string




/* Drill
Plan (query with LIKE)
***/

explain plan for select sessionid, servertime, ajaxUrl from
hive.hive_kafkalogs_daily where log_date = '2016-05-09' and ajaxUrl like
'%utm_source%' limit 1 ;

+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(sessionid=[$0], servertime=[$1], ajaxUrl=[$2])
00-02SelectionVectorRemover
00-03  Limit(fetch=[1])
00-04UnionExchange
01-01  SelectionVectorRemover
01-02Limit(fetch=[1])
01-03  Project(sessionid=[$0], servertime=[$1],
ajaxUrl=[$2])
01-04SelectionVectorRemover
01-05  Filter(condition=[AND(=($3, '2016-05-09'),
LIKE($2, '%utm_source%'))])
01-06Scan(groupscan=[HiveScan
[table=Table(dbName:default, tableName:hive_kafkalogs_daily),
columns=[`sessionid`, `servertime`, `ajaxurl`, `log_date`],
numPartitions=29, partitions= [Partition(values:[2016-04-11]),
Partition(values:[2016-04-12]), Partition(values:[2016-04-13]),
Partition(values:[2016-04-14]), Partition(values:[2016-04-15]),
Partition(values:[2016-04-16]), Partition(values:[2016-04-17]),
Partition(values:[2016-04-18]), Partition(values:[2016-04-19]),
Partition(values:[2016-04-20]), Partition(values:[2016-04-21]),
Partition(values:[2016-04-22]), Partition(values:[2016-04-23]),
Partition(values:[2016-04-24]), Partition(values:[2016-04-25]),
Partition(values:[2016-04-26]), Partition(values:[2016-04-27]),
Partition(values:[2016-04-28]), Partition(values:[2016-04-29]),
Partition(values:[2016-04-30]), Partition(values:[2016-05-01]),
Partition(values:[2016-05-02]), Partition(values:[2016-05-03]),
Partition(values:[2016-05-04]), Partition(values:[2016-05-05]),
Partition(values:[2016-05-06]), Partition(values:[2016-05-07]),
Partition(values:[2016-05-08]), Partition(values:[2016-05-09])],
inputDirectories=[hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160411,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160412,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160413,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160414,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160415,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160416,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160417,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160418,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160419,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160420,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160421,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160422,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160423,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160424,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160425,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160426,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160427,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160428,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160429,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160430,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160501,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160502,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160503,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160504,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160505,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160506,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160507,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160508,
hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160509]]])
 | {
 

Re: Drill (CTAS) Default hadoop Replication factor on HDFS ?

2016-05-10 Thread Shankar Mane
Thanks Abhishek Girish. Copying hdfs-site.xml  into Drill conf directory
(on all nodes) works for me.

And also tried config options setting. It does getting applied at storage
plugin level But no effects.

On Sat, May 7, 2016 at 11:29 PM, Jacques Nadeau <jacq...@dremio.com> wrote:

> My suggestion would be to use Drill's capability to have config options in
> the storage plugin rather than copying the hdfs-site.xml everywhere. Keeps
> it in one place and allows you to tune per system you are interacting with
> (instead of globally). See here for more detail:
>
> https://issues.apache.org/jira/browse/DRILL-4383
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, May 6, 2016 at 10:17 AM, Abhishek Girish <
> abhishek.gir...@gmail.com>
> wrote:
>
> > Hello,
> >
> > Assuming you have defined your replication factor setting inside your
> > cluster hdfs-site.xml, it might be worth a try to copy this config file
> > into your Drill conf directory (on all nodes). While I haven't tried this
> > myself, i'm hoping this could help.
> >
> > -Abhishek
> >
> > On Fri, May 6, 2016 at 12:50 AM, Shankar Mane <
> shankar.m...@games24x7.com>
> > wrote:
> >
> > > We have hadoop cluster where default replication factor
> (dfs.replication)
> > > is set to 1 ( this cluster is just plug and play, hence we don't need
> to
> > > store more than 1 copies).
> > >
> > > When we used drill *CTAS*, it has created table on *HDFS* with their
> > > own *replication
> > > factor of 3. *
> > >
> > > *Questions are *-
> > > 1. why cant it uses Hadoop default replication factor ?
> > > 2. Is there any setting in Drill to change hadoop replication factor
> > > realtime ?
> > >
> >
>


Drill (CTAS) Default hadoop Replication factor on HDFS ?

2016-05-06 Thread Shankar Mane
We have hadoop cluster where default replication factor (dfs.replication)
is set to 1 ( this cluster is just plug and play, hence we don't need to
store more than 1 copies).

When we used drill *CTAS*, it has created table on *HDFS* with their
own *replication
factor of 3. *

*Questions are *-
1. why cant it uses Hadoop default replication factor ?
2. Is there any setting in Drill to change hadoop replication factor
realtime ?


Questions on Drill custom UDF (getBrowserDetails**) when using external xml files

2016-04-27 Thread Shankar Mane
*Info - *
-- ** We have developed custom UDF named *getBrowserDetails *which returns
json string.

-- We used this UDF to extract *User-Agent* Info like Browser Name, Browser
version, OS name, OS version etc.

-- And this UDF uses *Wurfl *Java API which can be found @
http://www.scientiamobile.com/downloads . While developing Drill UDF we
used wurfl XML file (of size ~*33 MB*).

-- We are using Drill 1.6.0, HDFS and cluster of  3 aws instances of type
m3.2xlarge.

-- Dataset we are operating is of ~60 millions rows  of ~20GB GZ-Compressed
(json data).



*Problem - *
This UDF is working fine on SMALL dataset. But when dataset is *LARGE*,
this is taking too much time. In our case, we haven't get any output yet.



*Questions - *
1. Please suggest a correct way to used external files (xml, csv,
properties etc..) or any other libraries in UDF.
2. Does any one have working/developed similar kind of functions in drill ?

regards,
shankar


NullPointerException while reading Hive table data in drill 1.6.0

2016-04-24 Thread Shankar Mane
Need help to solved below issue:

*I am using :*
hive 1.2.1
drilll 1.6.0

*What i am doing : *
1. table is created in hive and data stored as parquet  (this table is
partitioned on column "log_date". Data size is around 15GB (parquet-Gzip)
2. And used hive storage plugin in drill to read the data.

/*** hive shell
/

hive> select sessionid from hive_daily where log_date = '2016-04-23'  limit
10;
OK
sessionid
64B06F4BF0260082F3A2D203E29C1
D0AC39A72265D1B946F6AE693D1D1
F0521251EE19C63AF63CF50919113
C7E83A9FFEEA8143B5666D0C27ED4
EADA5EADFFD17D6EE3B65BF8F26C1
3A33C47190942D4A9C5EAEC5D5735
C7E83A9FF9148143B5666D0C27ED4
1B88F36623F9BC08B085E8FC43931
A001DA690262B097C45540D184C41
59576D244F6FFA5474F1606B09413
Time taken: 0.122 seconds, Fetched: 10 row(s)




/*** drill shell
/


0: jdbc:drill:> select sessionid from hive.hive_daily where log_date =
'2016-04-23'  limit 10;
Error: SYSTEM ERROR: NullPointerException

Fragment 1:28

[Error Id: a1bf4e7c-303b-424b-8f00-479f696601ca on namenode:31010]

  (org.apache.drill.common.exceptions.ExecutionSetupException)
java.lang.reflect.UndeclaredThrowableException

org.apache.drill.common.exceptions.ExecutionSetupException.fromThrowable():30
org.apache.drill.exec.store.hive.HiveRecordReader.setup():275
org.apache.drill.exec.physical.impl.ScanBatch.():108

org.apache.drill.exec.store.hive.HiveDrillNativeScanBatchCreator.getBatch():151

org.apache.drill.exec.store.hive.HiveDrillNativeScanBatchCreator.getBatch():58
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():146
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():169
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():126
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():169
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():126
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():169
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():126
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():169
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():126
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():169
org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():100
org.apache.drill.exec.physical.impl.ImplCreator.getExec():78
org.apache.drill.exec.work.fragment.FragmentExecutor.run():231
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745
  Caused By (java.util.concurrent.ExecutionException)
java.lang.reflect.UndeclaredThrowableException
java.util.concurrent.FutureTask.report():122
java.util.concurrent.FutureTask.get():192
org.apache.drill.exec.store.hive.HiveRecordReader.setup():268
org.apache.drill.exec.physical.impl.ScanBatch.():108

org.apache.drill.exec.store.hive.HiveDrillNativeScanBatchCreator.getBatch():151

org.apache.drill.exec.store.hive.HiveDrillNativeScanBatchCreator.getBatch():58
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():146
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():169
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():126
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():169
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():126
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():169
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():126
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():169
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():126
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():169
org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():100
org.apache.drill.exec.physical.impl.ImplCreator.getExec():78
org.apache.drill.exec.work.fragment.FragmentExecutor.run():231
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745
  Caused By (java.lang.reflect.UndeclaredThrowableException) null
org.apache.hadoop.security.UserGroupInformation.doAs():1672
org.apache.drill.exec.ops.OperatorContextImpl$1.call():156
java.util.concurrent.FutureTask.run():266
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745
  Caused By (org.apache.drill.common.exceptions.ExecutionSetupException)
Failure while initializing HiveRecordReader: null

Re: Error parsing JSON (drill 1.6.0)

2016-03-25 Thread Shankar Mane
I have opened JIRA for the same. You can track it here "
https://issues.apache.org/jira/browse/DRILL-4520;

For the time being I am using older approach which is *hive*.

*Below are the steps I am following:*
1. Drill - Creating flattened table from HDFS json logs. But as I mentioned
in trail mail, there are bugs/exceptions. So SKIPPING this step.
1. Repeating step-1 again using hive. Here I am using hive 1.2.1 and
json-serde-1.3.7-jar.

   - a) Defining schema for input json by creating hive external table
   (using json-serde).
   - b) Defining hive parquet table schema which can be filled through hive
   external table. Here all fields in parquet table marked as STRING. This
   step is similar to creating parquet table in Drill.
   - c) hive will create table and stored all data in HDFS and in parquet
   format.
   - d) And later use this HDFS path in drill.

2. Here now onward Using *drill* to process further queries which works
fine.

*Here my question is : *
1. is this write approach to create parquet table using hive and used it
for drill ?
2. Parquet table created by hive and drill will not make any differences
and causes inconsistency ?


We know that drill discovers the schema on-the-fly where hive does not. In
hive, we need to explicitly defined schema.  So I can say here that -
1. Hive explicitly converts data into predefined datatypes where drill
doesn't unless we do cast.
Say for ex: A column with different data types would explicitly convert
into predefined data types in hive But in case of drill it doesn't works
either normal or by casting.

Please provide us any alternative way or any suggestion.
regards,
shankar



For CTAS, I also hit a NPE when storage format was Parquet (default).

With storage format as JSON, i hit this error:
Error: SYSTEM ERROR: IllegalArgumentException: You tried to read a
[readText()] type when you are using a field reader of type
[UnionListReader].

Since this is still an experimental feature, I'm not sure if someone tried
out CTAS previously. Could you open a JIRA for this? Or let me know if you
want me to open one instead.

And since you mention queries without CTAS works fine, can you create views
instead and query that (I tried this and it works fine)?

On Fri, Mar 18, 2016 at 1:29 PM, Shankar Mane <shankar.m...@games24x7.com>
wrote:

> @Abhishek:
>
> Some events in 150 gb json file are like this where they differ in
> structure. I could say there are only 0.1% (per 150gb json file) are such
> events.
>
> And yes, union work perfectly. But only when we use select statements.
>
> Could you please change your select query to CTAS ?   I am getting
> nullpointer exceptions.
> On 19 Mar 2016 01:35, "Abhishek Girish" <abhishek.gir...@gmail.com> wrote:
>
> > Hello Shankar,
> >
> > From the sample data you shared, it looks like you have JSON documents
> > which differ considerably in the schema / structure. This isn't
supported
> > by default.
> >
> > You could try turning on UNION type (an experimental feature).
> >
> > > set `exec.enable_union_type` = true;
> > +---+--+
> > |  ok   | summary  |
> > +---+--+
> > | true  | exec.enable_union_type updated.  |
> > +---+--+
> > 1 row selected (0.193 seconds)
> >
> > > select
> > > `timestamp`,
> > > sessionid,
> > > gameid,
> > > ajaxUrl,
> > > ajaxData
> > > from dfs.`/tmp/test1.json` t;
> >
> >
>
+++--+---+---+
> > |   timestamp|   sessionid|
> >gameid|
> >ajaxUrl| ajaxData  |
> >
> >
>
+++--+---+---+
> > | 1457658600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null
> >   |
> > /player/updatebonus1  | null  |
> > | 1457771458873  | D18104E8CA3071C7A8F4E141B127   |
> >
> >
>
https://daemon2.com/tournDetails.do?type=myGames=1556148_callback=jQuery213043
> >  | []| null  |
> > | 1457958600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null
> >   |
> > /player/updatebonus2  | null  |
> >
> >
>
+++-

Re: unable to start Drill 1.6.0

2016-03-20 Thread Shankar Mane


1. Drill in cluster is *working fine *when *customized* drill-module.conf
file is *not present *in dir "apache-drill-1.6.0/conf/drill-module.conf"





2. Custom UDF is not working as describe below :

i have copied my custom UDF into dir "apache-drill-1.6.0/jars/3rdparty" on
all nodes and restarted all drillbits.


udf filename=udfutil-0.0.1-SNAPSHOT.jar
jar *structure* -
/*
META-INF/
META-INF/MANIFEST.MF
com/
com/companyname/
com/companyname/drill/
drill-module.conf
com/companyname/drill/channeltest.class
com/companyname/drill/DateFunc.class
com/companyname/drill/DateExtract.class
com/companyname/drill/DecodeURI.class
com/companyname/drill/ChannelID.class
com/companyname/drill/BrowserFuncNew.class
com/companyname/drill/ToDate.class
META-INF/maven/
META-INF/maven/com.companyname.drill.udf/
META-INF/maven/com.companyname.drill.udf/udfutil/
META-INF/maven/com.companyname.drill.udf/udfutil/pom.xml
META-INF/maven/com.companyname.drill.udf/udfutil/pom.properties
*/


-- And login to drill to check whether function is working or not
/*
0: jdbc:drill:> select DateFunc(1458228298) from (values(1)) ;
*Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 26: No
match found for function signature DateFunc()*
*/

*IT FAILED*





3. Now as described on website, now i edited file "*drill-module.conf*".
And copied this file to all nodes in cluster and restarted all drillbits.

vi apache-drill-1.6.0/conf/drill-module.conf

/*
drill: {
classpath.scanning: {
packages: [
"com.companyname.drill.*"
]
}
}
*/

*But DRILL GET SHUTDOWN on all nodes.*




*Please help me to resolved this issue. Or suggest any other way to invoke
my custome UDFs. *





On Thu, Mar 17, 2016 at 6:50 PM, Abdel Hakim Deneche <adene...@maprtech.com>
wrote:

> Easiest fix when Drill fails to load a storage plugin is to delete the
> existing configurations. Deleting /tmp/drill/ should resolve this.
>
> I know this may not be practical in some cases, and other developers may
> give you a better solution.
>
> On Thu, Mar 17, 2016 at 2:13 PM, Shankar Mane <shankar.m...@games24x7.com>
> wrote:
>
> > *drillbit.out =>*
> >
> >
> > Exception in thread "main"
> > org.apache.drill.exec.exception.DrillbitStartupException: Failure during
> > initial startup of Drillbit.
> > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:284)
> > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:261)
> > at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:257)
> > Caused by: java.lang.IllegalStateException:
> > com.fasterxml.jackson.databind.JsonMappingException: Could not resolve
> type
> > id 'kudu' into a subtype of [simple type, class
> > org.apache.drill.common.logical.StoragePluginConfig]: known type ids =
> > [InfoSchemaConfig, StoragePluginConfig, SystemTablePluginConfig, file,
> > jdbc, mock, named]
> >  at [Source: {
> >   "storage":{
> > kudu : {
> >   type:"kudu",
> >   masterAddresses: "1.2.3.4",
> >   enabled: false
> > }
> >   }
> > }
> > ; line: 4, column: 12] (through reference chain:
> >
> >
> org.apache.drill.exec.planner.logical.StoragePlugins["storage"]->java.util.LinkedHashMap["kudu"])
> > at
> >
> >
> org.apache.drill.exec.store.StoragePluginRegistryImpl.createPlugins(StoragePluginRegistryImpl.java:182)
> > at
> >
> >
> org.apache.drill.exec.store.StoragePluginRegistryImpl.init(StoragePluginRegistryImpl.java:126)
> > at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:113)
> > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:281)
> > ... 2 more
> > Caused by: com.fasterxml.jackson.databind.JsonMappingException: Could not
> > resolve type id 'kudu' into a subtype of [simple type, class
> > org.apache.drill.common.logical.StoragePluginConfig]: known type ids =
> > [InfoSchemaConfig, StoragePluginConfig, SystemTablePluginConfig, file,
> > jdbc, mock, named]
> >  at [Source: {
> >   "storage":{
> > kudu : {
> >   type:"kudu",
> >   masterAddresses: "1.2.3.4",
> >   enabled: false
> > }
> >   }
> > }
> > ; line: 4, column: 12] (through reference chain:
> >
> >
> org.apache.drill.exec.planner.logical.StoragePlugins["storage"]->java

Error parsing JSON (drill 1.6.0)

2016-03-20 Thread Shankar Mane
Guys,


   1. I am stuck in the middle of somewhere. Could you please help me to
   resolve below error.
   2. I am running query on drill 1.6.0 in cluster on logs json data (150GB
   size of log file) ( 1 json / line).


I have just extract 3 lines from logs for test purpose. please find  those
lines below.


-- --- *test.json*
-


{"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus1","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457658600032}
{"gameId":"
https://daemon2.com/tournDetails.do?type=myGames=1556148_callback=jQuery213043","ajaxData":null,"metadata":null,"ajaxUrl":[{"R":0,"rNo":1,"gid":4,"wal":0,"d":{"gid":4,"pt":3,"wc":2326,"top":"1","reg":true,"brkt":1457771400268,"sk":"2507001010530109","id":56312439,"a":0,"st":145777140,"e":"0.0","j":0,"n":"Loot
Qualifier
1","tc":94,"et":0,"syst":1457771456,"rc":14577,"s":5,"t":1,"tk":false,"prnId":56311896,"jc":1,"tp":"10.0","ro":14540,"rp":0,"isprn":false},"fl":"192.168.35.42","aaid":"5828"}],"selectedItem":null,"sessionid":"D18104E8CA3071C7A8F4E141B127","timestamp":1457771458873}
{"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus2","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457958600032}

--  *Query*



select
`timestamp`,
sessionid,
gameid,
ajaxUrl,
ajaxData
from dfs.`/tmp/test.json` t
;



Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
are using a ValueWriter of type NullableVarCharWriterImpl.

File  /tmp/test.json
Record  2
Fragment 0:0


Re: unable to start Drill 1.6.0

2016-03-19 Thread Shankar Mane
@Jacques Nadeau

Earlier in 1.2.0, we used blank drill-module.conf file as a part of UDF
itself. And this file was not added into drill  "conf" directory.
Functions were working fine.

Same we did in 1.6.0, but drill is not able to find the function.

Btw As you suggested let me do changes. And will update here once done.

Regards,
Shankar
On 17 Mar 2016 22:31, "Jacques Nadeau" <jacq...@dremio.com> wrote:

> I think your problem here is that you override all classpath scanning
> packages (and also are using a wildcard in the name).
>
> You should try to only append your special packages like this:
>
> drill.classpath.scanning.package+= "com.companynanme.drill"
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Thu, Mar 17, 2016 at 8:25 AM, Shankar Mane <shankar.m...@games24x7.com>
> wrote:
>
> >
> >
> 
> >
> > 1. Drill in cluster is *working fine *when *customized* drill-module.conf
> > file is *not present *in dir "apache-drill-1.6.0/conf/drill-module.conf"
> >
> >
> >
> >
> 
> >
> >
> > 2. Custom UDF is not working as describe below :
> >
> > i have copied my custom UDF into dir "apache-drill-1.6.0/jars/3rdparty"
> on
> > all nodes and restarted all drillbits.
> >
> >
> > udf filename=udfutil-0.0.1-SNAPSHOT.jar
> > jar *structure* -
> > /*
> > META-INF/
> > META-INF/MANIFEST.MF
> > com/
> > com/companyname/
> > com/companyname/drill/
> > drill-module.conf
> > com/companyname/drill/channeltest.class
> > com/companyname/drill/DateFunc.class
> > com/companyname/drill/DateExtract.class
> > com/companyname/drill/DecodeURI.class
> > com/companyname/drill/ChannelID.class
> > com/companyname/drill/BrowserFuncNew.class
> > com/companyname/drill/ToDate.class
> > META-INF/maven/
> > META-INF/maven/com.companyname.drill.udf/
> > META-INF/maven/com.companyname.drill.udf/udfutil/
> > META-INF/maven/com.companyname.drill.udf/udfutil/pom.xml
> > META-INF/maven/com.companyname.drill.udf/udfutil/pom.properties
> > */
> >
> >
> > -- And login to drill to check whether function is working or not
> > /*
> > 0: jdbc:drill:> select DateFunc(1458228298) from (values(1)) ;
> > *Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 26: No
> > match found for function signature DateFunc()*
> > */
> >
> > *IT FAILED*
> >
> >
> >
> >
> >
> 
> >
> > 3. Now as described on website, now i edited file "*drill-module.conf*".
> > And copied this file to all nodes in cluster and restarted all drillbits.
> >
> > vi apache-drill-1.6.0/conf/drill-module.conf
> >
> > /*
> > drill: {
> > classpath.scanning: {
> > packages: [
> > "com.companyname.drill.*"
> > ]
> > }
> > }
> > */
> >
> > *But DRILL GET SHUTDOWN on all nodes.*
> >
> >
> >
> >
> > *Please help me to resolved this issue. Or suggest any other way to
> invoke
> > my custome UDFs. *
> >
> >
> >
> >
> >
> > On Thu, Mar 17, 2016 at 6:50 PM, Abdel Hakim Deneche <
> > adene...@maprtech.com>
> > wrote:
> >
> > > Easiest fix when Drill fails to load a storage plugin is to delete the
> > > existing configurations. Deleting /tmp/drill/ should resolve this.
> > >
> > > I know this may not be practical in some cases, and other developers
> may
> > > give you a better solution.
> > >
> > > On Thu, Mar 17, 2016 at 2:13 PM, Shankar Mane <
> > shankar.m...@games24x7.com>
> > > wrote:
> > >
> > > > *drillbit.out =>*
> > > >
> > > >
> > > > Exception in thread "main"
> > > > org.apache.drill.exec.exception.DrillbitStartupException: Failure
> > during
> > > > initial startup of Drillbit.
> > > > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:284)
> > > > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:261)
> > > > at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:257)
> > > > Caused by: java.lang.IllegalStateException:
> > > > com.fasterxml.jackson.databind.JsonMappingException: Could not
> resolve

unable to start Drill 1.6.0

2016-03-19 Thread Shankar Mane
I am not able to start drill 1.6.0. Please find the attached file for more
details.


Re: unable to start Drill 1.6.0

2016-03-19 Thread Shankar Mane
*drillbit.out =>*


Exception in thread "main"
org.apache.drill.exec.exception.DrillbitStartupException: Failure during
initial startup of Drillbit.
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:284)
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:261)
at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:257)
Caused by: java.lang.IllegalStateException:
com.fasterxml.jackson.databind.JsonMappingException: Could not resolve type
id 'kudu' into a subtype of [simple type, class
org.apache.drill.common.logical.StoragePluginConfig]: known type ids =
[InfoSchemaConfig, StoragePluginConfig, SystemTablePluginConfig, file,
jdbc, mock, named]
 at [Source: {
  "storage":{
kudu : {
  type:"kudu",
  masterAddresses: "1.2.3.4",
  enabled: false
}
  }
}
; line: 4, column: 12] (through reference chain:
org.apache.drill.exec.planner.logical.StoragePlugins["storage"]->java.util.LinkedHashMap["kudu"])
at
org.apache.drill.exec.store.StoragePluginRegistryImpl.createPlugins(StoragePluginRegistryImpl.java:182)
at
org.apache.drill.exec.store.StoragePluginRegistryImpl.init(StoragePluginRegistryImpl.java:126)
at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:113)
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:281)
... 2 more
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Could not
resolve type id 'kudu' into a subtype of [simple type, class
org.apache.drill.common.logical.StoragePluginConfig]: known type ids =
[InfoSchemaConfig, StoragePluginConfig, SystemTablePluginConfig, file,
jdbc, mock, named]
 at [Source: {
  "storage":{
kudu : {
  type:"kudu",
  masterAddresses: "1.2.3.4",
  enabled: false
}
  }
}
; line: 4, column: 12] (through reference chain:
org.apache.drill.exec.planner.logical.StoragePlugins["storage"]->java.util.LinkedHashMap["kudu"])
at
com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:216)
at
com.fasterxml.jackson.databind.DeserializationContext.unknownTypeException(DeserializationContext.java:983)
at
com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._handleUnknownTypeId(TypeDeserializerBase.java:281)
at
com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:163)
at
com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:106)
at
com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:91)
at
com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:142)
at
com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringMap(MapDeserializer.java:497)
at
com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:341)
at
com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:26)
at
com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:490)
at
com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:465)
at
com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:380)
at
com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1123)
at
com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:298)
at
com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:133)
at
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3788)
at
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2779)
at
org.apache.drill.exec.store.StoragePluginRegistryImpl.createPlugins(StoragePluginRegistryImpl.java:144)
... 5 more





On Thu, Mar 17, 2016 at 6:38 PM, Shankar Mane <shankar.m...@games24x7.com>
wrote:

> *drillbit.out =>*
>
>
> Exception in thread "main"
> org.apache.drill.exec.exception.DrillbitStartupException: Failure during
> initial startup of Drillbit.
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:284)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:261)
> at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:257)
> Caused by: java.lang.IllegalStateException:
> com.fasterxml.jackson.databind.JsonMappingException: Could not resolve type
> id 'kudu' into a subtype of [simple type, class
> org.apache.drill.common.logical.StoragePluginConfig]: known type ids =
> [InfoSchemaConfig, StoragePluginConfig, SystemTablePluginConfig, file,
> jdbc, mock, named]
>  at [Source: {
>   "storage":{
> kudu : {
&

Re: Error parsing JSON (drill 1.6.0)

2016-03-19 Thread Shankar Mane
@Abhishek:

Some events in 150 gb json file are like this where they differ in
structure. I could say there are only 0.1% (per 150gb json file) are such
events.

And yes, union work perfectly. But only when we use select statements.

Could you please change your select query to CTAS ?   I am getting
nullpointer exceptions.
On 19 Mar 2016 01:35, "Abhishek Girish" <abhishek.gir...@gmail.com> wrote:

> Hello Shankar,
>
> From the sample data you shared, it looks like you have JSON documents
> which differ considerably in the schema / structure. This isn't supported
> by default.
>
> You could try turning on UNION type (an experimental feature).
>
> > set `exec.enable_union_type` = true;
> +---+--+
> |  ok   | summary  |
> +---+--+
> | true  | exec.enable_union_type updated.  |
> +---+--+
> 1 row selected (0.193 seconds)
>
> > select
> > `timestamp`,
> > sessionid,
> > gameid,
> > ajaxUrl,
> > ajaxData
> > from dfs.`/tmp/test1.json` t;
>
> +++--+---+---+
> |   timestamp|   sessionid|
>gameid|
>ajaxUrl| ajaxData  |
>
> +++--+---+---+
> | 1457658600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null
>   |
> /player/updatebonus1  | null  |
> | 1457771458873  | D18104E8CA3071C7A8F4E141B127   |
>
> https://daemon2.com/tournDetails.do?type=myGames=1556148_callback=jQuery213043
>  | []| null  |
> | 1457958600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null
>   |
> /player/updatebonus2  | null  |
>
> +++------+---+---+
>
> 3 rows selected (0.36 seconds)
>
>
> Regards,
> Abhishek
>
> On Fri, Mar 18, 2016 at 12:02 PM, Shankar Mane <shankar.m...@games24x7.com
> >
> wrote:
>
> > Guys,
> >
> >
> >1. I am stuck in the middle of somewhere. Could you please help me to
> >resolve below error.
> >2. I am running query on drill 1.6.0 in cluster on logs json data
> (150GB
> >size of log file) ( 1 json / line).
> >
> >
> > I have just extract 3 lines from logs for test purpose. please find
> those
> > lines below.
> >
> >
> > -- --- *test.json*
> > -
> >
> >
> >
> >
> {"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus1","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457658600032}
> > {"gameId":"
> >
> >
> https://daemon2.com/tournDetails.do?type=myGames=1556148_callback=jQuery213043
> >
> ","ajaxData":null,"metadata":null,"ajaxUrl":[{"R":0,"rNo":1,"gid":4,"wal":0,"d":{"gid":4,"pt":3,"wc":2326,"top":"1","reg":true,"brkt":1457771400268,"sk":"2507001010530109","id":56312439,"a":0,"st":145777140,"e":"0.0","j":0,"n":"Loot
> > Qualifier
> >
> >
> 1","tc":94,"et":0,"syst":1457771456,"rc":14577,"s":5,"t":1,"tk":false,"prnId":56311896,"jc":1,"tp":"10.0","ro":14540,"rp":0,"isprn":false},"fl":"192.168.35.42","aaid":"5828"}],"selectedItem":null,"sessionid":"D18104E8CA3071C7A8F4E141B127","timestamp":1457771458873}
> >
> >
> {"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus2","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457958600032}
> >
> > --  *Query*
> > 
> >
> >
> > select
> > `timestamp`,
> > sessionid,
> > gameid,
> > ajaxUrl,
> > ajaxData
> > from dfs.`/tmp/test.json` t
> > ;
> >
> >
> >
> > Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
> > are using a ValueWriter of type NullableVarCharWriterImpl.
> >
> > File  /tmp/test.json
> > Record  2
> > Fragment 0:0
> >
>