Re: Update planner/configuration keys

2019-03-12 Thread Paul Rogers
Hi Kunal and Praveen,

ALTER SYSTEM sets an option value persistently in Zookeeper, which makes the 
value permanent within that one cluster.

Sounds like you want to persist values in an embedded Drillbit. In this case, 
there is no Zookeeper. System options get written to the file system, but I 
don't know if they are persistent. Any embedded Drill users know how this works?

One solution is to start Drill as a server, even if it runs on only one node. I 
don't know, however, if the Drill server mode is supported on Windows. Any 
Windows users know this?

Finally, Kunal is correct. Since about a 18 months ago, the value for 
system/session options are defined in Drill's config system. See [1].

Although Drill does not encourage this usage, you can customize these values in 
your drill-override.conf as Kunal suggests. You can try this as a workaround 
for the fact that you are 1) using Embedded Drill, 2) on Windows.

But, Drill is really a distributed tool, so you'd really want to run Drill on 
Linux as a server, then allow the options to persist in Zookeeper.

Thanks,
- Paul

[1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/resources/drill-module.conf#L454


 

On Tuesday, March 12, 2019, 11:31:59 AM PDT, Kunal Khatua 
 wrote:  
 
 That is correct... you'll need to run these on the command prompt for the
very first time. Like profiles, I believe, the ALTER SYSTEM commands also
have a permanent effect.

I'm not sure if there is a way for a user to provide a pre-configured set
of parameters to use during startup. You could see if setting these values
in DRILL_HOME/conf/drill-override.conf helps. A lot of the values you want
are originally set as defaults from an embedded file "drill-module.conf" ,
and drill-override.conf , well... overrides some of those values.

If this doesn't work, you can file an improvement/newFeature JIRA for this,
considering this is a nice feature to have.

~ Kunal


On Tue, Mar 12, 2019 at 3:18 AM PRAVEEN DEVERACHETTY 
wrote:

> Hi Kunal, But where should i keep those statements? is there any drill
> startup script that i can run these alter staements? i think these scripts
> only on the drill command prompt right?
>
> Thanks,
> Praveen
>
> On Tue, Mar 12, 2019 at 1:02 PM Kunal Khatua  wrote:
>
> > Executing an "alter system set param=value" usually persists the value.
> > Not sure if that works for an embedded mode.
> > Could you try and let us know if that works?
> > On 3/11/2019 11:19:13 AM, PRAVEEN DEVERACHETTY 
> > wrote:
> > I am using apache drill on windows platform. My requirement is to udpate
> > the following parameters during apache drill startup. These parameter
> > vaules may differ in each apache drill nodes. can you share me an example
> > of how to update these in windows. I know another option is to run on
> webUI
> > or postman? i dont want to do it post installation of apache drill. i
> want
> > these changes to be reflected during apache drill startup.
> >
> > *planner.cpu_load_average - 0.7*
> > *exec.queue.enable - true*
> > *exec.queue.large - 2*
> > *exec.queue.memory_ratio - 10.0*
> > *exec.queue.memory_reserve_ratio - 0.2*
> > *exec.queue.small - 4*
> >
> > Thanks,
> > praveen
> >
>
  

Re: RESOURCE ERROR: External Sort encountered an error while spilling to disk

2019-03-12 Thread Khurram Faraaz
Can you also share the stack trace from drillbit.log for the below error
RESOURCE ERROR: External Sort encountered an error while spilling to disk

And what is the version of Drill that you are running ?
Please share the table definition and the number of rows in that table,
which is being queried.

Thanks,
Khurram

On Tue, Mar 12, 2019 at 12:17 PM Boaz Ben-Zvi  wrote:

>   Hi Giovanni,
>
>  The error given by the External-Sort indicates a problem while
> spilling the excess memory into disk.
>
> When you enlarged the memory (from the default 2GB) to 8GB the LAG query
> may succeeded without spilling, hence circumvented the issue.
>
> Yes, you can keep enlarging the memory and run w/o spilling, but better
> check and fix the root issue.
>
> How is your spilling configured - check the filesystem -
> "drill.exec.spill.fs" and the directories - "drill.exec.spill.directories"
>
> The default is the local filesystem, and into /tmp .  It is possible
> that very little disk space is available for /tmp .
>
>   Thanks,
>
>   -- Boaz
>
> On 3/12/19 6:03 AM, Giovanni Conte wrote:
> > Hello,
> > I am doing a LAG and a SUM query over a PCAP dataset of 300Mb.
> > I get this error:
> > RESOURCE ERROR: External Sort encountered an error while spilling to disk
> >
> > Then I changetd
> > planner.memory.max_query_memory_per_node ---> 8589934592
> > and with this I am able to perform the LAG but not the SUM.
> >
> > I have no problem of memory since I am working on a server with 72 cores
> > and 256 GB of RAM.
> > Which other parameter should I change to avoid the RESOURCE ERROR?
> > Can I go over 8589934592?
> > Thank you very much,
> >
> > Giovanni
> >
>


Json Complex array issue only on HDF, local storage is fine

2019-03-12 Thread Robert Vantol
This is our Json, as you see the imp node is an array of other json:



{



"id":"21D66BC4F2BA7E24",

"imp":[

{

"id":"1",

"banner":{

"w":160,

"h":600,

"pos":1,

"topframe":1

},

"sessiondepth":22

}

],

"site":{

"id":"288491",

"cat":[

"IAB5"

],

"page":"
http://www.history101.com/discovered-photo-album-reveals-the-real-reason-why-the-titanic-sank/19/
",

"ref":"
http://www.history101.com/discovered-photo-album-reveals-the-real-reason-why-the-titanic-sank/18/
",

"publisher":{

"id":"184621"

},

"content":{

"keywords":"2"

}

}



When we drill locally on a windows machine with the json sitting on the
file system using:



Select t.id, t.imp[0].banner.w, t.imp[0].banner.h  from rv.`bid.json` t;



It returns the information correctly.

When we take that same json and place it on our HDF the imp node is
returned as null we are able to pull out other nodes, and even the other
Simple array (site.cat[0]), just not the imp node. We have tried to flatten
it and have tried: ALTER SESSION SET `store.json.all_text_mode` = true;



But nothing is working. We are using newly downloaded Drill 1.15



Is there any help you can give me? I am at a loss…



Rob









[image: sm_eq_works_logo]

*  Robert VanTol*
 Digital Consultant & Information Architect


*r...@eqworks.com * | *T * 416 260 4759

1235 Bay Street, Suite 401, Toronto ON, M5R 2A9

EQ Digital  | www.EQworks.com 

Follow us:  [image: cid:image007.png@01CE983F.AC5709F0]
 [image:
cid:image006.png@01CE983F.AC5709F0]
  [image:
cid:image005.png@01CE983F.AC5709F0]



Re: RESOURCE ERROR: External Sort encountered an error while spilling to disk

2019-03-12 Thread Boaz Ben-Zvi

 Hi Giovanni,

    The error given by the External-Sort indicates a problem while 
spilling the excess memory into disk.


When you enlarged the memory (from the default 2GB) to 8GB the LAG query 
may succeeded without spilling, hence circumvented the issue.


Yes, you can keep enlarging the memory and run w/o spilling, but better 
check and fix the root issue.


How is your spilling configured - check the filesystem - 
"drill.exec.spill.fs" and the directories - "drill.exec.spill.directories"


The default is the local filesystem, and into /tmp .  It is possible 
that very little disk space is available for /tmp .


 Thanks,

 -- Boaz

On 3/12/19 6:03 AM, Giovanni Conte wrote:

Hello,
I am doing a LAG and a SUM query over a PCAP dataset of 300Mb.
I get this error:
RESOURCE ERROR: External Sort encountered an error while spilling to disk

Then I changetd
planner.memory.max_query_memory_per_node ---> 8589934592
and with this I am able to perform the LAG but not the SUM.

I have no problem of memory since I am working on a server with 72 cores
and 256 GB of RAM.
Which other parameter should I change to avoid the RESOURCE ERROR?
Can I go over 8589934592?
Thank you very much,

Giovanni



Re: Update planner/configuration keys

2019-03-12 Thread Kunal Khatua
That is correct... you'll need to run these on the command prompt for the
very first time. Like profiles, I believe, the ALTER SYSTEM commands also
have a permanent effect.

I'm not sure if there is a way for a user to provide a pre-configured set
of parameters to use during startup. You could see if setting these values
in DRILL_HOME/conf/drill-override.conf helps. A lot of the values you want
are originally set as defaults from an embedded file "drill-module.conf" ,
and drill-override.conf , well... overrides some of those values.

If this doesn't work, you can file an improvement/newFeature JIRA for this,
considering this is a nice feature to have.

~ Kunal


On Tue, Mar 12, 2019 at 3:18 AM PRAVEEN DEVERACHETTY 
wrote:

> Hi Kunal, But where should i keep those statements? is there any drill
> startup script that i can run these alter staements? i think these scripts
> only on the drill command prompt right?
>
> Thanks,
> Praveen
>
> On Tue, Mar 12, 2019 at 1:02 PM Kunal Khatua  wrote:
>
> > Executing an "alter system set param=value" usually persists the value.
> > Not sure if that works for an embedded mode.
> > Could you try and let us know if that works?
> > On 3/11/2019 11:19:13 AM, PRAVEEN DEVERACHETTY 
> > wrote:
> > I am using apache drill on windows platform. My requirement is to udpate
> > the following parameters during apache drill startup. These parameter
> > vaules may differ in each apache drill nodes. can you share me an example
> > of how to update these in windows. I know another option is to run on
> webUI
> > or postman? i dont want to do it post installation of apache drill. i
> want
> > these changes to be reflected during apache drill startup.
> >
> > *planner.cpu_load_average - 0.7*
> > *exec.queue.enable - true*
> > *exec.queue.large - 2*
> > *exec.queue.memory_ratio - 10.0*
> > *exec.queue.memory_reserve_ratio - 0.2*
> > *exec.queue.small - 4*
> >
> > Thanks,
> > praveen
> >
>


RESOURCE ERROR: External Sort encountered an error while spilling to disk

2019-03-12 Thread Giovanni Conte
Hello,
I am doing a LAG and a SUM query over a PCAP dataset of 300Mb.
I get this error:
RESOURCE ERROR: External Sort encountered an error while spilling to disk

Then I changetd
planner.memory.max_query_memory_per_node ---> 8589934592
and with this I am able to perform the LAG but not the SUM.

I have no problem of memory since I am working on a server with 72 cores
and 256 GB of RAM.
Which other parameter should I change to avoid the RESOURCE ERROR?
Can I go over 8589934592?
Thank you very much,

Giovanni


Re: Update planner/configuration keys

2019-03-12 Thread PRAVEEN DEVERACHETTY
Hi Kunal, But where should i keep those statements? is there any drill
startup script that i can run these alter staements? i think these scripts
only on the drill command prompt right?

Thanks,
Praveen

On Tue, Mar 12, 2019 at 1:02 PM Kunal Khatua  wrote:

> Executing an "alter system set param=value" usually persists the value.
> Not sure if that works for an embedded mode.
> Could you try and let us know if that works?
> On 3/11/2019 11:19:13 AM, PRAVEEN DEVERACHETTY 
> wrote:
> I am using apache drill on windows platform. My requirement is to udpate
> the following parameters during apache drill startup. These parameter
> vaules may differ in each apache drill nodes. can you share me an example
> of how to update these in windows. I know another option is to run on webUI
> or postman? i dont want to do it post installation of apache drill. i want
> these changes to be reflected during apache drill startup.
>
> *planner.cpu_load_average - 0.7*
> *exec.queue.enable - true*
> *exec.queue.large - 2*
> *exec.queue.memory_ratio - 10.0*
> *exec.queue.memory_reserve_ratio - 0.2*
> *exec.queue.small - 4*
>
> Thanks,
> praveen
>


Re: Drill 1.15.0 fails with error while quering Parquet 2.0 file

2019-03-12 Thread Kunal Khatua
Hi Denis

You seem to be trying to read a Parquet 2.0 format file with a Parquet 1.10 
reader that comes with Drill. Is there a specific reason you are using version 
2.0 ?

~ Kunal
On 3/11/2019 10:13:39 AM, Denis Dudinski  wrote:
Hello,

I have a parquet 2.0 file which contains serialised avro records. Records avro 
schema is plain but contains a couple of optional string fields:

{
"namespace" : “proto.avro.v1",
"type" : "record",
"name" : “FactEntity",
"fields" : [
{"name" : “sensorName", "type" : "string"},
{"name" : “sensorDesc", "type" : "string”},
{"name" : "firstDeployed", "type" : "long"},
{"name" : "lastRenewed", "type" : "long"},
{"name" : “errMsg", "type" : ["null", "string"]},
{"name" : “errDetails", "type" : ["null", "string"]}
]
}

When I try to query entities in this file with

SELECT
t1.sensorName,
t1.sensorDesc,
t1.lastRenewed,
t1.errMsg
FROM dfs.`/path/to/file` t1
LIMIT 10;

I get this error:

2019-03-07 12:07:30,593 [237f20ac-b634-5300-06f5-6c731a8a97f2:frag:0:0] DEBUG 
o.a.d.e.w.fragment.FragmentExecutor - Starting fragment 0:0 on xxx:31010
2019-03-07 12:07:30,593 [237f20ac-b634-5300-06f5-6c731a8a97f2:frag:0:0] DEBUG 
o.a.d.e.s.p.DrillParquetReader - Requesting schema message 
proto.avro.v1.FactEntity {
required binary sensorName (UTF8);
required binary sensorDesc (UTF8);
required int64 firstDeployed;
required int64 lastRenewed;
optional binary errMsg (UTF8);
optional binary errDetails (UTF8);
}

2019-03-07 12:07:30,615 [237f20ac-b634-5300-06f5-6c731a8a97f2:frag:0:0] INFO 
o.a.d.exec.physical.impl.ScanBatch - User Error Occurred: Error in drill 
parquet reader (complex).
Message: Failure in setting up reader
Parquet Metadata: null (Error in drill parquet reader (complex).
Message: Failure in setting up reader
Parquet Metadata: null)
org.apache.drill.common.exceptions.UserException: INTERNAL_ERROR ERROR: Error 
in drill parquet reader (complex).
Message: Failure in setting up reader
Parquet Metadata: null


Please, refer to logs for more information.

[Error Id: 2b5a06a0-fa8e-497b-848d-01aae15874ee ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.15.0.jar:1.15.0]
at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:293) 
[drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
 [drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
 [drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
 [drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext(LimitRecordBatch.java:101)
 [drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
 [drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
 [drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
 [drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
 [drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
 [drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
 [drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
 [drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
 [drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:143)
 [drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
 [drill-java-exec-1.15.0.jar:1.15.0]
at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
[drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:83)
 [drill-java-exec-1.15.0.jar:1.15.0]
at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
[drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:297)
 [drill-java-exec-1.15.0.jar:1.15.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:284)
 [drill-java-exec-1.15.0.jar:1.15.0]
at java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_161]
at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_161]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
 [hadoop-common-2.7.4.jar:n

Re: Update planner/configuration keys

2019-03-12 Thread Kunal Khatua
Executing an "alter system set param=value" usually persists the value. Not 
sure if that works for an embedded mode. 
Could you try and let us know if that works?
On 3/11/2019 11:19:13 AM, PRAVEEN DEVERACHETTY  wrote:
I am using apache drill on windows platform. My requirement is to udpate
the following parameters during apache drill startup. These parameter
vaules may differ in each apache drill nodes. can you share me an example
of how to update these in windows. I know another option is to run on webUI
or postman? i dont want to do it post installation of apache drill. i want
these changes to be reflected during apache drill startup.

*planner.cpu_load_average - 0.7*
*exec.queue.enable - true*
*exec.queue.large - 2*
*exec.queue.memory_ratio - 10.0*
*exec.queue.memory_reserve_ratio - 0.2*
*exec.queue.small - 4*

Thanks,
praveen