Re: QSQL via jdbc (python3 and JayDeBeApi) wraps with SELECT ... LIMIT 0

2018-03-20 Thread Francis McGregor-Macdonald
Hi Kunal,

I did find:
https://github.com/baztian/jaydebeapi/commit/a1f8d3c3b4621570065d968b4b734bae3f0eaf79
which suggests that this may be already being worked on. I'm not sure if it
would cover this use case though. I posted a question on the commit which
hopefully gets an answer.



On Tue, Mar 20, 2018 at 5:45 PM, Kunal Khatua <kunalkha...@gmail.com> wrote:

> Francis
>
> I'm certain this is the result of JayDeBeApi using the  preparedStatement
> command.  (DRILL-5316. See the comments in the JIRA)
>
> I was thinking of creating a fork and using the standard
> Connection.getStatement() API instead, before compiling. However, I'm
> currently on a time crunch and my Python skills are a bit rusty. Hoping
> someone in the community can step forward and take a crack at this.
>
> ~ KK
>
>
> On 3/19/2018 7:30:49 PM, Francis McGregor-Macdonald <fran...@mc-mac.com>
> wrote:
> Thanks Kunal and Charles,
>
> I rebuilt the script / environment inside a container to see if I could
> replicate and I have the same result.
>
> The container is running on an EC2 "next to" the cluster.
>
> Charles was there any additional configuration you had done?
>
> I have in the Dockerfile:
> ...
> conda install -c conda-forge jpype1 -q && \
> conda install pip -q && \
> pip install jaydebeapi -q && \
> ...
>
> I am only loading the single jar into the container, I additionally get:
> "SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"." ...
> when
> running the script.
>
> I can see suggestions this might be related to "Prepared Statements" but
> can't find anything definitive.
>
>
>
> On Mon, Mar 19, 2018 at 6:16 PM, Kunal Khatua wrote:
>
> > This error looks familiar and might be because of the Python library
> > wrapping a select * around the original query.
> >
> > Using the JDBC driver directly doesn’t seem to show this problem. Drill
> > 1.13.0 is out now. Could you give a try with that and confirm if the
> > behavior is the same?
> >
> > -Original Message-
> > From: Charles Givre
> > Sent: Sunday, March 18, 2018 9:10 PM
> > To: user@drill.apache.org
> > Subject: Re: QSQL via jdbc (python3 and JayDeBeApi) wraps with SELECT ...
> > LIMIT 0
> >
> > Hi Francis,
> >
> >
> > The code below worked for me. Also, I don’t know if it matters, but did
> > you mean to create two cursors?
> > — C
> >
> > import jaydebeapi
> > import pandas as pd
> >
> > #Create the connection object
> > conn = jaydebeapi.connect("org.apache.drill.jdbc.Driver",
> > "jdbc:drill:drillbit=localhost:31010",
> > ["admin", "password"],
> > "/usr/local/share/drill/jars/
> > jdbc-driver/drill-jdbc-all-1.12.0.jar",)
> >
> > #Create the Cursor Object
> > curs = conn.cursor()
> >
> > #Execute the query
> > curs.execute("SELECT * FROM cp.`employee.json` LIMIT 20")
> >
> > #Get the results
> > curs.fetchall()
> >
> > #Read query results into a Pandas DataFrame df = pd.read_sql("SELECT *
> > FROM cp.`employee.json` LIMIT 20", conn)
> >
> >
> >
> >
> >
> > > On Mar 18, 2018, at 23:41, Francis McGregor-Macdonald
> > fran...@mc-mac.com> wrote:
> > >
> > > Hi all,
> > >
> > > I am attempting to send a query from python3 via JayDeBeApi and am
> > > encountering the issue that the SQL is enclosed in a SELECT * FROM
> > > $myquery LIMIT 0
> > >
> > > With:
> > > conn = jaydebeapi.connect("org.apache.drill.jdbc.Driver",
> > > "jdbc:drill:drillbit=$mycluster:$myport",
> > > ["$username", "$password"],
> > > "/tmp/drill-jdbc-all-1.12.0.jar")
> > > curs = conn.cursor()
> > > curs = conn.cursor()
> > > curs.execute('SHOW DATABASES')
> > >
> > > ... the query hits Drill as:
> > > SELECT * FROM (SHOW DATABASES) LIMIT 0
> > >
> > > A select * from mytable limit 100 also has the same issue.
> > >
> > > Drill is version 1.12
> > >
> > > This also occurs with other queries. I found
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org
> > > _jira_browse_DRILL-2D5136=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=-cT6otg6
> > > lpT_XkmYy7yg3A=P8xoBFS297Ln7VimEBQXJDYYFIdoiHRELssI6Cnf4IM=6S6F2Zp
> > > p92kNVbtDDuGR29X21JjbNNnn6FZWibbd-gk= which looks similar and lists
> > > "Client - ODBC" (not JDBC)
> > >
> > > Has anyone else encountered this?
> >
> >
>


Re: QSQL via jdbc (python3 and JayDeBeApi) wraps with SELECT ... LIMIT 0

2018-03-19 Thread Francis McGregor-Macdonald
Thanks Kunal and Charles,

I rebuilt the script / environment inside a container to see if I could
replicate and I have the same result.

The container is running on an EC2 "next to" the cluster.

Charles was there any additional configuration you had done?

I have in the Dockerfile:
...
conda install -c conda-forge jpype1 -q && \
conda install pip -q && \
pip install jaydebeapi -q && \
...

I am only loading the single jar into the container, I additionally get:
"SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"." ... when
running the script.

I can see suggestions this might be related to "Prepared Statements" but
can't find anything definitive.



On Mon, Mar 19, 2018 at 6:16 PM, Kunal Khatua <kkha...@mapr.com> wrote:

> This error looks familiar and might be because of the Python library
> wrapping a select * around the original query.
>
> Using the JDBC driver directly doesn’t seem to show this problem. Drill
> 1.13.0 is out now. Could you give a try with that and confirm if the
> behavior is the same?
>
> -Original Message-
> From: Charles Givre <cgi...@gmail.com>
> Sent: Sunday, March 18, 2018 9:10 PM
> To: user@drill.apache.org
> Subject: Re: QSQL via jdbc (python3 and JayDeBeApi) wraps with SELECT ...
> LIMIT 0
>
> Hi Francis,
>
>
> The code below worked for me.  Also, I don’t know if it matters, but did
> you mean to create two cursors?
> — C
>
> import jaydebeapi
> import pandas as pd
>
> #Create the connection object
> conn = jaydebeapi.connect("org.apache.drill.jdbc.Driver",
> "jdbc:drill:drillbit=localhost:31010",
>   ["admin", "password"],
>   "/usr/local/share/drill/jars/
> jdbc-driver/drill-jdbc-all-1.12.0.jar",)
>
> #Create the Cursor Object
> curs = conn.cursor()
>
> #Execute the query
> curs.execute("SELECT * FROM cp.`employee.json` LIMIT 20")
>
> #Get the results
> curs.fetchall()
>
> #Read query results into a Pandas DataFrame df = pd.read_sql("SELECT *
> FROM cp.`employee.json` LIMIT 20", conn)
>
>
>
>
>
> > On Mar 18, 2018, at 23:41, Francis McGregor-Macdonald <
> fran...@mc-mac.com> wrote:
> >
> > Hi all,
> >
> > I am attempting to send a query from python3 via JayDeBeApi and am
> > encountering the issue that the SQL is enclosed in a SELECT * FROM
> > $myquery LIMIT 0
> >
> > With:
> > conn = jaydebeapi.connect("org.apache.drill.jdbc.Driver",
> >  "jdbc:drill:drillbit=$mycluster:$myport",
> >  ["$username", "$password"],
> > "/tmp/drill-jdbc-all-1.12.0.jar")
> > curs = conn.cursor()
> > curs = conn.cursor()
> > curs.execute('SHOW DATABASES')
> >
> > ... the query hits Drill as:
> > SELECT * FROM (SHOW DATABASES) LIMIT 0
> >
> > A select * from mytable limit 100 also has the same issue.
> >
> > Drill is version 1.12
> >
> > This also occurs with other queries. I found
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org
> > _jira_browse_DRILL-2D5136=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=-cT6otg6
> > lpT_XkmYy7yg3A=P8xoBFS297Ln7VimEBQXJDYYFIdoiHRELssI6Cnf4IM=6S6F2Zp
> > p92kNVbtDDuGR29X21JjbNNnn6FZWibbd-gk= which looks similar and lists
> > "Client - ODBC" (not JDBC)
> >
> > Has anyone else encountered this?
>
>


QSQL via jdbc (python3 and JayDeBeApi) wraps with SELECT ... LIMIT 0

2018-03-18 Thread Francis McGregor-Macdonald
Hi all,

I am attempting to send a query from python3 via JayDeBeApi and am
encountering the issue that the SQL is enclosed in a SELECT * FROM $myquery
LIMIT 0

With:
conn = jaydebeapi.connect("org.apache.drill.jdbc.Driver",
  "jdbc:drill:drillbit=$mycluster:$myport",
  ["$username", "$password"],
"/tmp/drill-jdbc-all-1.12.0.jar")
curs = conn.cursor()
curs = conn.cursor()
curs.execute('SHOW DATABASES')

... the query hits Drill as:
SELECT * FROM (SHOW DATABASES) LIMIT 0

A select * from mytable limit 100 also has the same issue.

Drill is version 1.12

This also occurs with other queries. I found
https://issues.apache.org/jira/browse/DRILL-5136 which looks similar and
lists "Client - ODBC" (not JDBC)

Has anyone else encountered this?


Re: Drill on AWS

2018-02-12 Thread Francis McGregor-Macdonald
Hi Brandon,

I created a quick gist, hopefully it comes across clearly.

https://gist.github.com/fmcmac/a35738376d111fdca45057bd0fb4c79e

Any improvements greatly appreciated and happy to update it based on
feedback.

Regards,
Francis

On Tue, Feb 13, 2018 at 11:11 AM, Brandon Gmail <brantaylor...@gmail.com>
wrote:

> Hey Francis,
>
> Yes please! That option will work. I would appreciate any documentation
> you might have.
>
> > On Feb 12, 2018, at 2:51 PM, Francis McGregor-Macdonald <
> fran...@mc-mac.com> wrote:
> >
> > Hi Brandon,
> >
> > I recently went through this challenge and found it easier in the end to
> > install SSM with a bootstrap and then use ssm.send_command
> > with AWS-RunRemoteScript once the cluster was in place.
> >
> > This way I can use the AWS installed Zookeeper. The remote script is
> pretty
> > much a facsimile of the Drill Install instructions.
> >
> > If that option would work happy to share.
> >
> > On Tue, Feb 13, 2018 at 9:22 AM, Brandon Gmail <brantaylor...@gmail.com>
> > wrote:
> >
> >> Does anyone have an updated bootstrap script for drill on AWS EMR? Their
> >> repository is over 3 years old, and I’m flailing at figuring this out
> on my
> >> own. Any help would be appreciated.
> >>
> >> Thanks,
> >> Brandon
> >>
>
>


Re: Drill on AWS

2018-02-12 Thread Francis McGregor-Macdonald
Hi Brandon,

I recently went through this challenge and found it easier in the end to
install SSM with a bootstrap and then use ssm.send_command
with AWS-RunRemoteScript once the cluster was in place.

This way I can use the AWS installed Zookeeper. The remote script is pretty
much a facsimile of the Drill Install instructions.

If that option would work happy to share.

On Tue, Feb 13, 2018 at 9:22 AM, Brandon Gmail 
wrote:

> Does anyone have an updated bootstrap script for drill on AWS EMR? Their
> repository is over 3 years old, and I’m flailing at figuring this out on my
> own. Any help would be appreciated.
>
> Thanks,
> Brandon
>


Re: Fwd: Creating a Tableau extracts with Drill 1.12 uses unlimited memory

2018-01-28 Thread Francis McGregor-Macdonald
Hi all,

A physical plan attached ... all memory appears to be 0.0 which seems odd?

Thanks

On Sun, Jan 28, 2018 at 10:37 PM, Francis McGregor-Macdonald <
fran...@mc-mac.com> wrote:

> And with logs as attachments.
>
> On Sun, Jan 28, 2018 at 9:40 PM, Francis McGregor-Macdonald <
> fran...@mc-mac.com> wrote:
>
>> Thanks Paul and Kunal,
>> I think I have the right information now. With Paul's changes (and fixing
>> up a zoo.cfg error) it isn't crashing, rather failing. Logs attached, still
>> blowing past memory limits. It does the same thing when re-running the
>> query from the web console so presumably its not actually Tableau related
>> despite me first generating it that way.
>>
>> Thanks.
>>
>> On Sat, Jan 27, 2018 at 1:15 PM, Francis McGregor-Macdonald <
>> fran...@mc-mac.com> wrote:
>>
>>> Thanks Paul,
>>>
>>> I will update with your suggested memory allocations also and retry.
>>>
>>> Zookeeper crashed too which might explain more? I have attached the logs
>>> from Zookeeper too.
>>>
>>> Thanks
>>>
>>> On Sat, Jan 27, 2018 at 6:45 AM, Paul Rogers <par0...@yahoo.com> wrote:
>>>
>>>> Hi Francis,
>>>>
>>>> Thanks much for the log. The log shows running a query, then
>>>> immediately shows entries that occur when starting Drill. I'm guessing that
>>>> Drill literally crashed at this point? This is more severe than the usual
>>>> error in which a query exhausts memory.
>>>>
>>>> Some general observations. The Drill memory is 60 GB, but system memory
>>>> is 61 GB. Perhaps try dropping total Drill memory some to give the OS and
>>>> other tasks more headroom. For a SELECT * memory, Drill needs far less than
>>>> what you have, so maybe try giving Drill 48 GB total.
>>>>
>>>> Then, Drill needs direct memory much more than heap. So, maybe give
>>>> Drill 39 GB direct, 8 GB heap and 1 GB (the default) for code cache. These
>>>> settings are in drill-env.sh.
>>>>
>>>> Kunal, you have more experience with these issues. Can you make
>>>> additional suggestions by looking at the log?
>>>>
>>>> Thanks,
>>>>
>>>> - Paul
>>>>
>>>>
>>>>
>>>> On Thursday, January 25, 2018, 10:20:29 PM PST, Francis
>>>> McGregor-Macdonald <fran...@mc-mac.com> wrote:
>>>>
>>>>
>>>> Hi all,
>>>>
>>>> I am guessing that each of your EMR nodes are quite large? EMR nodes
>>>> are: r4.2xlarge ('vcpu': 8, 'memory': 61)
>>>>
>>>> Property "planner.width.max_per_node" is set to = 6
>>>>
>>>> What is the system memory and what are the allocations for heap and
>>>> direct?
>>>> System Memory: 61GB (EMR nodes above)
>>>> drill_mem_heap: 12G
>>>> drill_mem_max: 48G
>>>>
>>>> The view is simple: SELECT * FROM s3://myparquet.parquet (14GB)
>>>>
>>>> planner.memory.max_query_memor y_per_node = 10479720202
>>>>
>>>> Drillbit.log attached (I think I have the correct selection included).
>>>>
>>>> Thanks
>>>>
>>>> On Fri, Jan 26, 2018 at 2:41 PM, Kunal Khatua <kkha...@mapr.com> wrote:
>>>>
>>>> What is the system memory and what are the allocations for heap and
>>>> direct? The memory crash might be occurring due to insufficient heap. The
>>>> limits parameter applies to the direct memory and not Heap.
>>>>
>>>> Can you share details in the logs from the crash?
>>>>
>>>> -Original Message-
>>>> From: Timothy Farkas [mailto:tfar...@mapr.com]
>>>> Sent: Thursday, January 25, 2018 2:58 PM
>>>> To: user@drill.apache.org
>>>> Subject: Re: Creating a Tableau extracts with Drill 1.12 uses unlimited
>>>> memory
>>>>
>>>> Hi Francis,
>>>>
>>>> I am guessing that each of your EMR nodes are quite large (32 or 64
>>>> vcpus). On large machines Drill's planner over parallelizes and over
>>>> allocates memory. There is a property "planner.width.max_per_node" which
>>>> limits the number of operators that can simultaneously execute on a
>>>> Drillbit for a query. If you configure the width per node to something like
>>>> 5 or 10 (you may have to play around

Fwd: Fwd: Creating a Tableau extracts with Drill 1.12 uses unlimited memory

2018-01-28 Thread Francis McGregor-Macdonald
And with logs as attachments.

On Sun, Jan 28, 2018 at 9:40 PM, Francis McGregor-Macdonald <
fran...@mc-mac.com> wrote:

> Thanks Paul and Kunal,
> I think I have the right information now. With Paul's changes (and fixing
> up a zoo.cfg error) it isn't crashing, rather failing. Logs attached, still
> blowing past memory limits. It does the same thing when re-running the
> query from the web console so presumably its not actually Tableau related
> despite me first generating it that way.
>
> Thanks.
>
> On Sat, Jan 27, 2018 at 1:15 PM, Francis McGregor-Macdonald <
> fran...@mc-mac.com> wrote:
>
>> Thanks Paul,
>>
>> I will update with your suggested memory allocations also and retry.
>>
>> Zookeeper crashed too which might explain more? I have attached the logs
>> from Zookeeper too.
>>
>> Thanks
>>
>> On Sat, Jan 27, 2018 at 6:45 AM, Paul Rogers <par0...@yahoo.com> wrote:
>>
>>> Hi Francis,
>>>
>>> Thanks much for the log. The log shows running a query, then immediately
>>> shows entries that occur when starting Drill. I'm guessing that Drill
>>> literally crashed at this point? This is more severe than the usual error
>>> in which a query exhausts memory.
>>>
>>> Some general observations. The Drill memory is 60 GB, but system memory
>>> is 61 GB. Perhaps try dropping total Drill memory some to give the OS and
>>> other tasks more headroom. For a SELECT * memory, Drill needs far less than
>>> what you have, so maybe try giving Drill 48 GB total.
>>>
>>> Then, Drill needs direct memory much more than heap. So, maybe give
>>> Drill 39 GB direct, 8 GB heap and 1 GB (the default) for code cache. These
>>> settings are in drill-env.sh.
>>>
>>> Kunal, you have more experience with these issues. Can you make
>>> additional suggestions by looking at the log?
>>>
>>> Thanks,
>>>
>>> - Paul
>>>
>>>
>>>
>>> On Thursday, January 25, 2018, 10:20:29 PM PST, Francis
>>> McGregor-Macdonald <fran...@mc-mac.com> wrote:
>>>
>>>
>>> Hi all,
>>>
>>> I am guessing that each of your EMR nodes are quite large? EMR nodes
>>> are: r4.2xlarge ('vcpu': 8, 'memory': 61)
>>>
>>> Property "planner.width.max_per_node" is set to = 6
>>>
>>> What is the system memory and what are the allocations for heap and
>>> direct?
>>> System Memory: 61GB (EMR nodes above)
>>> drill_mem_heap: 12G
>>> drill_mem_max: 48G
>>>
>>> The view is simple: SELECT * FROM s3://myparquet.parquet (14GB)
>>>
>>> planner.memory.max_query_memor y_per_node = 10479720202
>>>
>>> Drillbit.log attached (I think I have the correct selection included).
>>>
>>> Thanks
>>>
>>> On Fri, Jan 26, 2018 at 2:41 PM, Kunal Khatua <kkha...@mapr.com> wrote:
>>>
>>> What is the system memory and what are the allocations for heap and
>>> direct? The memory crash might be occurring due to insufficient heap. The
>>> limits parameter applies to the direct memory and not Heap.
>>>
>>> Can you share details in the logs from the crash?
>>>
>>> -Original Message-
>>> From: Timothy Farkas [mailto:tfar...@mapr.com]
>>> Sent: Thursday, January 25, 2018 2:58 PM
>>> To: user@drill.apache.org
>>> Subject: Re: Creating a Tableau extracts with Drill 1.12 uses unlimited
>>> memory
>>>
>>> Hi Francis,
>>>
>>> I am guessing that each of your EMR nodes are quite large (32 or 64
>>> vcpus). On large machines Drill's planner over parallelizes and over
>>> allocates memory. There is a property "planner.width.max_per_node" which
>>> limits the number of operators that can simultaneously execute on a
>>> Drillbit for a query. If you configure the width per node to something like
>>> 5 or 10 (you may have to play around with it) things should start working.
>>>
>>> Thanks,
>>> Tim
>>>
>>> __ __
>>> From: Francis McGregor-Macdonald <fran...@mc-mac.com>
>>> Sent: Thursday, January 25, 2018 1:58:22 PM
>>> To: user@drill.apache.org
>>> Subject: Creating a Tableau extracts with Drill 1.12 uses unlimited
>>> memory
>>>
>>> Creating a creating a Tableau (with 10.3, 10.5 desktop) extract from a
>>> Drill (1.12 on EMR) cluster memory appears not to adhere to the limits set
>>> by planner.memory.max_query_memor y_per_node.
>>>
>>> The extract query consumes all memory and then crashes drill.
>>>
>>> Running the same query as a create table memory behaves as expected.
>>>
>>> The query complexity is trivial:
>>> select * from view only a single parquet with no calculated fields.
>>>
>>> Has anyone else observed this behavior?
>>>
>>>
>>>
>>>
>>
>


Re: Fwd: Creating a Tableau extracts with Drill 1.12 uses unlimited memory

2018-01-28 Thread Francis McGregor-Macdonald
Thanks Paul and Kunal,
I think I have the right information now. With Paul's changes (and fixing
up a zoo.cfg error) it isn't crashing, rather failing. Logs attached, still
blowing past memory limits. It does the same thing when re-running the
query from the web console so presumably its not actually Tableau related
despite me first generating it that way.

Thanks.

On Sat, Jan 27, 2018 at 1:15 PM, Francis McGregor-Macdonald <
fran...@mc-mac.com> wrote:

> Thanks Paul,
>
> I will update with your suggested memory allocations also and retry.
>
> Zookeeper crashed too which might explain more? I have attached the logs
> from Zookeeper too.
>
> Thanks
>
> On Sat, Jan 27, 2018 at 6:45 AM, Paul Rogers <par0...@yahoo.com> wrote:
>
>> Hi Francis,
>>
>> Thanks much for the log. The log shows running a query, then immediately
>> shows entries that occur when starting Drill. I'm guessing that Drill
>> literally crashed at this point? This is more severe than the usual error
>> in which a query exhausts memory.
>>
>> Some general observations. The Drill memory is 60 GB, but system memory
>> is 61 GB. Perhaps try dropping total Drill memory some to give the OS and
>> other tasks more headroom. For a SELECT * memory, Drill needs far less than
>> what you have, so maybe try giving Drill 48 GB total.
>>
>> Then, Drill needs direct memory much more than heap. So, maybe give Drill
>> 39 GB direct, 8 GB heap and 1 GB (the default) for code cache. These
>> settings are in drill-env.sh.
>>
>> Kunal, you have more experience with these issues. Can you make
>> additional suggestions by looking at the log?
>>
>> Thanks,
>>
>> - Paul
>>
>>
>>
>> On Thursday, January 25, 2018, 10:20:29 PM PST, Francis
>> McGregor-Macdonald <fran...@mc-mac.com> wrote:
>>
>>
>> Hi all,
>>
>> I am guessing that each of your EMR nodes are quite large? EMR nodes are:
>> r4.2xlarge ('vcpu': 8, 'memory': 61)
>>
>> Property "planner.width.max_per_node" is set to = 6
>>
>> What is the system memory and what are the allocations for heap and
>> direct?
>> System Memory: 61GB (EMR nodes above)
>> drill_mem_heap: 12G
>> drill_mem_max: 48G
>>
>> The view is simple: SELECT * FROM s3://myparquet.parquet (14GB)
>>
>> planner.memory.max_query_memor y_per_node = 10479720202
>>
>> Drillbit.log attached (I think I have the correct selection included).
>>
>> Thanks
>>
>> On Fri, Jan 26, 2018 at 2:41 PM, Kunal Khatua <kkha...@mapr.com> wrote:
>>
>> What is the system memory and what are the allocations for heap and
>> direct? The memory crash might be occurring due to insufficient heap. The
>> limits parameter applies to the direct memory and not Heap.
>>
>> Can you share details in the logs from the crash?
>>
>> -Original Message-
>> From: Timothy Farkas [mailto:tfar...@mapr.com]
>> Sent: Thursday, January 25, 2018 2:58 PM
>> To: user@drill.apache.org
>> Subject: Re: Creating a Tableau extracts with Drill 1.12 uses unlimited
>> memory
>>
>> Hi Francis,
>>
>> I am guessing that each of your EMR nodes are quite large (32 or 64
>> vcpus). On large machines Drill's planner over parallelizes and over
>> allocates memory. There is a property "planner.width.max_per_node" which
>> limits the number of operators that can simultaneously execute on a
>> Drillbit for a query. If you configure the width per node to something like
>> 5 or 10 (you may have to play around with it) things should start working.
>>
>> Thanks,
>> Tim
>>
>> __ __
>> From: Francis McGregor-Macdonald <fran...@mc-mac.com>
>> Sent: Thursday, January 25, 2018 1:58:22 PM
>> To: user@drill.apache.org
>> Subject: Creating a Tableau extracts with Drill 1.12 uses unlimited memory
>>
>> Creating a creating a Tableau (with 10.3, 10.5 desktop) extract from a
>> Drill (1.12 on EMR) cluster memory appears not to adhere to the limits set
>> by planner.memory.max_query_memor y_per_node.
>>
>> The extract query consumes all memory and then crashes drill.
>>
>> Running the same query as a create table memory behaves as expected.
>>
>> The query complexity is trivial:
>> select * from view only a single parquet with no calculated fields.
>>
>> Has anyone else observed this behavior?
>>
>>
>>
>>
>


Re: Fwd: Creating a Tableau extracts with Drill 1.12 uses unlimited memory

2018-01-26 Thread Francis McGregor-Macdonald
Thanks Paul,

I will update with your suggested memory allocations also and retry.

Zookeeper crashed too which might explain more? I have attached the logs
from Zookeeper too.

Thanks

On Sat, Jan 27, 2018 at 6:45 AM, Paul Rogers <par0...@yahoo.com> wrote:

> Hi Francis,
>
> Thanks much for the log. The log shows running a query, then immediately
> shows entries that occur when starting Drill. I'm guessing that Drill
> literally crashed at this point? This is more severe than the usual error
> in which a query exhausts memory.
>
> Some general observations. The Drill memory is 60 GB, but system memory is
> 61 GB. Perhaps try dropping total Drill memory some to give the OS and
> other tasks more headroom. For a SELECT * memory, Drill needs far less than
> what you have, so maybe try giving Drill 48 GB total.
>
> Then, Drill needs direct memory much more than heap. So, maybe give Drill
> 39 GB direct, 8 GB heap and 1 GB (the default) for code cache. These
> settings are in drill-env.sh.
>
> Kunal, you have more experience with these issues. Can you make additional
> suggestions by looking at the log?
>
> Thanks,
>
> - Paul
>
>
>
> On Thursday, January 25, 2018, 10:20:29 PM PST, Francis McGregor-Macdonald
> <fran...@mc-mac.com> wrote:
>
>
> Hi all,
>
> I am guessing that each of your EMR nodes are quite large? EMR nodes are:
> r4.2xlarge ('vcpu': 8, 'memory': 61)
>
> Property "planner.width.max_per_node" is set to = 6
>
> What is the system memory and what are the allocations for heap and direct?
> System Memory: 61GB (EMR nodes above)
> drill_mem_heap: 12G
> drill_mem_max: 48G
>
> The view is simple: SELECT * FROM s3://myparquet.parquet (14GB)
>
> planner.memory.max_query_memor y_per_node = 10479720202
>
> Drillbit.log attached (I think I have the correct selection included).
>
> Thanks
>
> On Fri, Jan 26, 2018 at 2:41 PM, Kunal Khatua <kkha...@mapr.com> wrote:
>
> What is the system memory and what are the allocations for heap and
> direct? The memory crash might be occurring due to insufficient heap. The
> limits parameter applies to the direct memory and not Heap.
>
> Can you share details in the logs from the crash?
>
> -Original Message-
> From: Timothy Farkas [mailto:tfar...@mapr.com]
> Sent: Thursday, January 25, 2018 2:58 PM
> To: user@drill.apache.org
> Subject: Re: Creating a Tableau extracts with Drill 1.12 uses unlimited
> memory
>
> Hi Francis,
>
> I am guessing that each of your EMR nodes are quite large (32 or 64
> vcpus). On large machines Drill's planner over parallelizes and over
> allocates memory. There is a property "planner.width.max_per_node" which
> limits the number of operators that can simultaneously execute on a
> Drillbit for a query. If you configure the width per node to something like
> 5 or 10 (you may have to play around with it) things should start working.
>
> Thanks,
> Tim
>
> __ __
> From: Francis McGregor-Macdonald <fran...@mc-mac.com>
> Sent: Thursday, January 25, 2018 1:58:22 PM
> To: user@drill.apache.org
> Subject: Creating a Tableau extracts with Drill 1.12 uses unlimited memory
>
> Creating a creating a Tableau (with 10.3, 10.5 desktop) extract from a
> Drill (1.12 on EMR) cluster memory appears not to adhere to the limits set
> by planner.memory.max_query_memor y_per_node.
>
> The extract query consumes all memory and then crashes drill.
>
> Running the same query as a create table memory behaves as expected.
>
> The query complexity is trivial:
> select * from view only a single parquet with no calculated fields.
>
> Has anyone else observed this behavior?
>
>
>
>


Fwd: Creating a Tableau extracts with Drill 1.12 uses unlimited memory

2018-01-25 Thread Francis McGregor-Macdonald
Hi all,

I am guessing that each of your EMR nodes are quite large? EMR nodes are:
r4.2xlarge ('vcpu': 8, 'memory': 61)

Property "planner.width.max_per_node" is set to = 6

What is the system memory and what are the allocations for heap and direct?
System Memory: 61GB (EMR nodes above)
drill_mem_heap: 12G
drill_mem_max: 48G

The view is simple: SELECT * FROM s3://myparquet.parquet (14GB)

planner.memory.max_query_memory_per_node = 10479720202

Drillbit.log attached (I think I have the correct selection included).

Thanks

On Fri, Jan 26, 2018 at 2:41 PM, Kunal Khatua <kkha...@mapr.com> wrote:

> What is the system memory and what are the allocations for heap and
> direct? The memory crash might be occurring due to insufficient heap. The
> limits parameter applies to the direct memory and not Heap.
>
> Can you share details in the logs from the crash?
>
> -Original Message-
> From: Timothy Farkas [mailto:tfar...@mapr.com]
> Sent: Thursday, January 25, 2018 2:58 PM
> To: user@drill.apache.org
> Subject: Re: Creating a Tableau extracts with Drill 1.12 uses unlimited
> memory
>
> Hi Francis,
>
> I am guessing that each of your EMR nodes are quite large (32 or 64
> vcpus). On large machines Drill's planner over parallelizes and over
> allocates memory. There is a property "planner.width.max_per_node" which
> limits the number of operators that can simultaneously execute on a
> Drillbit for a query. If you configure the width per node to something like
> 5 or 10 (you may have to play around with it) things should start working.
>
> Thanks,
> Tim
>
> 
> From: Francis McGregor-Macdonald <fran...@mc-mac.com>
> Sent: Thursday, January 25, 2018 1:58:22 PM
> To: user@drill.apache.org
> Subject: Creating a Tableau extracts with Drill 1.12 uses unlimited memory
>
> Creating a creating a Tableau (with 10.3, 10.5 desktop) extract from a
> Drill (1.12 on EMR) cluster memory appears not to adhere to the limits set
> by planner.memory.max_query_memory_per_node.
>
> The extract query consumes all memory and then crashes drill.
>
> Running the same query as a create table memory behaves as expected.
>
> The query complexity is trivial:
> select * from view only a single parquet with no calculated fields.
>
> Has anyone else observed this behavior?
>
>
2018-01-26 05:58:37,904 [25953c71-9933-e3ad-680b-d65363d46f0d:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
25953c71-9933-e3ad-680b-d65363d46f0d: SELECT 1 AS `Number of Records`,
  `nielsen_cl_1`.`dat_fact` AS `dat_fact`,
  `nielsen_cl_1`.`dat_market` AS `dat_market`,
  `nielsen_cl_1`.`dat_period` AS `dat_period`,
  `nielsen_cl_1`.`dat_product` AS `dat_product`,
  `nielsen_cl_1`.`dat_value` AS `dat_value`,
  `nielsen_cl_1`.`dataset` AS `dataset`,
  `nielsen_cl_1`.`mar_ccy` AS `mar_ccy`,
  `nielsen_cl_1`.`mar_country` AS `mar_country`,
  `nielsen_cl_1`.`mar_long_desc` AS `mar_long_desc`,
  `nielsen_cl_1`.`mar_sequence` AS `mar_sequence`,
  `nielsen_cl_1`.`mar_short_desc` AS `mar_short_desc`,
  `nielsen_cl_1`.`mar_tag` AS `mar_tag`,
  `nielsen_cl_1`.`per_list` AS `per_list`,
  `nielsen_cl_1`.`per_long_desc` AS `per_long_desc`,
  `nielsen_cl_1`.`per_sequence` AS `per_sequence`,
  `nielsen_cl_1`.`per_short_desc` AS `per_short_desc`,
  `nielsen_cl_1`.`per_tag` AS `per_tag`,
  `nielsen_cl_1`.`pro_aroma/fragancia/sabor` AS `pro_aroma/fragancia/sabor`,
  `nielsen_cl_1`.`pro_atributo` AS `pro_atributo`,
  `nielsen_cl_1`.`pro_barcode` AS `pro_barcode`,
  `nielsen_cl_1`.`pro_categoria` AS `pro_categoria`,
  `nielsen_cl_1`.`pro_category` AS `pro_category`,
  `nielsen_cl_1`.`pro_char08` AS `pro_char08`,
  `nielsen_cl_1`.`pro_char10` AS `pro_char10`,
  `nielsen_cl_1`.`pro_char16` AS `pro_char16`,
  `nielsen_cl_1`.`pro_char17` AS `pro_char17`,
  `nielsen_cl_1`.`pro_color` AS `pro_color`,
  `nielsen_cl_1`.`pro_consistencia` AS `pro_consistencia`,
  `nielsen_cl_1`.`pro_empaque/envase` AS `pro_empaque/envase`,
  `nielsen_cl_1`.`pro_empaque` AS `pro_empaque`,
  `nielsen_cl_1`.`pro_envasado/granel` AS `pro_envasado/granel`,
  `nielsen_cl_1`.`pro_envase` AS `pro_envase`,
  `nielsen_cl_1`.`pro_fabricante` AS `pro_fabricante`,
  `nielsen_cl_1`.`pro_formato` AS `pro_formato`,
  `nielsen_cl_1`.`pro_gasificacion` AS `pro_gasificacion`,
  `nielsen_cl_1`.`pro_item` AS `pro_item`,
  `nielsen_cl_1`.`pro_level` AS `pro_level`,
  `nielsen_cl_1`.`pro_long_desc` AS `pro_long_desc`,
  `nielsen_cl_1`.`pro_marca` AS `pro_marca`,
  `nielsen_cl_1`.`pro_marcas` AS `pro_marcas`,
  `nielsen_cl_1`.`pro_natural/sabor` AS `pro_natural/sabor`,
  `nielsen_cl_1`.`pro_rangos` AS `pro_rangos`,
  `nielsen_cl_1`.`pro_reg/diet` AS `pro_reg/diet`,
  `nielsen_cl_1`.`pro_regular/light` AS `pro_regular/light`,
  `nielsen_cl_1`.`pro_regular_/_light_-_diet` AS `pro_regular_/_light_-_diet`,
  

Creating a Tableau extracts with Drill 1.12 uses unlimited memory

2018-01-25 Thread Francis McGregor-Macdonald
Creating a creating a Tableau (with 10.3, 10.5 desktop) extract from a
Drill (1.12 on EMR) cluster memory appears not to adhere to the limits set
by planner.memory.max_query_memory_per_node.

The extract query consumes all memory and then crashes drill.

Running the same query as a create table memory behaves as expected.

The query complexity is trivial:
select * from view only a single parquet with no calculated fields.

Has anyone else observed this behavior?


Visibility of Workspaces or Views

2017-07-13 Thread Francis McGregor-Macdonald
Hi,

I have a situation where I would like to restrict access to workspaces
based on the user. I have an instance where I would like to allow some
third-party access to a subset of views. I can't find a standard method
here.

The only similar issue I could find was this:
https://issues.apache.org/jira/browse/DRILL-3467

Is there a standard practice here to limit workspaces for users?

Thanks,
Francis