Re: QSQL via jdbc (python3 and JayDeBeApi) wraps with SELECT ... LIMIT 0
Hi Kunal, I did find: https://github.com/baztian/jaydebeapi/commit/a1f8d3c3b4621570065d968b4b734bae3f0eaf79 which suggests that this may be already being worked on. I'm not sure if it would cover this use case though. I posted a question on the commit which hopefully gets an answer. On Tue, Mar 20, 2018 at 5:45 PM, Kunal Khatua <kunalkha...@gmail.com> wrote: > Francis > > I'm certain this is the result of JayDeBeApi using the preparedStatement > command. (DRILL-5316. See the comments in the JIRA) > > I was thinking of creating a fork and using the standard > Connection.getStatement() API instead, before compiling. However, I'm > currently on a time crunch and my Python skills are a bit rusty. Hoping > someone in the community can step forward and take a crack at this. > > ~ KK > > > On 3/19/2018 7:30:49 PM, Francis McGregor-Macdonald <fran...@mc-mac.com> > wrote: > Thanks Kunal and Charles, > > I rebuilt the script / environment inside a container to see if I could > replicate and I have the same result. > > The container is running on an EC2 "next to" the cluster. > > Charles was there any additional configuration you had done? > > I have in the Dockerfile: > ... > conda install -c conda-forge jpype1 -q && \ > conda install pip -q && \ > pip install jaydebeapi -q && \ > ... > > I am only loading the single jar into the container, I additionally get: > "SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"." ... > when > running the script. > > I can see suggestions this might be related to "Prepared Statements" but > can't find anything definitive. > > > > On Mon, Mar 19, 2018 at 6:16 PM, Kunal Khatua wrote: > > > This error looks familiar and might be because of the Python library > > wrapping a select * around the original query. > > > > Using the JDBC driver directly doesn’t seem to show this problem. Drill > > 1.13.0 is out now. Could you give a try with that and confirm if the > > behavior is the same? > > > > -Original Message- > > From: Charles Givre > > Sent: Sunday, March 18, 2018 9:10 PM > > To: user@drill.apache.org > > Subject: Re: QSQL via jdbc (python3 and JayDeBeApi) wraps with SELECT ... > > LIMIT 0 > > > > Hi Francis, > > > > > > The code below worked for me. Also, I don’t know if it matters, but did > > you mean to create two cursors? > > — C > > > > import jaydebeapi > > import pandas as pd > > > > #Create the connection object > > conn = jaydebeapi.connect("org.apache.drill.jdbc.Driver", > > "jdbc:drill:drillbit=localhost:31010", > > ["admin", "password"], > > "/usr/local/share/drill/jars/ > > jdbc-driver/drill-jdbc-all-1.12.0.jar",) > > > > #Create the Cursor Object > > curs = conn.cursor() > > > > #Execute the query > > curs.execute("SELECT * FROM cp.`employee.json` LIMIT 20") > > > > #Get the results > > curs.fetchall() > > > > #Read query results into a Pandas DataFrame df = pd.read_sql("SELECT * > > FROM cp.`employee.json` LIMIT 20", conn) > > > > > > > > > > > > > On Mar 18, 2018, at 23:41, Francis McGregor-Macdonald > > fran...@mc-mac.com> wrote: > > > > > > Hi all, > > > > > > I am attempting to send a query from python3 via JayDeBeApi and am > > > encountering the issue that the SQL is enclosed in a SELECT * FROM > > > $myquery LIMIT 0 > > > > > > With: > > > conn = jaydebeapi.connect("org.apache.drill.jdbc.Driver", > > > "jdbc:drill:drillbit=$mycluster:$myport", > > > ["$username", "$password"], > > > "/tmp/drill-jdbc-all-1.12.0.jar") > > > curs = conn.cursor() > > > curs = conn.cursor() > > > curs.execute('SHOW DATABASES') > > > > > > ... the query hits Drill as: > > > SELECT * FROM (SHOW DATABASES) LIMIT 0 > > > > > > A select * from mytable limit 100 also has the same issue. > > > > > > Drill is version 1.12 > > > > > > This also occurs with other queries. I found > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org > > > _jira_browse_DRILL-2D5136=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=-cT6otg6 > > > lpT_XkmYy7yg3A=P8xoBFS297Ln7VimEBQXJDYYFIdoiHRELssI6Cnf4IM=6S6F2Zp > > > p92kNVbtDDuGR29X21JjbNNnn6FZWibbd-gk= which looks similar and lists > > > "Client - ODBC" (not JDBC) > > > > > > Has anyone else encountered this? > > > > >
Re: QSQL via jdbc (python3 and JayDeBeApi) wraps with SELECT ... LIMIT 0
Thanks Kunal and Charles, I rebuilt the script / environment inside a container to see if I could replicate and I have the same result. The container is running on an EC2 "next to" the cluster. Charles was there any additional configuration you had done? I have in the Dockerfile: ... conda install -c conda-forge jpype1 -q && \ conda install pip -q && \ pip install jaydebeapi -q && \ ... I am only loading the single jar into the container, I additionally get: "SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"." ... when running the script. I can see suggestions this might be related to "Prepared Statements" but can't find anything definitive. On Mon, Mar 19, 2018 at 6:16 PM, Kunal Khatua <kkha...@mapr.com> wrote: > This error looks familiar and might be because of the Python library > wrapping a select * around the original query. > > Using the JDBC driver directly doesn’t seem to show this problem. Drill > 1.13.0 is out now. Could you give a try with that and confirm if the > behavior is the same? > > -Original Message- > From: Charles Givre <cgi...@gmail.com> > Sent: Sunday, March 18, 2018 9:10 PM > To: user@drill.apache.org > Subject: Re: QSQL via jdbc (python3 and JayDeBeApi) wraps with SELECT ... > LIMIT 0 > > Hi Francis, > > > The code below worked for me. Also, I don’t know if it matters, but did > you mean to create two cursors? > — C > > import jaydebeapi > import pandas as pd > > #Create the connection object > conn = jaydebeapi.connect("org.apache.drill.jdbc.Driver", > "jdbc:drill:drillbit=localhost:31010", > ["admin", "password"], > "/usr/local/share/drill/jars/ > jdbc-driver/drill-jdbc-all-1.12.0.jar",) > > #Create the Cursor Object > curs = conn.cursor() > > #Execute the query > curs.execute("SELECT * FROM cp.`employee.json` LIMIT 20") > > #Get the results > curs.fetchall() > > #Read query results into a Pandas DataFrame df = pd.read_sql("SELECT * > FROM cp.`employee.json` LIMIT 20", conn) > > > > > > > On Mar 18, 2018, at 23:41, Francis McGregor-Macdonald < > fran...@mc-mac.com> wrote: > > > > Hi all, > > > > I am attempting to send a query from python3 via JayDeBeApi and am > > encountering the issue that the SQL is enclosed in a SELECT * FROM > > $myquery LIMIT 0 > > > > With: > > conn = jaydebeapi.connect("org.apache.drill.jdbc.Driver", > > "jdbc:drill:drillbit=$mycluster:$myport", > > ["$username", "$password"], > > "/tmp/drill-jdbc-all-1.12.0.jar") > > curs = conn.cursor() > > curs = conn.cursor() > > curs.execute('SHOW DATABASES') > > > > ... the query hits Drill as: > > SELECT * FROM (SHOW DATABASES) LIMIT 0 > > > > A select * from mytable limit 100 also has the same issue. > > > > Drill is version 1.12 > > > > This also occurs with other queries. I found > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org > > _jira_browse_DRILL-2D5136=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=-cT6otg6 > > lpT_XkmYy7yg3A=P8xoBFS297Ln7VimEBQXJDYYFIdoiHRELssI6Cnf4IM=6S6F2Zp > > p92kNVbtDDuGR29X21JjbNNnn6FZWibbd-gk= which looks similar and lists > > "Client - ODBC" (not JDBC) > > > > Has anyone else encountered this? > >
QSQL via jdbc (python3 and JayDeBeApi) wraps with SELECT ... LIMIT 0
Hi all, I am attempting to send a query from python3 via JayDeBeApi and am encountering the issue that the SQL is enclosed in a SELECT * FROM $myquery LIMIT 0 With: conn = jaydebeapi.connect("org.apache.drill.jdbc.Driver", "jdbc:drill:drillbit=$mycluster:$myport", ["$username", "$password"], "/tmp/drill-jdbc-all-1.12.0.jar") curs = conn.cursor() curs = conn.cursor() curs.execute('SHOW DATABASES') ... the query hits Drill as: SELECT * FROM (SHOW DATABASES) LIMIT 0 A select * from mytable limit 100 also has the same issue. Drill is version 1.12 This also occurs with other queries. I found https://issues.apache.org/jira/browse/DRILL-5136 which looks similar and lists "Client - ODBC" (not JDBC) Has anyone else encountered this?
Re: Drill on AWS
Hi Brandon, I created a quick gist, hopefully it comes across clearly. https://gist.github.com/fmcmac/a35738376d111fdca45057bd0fb4c79e Any improvements greatly appreciated and happy to update it based on feedback. Regards, Francis On Tue, Feb 13, 2018 at 11:11 AM, Brandon Gmail <brantaylor...@gmail.com> wrote: > Hey Francis, > > Yes please! That option will work. I would appreciate any documentation > you might have. > > > On Feb 12, 2018, at 2:51 PM, Francis McGregor-Macdonald < > fran...@mc-mac.com> wrote: > > > > Hi Brandon, > > > > I recently went through this challenge and found it easier in the end to > > install SSM with a bootstrap and then use ssm.send_command > > with AWS-RunRemoteScript once the cluster was in place. > > > > This way I can use the AWS installed Zookeeper. The remote script is > pretty > > much a facsimile of the Drill Install instructions. > > > > If that option would work happy to share. > > > > On Tue, Feb 13, 2018 at 9:22 AM, Brandon Gmail <brantaylor...@gmail.com> > > wrote: > > > >> Does anyone have an updated bootstrap script for drill on AWS EMR? Their > >> repository is over 3 years old, and I’m flailing at figuring this out > on my > >> own. Any help would be appreciated. > >> > >> Thanks, > >> Brandon > >> > >
Re: Drill on AWS
Hi Brandon, I recently went through this challenge and found it easier in the end to install SSM with a bootstrap and then use ssm.send_command with AWS-RunRemoteScript once the cluster was in place. This way I can use the AWS installed Zookeeper. The remote script is pretty much a facsimile of the Drill Install instructions. If that option would work happy to share. On Tue, Feb 13, 2018 at 9:22 AM, Brandon Gmailwrote: > Does anyone have an updated bootstrap script for drill on AWS EMR? Their > repository is over 3 years old, and I’m flailing at figuring this out on my > own. Any help would be appreciated. > > Thanks, > Brandon >
Re: Fwd: Creating a Tableau extracts with Drill 1.12 uses unlimited memory
Hi all, A physical plan attached ... all memory appears to be 0.0 which seems odd? Thanks On Sun, Jan 28, 2018 at 10:37 PM, Francis McGregor-Macdonald < fran...@mc-mac.com> wrote: > And with logs as attachments. > > On Sun, Jan 28, 2018 at 9:40 PM, Francis McGregor-Macdonald < > fran...@mc-mac.com> wrote: > >> Thanks Paul and Kunal, >> I think I have the right information now. With Paul's changes (and fixing >> up a zoo.cfg error) it isn't crashing, rather failing. Logs attached, still >> blowing past memory limits. It does the same thing when re-running the >> query from the web console so presumably its not actually Tableau related >> despite me first generating it that way. >> >> Thanks. >> >> On Sat, Jan 27, 2018 at 1:15 PM, Francis McGregor-Macdonald < >> fran...@mc-mac.com> wrote: >> >>> Thanks Paul, >>> >>> I will update with your suggested memory allocations also and retry. >>> >>> Zookeeper crashed too which might explain more? I have attached the logs >>> from Zookeeper too. >>> >>> Thanks >>> >>> On Sat, Jan 27, 2018 at 6:45 AM, Paul Rogers <par0...@yahoo.com> wrote: >>> >>>> Hi Francis, >>>> >>>> Thanks much for the log. The log shows running a query, then >>>> immediately shows entries that occur when starting Drill. I'm guessing that >>>> Drill literally crashed at this point? This is more severe than the usual >>>> error in which a query exhausts memory. >>>> >>>> Some general observations. The Drill memory is 60 GB, but system memory >>>> is 61 GB. Perhaps try dropping total Drill memory some to give the OS and >>>> other tasks more headroom. For a SELECT * memory, Drill needs far less than >>>> what you have, so maybe try giving Drill 48 GB total. >>>> >>>> Then, Drill needs direct memory much more than heap. So, maybe give >>>> Drill 39 GB direct, 8 GB heap and 1 GB (the default) for code cache. These >>>> settings are in drill-env.sh. >>>> >>>> Kunal, you have more experience with these issues. Can you make >>>> additional suggestions by looking at the log? >>>> >>>> Thanks, >>>> >>>> - Paul >>>> >>>> >>>> >>>> On Thursday, January 25, 2018, 10:20:29 PM PST, Francis >>>> McGregor-Macdonald <fran...@mc-mac.com> wrote: >>>> >>>> >>>> Hi all, >>>> >>>> I am guessing that each of your EMR nodes are quite large? EMR nodes >>>> are: r4.2xlarge ('vcpu': 8, 'memory': 61) >>>> >>>> Property "planner.width.max_per_node" is set to = 6 >>>> >>>> What is the system memory and what are the allocations for heap and >>>> direct? >>>> System Memory: 61GB (EMR nodes above) >>>> drill_mem_heap: 12G >>>> drill_mem_max: 48G >>>> >>>> The view is simple: SELECT * FROM s3://myparquet.parquet (14GB) >>>> >>>> planner.memory.max_query_memor y_per_node = 10479720202 >>>> >>>> Drillbit.log attached (I think I have the correct selection included). >>>> >>>> Thanks >>>> >>>> On Fri, Jan 26, 2018 at 2:41 PM, Kunal Khatua <kkha...@mapr.com> wrote: >>>> >>>> What is the system memory and what are the allocations for heap and >>>> direct? The memory crash might be occurring due to insufficient heap. The >>>> limits parameter applies to the direct memory and not Heap. >>>> >>>> Can you share details in the logs from the crash? >>>> >>>> -Original Message- >>>> From: Timothy Farkas [mailto:tfar...@mapr.com] >>>> Sent: Thursday, January 25, 2018 2:58 PM >>>> To: user@drill.apache.org >>>> Subject: Re: Creating a Tableau extracts with Drill 1.12 uses unlimited >>>> memory >>>> >>>> Hi Francis, >>>> >>>> I am guessing that each of your EMR nodes are quite large (32 or 64 >>>> vcpus). On large machines Drill's planner over parallelizes and over >>>> allocates memory. There is a property "planner.width.max_per_node" which >>>> limits the number of operators that can simultaneously execute on a >>>> Drillbit for a query. If you configure the width per node to something like >>>> 5 or 10 (you may have to play around
Fwd: Fwd: Creating a Tableau extracts with Drill 1.12 uses unlimited memory
And with logs as attachments. On Sun, Jan 28, 2018 at 9:40 PM, Francis McGregor-Macdonald < fran...@mc-mac.com> wrote: > Thanks Paul and Kunal, > I think I have the right information now. With Paul's changes (and fixing > up a zoo.cfg error) it isn't crashing, rather failing. Logs attached, still > blowing past memory limits. It does the same thing when re-running the > query from the web console so presumably its not actually Tableau related > despite me first generating it that way. > > Thanks. > > On Sat, Jan 27, 2018 at 1:15 PM, Francis McGregor-Macdonald < > fran...@mc-mac.com> wrote: > >> Thanks Paul, >> >> I will update with your suggested memory allocations also and retry. >> >> Zookeeper crashed too which might explain more? I have attached the logs >> from Zookeeper too. >> >> Thanks >> >> On Sat, Jan 27, 2018 at 6:45 AM, Paul Rogers <par0...@yahoo.com> wrote: >> >>> Hi Francis, >>> >>> Thanks much for the log. The log shows running a query, then immediately >>> shows entries that occur when starting Drill. I'm guessing that Drill >>> literally crashed at this point? This is more severe than the usual error >>> in which a query exhausts memory. >>> >>> Some general observations. The Drill memory is 60 GB, but system memory >>> is 61 GB. Perhaps try dropping total Drill memory some to give the OS and >>> other tasks more headroom. For a SELECT * memory, Drill needs far less than >>> what you have, so maybe try giving Drill 48 GB total. >>> >>> Then, Drill needs direct memory much more than heap. So, maybe give >>> Drill 39 GB direct, 8 GB heap and 1 GB (the default) for code cache. These >>> settings are in drill-env.sh. >>> >>> Kunal, you have more experience with these issues. Can you make >>> additional suggestions by looking at the log? >>> >>> Thanks, >>> >>> - Paul >>> >>> >>> >>> On Thursday, January 25, 2018, 10:20:29 PM PST, Francis >>> McGregor-Macdonald <fran...@mc-mac.com> wrote: >>> >>> >>> Hi all, >>> >>> I am guessing that each of your EMR nodes are quite large? EMR nodes >>> are: r4.2xlarge ('vcpu': 8, 'memory': 61) >>> >>> Property "planner.width.max_per_node" is set to = 6 >>> >>> What is the system memory and what are the allocations for heap and >>> direct? >>> System Memory: 61GB (EMR nodes above) >>> drill_mem_heap: 12G >>> drill_mem_max: 48G >>> >>> The view is simple: SELECT * FROM s3://myparquet.parquet (14GB) >>> >>> planner.memory.max_query_memor y_per_node = 10479720202 >>> >>> Drillbit.log attached (I think I have the correct selection included). >>> >>> Thanks >>> >>> On Fri, Jan 26, 2018 at 2:41 PM, Kunal Khatua <kkha...@mapr.com> wrote: >>> >>> What is the system memory and what are the allocations for heap and >>> direct? The memory crash might be occurring due to insufficient heap. The >>> limits parameter applies to the direct memory and not Heap. >>> >>> Can you share details in the logs from the crash? >>> >>> -Original Message- >>> From: Timothy Farkas [mailto:tfar...@mapr.com] >>> Sent: Thursday, January 25, 2018 2:58 PM >>> To: user@drill.apache.org >>> Subject: Re: Creating a Tableau extracts with Drill 1.12 uses unlimited >>> memory >>> >>> Hi Francis, >>> >>> I am guessing that each of your EMR nodes are quite large (32 or 64 >>> vcpus). On large machines Drill's planner over parallelizes and over >>> allocates memory. There is a property "planner.width.max_per_node" which >>> limits the number of operators that can simultaneously execute on a >>> Drillbit for a query. If you configure the width per node to something like >>> 5 or 10 (you may have to play around with it) things should start working. >>> >>> Thanks, >>> Tim >>> >>> __ __ >>> From: Francis McGregor-Macdonald <fran...@mc-mac.com> >>> Sent: Thursday, January 25, 2018 1:58:22 PM >>> To: user@drill.apache.org >>> Subject: Creating a Tableau extracts with Drill 1.12 uses unlimited >>> memory >>> >>> Creating a creating a Tableau (with 10.3, 10.5 desktop) extract from a >>> Drill (1.12 on EMR) cluster memory appears not to adhere to the limits set >>> by planner.memory.max_query_memor y_per_node. >>> >>> The extract query consumes all memory and then crashes drill. >>> >>> Running the same query as a create table memory behaves as expected. >>> >>> The query complexity is trivial: >>> select * from view only a single parquet with no calculated fields. >>> >>> Has anyone else observed this behavior? >>> >>> >>> >>> >> >
Re: Fwd: Creating a Tableau extracts with Drill 1.12 uses unlimited memory
Thanks Paul and Kunal, I think I have the right information now. With Paul's changes (and fixing up a zoo.cfg error) it isn't crashing, rather failing. Logs attached, still blowing past memory limits. It does the same thing when re-running the query from the web console so presumably its not actually Tableau related despite me first generating it that way. Thanks. On Sat, Jan 27, 2018 at 1:15 PM, Francis McGregor-Macdonald < fran...@mc-mac.com> wrote: > Thanks Paul, > > I will update with your suggested memory allocations also and retry. > > Zookeeper crashed too which might explain more? I have attached the logs > from Zookeeper too. > > Thanks > > On Sat, Jan 27, 2018 at 6:45 AM, Paul Rogers <par0...@yahoo.com> wrote: > >> Hi Francis, >> >> Thanks much for the log. The log shows running a query, then immediately >> shows entries that occur when starting Drill. I'm guessing that Drill >> literally crashed at this point? This is more severe than the usual error >> in which a query exhausts memory. >> >> Some general observations. The Drill memory is 60 GB, but system memory >> is 61 GB. Perhaps try dropping total Drill memory some to give the OS and >> other tasks more headroom. For a SELECT * memory, Drill needs far less than >> what you have, so maybe try giving Drill 48 GB total. >> >> Then, Drill needs direct memory much more than heap. So, maybe give Drill >> 39 GB direct, 8 GB heap and 1 GB (the default) for code cache. These >> settings are in drill-env.sh. >> >> Kunal, you have more experience with these issues. Can you make >> additional suggestions by looking at the log? >> >> Thanks, >> >> - Paul >> >> >> >> On Thursday, January 25, 2018, 10:20:29 PM PST, Francis >> McGregor-Macdonald <fran...@mc-mac.com> wrote: >> >> >> Hi all, >> >> I am guessing that each of your EMR nodes are quite large? EMR nodes are: >> r4.2xlarge ('vcpu': 8, 'memory': 61) >> >> Property "planner.width.max_per_node" is set to = 6 >> >> What is the system memory and what are the allocations for heap and >> direct? >> System Memory: 61GB (EMR nodes above) >> drill_mem_heap: 12G >> drill_mem_max: 48G >> >> The view is simple: SELECT * FROM s3://myparquet.parquet (14GB) >> >> planner.memory.max_query_memor y_per_node = 10479720202 >> >> Drillbit.log attached (I think I have the correct selection included). >> >> Thanks >> >> On Fri, Jan 26, 2018 at 2:41 PM, Kunal Khatua <kkha...@mapr.com> wrote: >> >> What is the system memory and what are the allocations for heap and >> direct? The memory crash might be occurring due to insufficient heap. The >> limits parameter applies to the direct memory and not Heap. >> >> Can you share details in the logs from the crash? >> >> -Original Message- >> From: Timothy Farkas [mailto:tfar...@mapr.com] >> Sent: Thursday, January 25, 2018 2:58 PM >> To: user@drill.apache.org >> Subject: Re: Creating a Tableau extracts with Drill 1.12 uses unlimited >> memory >> >> Hi Francis, >> >> I am guessing that each of your EMR nodes are quite large (32 or 64 >> vcpus). On large machines Drill's planner over parallelizes and over >> allocates memory. There is a property "planner.width.max_per_node" which >> limits the number of operators that can simultaneously execute on a >> Drillbit for a query. If you configure the width per node to something like >> 5 or 10 (you may have to play around with it) things should start working. >> >> Thanks, >> Tim >> >> __ __ >> From: Francis McGregor-Macdonald <fran...@mc-mac.com> >> Sent: Thursday, January 25, 2018 1:58:22 PM >> To: user@drill.apache.org >> Subject: Creating a Tableau extracts with Drill 1.12 uses unlimited memory >> >> Creating a creating a Tableau (with 10.3, 10.5 desktop) extract from a >> Drill (1.12 on EMR) cluster memory appears not to adhere to the limits set >> by planner.memory.max_query_memor y_per_node. >> >> The extract query consumes all memory and then crashes drill. >> >> Running the same query as a create table memory behaves as expected. >> >> The query complexity is trivial: >> select * from view only a single parquet with no calculated fields. >> >> Has anyone else observed this behavior? >> >> >> >> >
Re: Fwd: Creating a Tableau extracts with Drill 1.12 uses unlimited memory
Thanks Paul, I will update with your suggested memory allocations also and retry. Zookeeper crashed too which might explain more? I have attached the logs from Zookeeper too. Thanks On Sat, Jan 27, 2018 at 6:45 AM, Paul Rogers <par0...@yahoo.com> wrote: > Hi Francis, > > Thanks much for the log. The log shows running a query, then immediately > shows entries that occur when starting Drill. I'm guessing that Drill > literally crashed at this point? This is more severe than the usual error > in which a query exhausts memory. > > Some general observations. The Drill memory is 60 GB, but system memory is > 61 GB. Perhaps try dropping total Drill memory some to give the OS and > other tasks more headroom. For a SELECT * memory, Drill needs far less than > what you have, so maybe try giving Drill 48 GB total. > > Then, Drill needs direct memory much more than heap. So, maybe give Drill > 39 GB direct, 8 GB heap and 1 GB (the default) for code cache. These > settings are in drill-env.sh. > > Kunal, you have more experience with these issues. Can you make additional > suggestions by looking at the log? > > Thanks, > > - Paul > > > > On Thursday, January 25, 2018, 10:20:29 PM PST, Francis McGregor-Macdonald > <fran...@mc-mac.com> wrote: > > > Hi all, > > I am guessing that each of your EMR nodes are quite large? EMR nodes are: > r4.2xlarge ('vcpu': 8, 'memory': 61) > > Property "planner.width.max_per_node" is set to = 6 > > What is the system memory and what are the allocations for heap and direct? > System Memory: 61GB (EMR nodes above) > drill_mem_heap: 12G > drill_mem_max: 48G > > The view is simple: SELECT * FROM s3://myparquet.parquet (14GB) > > planner.memory.max_query_memor y_per_node = 10479720202 > > Drillbit.log attached (I think I have the correct selection included). > > Thanks > > On Fri, Jan 26, 2018 at 2:41 PM, Kunal Khatua <kkha...@mapr.com> wrote: > > What is the system memory and what are the allocations for heap and > direct? The memory crash might be occurring due to insufficient heap. The > limits parameter applies to the direct memory and not Heap. > > Can you share details in the logs from the crash? > > -Original Message- > From: Timothy Farkas [mailto:tfar...@mapr.com] > Sent: Thursday, January 25, 2018 2:58 PM > To: user@drill.apache.org > Subject: Re: Creating a Tableau extracts with Drill 1.12 uses unlimited > memory > > Hi Francis, > > I am guessing that each of your EMR nodes are quite large (32 or 64 > vcpus). On large machines Drill's planner over parallelizes and over > allocates memory. There is a property "planner.width.max_per_node" which > limits the number of operators that can simultaneously execute on a > Drillbit for a query. If you configure the width per node to something like > 5 or 10 (you may have to play around with it) things should start working. > > Thanks, > Tim > > __ __ > From: Francis McGregor-Macdonald <fran...@mc-mac.com> > Sent: Thursday, January 25, 2018 1:58:22 PM > To: user@drill.apache.org > Subject: Creating a Tableau extracts with Drill 1.12 uses unlimited memory > > Creating a creating a Tableau (with 10.3, 10.5 desktop) extract from a > Drill (1.12 on EMR) cluster memory appears not to adhere to the limits set > by planner.memory.max_query_memor y_per_node. > > The extract query consumes all memory and then crashes drill. > > Running the same query as a create table memory behaves as expected. > > The query complexity is trivial: > select * from view only a single parquet with no calculated fields. > > Has anyone else observed this behavior? > > > >
Fwd: Creating a Tableau extracts with Drill 1.12 uses unlimited memory
Hi all, I am guessing that each of your EMR nodes are quite large? EMR nodes are: r4.2xlarge ('vcpu': 8, 'memory': 61) Property "planner.width.max_per_node" is set to = 6 What is the system memory and what are the allocations for heap and direct? System Memory: 61GB (EMR nodes above) drill_mem_heap: 12G drill_mem_max: 48G The view is simple: SELECT * FROM s3://myparquet.parquet (14GB) planner.memory.max_query_memory_per_node = 10479720202 Drillbit.log attached (I think I have the correct selection included). Thanks On Fri, Jan 26, 2018 at 2:41 PM, Kunal Khatua <kkha...@mapr.com> wrote: > What is the system memory and what are the allocations for heap and > direct? The memory crash might be occurring due to insufficient heap. The > limits parameter applies to the direct memory and not Heap. > > Can you share details in the logs from the crash? > > -Original Message- > From: Timothy Farkas [mailto:tfar...@mapr.com] > Sent: Thursday, January 25, 2018 2:58 PM > To: user@drill.apache.org > Subject: Re: Creating a Tableau extracts with Drill 1.12 uses unlimited > memory > > Hi Francis, > > I am guessing that each of your EMR nodes are quite large (32 or 64 > vcpus). On large machines Drill's planner over parallelizes and over > allocates memory. There is a property "planner.width.max_per_node" which > limits the number of operators that can simultaneously execute on a > Drillbit for a query. If you configure the width per node to something like > 5 or 10 (you may have to play around with it) things should start working. > > Thanks, > Tim > > > From: Francis McGregor-Macdonald <fran...@mc-mac.com> > Sent: Thursday, January 25, 2018 1:58:22 PM > To: user@drill.apache.org > Subject: Creating a Tableau extracts with Drill 1.12 uses unlimited memory > > Creating a creating a Tableau (with 10.3, 10.5 desktop) extract from a > Drill (1.12 on EMR) cluster memory appears not to adhere to the limits set > by planner.memory.max_query_memory_per_node. > > The extract query consumes all memory and then crashes drill. > > Running the same query as a create table memory behaves as expected. > > The query complexity is trivial: > select * from view only a single parquet with no calculated fields. > > Has anyone else observed this behavior? > > 2018-01-26 05:58:37,904 [25953c71-9933-e3ad-680b-d65363d46f0d:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query id 25953c71-9933-e3ad-680b-d65363d46f0d: SELECT 1 AS `Number of Records`, `nielsen_cl_1`.`dat_fact` AS `dat_fact`, `nielsen_cl_1`.`dat_market` AS `dat_market`, `nielsen_cl_1`.`dat_period` AS `dat_period`, `nielsen_cl_1`.`dat_product` AS `dat_product`, `nielsen_cl_1`.`dat_value` AS `dat_value`, `nielsen_cl_1`.`dataset` AS `dataset`, `nielsen_cl_1`.`mar_ccy` AS `mar_ccy`, `nielsen_cl_1`.`mar_country` AS `mar_country`, `nielsen_cl_1`.`mar_long_desc` AS `mar_long_desc`, `nielsen_cl_1`.`mar_sequence` AS `mar_sequence`, `nielsen_cl_1`.`mar_short_desc` AS `mar_short_desc`, `nielsen_cl_1`.`mar_tag` AS `mar_tag`, `nielsen_cl_1`.`per_list` AS `per_list`, `nielsen_cl_1`.`per_long_desc` AS `per_long_desc`, `nielsen_cl_1`.`per_sequence` AS `per_sequence`, `nielsen_cl_1`.`per_short_desc` AS `per_short_desc`, `nielsen_cl_1`.`per_tag` AS `per_tag`, `nielsen_cl_1`.`pro_aroma/fragancia/sabor` AS `pro_aroma/fragancia/sabor`, `nielsen_cl_1`.`pro_atributo` AS `pro_atributo`, `nielsen_cl_1`.`pro_barcode` AS `pro_barcode`, `nielsen_cl_1`.`pro_categoria` AS `pro_categoria`, `nielsen_cl_1`.`pro_category` AS `pro_category`, `nielsen_cl_1`.`pro_char08` AS `pro_char08`, `nielsen_cl_1`.`pro_char10` AS `pro_char10`, `nielsen_cl_1`.`pro_char16` AS `pro_char16`, `nielsen_cl_1`.`pro_char17` AS `pro_char17`, `nielsen_cl_1`.`pro_color` AS `pro_color`, `nielsen_cl_1`.`pro_consistencia` AS `pro_consistencia`, `nielsen_cl_1`.`pro_empaque/envase` AS `pro_empaque/envase`, `nielsen_cl_1`.`pro_empaque` AS `pro_empaque`, `nielsen_cl_1`.`pro_envasado/granel` AS `pro_envasado/granel`, `nielsen_cl_1`.`pro_envase` AS `pro_envase`, `nielsen_cl_1`.`pro_fabricante` AS `pro_fabricante`, `nielsen_cl_1`.`pro_formato` AS `pro_formato`, `nielsen_cl_1`.`pro_gasificacion` AS `pro_gasificacion`, `nielsen_cl_1`.`pro_item` AS `pro_item`, `nielsen_cl_1`.`pro_level` AS `pro_level`, `nielsen_cl_1`.`pro_long_desc` AS `pro_long_desc`, `nielsen_cl_1`.`pro_marca` AS `pro_marca`, `nielsen_cl_1`.`pro_marcas` AS `pro_marcas`, `nielsen_cl_1`.`pro_natural/sabor` AS `pro_natural/sabor`, `nielsen_cl_1`.`pro_rangos` AS `pro_rangos`, `nielsen_cl_1`.`pro_reg/diet` AS `pro_reg/diet`, `nielsen_cl_1`.`pro_regular/light` AS `pro_regular/light`, `nielsen_cl_1`.`pro_regular_/_light_-_diet` AS `pro_regular_/_light_-_diet`,
Creating a Tableau extracts with Drill 1.12 uses unlimited memory
Creating a creating a Tableau (with 10.3, 10.5 desktop) extract from a Drill (1.12 on EMR) cluster memory appears not to adhere to the limits set by planner.memory.max_query_memory_per_node. The extract query consumes all memory and then crashes drill. Running the same query as a create table memory behaves as expected. The query complexity is trivial: select * from view only a single parquet with no calculated fields. Has anyone else observed this behavior?
Visibility of Workspaces or Views
Hi, I have a situation where I would like to restrict access to workspaces based on the user. I have an instance where I would like to allow some third-party access to a subset of views. I can't find a standard method here. The only similar issue I could find was this: https://issues.apache.org/jira/browse/DRILL-3467 Is there a standard practice here to limit workspaces for users? Thanks, Francis