[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query
[ https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902684#comment-16902684 ] ASF GitHub Bot commented on DRILL-7222: --- kkhatua commented on issue #1779: DRILL-7222: Visualize estimated and actual row counts for a query URL: https://github.com/apache/drill/pull/1779#issuecomment-519365759 @agozhiy done most of the changes you've suggested and commented on the rest. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Visualize estimated and actual row counts for a query > - > > Key: DRILL-7222 > URL: https://issues.apache.org/jira/browse/DRILL-7222 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting, user-experience > Fix For: 1.17.0 > > > With statistics in place, it would be useful to have the *estimated* rowcount > along side the *actual* rowcount query profile's operator overview. > We can extract this from the Physical Plan section of the profile. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (DRILL-7341) Vector reAlloc may fails after exchange.
[ https://issues.apache.org/jira/browse/DRILL-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902625#comment-16902625 ] ASF GitHub Bot commented on DRILL-7341: --- weijietong commented on pull request #1838: DRILL-7341: Vector reAlloc may fails after exchange URL: https://github.com/apache/drill/pull/1838#discussion_r311841988 ## File path: exec/vector/src/main/codegen/templates/FixedValueVectors.java ## @@ -210,10 +210,16 @@ public void reAlloc() { // a zero-length buffer. Instead, just allocate a 256 byte // buffer if we start at 0. -final long newAllocationSize = allocationSizeInBytes == 0 +long newAllocationSize = allocationSizeInBytes == 0 ? 256 : allocationSizeInBytes * 2L; +// Some operations, such as Value Vector#exchange, can be change DrillBuf data field without corresponding allocation size changes. +// Check that the size of the allocation is sufficient to copy the old buffer. +while (newAllocationSize < data.capacity()) { Review comment: please also change the following 232 line: `data.setZero(halfNewCapacity, halfNewCapacity)` logic which initializes the allocated fresh buffer content to zero. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Vector reAlloc may fails after exchange. > > > Key: DRILL-7341 > URL: https://issues.apache.org/jira/browse/DRILL-7341 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Oleg Zinoviev >Priority: Major > Attachments: stacktrace.log > > > There are several methods that modify the BaseDataValueVector#data field. > Some of them, such as BaseDataValueVector#exchange, do not change > allocationSizeInBytes. > Therefore, if BaseDataValueVector#exchange was executed for vectors with > different size, *ValueVector#reAlloc may create a buffer of insufficient size. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query
[ https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902481#comment-16902481 ] ASF GitHub Bot commented on DRILL-7222: --- kkhatua commented on pull request #1779: DRILL-7222: Visualize estimated and actual row counts for a query URL: https://github.com/apache/drill/pull/1779#discussion_r311766304 ## File path: exec/java-exec/src/main/resources/rest/profile/profile.ftl ## @@ -587,6 +622,49 @@ if (e.target.form) <#if model.isOnlyImpersonationEnabled()>doSubmitQueryWithUserName()<#else>doSubmitQueryWithAutoLimit(); }); + +// Convert scientific to Decimal [Ref: https://gist.github.com/jiggzson/b5f489af9ad931e3d186] +function scientificToDecimal(num) { + //if the number is in scientific notation remove it + if(/\d+\.?\d*e[\+\-]*\d+/i.test(num)) { +var zero = '0', +parts = String(num).toLowerCase().split('e'), //split into coeff and exponent +e = parts.pop(),//store the exponential part +l = Math.abs(e), //get the number of zeros +sign = e/l, +coeff_array = parts[0].split('.'); +if(sign === -1) { +coeff_array[0] = Math.abs(coeff_array[0]); +num = '-'+zero + '.' + new Array(l).join(zero) + coeff_array.join(''); +} +else { +var dec = coeff_array[1]; +if(dec) l = l - dec.length; +num = coeff_array.join('') + new Array(l+1).join(zero); +} + } + return num; +} + +// Extract estimated rowcount map +var opRowCountMap = {}; +// Get OpId-Rowocunt Map +function buildRowCountMap() { + var phyText = $('#query-physical').find('pre').text(); + var opLines = phyText.split("\n"); + for (var l in opLines) { +var line = opLines[l]; +if (line.trim().length > 0) { + var opId = line.match(/\d+-\d+/g)[0]; + var opRowCount = line.match(/rowcount = \S+/g)[0].split(' ')[2].replace(',','').trim(); + if (opRowCount.includes("E")) { +opRowCountMap[opId] = parseInt(scientificToDecimal(opRowCount)).toLocaleString('en'); Review comment: Tested with ##E18 using the Number() function. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Visualize estimated and actual row counts for a query > - > > Key: DRILL-7222 > URL: https://issues.apache.org/jira/browse/DRILL-7222 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting, user-experience > Fix For: 1.17.0 > > > With statistics in place, it would be useful to have the *estimated* rowcount > along side the *actual* rowcount query profile's operator overview. > We can extract this from the Physical Plan section of the profile. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query
[ https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902466#comment-16902466 ] ASF GitHub Bot commented on DRILL-7222: --- kkhatua commented on pull request #1779: DRILL-7222: Visualize estimated and actual row counts for a query URL: https://github.com/apache/drill/pull/1779#discussion_r311756831 ## File path: exec/java-exec/src/main/resources/rest/profile/profile.ftl ## @@ -587,6 +622,49 @@ if (e.target.form) <#if model.isOnlyImpersonationEnabled()>doSubmitQueryWithUserName()<#else>doSubmitQueryWithAutoLimit(); }); + +// Convert scientific to Decimal [Ref: https://gist.github.com/jiggzson/b5f489af9ad931e3d186] +function scientificToDecimal(num) { + //if the number is in scientific notation remove it + if(/\d+\.?\d*e[\+\-]*\d+/i.test(num)) { +var zero = '0', +parts = String(num).toLowerCase().split('e'), //split into coeff and exponent +e = parts.pop(),//store the exponential part +l = Math.abs(e), //get the number of zeros +sign = e/l, +coeff_array = parts[0].split('.'); +if(sign === -1) { +coeff_array[0] = Math.abs(coeff_array[0]); +num = '-'+zero + '.' + new Array(l).join(zero) + coeff_array.join(''); +} +else { +var dec = coeff_array[1]; +if(dec) l = l - dec.length; +num = coeff_array.join('') + new Array(l+1).join(zero); +} + } + return num; +} + +// Extract estimated rowcount map +var opRowCountMap = {}; +// Get OpId-Rowocunt Map +function buildRowCountMap() { + var phyText = $('#query-physical').find('pre').text(); + var opLines = phyText.split("\n"); + for (var l in opLines) { +var line = opLines[l]; +if (line.trim().length > 0) { + var opId = line.match(/\d+-\d+/g)[0]; + var opRowCount = line.match(/rowcount = \S+/g)[0].split(' ')[2].replace(',','').trim(); + if (opRowCount.includes("E")) { +opRowCountMap[opId] = parseInt(scientificToDecimal(opRowCount)).toLocaleString('en'); Review comment: Good point. I'll check for larger values and apply the parseLong() method instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Visualize estimated and actual row counts for a query > - > > Key: DRILL-7222 > URL: https://issues.apache.org/jira/browse/DRILL-7222 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting, user-experience > Fix For: 1.17.0 > > > With statistics in place, it would be useful to have the *estimated* rowcount > along side the *actual* rowcount query profile's operator overview. > We can extract this from the Physical Plan section of the profile. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query
[ https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902465#comment-16902465 ] ASF GitHub Bot commented on DRILL-7222: --- kkhatua commented on pull request #1779: DRILL-7222: Visualize estimated and actual row counts for a query URL: https://github.com/apache/drill/pull/1779#discussion_r311756570 ## File path: exec/java-exec/src/main/resources/rest/profile/profile.ftl ## @@ -587,6 +622,49 @@ if (e.target.form) <#if model.isOnlyImpersonationEnabled()>doSubmitQueryWithUserName()<#else>doSubmitQueryWithAutoLimit(); }); + +// Convert scientific to Decimal [Ref: https://gist.github.com/jiggzson/b5f489af9ad931e3d186] +function scientificToDecimal(num) { + //if the number is in scientific notation remove it + if(/\d+\.?\d*e[\+\-]*\d+/i.test(num)) { +var zero = '0', +parts = String(num).toLowerCase().split('e'), //split into coeff and exponent +e = parts.pop(),//store the exponential part +l = Math.abs(e), //get the number of zeros +sign = e/l, +coeff_array = parts[0].split('.'); +if(sign === -1) { +coeff_array[0] = Math.abs(coeff_array[0]); +num = '-'+zero + '.' + new Array(l).join(zero) + coeff_array.join(''); +} +else { +var dec = coeff_array[1]; +if(dec) l = l - dec.length; +num = coeff_array.join('') + new Array(l+1).join(zero); +} + } + return num; +} + +// Extract estimated rowcount map +var opRowCountMap = {}; +// Get OpId-Rowocunt Map +function buildRowCountMap() { + var phyText = $('#query-physical').find('pre').text(); + var opLines = phyText.split("\n"); + for (var l in opLines) { +var line = opLines[l]; +if (line.trim().length > 0) { + var opId = line.match(/\d+-\d+/g)[0]; + var opRowCount = line.match(/rowcount = \S+/g)[0].split(' ')[2].replace(',','').trim(); Review comment: I can't use this because the `rowcount` might be in the scientific format, resulting in a partial extraction. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Visualize estimated and actual row counts for a query > - > > Key: DRILL-7222 > URL: https://issues.apache.org/jira/browse/DRILL-7222 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting, user-experience > Fix For: 1.17.0 > > > With statistics in place, it would be useful to have the *estimated* rowcount > along side the *actual* rowcount query profile's operator overview. > We can extract this from the Physical Plan section of the profile. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query
[ https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902460#comment-16902460 ] ASF GitHub Bot commented on DRILL-7222: --- kkhatua commented on pull request #1779: DRILL-7222: Visualize estimated and actual row counts for a query URL: https://github.com/apache/drill/pull/1779#discussion_r311755362 ## File path: exec/java-exec/src/main/resources/rest/profile/profile.ftl ## @@ -587,6 +622,49 @@ if (e.target.form) <#if model.isOnlyImpersonationEnabled()>doSubmitQueryWithUserName()<#else>doSubmitQueryWithAutoLimit(); }); + +// Convert scientific to Decimal [Ref: https://gist.github.com/jiggzson/b5f489af9ad931e3d186] +function scientificToDecimal(num) { Review comment: I don't believe that works. The parseInt function will parse until the 'E' symbol. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Visualize estimated and actual row counts for a query > - > > Key: DRILL-7222 > URL: https://issues.apache.org/jira/browse/DRILL-7222 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.16.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting, user-experience > Fix For: 1.17.0 > > > With statistics in place, it would be useful to have the *estimated* rowcount > along side the *actual* rowcount query profile's operator overview. > We can extract this from the Physical Plan section of the profile. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (DRILL-7338) REST API calls to Drill fail due to insufficient heap memory
[ https://issues.apache.org/jira/browse/DRILL-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902449#comment-16902449 ] ASF GitHub Bot commented on DRILL-7338: --- kkhatua commented on issue #1837: DRILL-7338: REST API calls to Drill fail due to insufficient heap memory URL: https://github.com/apache/drill/pull/1837#issuecomment-519256074 @arina-ielchiieva applied the recommendations. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > REST API calls to Drill fail due to insufficient heap memory > > > Key: DRILL-7338 > URL: https://issues.apache.org/jira/browse/DRILL-7338 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.15.0 >Reporter: Aditya Allamraju >Assignee: Kunal Khatua >Priority: Major > Fix For: 1.17.0 > > > Drill queries that use REST API calls have started failing(given below) after > recent changes. > {code:java} > RESOURCE ERROR: There is not enough heap memory to run this query using the > web interface. > Please try a query with fewer columns or with a filter or limit condition to > limit the data returned. > You can also try an ODBC/JDBC client.{code} > They were running fine earlier as the ResultSet returned was just few rows. > These queries now fail for even very small resultSets( < 10rows). > Investigating the issue revealed that we introduced a check to limit the Heap > usage. > The Wrapper code from > *_exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/QueryWrapper.java_* > that throws this error, i see certain issues. It does seem we use a > threshold of *85%* of heap usage before throwing that warning and exiting the > query. > > {code:java} > public class QueryWrapper { > private static final org.slf4j.Logger logger = > org.slf4j.LoggerFactory.getLogger(QueryWrapper.class); > // Heap usage threshold/trigger to provide resiliency on web server for > queries submitted via HTTP > private static final double HEAP_MEMORY_FAILURE_THRESHOLD = 0.85; > ... > private static MemoryMXBean memMXBean = ManagementFactory.getMemoryMXBean(); > ... > // Wait until the query execution is complete or there is error submitting > the query > logger.debug("Wait until the query execution is complete or there is > error submitting the query"); > do { > try { > isComplete = webUserConnection.await(TimeUnit.SECONDS.toMillis(1)); > //periodically timeout 1 sec to check heap > } catch (InterruptedException e) {} > usagePercent = getHeapUsage(); > if (usagePercent > HEAP_MEMORY_FAILURE_THRESHOLD) { > nearlyOutOfHeapSpace = true; > } > } while (!isComplete && !nearlyOutOfHeapSpace); > {code} > By using above check, we unintentionally invited all those issues that happen > with Java’s Heap usage. JVM does try to make maximum usage of HEAP until > Minor or Major GC kicks in i.e GC kicks after there is no more space left in > heap(eden or young gen). > The workarounds i can think of in order to resolve this issue are: > # Remove this check altogether so we know why it is filling up Heap. > # Advise the users to stop using REST for querying data.(We did this > already). *But not all users may not be happy with this suggestion.* There > could be few dynamic applications(dashboard, monitoring etc). > # Make the threshold high enough so that GC kicks in much better. > If not above options, we have to tune the Heap sizes of drillbit. A quick fix > would be to increase the threshold from 85% to 100%(option-3 above). > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (DRILL-7338) REST API calls to Drill fail due to insufficient heap memory
[ https://issues.apache.org/jira/browse/DRILL-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902450#comment-16902450 ] ASF GitHub Bot commented on DRILL-7338: --- kkhatua commented on issue #1837: DRILL-7338: REST API calls to Drill fail due to insufficient heap memory URL: https://github.com/apache/drill/pull/1837#issuecomment-519256074 @arina-ielchiieva thanks for the review. I've applied the recommendations. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > REST API calls to Drill fail due to insufficient heap memory > > > Key: DRILL-7338 > URL: https://issues.apache.org/jira/browse/DRILL-7338 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.15.0 >Reporter: Aditya Allamraju >Assignee: Kunal Khatua >Priority: Major > Fix For: 1.17.0 > > > Drill queries that use REST API calls have started failing(given below) after > recent changes. > {code:java} > RESOURCE ERROR: There is not enough heap memory to run this query using the > web interface. > Please try a query with fewer columns or with a filter or limit condition to > limit the data returned. > You can also try an ODBC/JDBC client.{code} > They were running fine earlier as the ResultSet returned was just few rows. > These queries now fail for even very small resultSets( < 10rows). > Investigating the issue revealed that we introduced a check to limit the Heap > usage. > The Wrapper code from > *_exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/QueryWrapper.java_* > that throws this error, i see certain issues. It does seem we use a > threshold of *85%* of heap usage before throwing that warning and exiting the > query. > > {code:java} > public class QueryWrapper { > private static final org.slf4j.Logger logger = > org.slf4j.LoggerFactory.getLogger(QueryWrapper.class); > // Heap usage threshold/trigger to provide resiliency on web server for > queries submitted via HTTP > private static final double HEAP_MEMORY_FAILURE_THRESHOLD = 0.85; > ... > private static MemoryMXBean memMXBean = ManagementFactory.getMemoryMXBean(); > ... > // Wait until the query execution is complete or there is error submitting > the query > logger.debug("Wait until the query execution is complete or there is > error submitting the query"); > do { > try { > isComplete = webUserConnection.await(TimeUnit.SECONDS.toMillis(1)); > //periodically timeout 1 sec to check heap > } catch (InterruptedException e) {} > usagePercent = getHeapUsage(); > if (usagePercent > HEAP_MEMORY_FAILURE_THRESHOLD) { > nearlyOutOfHeapSpace = true; > } > } while (!isComplete && !nearlyOutOfHeapSpace); > {code} > By using above check, we unintentionally invited all those issues that happen > with Java’s Heap usage. JVM does try to make maximum usage of HEAP until > Minor or Major GC kicks in i.e GC kicks after there is no more space left in > heap(eden or young gen). > The workarounds i can think of in order to resolve this issue are: > # Remove this check altogether so we know why it is filling up Heap. > # Advise the users to stop using REST for querying data.(We did this > already). *But not all users may not be happy with this suggestion.* There > could be few dynamic applications(dashboard, monitoring etc). > # Make the threshold high enough so that GC kicks in much better. > If not above options, we have to tune the Heap sizes of drillbit. A quick fix > would be to increase the threshold from 85% to 100%(option-3 above). > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (DRILL-7338) REST API calls to Drill fail due to insufficient heap memory
[ https://issues.apache.org/jira/browse/DRILL-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902440#comment-16902440 ] ASF GitHub Bot commented on DRILL-7338: --- kkhatua commented on pull request #1837: DRILL-7338: REST API calls to Drill fail due to insufficient heap memory URL: https://github.com/apache/drill/pull/1837#discussion_r311743001 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/QueryWrapper.java ## @@ -44,7 +44,7 @@ public class QueryWrapper { private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(QueryWrapper.class); // Heap usage threshold/trigger to provide resiliency on web server for queries submitted via HTTP - private static final double HEAP_MEMORY_FAILURE_THRESHOLD = 0.85; + private double memoryFailureThreshold = 0.85; Review comment: +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > REST API calls to Drill fail due to insufficient heap memory > > > Key: DRILL-7338 > URL: https://issues.apache.org/jira/browse/DRILL-7338 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.15.0 >Reporter: Aditya Allamraju >Assignee: Kunal Khatua >Priority: Major > Fix For: 1.17.0 > > > Drill queries that use REST API calls have started failing(given below) after > recent changes. > {code:java} > RESOURCE ERROR: There is not enough heap memory to run this query using the > web interface. > Please try a query with fewer columns or with a filter or limit condition to > limit the data returned. > You can also try an ODBC/JDBC client.{code} > They were running fine earlier as the ResultSet returned was just few rows. > These queries now fail for even very small resultSets( < 10rows). > Investigating the issue revealed that we introduced a check to limit the Heap > usage. > The Wrapper code from > *_exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/QueryWrapper.java_* > that throws this error, i see certain issues. It does seem we use a > threshold of *85%* of heap usage before throwing that warning and exiting the > query. > > {code:java} > public class QueryWrapper { > private static final org.slf4j.Logger logger = > org.slf4j.LoggerFactory.getLogger(QueryWrapper.class); > // Heap usage threshold/trigger to provide resiliency on web server for > queries submitted via HTTP > private static final double HEAP_MEMORY_FAILURE_THRESHOLD = 0.85; > ... > private static MemoryMXBean memMXBean = ManagementFactory.getMemoryMXBean(); > ... > // Wait until the query execution is complete or there is error submitting > the query > logger.debug("Wait until the query execution is complete or there is > error submitting the query"); > do { > try { > isComplete = webUserConnection.await(TimeUnit.SECONDS.toMillis(1)); > //periodically timeout 1 sec to check heap > } catch (InterruptedException e) {} > usagePercent = getHeapUsage(); > if (usagePercent > HEAP_MEMORY_FAILURE_THRESHOLD) { > nearlyOutOfHeapSpace = true; > } > } while (!isComplete && !nearlyOutOfHeapSpace); > {code} > By using above check, we unintentionally invited all those issues that happen > with Java’s Heap usage. JVM does try to make maximum usage of HEAP until > Minor or Major GC kicks in i.e GC kicks after there is no more space left in > heap(eden or young gen). > The workarounds i can think of in order to resolve this issue are: > # Remove this check altogether so we know why it is filling up Heap. > # Advise the users to stop using REST for querying data.(We did this > already). *But not all users may not be happy with this suggestion.* There > could be few dynamic applications(dashboard, monitoring etc). > # Make the threshold high enough so that GC kicks in much better. > If not above options, we have to tune the Heap sizes of drillbit. A quick fix > would be to increase the threshold from 85% to 100%(option-3 above). > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (DRILL-7338) REST API calls to Drill fail due to insufficient heap memory
[ https://issues.apache.org/jira/browse/DRILL-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902441#comment-16902441 ] ASF GitHub Bot commented on DRILL-7338: --- kkhatua commented on pull request #1837: DRILL-7338: REST API calls to Drill fail due to insufficient heap memory URL: https://github.com/apache/drill/pull/1837#discussion_r311743001 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/QueryWrapper.java ## @@ -44,7 +44,7 @@ public class QueryWrapper { private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(QueryWrapper.class); // Heap usage threshold/trigger to provide resiliency on web server for queries submitted via HTTP - private static final double HEAP_MEMORY_FAILURE_THRESHOLD = 0.85; + private double memoryFailureThreshold = 0.85; Review comment: +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > REST API calls to Drill fail due to insufficient heap memory > > > Key: DRILL-7338 > URL: https://issues.apache.org/jira/browse/DRILL-7338 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.15.0 >Reporter: Aditya Allamraju >Assignee: Kunal Khatua >Priority: Major > Fix For: 1.17.0 > > > Drill queries that use REST API calls have started failing(given below) after > recent changes. > {code:java} > RESOURCE ERROR: There is not enough heap memory to run this query using the > web interface. > Please try a query with fewer columns or with a filter or limit condition to > limit the data returned. > You can also try an ODBC/JDBC client.{code} > They were running fine earlier as the ResultSet returned was just few rows. > These queries now fail for even very small resultSets( < 10rows). > Investigating the issue revealed that we introduced a check to limit the Heap > usage. > The Wrapper code from > *_exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/QueryWrapper.java_* > that throws this error, i see certain issues. It does seem we use a > threshold of *85%* of heap usage before throwing that warning and exiting the > query. > > {code:java} > public class QueryWrapper { > private static final org.slf4j.Logger logger = > org.slf4j.LoggerFactory.getLogger(QueryWrapper.class); > // Heap usage threshold/trigger to provide resiliency on web server for > queries submitted via HTTP > private static final double HEAP_MEMORY_FAILURE_THRESHOLD = 0.85; > ... > private static MemoryMXBean memMXBean = ManagementFactory.getMemoryMXBean(); > ... > // Wait until the query execution is complete or there is error submitting > the query > logger.debug("Wait until the query execution is complete or there is > error submitting the query"); > do { > try { > isComplete = webUserConnection.await(TimeUnit.SECONDS.toMillis(1)); > //periodically timeout 1 sec to check heap > } catch (InterruptedException e) {} > usagePercent = getHeapUsage(); > if (usagePercent > HEAP_MEMORY_FAILURE_THRESHOLD) { > nearlyOutOfHeapSpace = true; > } > } while (!isComplete && !nearlyOutOfHeapSpace); > {code} > By using above check, we unintentionally invited all those issues that happen > with Java’s Heap usage. JVM does try to make maximum usage of HEAP until > Minor or Major GC kicks in i.e GC kicks after there is no more space left in > heap(eden or young gen). > The workarounds i can think of in order to resolve this issue are: > # Remove this check altogether so we know why it is filling up Heap. > # Advise the users to stop using REST for querying data.(We did this > already). *But not all users may not be happy with this suggestion.* There > could be few dynamic applications(dashboard, monitoring etc). > # Make the threshold high enough so that GC kicks in much better. > If not above options, we have to tune the Heap sizes of drillbit. A quick fix > would be to increase the threshold from 85% to 100%(option-3 above). > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (DRILL-7341) Vector reAlloc may fails after exchange.
[ https://issues.apache.org/jira/browse/DRILL-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902200#comment-16902200 ] Paul Rogers edited comment on DRILL-7341 at 8/7/19 4:17 PM: As it turns out, vector accounting is very fragile. Vector sizes and counts are often wrong. Code seems to have been tweaked to work around these errors. They are tricky because there is no good way to visualize what is happening. Tried to fix some of the issues, but fixing one thing tends to break something else that depended on the wrong state. Created a "batch validator" to identify the issues, but can't check it in because it produces hundreds of errors. Glad you were able to find & fix one of the issues! was (Author: paul.rogers): As it turns out, vector accounting is very fragile. Vector sizes and counts are often wrong. Code seems to have been tweaed to work around these errors. They are tricky because there is no good way to visualize what is happening. Tried to fix some of the issues, but fixing one thing tends to break something else that depended on the wrong state. Created a "batch validator" to identify the issues, but can't check it in because it produces hundreds of errors. Glad you were able to find & fix one of the issues! > Vector reAlloc may fails after exchange. > > > Key: DRILL-7341 > URL: https://issues.apache.org/jira/browse/DRILL-7341 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Oleg Zinoviev >Priority: Major > Attachments: stacktrace.log > > > There are several methods that modify the BaseDataValueVector#data field. > Some of them, such as BaseDataValueVector#exchange, do not change > allocationSizeInBytes. > Therefore, if BaseDataValueVector#exchange was executed for vectors with > different size, *ValueVector#reAlloc may create a buffer of insufficient size. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (DRILL-7341) Vector reAlloc may fails after exchange.
[ https://issues.apache.org/jira/browse/DRILL-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902200#comment-16902200 ] Paul Rogers commented on DRILL-7341: As it turns out, vector accounting is very fragile. Vector sizes and counts are often wrong. Code seems to have been tweaed to work around these errors. They are tricky because there is no good way to visualize what is happening. Tried to fix some of the issues, but fixing one thing tends to break something else that depended on the wrong state. Created a "batch validator" to identify the issues, but can't check it in because it produces hundreds of errors. Glad you were able to find & fix one of the issues! > Vector reAlloc may fails after exchange. > > > Key: DRILL-7341 > URL: https://issues.apache.org/jira/browse/DRILL-7341 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Oleg Zinoviev >Priority: Major > Attachments: stacktrace.log > > > There are several methods that modify the BaseDataValueVector#data field. > Some of them, such as BaseDataValueVector#exchange, do not change > allocationSizeInBytes. > Therefore, if BaseDataValueVector#exchange was executed for vectors with > different size, *ValueVector#reAlloc may create a buffer of insufficient size. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files
[ https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902152#comment-16902152 ] ASF GitHub Bot commented on DRILL-7177: --- oleg-zinovev commented on issue #1749: DRILL-7177: Format Plugin for Excel Files URL: https://github.com/apache/drill/pull/1749#issuecomment-519141613 Since I have same plugin for internal use (https://github.com/idvp-project/drill-storage-excel): 1) XSSFWorkbook has a awful memory usage. http://apache-poi.1045710.n5.nabble.com/HSSF-and-XSSF-memory-usage-some-numbers-td4312784.html. Reading a 10-15 mb Xlsx file can easily lead to OutOfMemory. So we used com.monitorjbl:xlsx-streamer project. (No support for a formula evaluation) 2) Excel does not guarantee that the column value type remains static. IMHO, it is better to read everything as VARCHAR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Format Plugin for Excel Files > - > > Key: DRILL-7177 > URL: https://issues.apache.org/jira/browse/DRILL-7177 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > This pull request adds the functionality which enables Drill to query > Microsoft Excel files. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files
[ https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902145#comment-16902145 ] ASF GitHub Bot commented on DRILL-7177: --- oleg-zinovev commented on issue #1749: DRILL-7177: Format Plugin for Excel Files URL: https://github.com/apache/drill/pull/1749#issuecomment-519141613 Since I have same plugin for internal use (https://github.com/idvp-project/drill-storage-excel): 1) XSSFWorkbook has a awful memory usage. http://apache-poi.1045710.n5.nabble.com/HSSF-and-XSSF-memory-usage-some-numbers-td4312784.html. Reading a 10-15 mb Xlsx file can easily lead to OutOfMemory. 2) Excel does not guarantee that the column value type remains static. IMHO, it is better to read everything as lines. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Format Plugin for Excel Files > - > > Key: DRILL-7177 > URL: https://issues.apache.org/jira/browse/DRILL-7177 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > This pull request adds the functionality which enables Drill to query > Microsoft Excel files. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files
[ https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902146#comment-16902146 ] ASF GitHub Bot commented on DRILL-7177: --- oleg-zinovev commented on issue #1749: DRILL-7177: Format Plugin for Excel Files URL: https://github.com/apache/drill/pull/1749#issuecomment-519141613 Since I have same plugin for internal use (https://github.com/idvp-project/drill-storage-excel): 1) XSSFWorkbook has a awful memory usage. http://apache-poi.1045710.n5.nabble.com/HSSF-and-XSSF-memory-usage-some-numbers-td4312784.html. Reading a 10-15 mb Xlsx file can easily lead to OutOfMemory. 2) Excel does not guarantee that the column value type remains static. IMHO, it is better to read everything as VARCHAR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Format Plugin for Excel Files > - > > Key: DRILL-7177 > URL: https://issues.apache.org/jira/browse/DRILL-7177 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > This pull request adds the functionality which enables Drill to query > Microsoft Excel files. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (DRILL-7341) Vector reAlloc may fails after exchange.
[ https://issues.apache.org/jira/browse/DRILL-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902073#comment-16902073 ] ASF GitHub Bot commented on DRILL-7341: --- oleg-zinovev commented on pull request #1838: DRILL-7341: Vector reAlloc may fails after exchange URL: https://github.com/apache/drill/pull/1838 Fixes a relocate issues after direct assignment of Vector data field. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Vector reAlloc may fails after exchange. > > > Key: DRILL-7341 > URL: https://issues.apache.org/jira/browse/DRILL-7341 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Oleg Zinoviev >Priority: Major > Attachments: stacktrace.log > > > There are several methods that modify the BaseDataValueVector#data field. > Some of them, such as BaseDataValueVector#exchange, do not change > allocationSizeInBytes. > Therefore, if BaseDataValueVector#exchange was executed for vectors with > different size, *ValueVector#reAlloc may create a buffer of insufficient size. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (DRILL-7341) Vector reAlloc may fails after exchange.
[ https://issues.apache.org/jira/browse/DRILL-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleg Zinoviev updated DRILL-7341: - Summary: Vector reAlloc may fails after exchange. (was: Vector reAllocation may fails after exchange.) > Vector reAlloc may fails after exchange. > > > Key: DRILL-7341 > URL: https://issues.apache.org/jira/browse/DRILL-7341 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Oleg Zinoviev >Priority: Major > Attachments: stacktrace.log > > > There are several methods that modify the BaseDataValueVector#data field. > Some of them, such as BaseDataValueVector#exchange, do not change > allocationSizeInBytes. > Therefore, if BaseDataValueVector#exchange was executed for vectors with > different size, *ValueVector#reAlloc may create a buffer of insufficient size. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (DRILL-7341) Vector reAllocation may fails after exchange.
Oleg Zinoviev created DRILL-7341: Summary: Vector reAllocation may fails after exchange. Key: DRILL-7341 URL: https://issues.apache.org/jira/browse/DRILL-7341 Project: Apache Drill Issue Type: Bug Affects Versions: 1.16.0 Reporter: Oleg Zinoviev Attachments: stacktrace.log There are several methods that modify the BaseDataValueVector#data field. Some of them, such as BaseDataValueVector#exchange, do not change allocationSizeInBytes. Therefore, if BaseDataValueVector#exchange was executed for vectors with different size, *ValueVector#reAlloc may create a buffer of insufficient size. -- This message was sent by Atlassian JIRA (v7.6.14#76016)