[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902684#comment-16902684
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

kkhatua commented on issue #1779: DRILL-7222: Visualize estimated and actual 
row counts for a query
URL: https://github.com/apache/drill/pull/1779#issuecomment-519365759
 
 
   @agozhiy done most of the changes you've suggested and commented on the rest.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7341) Vector reAlloc may fails after exchange.

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902625#comment-16902625
 ] 

ASF GitHub Bot commented on DRILL-7341:
---

weijietong commented on pull request #1838: DRILL-7341: Vector reAlloc may 
fails after exchange
URL: https://github.com/apache/drill/pull/1838#discussion_r311841988
 
 

 ##
 File path: exec/vector/src/main/codegen/templates/FixedValueVectors.java
 ##
 @@ -210,10 +210,16 @@ public void reAlloc() {
 // a zero-length buffer. Instead, just allocate a 256 byte
 // buffer if we start at 0.
 
-final long newAllocationSize = allocationSizeInBytes == 0
+long newAllocationSize = allocationSizeInBytes == 0
 ? 256
 : allocationSizeInBytes * 2L;
 
+// Some operations, such as Value Vector#exchange, can be change DrillBuf 
data field without corresponding allocation size changes.
+// Check that the size of the allocation is sufficient to copy the old 
buffer.
+while (newAllocationSize < data.capacity()) {
 
 Review comment:
   please also change the following 232 line: `data.setZero(halfNewCapacity, 
halfNewCapacity)` logic which initializes the allocated fresh buffer content to 
zero.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Vector reAlloc may fails after exchange.
> 
>
> Key: DRILL-7341
> URL: https://issues.apache.org/jira/browse/DRILL-7341
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Oleg Zinoviev
>Priority: Major
> Attachments: stacktrace.log
>
>
> There are several methods that modify the BaseDataValueVector#data field. 
> Some of them, such as BaseDataValueVector#exchange, do not change 
> allocationSizeInBytes. 
> Therefore, if BaseDataValueVector#exchange was executed for vectors with 
> different size, *ValueVector#reAlloc may create a buffer of insufficient size.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902481#comment-16902481
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

kkhatua commented on pull request #1779: DRILL-7222: Visualize estimated and 
actual row counts for a query
URL: https://github.com/apache/drill/pull/1779#discussion_r311766304
 
 

 ##
 File path: exec/java-exec/src/main/resources/rest/profile/profile.ftl
 ##
 @@ -587,6 +622,49 @@
   if (e.target.form) 
 <#if 
model.isOnlyImpersonationEnabled()>doSubmitQueryWithUserName()<#else>doSubmitQueryWithAutoLimit();
 });
+
+// Convert scientific to Decimal [Ref: 
https://gist.github.com/jiggzson/b5f489af9ad931e3d186]
+function scientificToDecimal(num) {
+  //if the number is in scientific notation remove it
+  if(/\d+\.?\d*e[\+\-]*\d+/i.test(num)) {
+var zero = '0',
+parts = String(num).toLowerCase().split('e'), //split into coeff 
and exponent
+e = parts.pop(),//store the exponential part
+l = Math.abs(e), //get the number of zeros
+sign = e/l,
+coeff_array = parts[0].split('.');
+if(sign === -1) {
+coeff_array[0] = Math.abs(coeff_array[0]);
+num = '-'+zero + '.' + new Array(l).join(zero) + 
coeff_array.join('');
+}
+else {
+var dec = coeff_array[1];
+if(dec) l = l - dec.length;
+num = coeff_array.join('') + new Array(l+1).join(zero);
+}
+  }
+  return num;
+}
+
+// Extract estimated rowcount map
+var opRowCountMap = {};
+// Get OpId-Rowocunt Map
+function buildRowCountMap() {
+  var phyText = $('#query-physical').find('pre').text();
+  var opLines = phyText.split("\n");
+  for (var l in opLines) {
+var line = opLines[l];
+if (line.trim().length > 0) {
+  var opId = line.match(/\d+-\d+/g)[0];
+  var opRowCount = line.match(/rowcount = \S+/g)[0].split(' 
')[2].replace(',','').trim();
+  if (opRowCount.includes("E")) {
+opRowCountMap[opId] = 
parseInt(scientificToDecimal(opRowCount)).toLocaleString('en');
 
 Review comment:
   Tested with ##E18  using the Number() function.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902466#comment-16902466
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

kkhatua commented on pull request #1779: DRILL-7222: Visualize estimated and 
actual row counts for a query
URL: https://github.com/apache/drill/pull/1779#discussion_r311756831
 
 

 ##
 File path: exec/java-exec/src/main/resources/rest/profile/profile.ftl
 ##
 @@ -587,6 +622,49 @@
   if (e.target.form) 
 <#if 
model.isOnlyImpersonationEnabled()>doSubmitQueryWithUserName()<#else>doSubmitQueryWithAutoLimit();
 });
+
+// Convert scientific to Decimal [Ref: 
https://gist.github.com/jiggzson/b5f489af9ad931e3d186]
+function scientificToDecimal(num) {
+  //if the number is in scientific notation remove it
+  if(/\d+\.?\d*e[\+\-]*\d+/i.test(num)) {
+var zero = '0',
+parts = String(num).toLowerCase().split('e'), //split into coeff 
and exponent
+e = parts.pop(),//store the exponential part
+l = Math.abs(e), //get the number of zeros
+sign = e/l,
+coeff_array = parts[0].split('.');
+if(sign === -1) {
+coeff_array[0] = Math.abs(coeff_array[0]);
+num = '-'+zero + '.' + new Array(l).join(zero) + 
coeff_array.join('');
+}
+else {
+var dec = coeff_array[1];
+if(dec) l = l - dec.length;
+num = coeff_array.join('') + new Array(l+1).join(zero);
+}
+  }
+  return num;
+}
+
+// Extract estimated rowcount map
+var opRowCountMap = {};
+// Get OpId-Rowocunt Map
+function buildRowCountMap() {
+  var phyText = $('#query-physical').find('pre').text();
+  var opLines = phyText.split("\n");
+  for (var l in opLines) {
+var line = opLines[l];
+if (line.trim().length > 0) {
+  var opId = line.match(/\d+-\d+/g)[0];
+  var opRowCount = line.match(/rowcount = \S+/g)[0].split(' 
')[2].replace(',','').trim();
+  if (opRowCount.includes("E")) {
+opRowCountMap[opId] = 
parseInt(scientificToDecimal(opRowCount)).toLocaleString('en');
 
 Review comment:
   Good point. I'll check for larger values and apply the parseLong() method 
instead.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902465#comment-16902465
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

kkhatua commented on pull request #1779: DRILL-7222: Visualize estimated and 
actual row counts for a query
URL: https://github.com/apache/drill/pull/1779#discussion_r311756570
 
 

 ##
 File path: exec/java-exec/src/main/resources/rest/profile/profile.ftl
 ##
 @@ -587,6 +622,49 @@
   if (e.target.form) 
 <#if 
model.isOnlyImpersonationEnabled()>doSubmitQueryWithUserName()<#else>doSubmitQueryWithAutoLimit();
 });
+
+// Convert scientific to Decimal [Ref: 
https://gist.github.com/jiggzson/b5f489af9ad931e3d186]
+function scientificToDecimal(num) {
+  //if the number is in scientific notation remove it
+  if(/\d+\.?\d*e[\+\-]*\d+/i.test(num)) {
+var zero = '0',
+parts = String(num).toLowerCase().split('e'), //split into coeff 
and exponent
+e = parts.pop(),//store the exponential part
+l = Math.abs(e), //get the number of zeros
+sign = e/l,
+coeff_array = parts[0].split('.');
+if(sign === -1) {
+coeff_array[0] = Math.abs(coeff_array[0]);
+num = '-'+zero + '.' + new Array(l).join(zero) + 
coeff_array.join('');
+}
+else {
+var dec = coeff_array[1];
+if(dec) l = l - dec.length;
+num = coeff_array.join('') + new Array(l+1).join(zero);
+}
+  }
+  return num;
+}
+
+// Extract estimated rowcount map
+var opRowCountMap = {};
+// Get OpId-Rowocunt Map
+function buildRowCountMap() {
+  var phyText = $('#query-physical').find('pre').text();
+  var opLines = phyText.split("\n");
+  for (var l in opLines) {
+var line = opLines[l];
+if (line.trim().length > 0) {
+  var opId = line.match(/\d+-\d+/g)[0];
+  var opRowCount = line.match(/rowcount = \S+/g)[0].split(' 
')[2].replace(',','').trim();
 
 Review comment:
   I can't use this because the `rowcount` might be in the scientific format, 
resulting in a partial extraction.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902460#comment-16902460
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

kkhatua commented on pull request #1779: DRILL-7222: Visualize estimated and 
actual row counts for a query
URL: https://github.com/apache/drill/pull/1779#discussion_r311755362
 
 

 ##
 File path: exec/java-exec/src/main/resources/rest/profile/profile.ftl
 ##
 @@ -587,6 +622,49 @@
   if (e.target.form) 
 <#if 
model.isOnlyImpersonationEnabled()>doSubmitQueryWithUserName()<#else>doSubmitQueryWithAutoLimit();
 });
+
+// Convert scientific to Decimal [Ref: 
https://gist.github.com/jiggzson/b5f489af9ad931e3d186]
+function scientificToDecimal(num) {
 
 Review comment:
   I don't believe that works. The parseInt function will parse until the 'E' 
symbol. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7338) REST API calls to Drill fail due to insufficient heap memory

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902449#comment-16902449
 ] 

ASF GitHub Bot commented on DRILL-7338:
---

kkhatua commented on issue #1837: DRILL-7338: REST API calls to Drill fail due 
to insufficient heap memory
URL: https://github.com/apache/drill/pull/1837#issuecomment-519256074
 
 
   @arina-ielchiieva applied the recommendations.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> REST API calls to Drill fail due to insufficient heap memory
> 
>
> Key: DRILL-7338
> URL: https://issues.apache.org/jira/browse/DRILL-7338
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.15.0
>Reporter: Aditya Allamraju
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.17.0
>
>
> Drill queries that use REST API calls have started failing(given below) after 
> recent changes.
> {code:java}
> RESOURCE ERROR: There is not enough heap memory to run this query using the 
> web interface.
> Please try a query with fewer columns or with a filter or limit condition to 
> limit the data returned.
> You can also try an ODBC/JDBC client.{code}
> They were running fine earlier as the ResultSet returned was just few rows. 
> These queries now fail for even very small resultSets( < 10rows).
> Investigating the issue revealed that we introduced a check to limit the Heap 
> usage.
> The Wrapper code from 
> *_exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/QueryWrapper.java_*
>   that throws this error, i see certain issues. It does seem we use a 
> threshold of *85%* of heap usage before throwing that warning and exiting the 
> query.
>  
> {code:java}
> public class QueryWrapper {
>   private static final org.slf4j.Logger logger = 
> org.slf4j.LoggerFactory.getLogger(QueryWrapper.class);
>   // Heap usage threshold/trigger to provide resiliency on web server for 
> queries submitted via HTTP
>   private static final double HEAP_MEMORY_FAILURE_THRESHOLD = 0.85;
> ...
>   private static MemoryMXBean memMXBean = ManagementFactory.getMemoryMXBean();
> ...
>   // Wait until the query execution is complete or there is error submitting 
> the query
> logger.debug("Wait until the query execution is complete or there is 
> error submitting the query");
> do {
>   try {
> isComplete = webUserConnection.await(TimeUnit.SECONDS.toMillis(1)); 
> //periodically timeout 1 sec to check heap
>   } catch (InterruptedException e) {}
>   usagePercent = getHeapUsage();
>   if (usagePercent >  HEAP_MEMORY_FAILURE_THRESHOLD) {
> nearlyOutOfHeapSpace = true;
>   }
> } while (!isComplete && !nearlyOutOfHeapSpace);
> {code}
> By using above check, we unintentionally invited all those issues that happen 
> with Java’s Heap usage. JVM does try to make maximum usage of HEAP until 
> Minor or Major GC kicks in i.e GC kicks after there is no more space left in 
> heap(eden or young gen).
> The workarounds i can think of in order to resolve this issue are:
>  # Remove this check altogether so we know why it is filling up Heap.
>  # Advise the users to stop using REST for querying data.(We did this 
> already). *But not all users may not be happy with this suggestion.* There 
> could be few dynamic applications(dashboard, monitoring etc).
>  # Make the threshold high enough so that GC kicks in much better.
> If not above options, we have to tune the Heap sizes of drillbit. A quick fix 
> would be to increase the threshold from 85% to 100%(option-3 above).
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7338) REST API calls to Drill fail due to insufficient heap memory

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902450#comment-16902450
 ] 

ASF GitHub Bot commented on DRILL-7338:
---

kkhatua commented on issue #1837: DRILL-7338: REST API calls to Drill fail due 
to insufficient heap memory
URL: https://github.com/apache/drill/pull/1837#issuecomment-519256074
 
 
   @arina-ielchiieva thanks for the review. I've applied the recommendations.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> REST API calls to Drill fail due to insufficient heap memory
> 
>
> Key: DRILL-7338
> URL: https://issues.apache.org/jira/browse/DRILL-7338
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.15.0
>Reporter: Aditya Allamraju
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.17.0
>
>
> Drill queries that use REST API calls have started failing(given below) after 
> recent changes.
> {code:java}
> RESOURCE ERROR: There is not enough heap memory to run this query using the 
> web interface.
> Please try a query with fewer columns or with a filter or limit condition to 
> limit the data returned.
> You can also try an ODBC/JDBC client.{code}
> They were running fine earlier as the ResultSet returned was just few rows. 
> These queries now fail for even very small resultSets( < 10rows).
> Investigating the issue revealed that we introduced a check to limit the Heap 
> usage.
> The Wrapper code from 
> *_exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/QueryWrapper.java_*
>   that throws this error, i see certain issues. It does seem we use a 
> threshold of *85%* of heap usage before throwing that warning and exiting the 
> query.
>  
> {code:java}
> public class QueryWrapper {
>   private static final org.slf4j.Logger logger = 
> org.slf4j.LoggerFactory.getLogger(QueryWrapper.class);
>   // Heap usage threshold/trigger to provide resiliency on web server for 
> queries submitted via HTTP
>   private static final double HEAP_MEMORY_FAILURE_THRESHOLD = 0.85;
> ...
>   private static MemoryMXBean memMXBean = ManagementFactory.getMemoryMXBean();
> ...
>   // Wait until the query execution is complete or there is error submitting 
> the query
> logger.debug("Wait until the query execution is complete or there is 
> error submitting the query");
> do {
>   try {
> isComplete = webUserConnection.await(TimeUnit.SECONDS.toMillis(1)); 
> //periodically timeout 1 sec to check heap
>   } catch (InterruptedException e) {}
>   usagePercent = getHeapUsage();
>   if (usagePercent >  HEAP_MEMORY_FAILURE_THRESHOLD) {
> nearlyOutOfHeapSpace = true;
>   }
> } while (!isComplete && !nearlyOutOfHeapSpace);
> {code}
> By using above check, we unintentionally invited all those issues that happen 
> with Java’s Heap usage. JVM does try to make maximum usage of HEAP until 
> Minor or Major GC kicks in i.e GC kicks after there is no more space left in 
> heap(eden or young gen).
> The workarounds i can think of in order to resolve this issue are:
>  # Remove this check altogether so we know why it is filling up Heap.
>  # Advise the users to stop using REST for querying data.(We did this 
> already). *But not all users may not be happy with this suggestion.* There 
> could be few dynamic applications(dashboard, monitoring etc).
>  # Make the threshold high enough so that GC kicks in much better.
> If not above options, we have to tune the Heap sizes of drillbit. A quick fix 
> would be to increase the threshold from 85% to 100%(option-3 above).
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7338) REST API calls to Drill fail due to insufficient heap memory

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902440#comment-16902440
 ] 

ASF GitHub Bot commented on DRILL-7338:
---

kkhatua commented on pull request #1837: DRILL-7338: REST API calls to Drill 
fail due to insufficient heap memory
URL: https://github.com/apache/drill/pull/1837#discussion_r311743001
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/QueryWrapper.java
 ##
 @@ -44,7 +44,7 @@
 public class QueryWrapper {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(QueryWrapper.class);
   // Heap usage threshold/trigger to provide resiliency on web server for 
queries submitted via HTTP
-  private static final double HEAP_MEMORY_FAILURE_THRESHOLD = 0.85;
+  private double memoryFailureThreshold = 0.85;
 
 Review comment:
   +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> REST API calls to Drill fail due to insufficient heap memory
> 
>
> Key: DRILL-7338
> URL: https://issues.apache.org/jira/browse/DRILL-7338
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.15.0
>Reporter: Aditya Allamraju
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.17.0
>
>
> Drill queries that use REST API calls have started failing(given below) after 
> recent changes.
> {code:java}
> RESOURCE ERROR: There is not enough heap memory to run this query using the 
> web interface.
> Please try a query with fewer columns or with a filter or limit condition to 
> limit the data returned.
> You can also try an ODBC/JDBC client.{code}
> They were running fine earlier as the ResultSet returned was just few rows. 
> These queries now fail for even very small resultSets( < 10rows).
> Investigating the issue revealed that we introduced a check to limit the Heap 
> usage.
> The Wrapper code from 
> *_exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/QueryWrapper.java_*
>   that throws this error, i see certain issues. It does seem we use a 
> threshold of *85%* of heap usage before throwing that warning and exiting the 
> query.
>  
> {code:java}
> public class QueryWrapper {
>   private static final org.slf4j.Logger logger = 
> org.slf4j.LoggerFactory.getLogger(QueryWrapper.class);
>   // Heap usage threshold/trigger to provide resiliency on web server for 
> queries submitted via HTTP
>   private static final double HEAP_MEMORY_FAILURE_THRESHOLD = 0.85;
> ...
>   private static MemoryMXBean memMXBean = ManagementFactory.getMemoryMXBean();
> ...
>   // Wait until the query execution is complete or there is error submitting 
> the query
> logger.debug("Wait until the query execution is complete or there is 
> error submitting the query");
> do {
>   try {
> isComplete = webUserConnection.await(TimeUnit.SECONDS.toMillis(1)); 
> //periodically timeout 1 sec to check heap
>   } catch (InterruptedException e) {}
>   usagePercent = getHeapUsage();
>   if (usagePercent >  HEAP_MEMORY_FAILURE_THRESHOLD) {
> nearlyOutOfHeapSpace = true;
>   }
> } while (!isComplete && !nearlyOutOfHeapSpace);
> {code}
> By using above check, we unintentionally invited all those issues that happen 
> with Java’s Heap usage. JVM does try to make maximum usage of HEAP until 
> Minor or Major GC kicks in i.e GC kicks after there is no more space left in 
> heap(eden or young gen).
> The workarounds i can think of in order to resolve this issue are:
>  # Remove this check altogether so we know why it is filling up Heap.
>  # Advise the users to stop using REST for querying data.(We did this 
> already). *But not all users may not be happy with this suggestion.* There 
> could be few dynamic applications(dashboard, monitoring etc).
>  # Make the threshold high enough so that GC kicks in much better.
> If not above options, we have to tune the Heap sizes of drillbit. A quick fix 
> would be to increase the threshold from 85% to 100%(option-3 above).
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7338) REST API calls to Drill fail due to insufficient heap memory

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902441#comment-16902441
 ] 

ASF GitHub Bot commented on DRILL-7338:
---

kkhatua commented on pull request #1837: DRILL-7338: REST API calls to Drill 
fail due to insufficient heap memory
URL: https://github.com/apache/drill/pull/1837#discussion_r311743001
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/QueryWrapper.java
 ##
 @@ -44,7 +44,7 @@
 public class QueryWrapper {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(QueryWrapper.class);
   // Heap usage threshold/trigger to provide resiliency on web server for 
queries submitted via HTTP
-  private static final double HEAP_MEMORY_FAILURE_THRESHOLD = 0.85;
+  private double memoryFailureThreshold = 0.85;
 
 Review comment:
   +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> REST API calls to Drill fail due to insufficient heap memory
> 
>
> Key: DRILL-7338
> URL: https://issues.apache.org/jira/browse/DRILL-7338
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.15.0
>Reporter: Aditya Allamraju
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.17.0
>
>
> Drill queries that use REST API calls have started failing(given below) after 
> recent changes.
> {code:java}
> RESOURCE ERROR: There is not enough heap memory to run this query using the 
> web interface.
> Please try a query with fewer columns or with a filter or limit condition to 
> limit the data returned.
> You can also try an ODBC/JDBC client.{code}
> They were running fine earlier as the ResultSet returned was just few rows. 
> These queries now fail for even very small resultSets( < 10rows).
> Investigating the issue revealed that we introduced a check to limit the Heap 
> usage.
> The Wrapper code from 
> *_exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/QueryWrapper.java_*
>   that throws this error, i see certain issues. It does seem we use a 
> threshold of *85%* of heap usage before throwing that warning and exiting the 
> query.
>  
> {code:java}
> public class QueryWrapper {
>   private static final org.slf4j.Logger logger = 
> org.slf4j.LoggerFactory.getLogger(QueryWrapper.class);
>   // Heap usage threshold/trigger to provide resiliency on web server for 
> queries submitted via HTTP
>   private static final double HEAP_MEMORY_FAILURE_THRESHOLD = 0.85;
> ...
>   private static MemoryMXBean memMXBean = ManagementFactory.getMemoryMXBean();
> ...
>   // Wait until the query execution is complete or there is error submitting 
> the query
> logger.debug("Wait until the query execution is complete or there is 
> error submitting the query");
> do {
>   try {
> isComplete = webUserConnection.await(TimeUnit.SECONDS.toMillis(1)); 
> //periodically timeout 1 sec to check heap
>   } catch (InterruptedException e) {}
>   usagePercent = getHeapUsage();
>   if (usagePercent >  HEAP_MEMORY_FAILURE_THRESHOLD) {
> nearlyOutOfHeapSpace = true;
>   }
> } while (!isComplete && !nearlyOutOfHeapSpace);
> {code}
> By using above check, we unintentionally invited all those issues that happen 
> with Java’s Heap usage. JVM does try to make maximum usage of HEAP until 
> Minor or Major GC kicks in i.e GC kicks after there is no more space left in 
> heap(eden or young gen).
> The workarounds i can think of in order to resolve this issue are:
>  # Remove this check altogether so we know why it is filling up Heap.
>  # Advise the users to stop using REST for querying data.(We did this 
> already). *But not all users may not be happy with this suggestion.* There 
> could be few dynamic applications(dashboard, monitoring etc).
>  # Make the threshold high enough so that GC kicks in much better.
> If not above options, we have to tune the Heap sizes of drillbit. A quick fix 
> would be to increase the threshold from 85% to 100%(option-3 above).
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (DRILL-7341) Vector reAlloc may fails after exchange.

2019-08-07 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902200#comment-16902200
 ] 

Paul Rogers edited comment on DRILL-7341 at 8/7/19 4:17 PM:


As it turns out, vector accounting is very fragile. Vector sizes and counts are 
often wrong. Code seems to have been tweaked to work around these errors. They 
are tricky because there is no good way to visualize what is happening.

Tried to fix some of the issues, but fixing one thing tends to break something 
else that depended on the wrong state. Created a "batch validator" to identify 
the issues, but can't check it in because it produces hundreds of errors.

Glad you were able to find & fix one of the issues!


was (Author: paul.rogers):
As it turns out, vector accounting is very fragile. Vector sizes and counts are 
often wrong. Code seems to have been tweaed to work around these errors. They 
are tricky because there is no good way to visualize what is happening.

Tried to fix some of the issues, but fixing one thing tends to break something 
else that depended on the wrong state. Created a "batch validator" to identify 
the issues, but can't check it in because it produces hundreds of errors.

Glad you were able to find & fix one of the issues!

> Vector reAlloc may fails after exchange.
> 
>
> Key: DRILL-7341
> URL: https://issues.apache.org/jira/browse/DRILL-7341
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Oleg Zinoviev
>Priority: Major
> Attachments: stacktrace.log
>
>
> There are several methods that modify the BaseDataValueVector#data field. 
> Some of them, such as BaseDataValueVector#exchange, do not change 
> allocationSizeInBytes. 
> Therefore, if BaseDataValueVector#exchange was executed for vectors with 
> different size, *ValueVector#reAlloc may create a buffer of insufficient size.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7341) Vector reAlloc may fails after exchange.

2019-08-07 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902200#comment-16902200
 ] 

Paul Rogers commented on DRILL-7341:


As it turns out, vector accounting is very fragile. Vector sizes and counts are 
often wrong. Code seems to have been tweaed to work around these errors. They 
are tricky because there is no good way to visualize what is happening.

Tried to fix some of the issues, but fixing one thing tends to break something 
else that depended on the wrong state. Created a "batch validator" to identify 
the issues, but can't check it in because it produces hundreds of errors.

Glad you were able to find & fix one of the issues!

> Vector reAlloc may fails after exchange.
> 
>
> Key: DRILL-7341
> URL: https://issues.apache.org/jira/browse/DRILL-7341
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Oleg Zinoviev
>Priority: Major
> Attachments: stacktrace.log
>
>
> There are several methods that modify the BaseDataValueVector#data field. 
> Some of them, such as BaseDataValueVector#exchange, do not change 
> allocationSizeInBytes. 
> Therefore, if BaseDataValueVector#exchange was executed for vectors with 
> different size, *ValueVector#reAlloc may create a buffer of insufficient size.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902152#comment-16902152
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

oleg-zinovev commented on issue #1749: DRILL-7177: Format Plugin for Excel Files
URL: https://github.com/apache/drill/pull/1749#issuecomment-519141613
 
 
   Since I have same plugin for internal use 
(https://github.com/idvp-project/drill-storage-excel):
   1) XSSFWorkbook has a awful memory usage. 
http://apache-poi.1045710.n5.nabble.com/HSSF-and-XSSF-memory-usage-some-numbers-td4312784.html.
 Reading a 10-15 mb Xlsx file can easily lead to OutOfMemory. So we used 
com.monitorjbl:xlsx-streamer project. (No support for a formula evaluation)
   2) Excel does not guarantee that the column value type remains static. IMHO, 
it is better to read everything as VARCHAR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902145#comment-16902145
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

oleg-zinovev commented on issue #1749: DRILL-7177: Format Plugin for Excel Files
URL: https://github.com/apache/drill/pull/1749#issuecomment-519141613
 
 
   Since I have same plugin for internal use 
(https://github.com/idvp-project/drill-storage-excel):
   1) XSSFWorkbook has a awful memory usage. 
http://apache-poi.1045710.n5.nabble.com/HSSF-and-XSSF-memory-usage-some-numbers-td4312784.html.
 Reading a 10-15 mb Xlsx file can easily lead to OutOfMemory. 
   2) Excel does not guarantee that the column value type remains static. IMHO, 
it is better to read everything as lines.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902146#comment-16902146
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

oleg-zinovev commented on issue #1749: DRILL-7177: Format Plugin for Excel Files
URL: https://github.com/apache/drill/pull/1749#issuecomment-519141613
 
 
   Since I have same plugin for internal use 
(https://github.com/idvp-project/drill-storage-excel):
   1) XSSFWorkbook has a awful memory usage. 
http://apache-poi.1045710.n5.nabble.com/HSSF-and-XSSF-memory-usage-some-numbers-td4312784.html.
 Reading a 10-15 mb Xlsx file can easily lead to OutOfMemory. 
   2) Excel does not guarantee that the column value type remains static. IMHO, 
it is better to read everything as VARCHAR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7341) Vector reAlloc may fails after exchange.

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902073#comment-16902073
 ] 

ASF GitHub Bot commented on DRILL-7341:
---

oleg-zinovev commented on pull request #1838: DRILL-7341: Vector reAlloc may 
fails after exchange
URL: https://github.com/apache/drill/pull/1838
 
 
   Fixes a relocate issues after direct  assignment of Vector data field.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Vector reAlloc may fails after exchange.
> 
>
> Key: DRILL-7341
> URL: https://issues.apache.org/jira/browse/DRILL-7341
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Oleg Zinoviev
>Priority: Major
> Attachments: stacktrace.log
>
>
> There are several methods that modify the BaseDataValueVector#data field. 
> Some of them, such as BaseDataValueVector#exchange, do not change 
> allocationSizeInBytes. 
> Therefore, if BaseDataValueVector#exchange was executed for vectors with 
> different size, *ValueVector#reAlloc may create a buffer of insufficient size.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (DRILL-7341) Vector reAlloc may fails after exchange.

2019-08-07 Thread Oleg Zinoviev (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Zinoviev updated DRILL-7341:
-
Summary: Vector reAlloc may fails after exchange.  (was: Vector 
reAllocation may fails after exchange.)

> Vector reAlloc may fails after exchange.
> 
>
> Key: DRILL-7341
> URL: https://issues.apache.org/jira/browse/DRILL-7341
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Oleg Zinoviev
>Priority: Major
> Attachments: stacktrace.log
>
>
> There are several methods that modify the BaseDataValueVector#data field. 
> Some of them, such as BaseDataValueVector#exchange, do not change 
> allocationSizeInBytes. 
> Therefore, if BaseDataValueVector#exchange was executed for vectors with 
> different size, *ValueVector#reAlloc may create a buffer of insufficient size.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (DRILL-7341) Vector reAllocation may fails after exchange.

2019-08-07 Thread Oleg Zinoviev (JIRA)
Oleg Zinoviev created DRILL-7341:


 Summary: Vector reAllocation may fails after exchange.
 Key: DRILL-7341
 URL: https://issues.apache.org/jira/browse/DRILL-7341
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.16.0
Reporter: Oleg Zinoviev
 Attachments: stacktrace.log

There are several methods that modify the BaseDataValueVector#data field. 
Some of them, such as BaseDataValueVector#exchange, do not change 
allocationSizeInBytes. 
Therefore, if BaseDataValueVector#exchange was executed for vectors with 
different size, *ValueVector#reAlloc may create a buffer of insufficient size.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)