[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2019-01-30 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756486#comment-16756486
 ] 

Robert Hou commented on DRILL-6709:
---

I have verified this.

> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-09-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608301#comment-16608301
 ] 

ASF GitHub Bot commented on DRILL-6709:
---

Ben-Zvi closed pull request #1444: DRILL-6709: Extended the batch stats utility 
to other operators
URL: https://github.com/apache/drill/pull/1444
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
index cda98fc7a51..cd24a4ca592 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
@@ -48,6 +48,7 @@
 import org.apache.drill.exec.testing.ControlsInjectorFactory;
 import org.apache.drill.exec.util.CallBack;
 import org.apache.drill.exec.util.record.RecordBatchStats;
+import org.apache.drill.exec.util.record.RecordBatchStats.RecordBatchIOType;
 import 
org.apache.drill.exec.util.record.RecordBatchStats.RecordBatchStatsContext;
 import org.apache.drill.exec.vector.AllocationHelper;
 import org.apache.drill.exec.vector.NullableVarCharVector;
@@ -302,7 +303,7 @@ private void logRecordBatchStats() {
   return; // NOOP
 }
 
-RecordBatchStats.logRecordBatchStats(getFQNForLogging(MAX_FQN_LENGTH), 
this, batchStatsContext);
+RecordBatchStats.logRecordBatchStats(RecordBatchIOType.OUTPUT, 
getFQNForLogging(MAX_FQN_LENGTH), this, batchStatsContext);
   }
 
   /** Might truncate the FQN if too long */
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
index 28f8263d1f3..9de9aae06b3 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
@@ -60,6 +60,8 @@
 import org.apache.drill.exec.record.VectorWrapper;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.util.record.RecordBatchStats;
+import org.apache.drill.exec.util.record.RecordBatchStats.RecordBatchIOType;
 import org.apache.drill.exec.vector.AllocationHelper;
 import org.apache.drill.exec.vector.FixedWidthVector;
 import org.apache.drill.exec.vector.ValueVector;
@@ -159,9 +161,7 @@ public void update(RecordBatch incomingRecordBatch) {
   }
 
   updateIncomingStats();
-  if (logger.isDebugEnabled()) {
-logger.debug("BATCH_STATS, incoming: {}", getRecordBatchSizer());
-  }
+  RecordBatchStats.logRecordBatchStats(RecordBatchIOType.INPUT, 
getRecordBatchSizer(), getRecordBatchStatsContext());
 }
   }
 
@@ -204,7 +204,9 @@ public HashAggBatch(HashAggregate popConfig, RecordBatch 
incoming, FragmentConte
 }
 
 hashAggMemoryManager = new HashAggMemoryManager(configuredBatchSize);
-logger.debug("BATCH_STATS, configured output batch size: {}", 
configuredBatchSize);
+
+  RecordBatchStats.logRecordBatchStats(getRecordBatchStatsContext(),
+"configured output batch size: %d", configuredBatchSize);
 
 columnMapping = CaseInsensitiveMap.newHashMap();
   }
@@ -474,15 +476,15 @@ private void updateStats() {
 stats.setLongStat(HashAggTemplate.Metric.AVG_OUTPUT_ROW_BYTES, 
hashAggMemoryManager.getAvgOutputRowWidth());
 stats.setLongStat(HashAggTemplate.Metric.OUTPUT_RECORD_COUNT, 
hashAggMemoryManager.getTotalOutputRecords());
 
-if (logger.isDebugEnabled()) {
-  logger.debug("BATCH_STATS, incoming aggregate: count : {}, avg bytes : 
{},  avg row bytes : {}, record count : {}",
-hashAggMemoryManager.getNumIncomingBatches(), 
hashAggMemoryManager.getAvgInputBatchSize(),
-hashAggMemoryManager.getAvgInputRowWidth(), 
hashAggMemoryManager.getTotalInputRecords());
+RecordBatchStats.logRecordBatchStats(getRecordBatchStatsContext(),
+  "incoming aggregate: count : %d, avg bytes : %d,  avg row bytes : %d, 
record count : %d",
+  hashAggMemoryManager.getNumIncomingBatches(), 
hashAggMemoryManager.getAvgInputBatchSize(),
+  hashAggMemoryManager.getAvgInputRowWidth(), 
hashAggMemoryManager.getTotalInputRecords());
 
-  logger.debug("BATCH_STATS, outgoing aggregate: count : {}, avg bytes : 
{},  avg row bytes : {}, record count : {}",
-hashAggMemoryManager.getNumOutgoingBatches(), 
hashAggMemoryManager.getAvgOutputBatchSize(),
-hashAggMemoryManager.getAvgOutputRowWidth(), 

[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-08-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594279#comment-16594279
 ] 

ASF GitHub Bot commented on DRILL-6709:
---

sachouche commented on issue #1444: DRILL-6709: Extended the batch stats 
utility to other operators
URL: https://github.com/apache/drill/pull/1444#issuecomment-416384384
 
 
   Thank you @bitblender and @ilooner for performing the review! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-08-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593873#comment-16593873
 ] 

ASF GitHub Bot commented on DRILL-6709:
---

sachouche commented on issue #1444: DRILL-6709: Extended the batch stats 
utility to other operators
URL: https://github.com/apache/drill/pull/1444#issuecomment-416276516
 
 
   @ilooner can you please review this PR?
   thanks!
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-08-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593077#comment-16593077
 ] 

ASF GitHub Bot commented on DRILL-6709:
---

sachouche commented on issue #1444: DRILL-6709: Extended the batch stats 
utility to other operators
URL: https://github.com/apache/drill/pull/1444#issuecomment-416082895
 
 
   @bitblender  thank you for your review! I have implemented all the feedback 
that you requested.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-08-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593075#comment-16593075
 ] 

ASF GitHub Bot commented on DRILL-6709:
---

sachouche commented on a change in pull request #1444: DRILL-6709: Extended the 
batch stats utility to other operators
URL: https://github.com/apache/drill/pull/1444#discussion_r212842963
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ##
 @@ -204,7 +204,12 @@ public HashAggBatch(HashAggregate popConfig, RecordBatch 
incoming, FragmentConte
 }
 
 hashAggMemoryManager = new HashAggMemoryManager(configuredBatchSize);
-logger.debug("BATCH_STATS, configured output batch size: {}", 
configuredBatchSize);
+
+if (isRecordBatchStatsLoggingEnabled()) {
 
 Review comment:
   I agree with your feedback; I was anyway going to make this improvement as I 
didn't like the extra condition. I added another method which takes a string 
format and variable arguments; modified most of the previous calls to use this 
method. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-08-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593073#comment-16593073
 ] 

ASF GitHub Bot commented on DRILL-6709:
---

sachouche commented on a change in pull request #1444: DRILL-6709: Extended the 
batch stats utility to other operators
URL: https://github.com/apache/drill/pull/1444#discussion_r212843079
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -134,25 +135,66 @@ private boolean 
isBatchStatsEnabledForOperator(FragmentContext context, Operator
 }
   }
 
+  /** Indicates whether a record batch is Input or Output */
+  public enum BatchIOType {
 
 Review comment:
   Renamed the enum to RecordBatchIOType; I want to include the IO keyword to 
avoid conflating this enum type with data type. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-08-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593072#comment-16593072
 ] 

ASF GitHub Bot commented on DRILL-6709:
---

sachouche commented on a change in pull request #1444: DRILL-6709: Extended the 
batch stats utility to other operators
URL: https://github.com/apache/drill/pull/1444#discussion_r212843653
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -279,4 +324,19 @@ private static void 
logBatchStatsMsg(RecordBatchStatsContext batchStatsContext,
 }
   }
 
+  private static String toString(BatchIOType ioType) {
+Preconditions.checkNotNull(ioType, "The record batch IO type cannot be 
null");
+
+switch (ioType) {
 
 Review comment:
   I have seen both types of indentation being used; I have implemented your 
suggestion.
   
   and yes, I am aware that Java enums are more powerful; for now didn't have a 
need to enhance it. I have a added a private variable IOTypeString in case 
caller wants to get the string representation. Didn't use this in the code as I 
wanted to avoid expensive string concatenation (especially now that the caller 
is not using the isStatsLoggingEnabled() check).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-08-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593074#comment-16593074
 ] 

ASF GitHub Bot commented on DRILL-6709:
---

sachouche commented on a change in pull request #1444: DRILL-6709: Extended the 
batch stats utility to other operators
URL: https://github.com/apache/drill/pull/1444#discussion_r212842996
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -134,25 +135,66 @@ private boolean 
isBatchStatsEnabledForOperator(FragmentContext context, Operator
 }
   }
 
+  /** Indicates whether a record batch is Input or Output */
+  public enum BatchIOType {
+IS_INPUT,
+IS_INPUT_RIGHT,
+IS_INPUT_LEFT,
+IS_OUTPUT,
+IS_PASSTHROUGH
+  }
 
 Review comment:
   Made the changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-08-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592440#comment-16592440
 ] 

ASF GitHub Bot commented on DRILL-6709:
---

bitblender commented on a change in pull request #1444: DRILL-6709: Extended 
the batch stats utility to other operators
URL: https://github.com/apache/drill/pull/1444#discussion_r212788123
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -279,4 +324,19 @@ private static void 
logBatchStatsMsg(RecordBatchStatsContext batchStatsContext,
 }
   }
 
+  private static String toString(BatchIOType ioType) {
+Preconditions.checkNotNull(ioType, "The record batch IO type cannot be 
null");
+
+switch (ioType) {
 
 Review comment:
   'case' is generally indented one level into the 'switch'. 
   
   Also, Java allows Enums to have constructors and methods that  can be used 
to associate a string with an enum value. This makes modification of enum names 
and attributes easier. See 
https://docs.oracle.com/javase/tutorial/java/javaOO/enum.html


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-08-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592439#comment-16592439
 ] 

ASF GitHub Bot commented on DRILL-6709:
---

bitblender commented on a change in pull request #1444: DRILL-6709: Extended 
the batch stats utility to other operators
URL: https://github.com/apache/drill/pull/1444#discussion_r212788029
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -134,25 +135,66 @@ private boolean 
isBatchStatsEnabledForOperator(FragmentContext context, Operator
 }
   }
 
+  /** Indicates whether a record batch is Input or Output */
+  public enum BatchIOType {
 
 Review comment:
   I feel "RecordBatchType" describes the enum better then "BatchIOType" 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-08-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592438#comment-16592438
 ] 

ASF GitHub Bot commented on DRILL-6709:
---

bitblender commented on a change in pull request #1444: DRILL-6709: Extended 
the batch stats utility to other operators
URL: https://github.com/apache/drill/pull/1444#discussion_r212699960
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -134,25 +135,66 @@ private boolean 
isBatchStatsEnabledForOperator(FragmentContext context, Operator
 }
   }
 
+  /** Indicates whether a record batch is Input or Output */
+  public enum BatchIOType {
+IS_INPUT,
+IS_INPUT_RIGHT,
+IS_INPUT_LEFT,
+IS_OUTPUT,
+IS_PASSTHROUGH
+  }
 
 Review comment:
   Please remove the 'IS_' prefix in the enum values. isXXX is typically used 
for functions that return a boolean value.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-08-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592437#comment-16592437
 ] 

ASF GitHub Bot commented on DRILL-6709:
---

bitblender commented on a change in pull request #1444: DRILL-6709: Extended 
the batch stats utility to other operators
URL: https://github.com/apache/drill/pull/1444#discussion_r212787951
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ##
 @@ -204,7 +204,12 @@ public HashAggBatch(HashAggregate popConfig, RecordBatch 
incoming, FragmentConte
 }
 
 hashAggMemoryManager = new HashAggMemoryManager(configuredBatchSize);
-logger.debug("BATCH_STATS, configured output batch size: {}", 
configuredBatchSize);
+
+if (isRecordBatchStatsLoggingEnabled()) {
 
 Review comment:
   This outer check is unnecessary because logRecordBatchStats does the same 
check. I guess you have it here just to avoid the cost of String.format(). 
Maybe we should add a var args version which does the string.format internally? 
When reading code, I personally like it when I can read as much of the real 
function logic in one page, large bodies of logging code make this harder.  I 
realize it is just 2 extra lines per logging location but it makes it more 
verbose. 
   
   That said, if you are using an outer check, it allows you to use the "+" 
string builder and not use String.format(). String.format() is prone to 
mismatch errors between the order in the format string and the order in the 
variable list.
   
   This is more of a style comment. I will be ok if you don't change it now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-08-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591851#comment-16591851
 ] 

ASF GitHub Bot commented on DRILL-6709:
---

sachouche commented on issue #1444: DRILL-6709: Extended the batch stats 
utility to other operators
URL: https://github.com/apache/drill/pull/1444#issuecomment-415803914
 
 
   @ilooner and @bitblender, can you please review this PR?
   
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-08-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591848#comment-16591848
 ] 

ASF GitHub Bot commented on DRILL-6709:
---

sachouche opened a new pull request #1444: DRILL-6709: Extended the batch stats 
utility to other operators
URL: https://github.com/apache/drill/pull/1444
 
 
   - Enhanced the Batch Stats utility to handle input / output operators as it 
was being used only by the Parquet reader
   - Modified all batch stats logging (from other operators which support batch 
sizing) to use the Batch Sizing utility 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)