[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756486#comment-16756486 ] Robert Hou commented on DRILL-6709: --- I have verified this. > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608301#comment-16608301 ] ASF GitHub Bot commented on DRILL-6709: --- Ben-Zvi closed pull request #1444: DRILL-6709: Extended the batch stats utility to other operators URL: https://github.com/apache/drill/pull/1444 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java index cda98fc7a51..cd24a4ca592 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java @@ -48,6 +48,7 @@ import org.apache.drill.exec.testing.ControlsInjectorFactory; import org.apache.drill.exec.util.CallBack; import org.apache.drill.exec.util.record.RecordBatchStats; +import org.apache.drill.exec.util.record.RecordBatchStats.RecordBatchIOType; import org.apache.drill.exec.util.record.RecordBatchStats.RecordBatchStatsContext; import org.apache.drill.exec.vector.AllocationHelper; import org.apache.drill.exec.vector.NullableVarCharVector; @@ -302,7 +303,7 @@ private void logRecordBatchStats() { return; // NOOP } -RecordBatchStats.logRecordBatchStats(getFQNForLogging(MAX_FQN_LENGTH), this, batchStatsContext); +RecordBatchStats.logRecordBatchStats(RecordBatchIOType.OUTPUT, getFQNForLogging(MAX_FQN_LENGTH), this, batchStatsContext); } /** Might truncate the FQN if too long */ diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java index 28f8263d1f3..9de9aae06b3 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java @@ -60,6 +60,8 @@ import org.apache.drill.exec.record.VectorWrapper; import org.apache.drill.exec.record.selection.SelectionVector2; import org.apache.drill.exec.record.selection.SelectionVector4; +import org.apache.drill.exec.util.record.RecordBatchStats; +import org.apache.drill.exec.util.record.RecordBatchStats.RecordBatchIOType; import org.apache.drill.exec.vector.AllocationHelper; import org.apache.drill.exec.vector.FixedWidthVector; import org.apache.drill.exec.vector.ValueVector; @@ -159,9 +161,7 @@ public void update(RecordBatch incomingRecordBatch) { } updateIncomingStats(); - if (logger.isDebugEnabled()) { -logger.debug("BATCH_STATS, incoming: {}", getRecordBatchSizer()); - } + RecordBatchStats.logRecordBatchStats(RecordBatchIOType.INPUT, getRecordBatchSizer(), getRecordBatchStatsContext()); } } @@ -204,7 +204,9 @@ public HashAggBatch(HashAggregate popConfig, RecordBatch incoming, FragmentConte } hashAggMemoryManager = new HashAggMemoryManager(configuredBatchSize); -logger.debug("BATCH_STATS, configured output batch size: {}", configuredBatchSize); + + RecordBatchStats.logRecordBatchStats(getRecordBatchStatsContext(), +"configured output batch size: %d", configuredBatchSize); columnMapping = CaseInsensitiveMap.newHashMap(); } @@ -474,15 +476,15 @@ private void updateStats() { stats.setLongStat(HashAggTemplate.Metric.AVG_OUTPUT_ROW_BYTES, hashAggMemoryManager.getAvgOutputRowWidth()); stats.setLongStat(HashAggTemplate.Metric.OUTPUT_RECORD_COUNT, hashAggMemoryManager.getTotalOutputRecords()); -if (logger.isDebugEnabled()) { - logger.debug("BATCH_STATS, incoming aggregate: count : {}, avg bytes : {}, avg row bytes : {}, record count : {}", -hashAggMemoryManager.getNumIncomingBatches(), hashAggMemoryManager.getAvgInputBatchSize(), -hashAggMemoryManager.getAvgInputRowWidth(), hashAggMemoryManager.getTotalInputRecords()); +RecordBatchStats.logRecordBatchStats(getRecordBatchStatsContext(), + "incoming aggregate: count : %d, avg bytes : %d, avg row bytes : %d, record count : %d", + hashAggMemoryManager.getNumIncomingBatches(), hashAggMemoryManager.getAvgInputBatchSize(), + hashAggMemoryManager.getAvgInputRowWidth(), hashAggMemoryManager.getTotalInputRecords()); - logger.debug("BATCH_STATS, outgoing aggregate: count : {}, avg bytes : {}, avg row bytes : {}, record count : {}", -hashAggMemoryManager.getNumOutgoingBatches(), hashAggMemoryManager.getAvgOutputBatchSize(), -hashAggMemoryManager.getAvgOutputRowWidth(),
[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594279#comment-16594279 ] ASF GitHub Bot commented on DRILL-6709: --- sachouche commented on issue #1444: DRILL-6709: Extended the batch stats utility to other operators URL: https://github.com/apache/drill/pull/1444#issuecomment-416384384 Thank you @bitblender and @ilooner for performing the review! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593873#comment-16593873 ] ASF GitHub Bot commented on DRILL-6709: --- sachouche commented on issue #1444: DRILL-6709: Extended the batch stats utility to other operators URL: https://github.com/apache/drill/pull/1444#issuecomment-416276516 @ilooner can you please review this PR? thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593077#comment-16593077 ] ASF GitHub Bot commented on DRILL-6709: --- sachouche commented on issue #1444: DRILL-6709: Extended the batch stats utility to other operators URL: https://github.com/apache/drill/pull/1444#issuecomment-416082895 @bitblender thank you for your review! I have implemented all the feedback that you requested. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593075#comment-16593075 ] ASF GitHub Bot commented on DRILL-6709: --- sachouche commented on a change in pull request #1444: DRILL-6709: Extended the batch stats utility to other operators URL: https://github.com/apache/drill/pull/1444#discussion_r212842963 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java ## @@ -204,7 +204,12 @@ public HashAggBatch(HashAggregate popConfig, RecordBatch incoming, FragmentConte } hashAggMemoryManager = new HashAggMemoryManager(configuredBatchSize); -logger.debug("BATCH_STATS, configured output batch size: {}", configuredBatchSize); + +if (isRecordBatchStatsLoggingEnabled()) { Review comment: I agree with your feedback; I was anyway going to make this improvement as I didn't like the extra condition. I added another method which takes a string format and variable arguments; modified most of the previous calls to use this method. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593073#comment-16593073 ] ASF GitHub Bot commented on DRILL-6709: --- sachouche commented on a change in pull request #1444: DRILL-6709: Extended the batch stats utility to other operators URL: https://github.com/apache/drill/pull/1444#discussion_r212843079 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java ## @@ -134,25 +135,66 @@ private boolean isBatchStatsEnabledForOperator(FragmentContext context, Operator } } + /** Indicates whether a record batch is Input or Output */ + public enum BatchIOType { Review comment: Renamed the enum to RecordBatchIOType; I want to include the IO keyword to avoid conflating this enum type with data type. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593072#comment-16593072 ] ASF GitHub Bot commented on DRILL-6709: --- sachouche commented on a change in pull request #1444: DRILL-6709: Extended the batch stats utility to other operators URL: https://github.com/apache/drill/pull/1444#discussion_r212843653 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java ## @@ -279,4 +324,19 @@ private static void logBatchStatsMsg(RecordBatchStatsContext batchStatsContext, } } + private static String toString(BatchIOType ioType) { +Preconditions.checkNotNull(ioType, "The record batch IO type cannot be null"); + +switch (ioType) { Review comment: I have seen both types of indentation being used; I have implemented your suggestion. and yes, I am aware that Java enums are more powerful; for now didn't have a need to enhance it. I have a added a private variable IOTypeString in case caller wants to get the string representation. Didn't use this in the code as I wanted to avoid expensive string concatenation (especially now that the caller is not using the isStatsLoggingEnabled() check). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593074#comment-16593074 ] ASF GitHub Bot commented on DRILL-6709: --- sachouche commented on a change in pull request #1444: DRILL-6709: Extended the batch stats utility to other operators URL: https://github.com/apache/drill/pull/1444#discussion_r212842996 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java ## @@ -134,25 +135,66 @@ private boolean isBatchStatsEnabledForOperator(FragmentContext context, Operator } } + /** Indicates whether a record batch is Input or Output */ + public enum BatchIOType { +IS_INPUT, +IS_INPUT_RIGHT, +IS_INPUT_LEFT, +IS_OUTPUT, +IS_PASSTHROUGH + } Review comment: Made the changes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592440#comment-16592440 ] ASF GitHub Bot commented on DRILL-6709: --- bitblender commented on a change in pull request #1444: DRILL-6709: Extended the batch stats utility to other operators URL: https://github.com/apache/drill/pull/1444#discussion_r212788123 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java ## @@ -279,4 +324,19 @@ private static void logBatchStatsMsg(RecordBatchStatsContext batchStatsContext, } } + private static String toString(BatchIOType ioType) { +Preconditions.checkNotNull(ioType, "The record batch IO type cannot be null"); + +switch (ioType) { Review comment: 'case' is generally indented one level into the 'switch'. Also, Java allows Enums to have constructors and methods that can be used to associate a string with an enum value. This makes modification of enum names and attributes easier. See https://docs.oracle.com/javase/tutorial/java/javaOO/enum.html This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592439#comment-16592439 ] ASF GitHub Bot commented on DRILL-6709: --- bitblender commented on a change in pull request #1444: DRILL-6709: Extended the batch stats utility to other operators URL: https://github.com/apache/drill/pull/1444#discussion_r212788029 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java ## @@ -134,25 +135,66 @@ private boolean isBatchStatsEnabledForOperator(FragmentContext context, Operator } } + /** Indicates whether a record batch is Input or Output */ + public enum BatchIOType { Review comment: I feel "RecordBatchType" describes the enum better then "BatchIOType" This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592438#comment-16592438 ] ASF GitHub Bot commented on DRILL-6709: --- bitblender commented on a change in pull request #1444: DRILL-6709: Extended the batch stats utility to other operators URL: https://github.com/apache/drill/pull/1444#discussion_r212699960 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java ## @@ -134,25 +135,66 @@ private boolean isBatchStatsEnabledForOperator(FragmentContext context, Operator } } + /** Indicates whether a record batch is Input or Output */ + public enum BatchIOType { +IS_INPUT, +IS_INPUT_RIGHT, +IS_INPUT_LEFT, +IS_OUTPUT, +IS_PASSTHROUGH + } Review comment: Please remove the 'IS_' prefix in the enum values. isXXX is typically used for functions that return a boolean value. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592437#comment-16592437 ] ASF GitHub Bot commented on DRILL-6709: --- bitblender commented on a change in pull request #1444: DRILL-6709: Extended the batch stats utility to other operators URL: https://github.com/apache/drill/pull/1444#discussion_r212787951 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java ## @@ -204,7 +204,12 @@ public HashAggBatch(HashAggregate popConfig, RecordBatch incoming, FragmentConte } hashAggMemoryManager = new HashAggMemoryManager(configuredBatchSize); -logger.debug("BATCH_STATS, configured output batch size: {}", configuredBatchSize); + +if (isRecordBatchStatsLoggingEnabled()) { Review comment: This outer check is unnecessary because logRecordBatchStats does the same check. I guess you have it here just to avoid the cost of String.format(). Maybe we should add a var args version which does the string.format internally? When reading code, I personally like it when I can read as much of the real function logic in one page, large bodies of logging code make this harder. I realize it is just 2 extra lines per logging location but it makes it more verbose. That said, if you are using an outer check, it allows you to use the "+" string builder and not use String.format(). String.format() is prone to mismatch errors between the order in the format string and the order in the variable list. This is more of a style comment. I will be ok if you don't change it now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591851#comment-16591851 ] ASF GitHub Bot commented on DRILL-6709: --- sachouche commented on issue #1444: DRILL-6709: Extended the batch stats utility to other operators URL: https://github.com/apache/drill/pull/1444#issuecomment-415803914 @ilooner and @bitblender, can you please review this PR? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591848#comment-16591848 ] ASF GitHub Bot commented on DRILL-6709: --- sachouche opened a new pull request #1444: DRILL-6709: Extended the batch stats utility to other operators URL: https://github.com/apache/drill/pull/1444 - Enhanced the Batch Stats utility to handle input / output operators as it was being used only by the Parquet reader - Modified all batch stats logging (from other operators which support batch sizing) to use the Batch Sizing utility This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)