[
https://issues.apache.org/jira/browse/BEAM-12754?focusedWorklogId=637548&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637548
]
ASF GitHub Bot logged work on BEAM-12754:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 12/Aug/21 21:11
Start Date: 12/Aug/21 21:11
Worklog Time Spent: 10m
Work Description: steveniemitz commented on a change in pull request
#15327:
URL: https://github.com/apache/beam/pull/15327#discussion_r688087440
##########
File path:
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java
##########
@@ -316,27 +318,43 @@ static void encodeDelegate(
// Encode the field count. This allows us to handle compatible schema
changes.
VAR_INT_CODER.encode(value.getFieldCount(), outputStream);
- // Encode a bitmap for the null fields to save having to encode a bunch
of nulls.
- NULL_LIST_CODER.encode(scanNullFields(value, hasNullableFields),
outputStream);
- for (int encodingPos = 0; encodingPos < value.getFieldCount();
++encodingPos) {
- @Nullable Object fieldValue =
value.getValue(encodingPosToIndex[encodingPos]);
- if (fieldValue != null) {
- coders[encodingPos].encode(fieldValue, outputStream);
+
+ if (hasNullableFields) {
+ // If the row has null fields, extract the values out once so that
both scanNullFields and
+ // the encoding can share it and avoid having to extract them twice.
+
+ List<Object> fieldValues = value.getValues();
+ // Encode a bitmap for the null fields to save having to encode a
bunch of nulls.
+ NULL_LIST_CODER.encode(scanNullFields(fieldValues), outputStream);
+ for (int encodingPos = 0; encodingPos < fieldValues.size();
++encodingPos) {
+ @Nullable Object fieldValue =
fieldValues.get(encodingPosToIndex[encodingPos]);
Review comment:
ugh well that's very confusing. I'll change this to copy into an array
instead then using `getValue`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 637548)
Time Spent: 0.5h (was: 20m)
> RowCoderGenerator calls getValue multiple times
> -----------------------------------------------
>
> Key: BEAM-12754
> URL: https://issues.apache.org/jira/browse/BEAM-12754
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Affects Versions: 2.31.0
> Reporter: Steve Niemitz
> Assignee: Steve Niemitz
> Priority: P2
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> RowCoderGenerator.encodeDelegate calls getValue for each field on a row
> twice, one to check if it is null in scanNullFields, and one to actually get
> the value to be encoded.
> If getValue is expensive (for example, it has to recursively adapt a type to
> a beam Row), this causes unneeded extra work.
> Instead we could call value.getValues to get all values once, then pass them
> to scanNullFields and re-use them when encoding the values.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)