Kris Mok created SPARK-26061:
--------------------------------

             Summary: Reduce the number of unused UnsafeRowWriters created in 
whole-stage codegen
                 Key: SPARK-26061
                 URL: https://issues.apache.org/jira/browse/SPARK-26061
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.0, 2.3.2, 2.3.1, 2.3.0
            Reporter: Kris Mok


Reduce the number of unused UnsafeRowWriters created in whole-stage generated 
code.
They come from the CodegenSupport.consume() calling prepareRowVar(), which uses 
GenerateUnsafeProjection.createCode() and registers an UnsafeRowWriter mutable 
state, regardless of whether or not the downstream (parent) operator will use 
the rowVar or not.
Even when the downstream doConsume function doesn't use the rowVar (i.e. 
doesn't put row.code as a part of this operator's codegen template), the 
registered UnsafeRowWriter stays there, which makes the init function of the 
generated code a bit bloated.

This ticket doesn't track the root issue, but makes it slightly less painful: 
when the doConsume function is split out, the prepareRowVar() function is 
called twice, so it's double the pain of unused UnsafeRowWriters. This fix 
simply moves the original call to prepareRowVar() down into the doConsume 
split/no-split branch so that we're back to just 1x the pain.

To fix the root issue, something that allows the CodegenSupport operators to 
indicate whether or not they're going to use the rowVar would be needed. That's 
a much more elaborate change so I'd like to just make a minor fix first.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to