[ 
https://issues.apache.org/jira/browse/SPARK-38542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mcdull_zhang updated SPARK-38542:
---------------------------------
    Description: 
At present, UnsafeHashedRelation does not write out numKeys during 
serialization, so the numKeys of UnsafeHashedRelation obtained by 
deserialization is equal to 0. The numFields of UnsafeRows returned by 
UnsafeHashedRelation.keys() are all 0, which can lead to missing or incorrect 
data.

 

For example, in SubqueryBroadcastExec, the HashedRelation.keys() function is 
called.
{code:java}
val broadcastRelation = child.executeBroadcast[HashedRelation]().value
val (iter, expr) = if (broadcastRelation.isInstanceOf[LongHashedRelation]) {
  (broadcastRelation.keys(), HashJoin.extractKeyExprAt(buildKeys, index))
} else {
  (broadcastRelation.keys(),
    BoundReference(index, buildKeys(index).dataType, buildKeys(index).nullable))
}{code}
 

 

 

 

  was:
At present, UnsafeHashedRelation does not write out numKeys during 
serialization, so the numKeys of UnsafeHashedRelation obtained by 
deserialization is equal to 0. The numFields of UnsafeRows returned by 
UnsafeHashedRelation.keys() are all 0, which can lead to missing or incorrect 
data.

 

For example, in SubqueryBroadcastExec, the HashedRelation.keys() function is 
called.
{code:java}
val broadcastRelation = child.executeBroadcast[HashedRelation]().value
val (iter, expr) = if (broadcastRelation.isInstanceOf[LongHashedRelation]) {
  (broadcastRelation.keys(), HashJoin.extractKeyExprAt(buildKeys, index))
} else {
  (broadcastRelation.keys(),
    BoundReference(index, buildKeys(index).dataType, buildKeys(index).nullable))
}{code}
 

 

 


> UnsafeHashedRelation should serialize numKeys out
> -------------------------------------------------
>
>                 Key: SPARK-38542
>                 URL: https://issues.apache.org/jira/browse/SPARK-38542
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: mcdull_zhang
>            Priority: Critical
>
> At present, UnsafeHashedRelation does not write out numKeys during 
> serialization, so the numKeys of UnsafeHashedRelation obtained by 
> deserialization is equal to 0. The numFields of UnsafeRows returned by 
> UnsafeHashedRelation.keys() are all 0, which can lead to missing or incorrect 
> data.
>  
> For example, in SubqueryBroadcastExec, the HashedRelation.keys() function is 
> called.
> {code:java}
> val broadcastRelation = child.executeBroadcast[HashedRelation]().value
> val (iter, expr) = if (broadcastRelation.isInstanceOf[LongHashedRelation]) {
>   (broadcastRelation.keys(), HashJoin.extractKeyExprAt(buildKeys, index))
> } else {
>   (broadcastRelation.keys(),
>     BoundReference(index, buildKeys(index).dataType, 
> buildKeys(index).nullable))
> }{code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to