[ https://issues.apache.org/jira/browse/SPARK-38542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
mcdull_zhang updated SPARK-38542: --------------------------------- Description: At present, UnsafeHashedRelation does not write out numKeys during serialization, so the numKeys of UnsafeHashedRelation obtained by deserialization is equal to 0. The numFields of UnsafeRows returned by UnsafeHashedRelation.keys() are all 0, which can lead to missing or incorrect data. For example, in SubqueryBroadcastExec, the HashedRelation.keys() function is called. {code:java} val broadcastRelation = child.executeBroadcast[HashedRelation]().value val (iter, expr) = if (broadcastRelation.isInstanceOf[LongHashedRelation]) { (broadcastRelation.keys(), HashJoin.extractKeyExprAt(buildKeys, index)) } else { (broadcastRelation.keys(), BoundReference(index, buildKeys(index).dataType, buildKeys(index).nullable)) }{code} was: At present, UnsafeHashedRelation does not write out numKeys during serialization, so the numKeys of UnsafeHashedRelation obtained by deserialization is equal to 0. The numFields of UnsafeRows returned by UnsafeHashedRelation.keys() are all 0, which can lead to missing or incorrect data. For example, in SubqueryBroadcastExec, the HashedRelation.keys() function is called. {code:java} val broadcastRelation = child.executeBroadcast[HashedRelation]().value val (iter, expr) = if (broadcastRelation.isInstanceOf[LongHashedRelation]) { (broadcastRelation.keys(), HashJoin.extractKeyExprAt(buildKeys, index)) } else { (broadcastRelation.keys(), BoundReference(index, buildKeys(index).dataType, buildKeys(index).nullable)) }{code} > UnsafeHashedRelation should serialize numKeys out > ------------------------------------------------- > > Key: SPARK-38542 > URL: https://issues.apache.org/jira/browse/SPARK-38542 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.0 > Reporter: mcdull_zhang > Priority: Critical > > At present, UnsafeHashedRelation does not write out numKeys during > serialization, so the numKeys of UnsafeHashedRelation obtained by > deserialization is equal to 0. The numFields of UnsafeRows returned by > UnsafeHashedRelation.keys() are all 0, which can lead to missing or incorrect > data. > > For example, in SubqueryBroadcastExec, the HashedRelation.keys() function is > called. > {code:java} > val broadcastRelation = child.executeBroadcast[HashedRelation]().value > val (iter, expr) = if (broadcastRelation.isInstanceOf[LongHashedRelation]) { > (broadcastRelation.keys(), HashJoin.extractKeyExprAt(buildKeys, index)) > } else { > (broadcastRelation.keys(), > BoundReference(index, buildKeys(index).dataType, > buildKeys(index).nullable)) > }{code} > > > > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org