zkaoudi commented on issue #423:
URL: 
https://github.com/apache/incubator-wayang/issues/423#issuecomment-2034387564

   Hi again,
   
   I would suggest two things to check:
   
   1) The type of join that Spark SQL uses. Wayang's current join operator maps 
to the corresponding join in RDDs, which if I'm not mistaken is implemented as 
as hash join. Maybe Spark SQL uses a broadcast join and thus, the difference in 
the data transferred?
   
   2) I'm not very familiar with the views in Spark, but when one registers the 
temporary views are they materialized in memory? If so, the timer you have 
would measure data accessed via memory. But again not sure how the temp views 
in Spark work. Maybe you could time the registerviews method to check this out.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to