Re: [PR] [#1750] feat(remote merge): Support Spark. [uniffle]

via GitHub Tue, 18 Mar 2025 08:24:07 -0700


zhengchenyu commented on PR #2405:
URL: https://github.com/apache/uniffle/pull/2405#issuecomment-2732093224


   > > > From my sight, this feature now can't be used in Spark SQL. Maybe RDD 
could use this.
   > > 
   > > 
   > > This test is based on draft pr 
[apache/spark#50248](https://github.com/apache/spark/pull/50248).
   > 
   > This will break the code implement of Spark. You would better to insert a 
new logic plan represents the distribution and partitioning after shuffling. 
You only need to implement some optimization rules.
   
   Are you talking about changes to Spark? My initial idea was also to see if I 
could add a new rule. Maybe for map side, I could add new rules. But for 
reduce, adding a new SortExec is determined by determining whether distribution 
and partitioning match, which is not easy to do by adding a new Rule.
   For the draft pr about changes to spark. It is only a draft to verify the 
feasibility of this proposal. There are still some code architectures that need 
to be refactored. For example, some partial aggregation in memory logic, add 
some logic to the rule.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [#1750] feat(remote merge): Support Spark. [uniffle]

Reply via email to