milenkovicm opened a new pull request, #1338:
URL: https://github.com/apache/datafusion-ballista/pull/1338

   # Which issue does this PR close?
   
   Closes #1142
   
    # Rationale for this change
   
   For quite some time, we wanted to provide a Ballista Python interface and 
make it an extension of DataFusion Python. For the reasons mentioned in #1142, 
we haven't been able to do so. The main issue was that we could not use 
DataFrame as there was a class mismatch, something like 
   
   ```python
   from pyballista import BallistaBuilder
   from datafusion import SessionContext
   from datafusion import functions as f
   
   # %%
   ctx: SessionContext = BallistaBuilder()\
       .config("ballista.job.name", "example ballista")\
       .config("ballista.shuffle.partitions", "16")\
       .standalone()
       
   df = ctx.sql("SELECT 1 as r").aggregate(
       [f.col("r")], [f.count_star()]
   )
   df.show()
   ``` 
   
   was not possible due to FFI between datafusion and ballista python. 
   
   # What changes are included in this PR?
   
   This PR relies on python duck typing, to "fake" `DataFrame` interface and 
replace it with `DistributedDataFrame` extension which would execute query on 
ballista cluster.
   
   ```python
   from ballista import BallistaSessionContext
   from datafusion import col, lit, DataFrame
   from datafusion import functions as f
   
   # we replace 
   # ctx = SessionContext()
   # with
   ctx = BallistaSessionContext(address="df://127.0.0.1:50050")
   
   df : DataFrame = ctx.table("t")
   df.filter(col("id") > lit(4)).show()
   
   df0 = ctx.sql("SELECT 1 as r")
   
   df0.aggregate(
       [f.col("r")], [f.count_star()]
   )
   
   df0.show()
   ```
   
   There is slight inneficiency where original logical plan will be serialised 
in datafusion python and deserialised in ballista python in order to cross FFI 
boundary as well as re-creation of BallistaContext
   
   Also, we would need to override few methods to make it work with ballista.
   
   # Are there any user-facing changes?
   
   There will be change in interface, but too early to tell 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to