Hi, I am not very sure if SPARK data frames apply to your used case, if it does please give a try by creating a UDF in Python and check whether you can call it in Scala or not using select and expr.
Regards, Gourav Sengupta On Mon, Jul 16, 2018 at 5:32 AM, Chetan Khatri <chetan.opensou...@gmail.com> wrote: > Hello Jayant, > > Thanks for great OSS Contribution :) > > On Thu, Jul 12, 2018 at 1:36 PM, Jayant Shekhar <jayantbaya...@gmail.com> > wrote: > >> Hello Chetan, >> >> Sorry missed replying earlier. You can find some sample code here : >> >> http://sparkflows.readthedocs.io/en/latest/user-guide/python >> /pipe-python.html >> >> We will continue adding more there. >> >> Feel free to ping me directly in case of questions. >> >> Thanks, >> Jayant >> >> >> On Mon, Jul 9, 2018 at 9:56 PM, Chetan Khatri < >> chetan.opensou...@gmail.com> wrote: >> >>> Hello Jayant, >>> >>> Thank you so much for suggestion. My view was to use Python function as >>> transformation which can take couple of column names and return object. >>> which you explained. would that possible to point me to similiar codebase >>> example. >>> >>> Thanks. >>> >>> On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar <jayantbaya...@gmail.com> >>> wrote: >>> >>>> Hello Chetan, >>>> >>>> We have currently done it with .pipe(.py) as Prem suggested. >>>> >>>> That passes the RDD as CSV strings to the python script. The python >>>> script can either process it line by line, create the result and return it >>>> back. Or create things like Pandas Dataframe for processing and finally >>>> write the results back. >>>> >>>> In the Spark/Scala/Java code, you get an RDD of string, which we >>>> convert back to a Dataframe. >>>> >>>> Feel free to ping me directly in case of questions. >>>> >>>> Thanks, >>>> Jayant >>>> >>>> >>>> On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri < >>>> chetan.opensou...@gmail.com> wrote: >>>> >>>>> Prem sure, Thanks for suggestion. >>>>> >>>>> On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure <sparksure...@gmail.com> >>>>> wrote: >>>>> >>>>>> try .pipe(.py) on RDD >>>>>> >>>>>> Thanks, >>>>>> Prem >>>>>> >>>>>> On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri < >>>>>> chetan.opensou...@gmail.com> wrote: >>>>>> >>>>>>> Can someone please suggest me , thanks >>>>>>> >>>>>>> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, < >>>>>>> chetan.opensou...@gmail.com> wrote: >>>>>>> >>>>>>>> Hello Dear Spark User / Dev, >>>>>>>> >>>>>>>> I would like to pass Python user defined function to Spark Job >>>>>>>> developed using Scala and return value of that function would be >>>>>>>> returned >>>>>>>> to DF / Dataset API. >>>>>>>> >>>>>>>> Can someone please guide me, which would be best approach to do >>>>>>>> this. Python function would be mostly transformation function. Also >>>>>>>> would >>>>>>>> like to pass Java Function as a String to Spark / Scala job and it >>>>>>>> applies >>>>>>>> to RDD / Data Frame and should return RDD / Data Frame. >>>>>>>> >>>>>>>> Thank you. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>> >> >