Hello Chetan,

We have currently done it with .pipe(.py) as Prem suggested.

That passes the RDD as CSV strings to the python script. The python script
can either process it line by line, create the result and return it back.
Or create things like Pandas Dataframe for processing and finally write the
results back.

In the Spark/Scala/Java code, you get an RDD of string, which we convert
back to a Dataframe.

Feel free to ping me directly in case of questions.

Thanks,
Jayant


On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri <chetan.opensou...@gmail.com>
wrote:

> Prem sure, Thanks for suggestion.
>
> On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure <sparksure...@gmail.com> wrote:
>
>> try .pipe(.py) on RDD
>>
>> Thanks,
>> Prem
>>
>> On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Can someone please suggest me , thanks
>>>
>>> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, <chetan.opensou...@gmail.com>
>>> wrote:
>>>
>>>> Hello Dear Spark User / Dev,
>>>>
>>>> I would like to pass Python user defined function to Spark Job
>>>> developed using Scala and return value of that function would be returned
>>>> to DF / Dataset API.
>>>>
>>>> Can someone please guide me, which would be best approach to do this.
>>>> Python function would be mostly transformation function. Also would like to
>>>> pass Java Function as a String to Spark / Scala job and it applies to RDD /
>>>> Data Frame and should return RDD / Data Frame.
>>>>
>>>> Thank you.
>>>>
>>>>
>>>>
>>>>
>>
>

Reply via email to