Re: Calling Python code from Scala

2016-04-18 Thread Didier Marin
Hi,

Thank you for the quick answers !

Ndjido Ardo: actually the Python part is not legacy code. I'm not familiar
with UDF in Spark, do you have some examples of Python UDFs and how to use
them in Scala code ?

Holden Karau: the pipe interface seems like a good solution, but I'm a bit
concerned about performance. Will that work well if I need to pipe a lot of
data ?

Cheers,
Didier

2016-04-18 19:29 GMT+02:00 Holden Karau :

> So if there is just a few python functions your interested in accessing
> you can also use the pipe interface (you'll have to manually serialize your
> data on both ends in ways that Python and Scala can respectively parse) -
> but its a very generic approach and can work with many different languages.
>
> On Mon, Apr 18, 2016 at 10:23 AM, Ndjido Ardo BAR 
> wrote:
>
>> Hi Didier,
>>
>> I think with PySpark you can wrap your legacy Python functions into UDFs
>> and use it in your DataFrames. But you have to use DataFrames instead of
>> RDD.
>>
>> cheers,
>> Ardo
>>
>> On Mon, Apr 18, 2016 at 7:13 PM, didmar  wrote:
>>
>>> Hi,
>>>
>>> I have a Spark project in Scala and I would like to call some Python
>>> functions from within the program.
>>> Both parts are quite big, so re-coding everything in one language is not
>>> really an option.
>>>
>>> The workflow would be:
>>> - Creating a RDD with Scala code
>>> - Mapping a Python function over this RDD
>>> - Using the result directly in Scala
>>>
>>> I've read about PySpark internals, but that didn't help much.
>>> Is it possible to do so, and preferably in an efficent manner ?
>>>
>>> Cheers,
>>> Didier
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Calling-Python-code-from-Scala-tp26798.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>


Re: Calling Python code from Scala

2016-04-18 Thread Mohit Jaggi
When faced with this issue I followed the approach taken by pyspark and used 
py4j. You have to:
- ensure your code is Java compatible
- use py4j to call the java (scala) code from python


> On Apr 18, 2016, at 10:29 AM, Holden Karau  wrote:
> 
> So if there is just a few python functions your interested in accessing you 
> can also use the pipe interface (you'll have to manually serialize your data 
> on both ends in ways that Python and Scala can respectively parse) - but its 
> a very generic approach and can work with many different languages.
> 
> On Mon, Apr 18, 2016 at 10:23 AM, Ndjido Ardo BAR  > wrote:
> Hi Didier,
> 
> I think with PySpark you can wrap your legacy Python functions into UDFs and 
> use it in your DataFrames. But you have to use DataFrames instead of RDD. 
> 
> cheers,
> Ardo
> 
> On Mon, Apr 18, 2016 at 7:13 PM, didmar  > wrote:
> Hi,
> 
> I have a Spark project in Scala and I would like to call some Python
> functions from within the program.
> Both parts are quite big, so re-coding everything in one language is not
> really an option.
> 
> The workflow would be:
> - Creating a RDD with Scala code
> - Mapping a Python function over this RDD
> - Using the result directly in Scala
> 
> I've read about PySpark internals, but that didn't help much.
> Is it possible to do so, and preferably in an efficent manner ?
> 
> Cheers,
> Didier
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Calling-Python-code-from-Scala-tp26798.html
>  
> 
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> 
> For additional commands, e-mail: user-h...@spark.apache.org 
> 
> 
> 
> 
> 
> 
> -- 
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau 


Re: Calling Python code from Scala

2016-04-18 Thread Holden Karau
So if there is just a few python functions your interested in accessing you
can also use the pipe interface (you'll have to manually serialize your
data on both ends in ways that Python and Scala can respectively parse) -
but its a very generic approach and can work with many different languages.

On Mon, Apr 18, 2016 at 10:23 AM, Ndjido Ardo BAR  wrote:

> Hi Didier,
>
> I think with PySpark you can wrap your legacy Python functions into UDFs
> and use it in your DataFrames. But you have to use DataFrames instead of
> RDD.
>
> cheers,
> Ardo
>
> On Mon, Apr 18, 2016 at 7:13 PM, didmar  wrote:
>
>> Hi,
>>
>> I have a Spark project in Scala and I would like to call some Python
>> functions from within the program.
>> Both parts are quite big, so re-coding everything in one language is not
>> really an option.
>>
>> The workflow would be:
>> - Creating a RDD with Scala code
>> - Mapping a Python function over this RDD
>> - Using the result directly in Scala
>>
>> I've read about PySpark internals, but that didn't help much.
>> Is it possible to do so, and preferably in an efficent manner ?
>>
>> Cheers,
>> Didier
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Calling-Python-code-from-Scala-tp26798.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Re: Calling Python code from Scala

2016-04-18 Thread Ndjido Ardo BAR
Hi Didier,

I think with PySpark you can wrap your legacy Python functions into UDFs
and use it in your DataFrames. But you have to use DataFrames instead of
RDD.

cheers,
Ardo

On Mon, Apr 18, 2016 at 7:13 PM, didmar  wrote:

> Hi,
>
> I have a Spark project in Scala and I would like to call some Python
> functions from within the program.
> Both parts are quite big, so re-coding everything in one language is not
> really an option.
>
> The workflow would be:
> - Creating a RDD with Scala code
> - Mapping a Python function over this RDD
> - Using the result directly in Scala
>
> I've read about PySpark internals, but that didn't help much.
> Is it possible to do so, and preferably in an efficent manner ?
>
> Cheers,
> Didier
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Calling-Python-code-from-Scala-tp26798.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>