Invoking Hive UDF programmatically

2015-05-27 Thread Punyashloka Biswal
Dear Spark users, Given a DataFrame df with a column named foo bar, I can call a Spark SQL built-in function on it like so: df.select(functions.max(df(foo bar))) However, if I want to apply a Hive UDF named myCustomFunction, I need to write df.selectExpr(myCustomFunction(`foo bar`)) which

Recommended Scala version

2015-05-26 Thread Punyashloka Biswal
Dear Spark developers and users, Am I correct in believing that the recommended version of Scala to use with Spark is currently 2.10? Is there any plan to switch to 2.11 in future? Are there any advantages to using 2.11 today? Regards, Punya

Re: gridsearch - python

2015-04-23 Thread Punyashloka Biswal
https://issues.apache.org/jira/browse/SPARK-7022. Punya On Thu, Apr 23, 2015 at 5:47 PM Pagliari, Roberto rpagli...@appcomsci.com wrote: Can anybody point me to an example, if available, about gridsearch with python? Thank you,

Re: Map-Side Join in Spark

2015-04-20 Thread Punyashloka Biswal
Could you do it using flatMap? Punya On Tue, Apr 21, 2015 at 12:19 AM ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: The reason am asking this is, i am not able to understand how do i do a skip. 1) Broadcast small table-1 as map. 2) I jun do .map() on large table-2. When you do .map()