Yes, the SomeUDF is Contains, shape is a UDT that maps a custom geometry type 
to sql binary type.

Custom geometry type is a Java class. Please let me know if you need further 
info.

Regards
Raghu

> On Jan 26, 2016, at 17:13, Ted Yu <yuzhih...@gmail.com> wrote:
> 
> What's the type of shape column ?
> 
> Can you disclose what SomeUDF does (by showing the code) ?
> 
> Cheers
> 
>> On Tue, Jan 26, 2016 at 12:41 PM, raghukiran <raghuki...@gmail.com> wrote:
>> Hi,
>> 
>> I create two tables, one counties with just one row (it actually has 2k
>> rows, but I used only one) and another hospitals, which has 6k rows. The
>> join command I use is as follows, which takes way too long to run and has
>> never finished successfully (even after nearly 10mins). The following is
>> what I have:
>> 
>> DataFrame df1 = ...
>> df1.registerTempTable("hospitals");
>> DataFrame df2 = ...
>> df2.registerTempTable("counties"); //has only one row right now
>> DataFrame joinDf = sqlCtx.sql("SELECT h.name, c.name FROM hospitals h JOIN
>> counties c ON SomeUDF(c.shape, h.location)");
>> long count = joinDf.count(); //this takes too long!
>> 
>> //whereas the following which is the exact equivalent of the above gets done
>> very quickly!
>> DataFrame joinDf = sqlCtx.sql("SELECT h.name FROM hospitals WHERE
>> SomeUDF('c.shape as string', h.location)");
>> long count = joinDf.count(); //gives me the correct answer of 8
>> 
>> Any suggestions on what I can do to optimize and debug this piece of code?
>> 
>> Regards,
>> Raghu
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-joins-taking-too-long-tp26078.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
> 

Reply via email to