Have you seen the thread 'Filter on a column having multiple values' where Michael gave this example ?
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/1075277772969592/2840265927289860/2388bac36e.html FYI On Wed, Mar 2, 2016 at 9:33 PM, Angel Angel <areyouange...@gmail.com> wrote: > Hello Sir/Madam, > > I am writing one application using spark sql. > > i made the vary big table using the following command > > *val dfCustomers1 = > sc.textFile("/root/Desktop/database.txt").map(_.split(",")).map(p => > Customer1(p(0),p(1).trim.toInt, p(2).trim.toInt, p(3)))toDF* > > > Now i want to search the address(many address) fields in the table and > then extends the new table as per the searching. > > *var k = dfCustomers1.filter(dfCustomers1("Address").equalTo(lines(0)))* > > > > *for( a <-1 until 1500) {* > > * | var temp= > dfCustomers1.filter(dfCustomers1("Address").equalTo(lines(a)))* > > * | k = temp.unionAll(k)* > > *}* > > *k.show* > > > > > But this is taking so long time. So can you suggest me the any optimized > way, so i can reduce the execution time. > > > My cluster has 3 slaves and 1 master. > > > Thanks. >