Thanks,
but I think this is not the case of multiple spark contexts (never the less I 
tried your suggestion - didn’t worked). The problem is join to datasets using 
array items value: attribute.value in my case. Has anyone ideas?


> 24 авг. 2015 г., в 15:01, satish chandra j <jsatishchan...@gmail.com> 
> написал(а):
> 
> Hi,
> If you join logic is correct, it seems to be a similar issue which i faced 
> recently
> 
> Can you try by 
> SparkContext(conf).set("spark.driver.allowMultipleContexts","true")
> 
> Regards,
> Satish Chandra
> 
> On Mon, Aug 24, 2015 at 2:51 PM, Ilya Karpov <i.kar...@cleverdata.ru 
> <mailto:i.kar...@cleverdata.ru>> wrote:
> Hi, guys
> I'm confused about joining columns in SparkSQL and need your advice.
> I want to join 2 datasets of profiles. Each profile has name and array of 
> attributes(age, gender, email etc).
> There can be mutliple instances of attribute with the same name, e.g. profile 
> has 2 emails - so 2 attributes with name = 'email' in
> array. Now I want to join 2 datasets using 'email' attribute. I cant find the 
> way to do it :(
> 
> The code is below. Now result of join is empty, while I expect to see 1 row 
> with all Alice emails.
> 
> import org.apache.spark.sql.{DataFrame, SQLContext}
> import org.apache.spark.{SparkConf, SparkContext}
> 
> case class Attribute(name: String, value: String, weight: Float)
> case class Profile(name: String, attributes: Seq[Attribute])
> 
> object SparkJoinArrayColumn {
>   def main(args: Array[String]) {
>     val sc: SparkContext = new SparkContext(new 
> SparkConf().setMaster("local").setAppName(getClass.getSimpleName))
>     val sqlContext: SQLContext = new SQLContext(sc)
> 
>     import sqlContext.implicits._
> 
>     val a: DataFrame = sc.parallelize(Seq(
>       Profile("Alice", Seq(Attribute("email", "al...@mail.com 
> <mailto:al...@mail.com>", 1.0f), Attribute("email", "a.jo...@mail.com 
> <mailto:a.jo...@mail.com>", 1.0f)))
>     )).toDF.as("a")
> 
>     val b: DataFrame = sc.parallelize(Seq(
>       Profile("Alice", Seq(Attribute("email", "al...@mail.com 
> <mailto:al...@mail.com>", 1.0f), Attribute("age", "29", 0.2f)))
>     )).toDF.as("b")
> 
> 
>     a.where($"a.attributes.name <http://a.attributes.name/>" === "email")
>       .join(
>         b.where($"b.attributes.name <http://b.attributes.name/>" === "email"),
>         $"a.attributes.value" === $"b.attributes.value"
>       )
>     .show()
>   }
> }
> 
> Thanks forward!
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org 
> <mailto:user-h...@spark.apache.org>
> 
> 

Reply via email to