Hi Can you try select birthday from customer left join profile on customer.account_id = profile.account_id to see if the problems remains on your entire data?
Thanks, Liquan On Fri, Oct 10, 2014 at 8:20 AM, invkrh <inv...@gmail.com> wrote: > Hi, > > I am exploring SparkSQL 1.1.0, I have a problem on LEFT JOIN. > > Here is the request: > > select * from customer left join profile on customer.account_id = > profile.account_id > > The two tables' schema are shown as following: > > // Table: customer > root > |-- account_id: string (nullable = false) > |-- birthday: string (nullable = true) > |-- preferstore: string (nullable = true) > |-- registstore: string (nullable = true) > |-- gender: string (nullable = true) > |-- city_name_en: string (nullable = true) > |-- register_date: string (nullable = true) > |-- zip: string (nullable = true) > > // Table: profile > root > |-- account_id: string (nullable = false) > |-- card_type: string (nullable = true) > |-- card_upgrade_time_black: string (nullable = true) > |-- card_upgrade_time_gold: string (nullable = true) > > However, I have always an exception: > > Exception in thread "main" > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved > attributes: *, tree: > Project [*] > Join LeftOuter, Some(('customer.account_id = 'profile.account_id)) > Subquery customer > SparkLogicalPlan (ExistingRdd > > [account_id#0,birthday#1,preferstore#2,registstore#3,gender#4,city_name_en#5,register_date#6,zip#7], > MappedRDD[5] at map at SQLFetcher.scala:43) > Subquery profile > SparkLogicalPlan (ExistingRdd > > [account_id#8,card_type#9,card_upgrade_time_black#10,card_upgrade_time_gold#11], > MappedRDD[12] at map at SQLFetcher.scala:43) > > I was not sure where the problem is. So I create two simple tables to > isolate the problem. > > // table 1 > a b c > 4 8 9 > 1 3 4 > 3 4 5 > > // table 2 > a b c > 1 2 3 > 4 5 6 > > This time, it works. > > So the problem might be in data. I have just sampled some lines of input > tables to create new ones. > This also works. > > I am so confused. The problem is in the data, but the error messages are > not > enough to find it (if I am not missing anything.) > > Some lines of the sampled tables. > > // Table: customer > > [50660,1975-06-05 00:00:00.000,13,12,male,ningboshi,2006-12-14 > 00:00:00.000,] > [50666,1984-02-23 00:00:00.000,72,5,Female,beijingshi,2006-12-14 > 00:00:00.000,100086] > [50680,1976-11-25 00:00:00.000,59,5,Female,beijingshi,2006-12-14 > 00:00:00.000,100022] > [85,1971-03-27 00:00:00.000,2,2,Female,shanghaishi,2005-09-20 > 00:00:00.000,200336] > > > // Table: profile > > [1144681,3,2010-02-18 00:00:00.000,2013-02-28 00:00:00.000] > [50666,2,2010-10-31 00:00:00.000,] > [3930657,1,,] > [1056365,2,2009-12-29 00:00:00.000,] > > Any help ? =) > > Hao > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-LEFT-JOIN-problem-tp16152.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Liquan Pei Department of Physics University of Massachusetts Amherst