You have 2 options: Option 1: Use lateral view explode, as you did below. But if you want to remove the duplicate, then use distinct after that. For example: col1, col2, ArrayOf(Struct) After explode: col1, col2, employee0col1, col2, employee1col1, col2, employee0 Then select distinct col1, col2 from ... where emp.name='employee0' Option 2: Implement your own UDF, to do the logic you want to do. In fact, in the Hive, there is already one called array_contains(), which check if the array contain the data you want. But in your case, your data in the array is a struct, and you only want to compare name of the struct, instead of whole struct. You need to override the equals() logic of array_contains() in the Hive, so you have to implement that by custom UDF. See the hive function of array_contains here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-CollectionFunctions Yong From: tridib.sama...@live.com To: java8...@hotmail.com; user@spark.apache.org Subject: RE: nested collection object query Date: Mon, 28 Sep 2015 23:02:41 -0700
Well I figure out a way to use explode. But it returns two rows if there is two match in nested array objects. select id from department LATERAL VIEW explode(employee) dummy_table as emp where emp.name = 'employee0' I was looking for an operator that loops through the array and return true if it matches the condition and returns the parent object. From: tridib.sama...@live.com To: java8...@hotmail.com; user@spark.apache.org Subject: RE: nested collection object query Date: Mon, 28 Sep 2015 22:26:46 -0700 Thanks for you response Yong! Array syntax works fine. But I am not sure how to use explode. Should I use as follows? select id from department where explode(employee).name = 'employee0 This query gives me java.lang.UnsupportedOperationException . I am using HiveContext. From: java8...@hotmail.com To: tridib.sama...@live.com; user@spark.apache.org Subject: RE: nested collection object query Date: Mon, 28 Sep 2015 20:42:11 -0400 Your employee in fact is an array of struct, not just struct. If you are using HiveSQLContext, then you can refer it like following: select id from member where employee[0].name = 'employee0' The employee[0] is pointing to the 1st element of the array. If you want to query all the elements in the array, then you have to use "explode" in the Hive. See Hive document for this: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode Yong > Date: Mon, 28 Sep 2015 16:37:23 -0700 > From: tridib.sama...@live.com > To: user@spark.apache.org > Subject: nested collection object query > > Hi Friends, > What is the right syntax to query on collection of nested object? I have a > following schema and SQL. But it does not return anything. Is the syntax > correct? > > root > |-- id: string (nullable = false) > |-- employee: array (nullable = false) > | |-- element: struct (containsNull = true) > | | |-- id: string (nullable = false) > | | |-- name: string (nullable = false) > | | |-- speciality: string (nullable = false) > > > select id from member where employee.name = 'employee0' > > Uploaded a test if some one want to try it out. NestedObjectTest.java > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n24853/NestedObjectTest.java> > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/nested-collection-object-query-tp24853.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org >