Have you tryed to narrow down the problem so that we can be 100% sure that it lies on the array types ? Just exclude them for sake of testing. If we know 100% that it is on this array stuff try to explode that columns into simple types.
Jorge Machado > On 4 Jun 2018, at 11:09, Pranav Agrawal <pranav.mn...@gmail.com> wrote: > > I am ordering the columns before doing union, so I think that should not be > an issue, > > String[] columns_original_order = baseDs.columns(); > String[] columns = baseDs.columns(); > Arrays.sort(columns); > baseDs=baseDs.selectExpr(columns); > incDsForPartition=incDsForPartition.selectExpr(columns); > > if (baseDs.count() > 0) { > return > baseDs.union(incDsForPartition).selectExpr(columns_original_order); > } else { > return incDsForPartition.selectExpr(columns_original_order); > > > On Mon, Jun 4, 2018 at 2:31 PM, Jorge Machado <jom...@me.com > <mailto:jom...@me.com>> wrote: > Try the same union with a dataframe without the arrays types. Could be > something strange there like ordering or so. > > Jorge Machado > > > > > >> On 4 Jun 2018, at 10:17, Pranav Agrawal <pranav.mn...@gmail.com >> <mailto:pranav.mn...@gmail.com>> wrote: >> >> schema is exactly the same, not sure why it is failing though. >> >> root >> |-- booking_id: integer (nullable = true) >> |-- booking_rooms_room_category_id: integer (nullable = true) >> |-- booking_rooms_room_id: integer (nullable = true) >> |-- booking_source: integer (nullable = true) >> |-- booking_status: integer (nullable = true) >> |-- cancellation_reason: integer (nullable = true) >> |-- checkin: string (nullable = true) >> |-- checkout: string (nullable = true) >> |-- city_id: integer (nullable = true) >> |-- cluster_id: integer (nullable = true) >> |-- company_id: integer (nullable = true) >> |-- created_at: string (nullable = true) >> |-- discount: integer (nullable = true) >> |-- feedback_created_at: string (nullable = true) >> |-- feedback_id: integer (nullable = true) >> |-- hotel_id: integer (nullable = true) >> |-- hub_id: integer (nullable = true) >> |-- month: integer (nullable = true) >> |-- no_show_reason: integer (nullable = true) >> |-- oyo_rooms: integer (nullable = true) >> |-- selling_amount: integer (nullable = true) >> |-- shifting: array (nullable = true) >> | |-- element: struct (containsNull = true) >> | | |-- id: integer (nullable = true) >> | | |-- booking_id: integer (nullable = true) >> | | |-- shifting_status: integer (nullable = true) >> | | |-- shifting_reason: integer (nullable = true) >> | | |-- shifting_metadata: integer (nullable = true) >> |-- suggest_oyo: integer (nullable = true) >> |-- tickets: array (nullable = true) >> | |-- element: struct (containsNull = true) >> | | |-- ticket_source: integer (nullable = true) >> | | |-- ticket_status: string (nullable = true) >> | | |-- ticket_instance_source: integer (nullable = true) >> | | |-- ticket_category: string (nullable = true) >> |-- updated_at: timestamp (nullable = true) >> |-- year: integer (nullable = true) >> |-- zone_id: integer (nullable = true) >> >> root >> |-- booking_id: integer (nullable = true) >> |-- booking_rooms_room_category_id: integer (nullable = true) >> |-- booking_rooms_room_id: integer (nullable = true) >> |-- booking_source: integer (nullable = true) >> |-- booking_status: integer (nullable = true) >> |-- cancellation_reason: integer (nullable = true) >> |-- checkin: string (nullable = true) >> |-- checkout: string (nullable = true) >> |-- city_id: integer (nullable = true) >> |-- cluster_id: integer (nullable = true) >> |-- company_id: integer (nullable = true) >> |-- created_at: string (nullable = true) >> |-- discount: integer (nullable = true) >> |-- feedback_created_at: string (nullable = true) >> |-- feedback_id: integer (nullable = true) >> |-- hotel_id: integer (nullable = true) >> |-- hub_id: integer (nullable = true) >> |-- month: integer (nullable = true) >> |-- no_show_reason: integer (nullable = true) >> |-- oyo_rooms: integer (nullable = true) >> |-- selling_amount: integer (nullable = true) >> |-- shifting: array (nullable = true) >> | |-- element: struct (containsNull = true) >> | | |-- id: integer (nullable = true) >> | | |-- booking_id: integer (nullable = true) >> | | |-- shifting_status: integer (nullable = true) >> | | |-- shifting_reason: integer (nullable = true) >> | | |-- shifting_metadata: integer (nullable = true) >> |-- suggest_oyo: integer (nullable = true) >> |-- tickets: array (nullable = true) >> | |-- element: struct (containsNull = true) >> | | |-- ticket_source: integer (nullable = true) >> | | |-- ticket_status: string (nullable = true) >> | | |-- ticket_instance_source: integer (nullable = true) >> | | |-- ticket_category: string (nullable = true) >> |-- updated_at: timestamp (nullable = false) >> |-- year: integer (nullable = true) >> |-- zone_id: integer (nullable = true) >> >> On Sun, Jun 3, 2018 at 8:05 PM, Alessandro Solimando >> <alessandro.solima...@gmail.com <mailto:alessandro.solima...@gmail.com>> >> wrote: >> Hi Pranav, >> I don´t have an answer to your issue, but what I generally do in this cases >> is to first try to simplify it to a point where it is easier to check what´s >> going on, and then adding back ¨pieces¨ one by one until I spot the error. >> >> In your case I can suggest to: >> >> 1) project the dataset to the problematic column only (column 21 from your >> log) >> 2) use explode function to have one element of the array per line >> 3) flatten the struct >> >> At each step use printSchema() to double check if the types are as you >> expect them to be, and if they are the same for both datasets. >> >> Best regards, >> Alessandro >> >> On 2 June 2018 at 19:48, Pranav Agrawal <pranav.mn...@gmail.com >> <mailto:pranav.mn...@gmail.com>> wrote: >> can't get around this error when performing union of two datasets >> (ds1.union(ds2)) having complex data type (struct, list), >> >> 18/06/02 15:12:00 INFO ApplicationMaster: Final app status: FAILED, >> exitCode: 15, (reason: User class threw exception: >> org.apache.spark.sql.AnalysisException: Union can only be performed on >> tables with the compatible column types. >> array<struct<id:int,booking_id:int,shifting_status:int,shifting_reason:int,shifting_metadata:string>> >> <> >> array<struct<id:int,booking_id:int,shifting_status:int,shifting_reason:int,shifting_metadata:string>> >> at the 21th column of the second table;; >> >> As far as I can tell, they are the same. What am I doing wrong? Any help / >> workaround appreciated! >> >> spark version: 2.2.1 >> >> Thanks, >> Pranav >> >> > >