Have you tryed to narrow down the problem so that we can be 100% sure that it 
lies on the array types ? Just exclude them for sake of testing. 
If we know 100% that it is on this array stuff try to explode that columns into 
simple types.

Jorge Machado






> On 4 Jun 2018, at 11:09, Pranav Agrawal <pranav.mn...@gmail.com> wrote:
> 
> I am ordering the columns before doing union, so I think that should not be 
> an issue,
>         
>         String[] columns_original_order = baseDs.columns();
>         String[] columns = baseDs.columns();
>         Arrays.sort(columns);
>         baseDs=baseDs.selectExpr(columns);
>         incDsForPartition=incDsForPartition.selectExpr(columns);
> 
>         if (baseDs.count() > 0) {
>             return 
> baseDs.union(incDsForPartition).selectExpr(columns_original_order);
>         } else {
>             return incDsForPartition.selectExpr(columns_original_order);
> 
> 
> On Mon, Jun 4, 2018 at 2:31 PM, Jorge Machado <jom...@me.com 
> <mailto:jom...@me.com>> wrote:
> Try the same union with a dataframe without the arrays types. Could be 
> something strange there like ordering or so.
> 
> Jorge Machado
> 
> 
> 
> 
> 
>> On 4 Jun 2018, at 10:17, Pranav Agrawal <pranav.mn...@gmail.com 
>> <mailto:pranav.mn...@gmail.com>> wrote:
>> 
>> schema is exactly the same, not sure why it is failing though.
>> 
>> root
>>  |-- booking_id: integer (nullable = true)
>>  |-- booking_rooms_room_category_id: integer (nullable = true)
>>  |-- booking_rooms_room_id: integer (nullable = true)
>>  |-- booking_source: integer (nullable = true)
>>  |-- booking_status: integer (nullable = true)
>>  |-- cancellation_reason: integer (nullable = true)
>>  |-- checkin: string (nullable = true)
>>  |-- checkout: string (nullable = true)
>>  |-- city_id: integer (nullable = true)
>>  |-- cluster_id: integer (nullable = true)
>>  |-- company_id: integer (nullable = true)
>>  |-- created_at: string (nullable = true)
>>  |-- discount: integer (nullable = true)
>>  |-- feedback_created_at: string (nullable = true)
>>  |-- feedback_id: integer (nullable = true)
>>  |-- hotel_id: integer (nullable = true)
>>  |-- hub_id: integer (nullable = true)
>>  |-- month: integer (nullable = true)
>>  |-- no_show_reason: integer (nullable = true)
>>  |-- oyo_rooms: integer (nullable = true)
>>  |-- selling_amount: integer (nullable = true)
>>  |-- shifting: array (nullable = true)
>>  |    |-- element: struct (containsNull = true)
>>  |    |    |-- id: integer (nullable = true)
>>  |    |    |-- booking_id: integer (nullable = true)
>>  |    |    |-- shifting_status: integer (nullable = true)
>>  |    |    |-- shifting_reason: integer (nullable = true)
>>  |    |    |-- shifting_metadata: integer (nullable = true)
>>  |-- suggest_oyo: integer (nullable = true)
>>  |-- tickets: array (nullable = true)
>>  |    |-- element: struct (containsNull = true)
>>  |    |    |-- ticket_source: integer (nullable = true)
>>  |    |    |-- ticket_status: string (nullable = true)
>>  |    |    |-- ticket_instance_source: integer (nullable = true)
>>  |    |    |-- ticket_category: string (nullable = true)
>>  |-- updated_at: timestamp (nullable = true)
>>  |-- year: integer (nullable = true)
>>  |-- zone_id: integer (nullable = true)
>> 
>> root
>>  |-- booking_id: integer (nullable = true)
>>  |-- booking_rooms_room_category_id: integer (nullable = true)
>>  |-- booking_rooms_room_id: integer (nullable = true)
>>  |-- booking_source: integer (nullable = true)
>>  |-- booking_status: integer (nullable = true)
>>  |-- cancellation_reason: integer (nullable = true)
>>  |-- checkin: string (nullable = true)
>>  |-- checkout: string (nullable = true)
>>  |-- city_id: integer (nullable = true)
>>  |-- cluster_id: integer (nullable = true)
>>  |-- company_id: integer (nullable = true)
>>  |-- created_at: string (nullable = true)
>>  |-- discount: integer (nullable = true)
>>  |-- feedback_created_at: string (nullable = true)
>>  |-- feedback_id: integer (nullable = true)
>>  |-- hotel_id: integer (nullable = true)
>>  |-- hub_id: integer (nullable = true)
>>  |-- month: integer (nullable = true)
>>  |-- no_show_reason: integer (nullable = true)
>>  |-- oyo_rooms: integer (nullable = true)
>>  |-- selling_amount: integer (nullable = true)
>>  |-- shifting: array (nullable = true)
>>  |    |-- element: struct (containsNull = true)
>>  |    |    |-- id: integer (nullable = true)
>>  |    |    |-- booking_id: integer (nullable = true)
>>  |    |    |-- shifting_status: integer (nullable = true)
>>  |    |    |-- shifting_reason: integer (nullable = true)
>>  |    |    |-- shifting_metadata: integer (nullable = true)
>>  |-- suggest_oyo: integer (nullable = true)
>>  |-- tickets: array (nullable = true)
>>  |    |-- element: struct (containsNull = true)
>>  |    |    |-- ticket_source: integer (nullable = true)
>>  |    |    |-- ticket_status: string (nullable = true)
>>  |    |    |-- ticket_instance_source: integer (nullable = true)
>>  |    |    |-- ticket_category: string (nullable = true)
>>  |-- updated_at: timestamp (nullable = false)
>>  |-- year: integer (nullable = true)
>>  |-- zone_id: integer (nullable = true)
>> 
>> On Sun, Jun 3, 2018 at 8:05 PM, Alessandro Solimando 
>> <alessandro.solima...@gmail.com <mailto:alessandro.solima...@gmail.com>> 
>> wrote:
>> Hi Pranav,
>> I don´t have an answer to your issue, but what I generally do in this cases 
>> is to first try to simplify it to a point where it is easier to check what´s 
>> going on, and then adding back ¨pieces¨ one by one until I spot the error.
>> 
>> In your case I can suggest to: 
>> 
>> 1) project the dataset to the problematic column only (column 21 from your 
>> log)
>> 2) use explode function to have one element of the array per line
>> 3) flatten the struct 
>> 
>> At each step use printSchema() to double check if the types are as you 
>> expect them to be, and if they are the same for both datasets.
>> 
>> Best regards,
>> Alessandro 
>> 
>> On 2 June 2018 at 19:48, Pranav Agrawal <pranav.mn...@gmail.com 
>> <mailto:pranav.mn...@gmail.com>> wrote:
>> can't get around this error when performing union of two datasets 
>> (ds1.union(ds2)) having complex data type (struct, list),
>> 
>> 18/06/02 15:12:00 INFO ApplicationMaster: Final app status: FAILED, 
>> exitCode: 15, (reason: User class threw exception: 
>> org.apache.spark.sql.AnalysisException: Union can only be performed on 
>> tables with the compatible column types. 
>> array<struct<id:int,booking_id:int,shifting_status:int,shifting_reason:int,shifting_metadata:string>>
>>  <> 
>> array<struct<id:int,booking_id:int,shifting_status:int,shifting_reason:int,shifting_metadata:string>>
>>  at the 21th column of the second table;;
>> 
>> As far as I can tell, they are the same. What am I doing wrong? Any help / 
>> workaround appreciated!
>> 
>> spark version: 2.2.1
>> 
>> Thanks,
>> Pranav
>> 
>> 
> 
> 

Reply via email to