Re: need workaround around HIVE-11625 / DISTRO-800
any help please On Tue, Aug 7, 2018 at 1:49 PM, Pranav Agrawal wrote: > I am hitting issue, > https://issues.cloudera.org/browse/DISTRO-800 (related to > https://issues.apache.org/jira/browse/HIVE-11625) > > I am unable to write empty array of types int or string (array of size 0) > into parquet, please assist or suggest workaround for the same. > > spark version: 2.2.1 > AWS EMR: 5.12, 5.13 >
need workaround around HIVE-11625 / DISTRO-800
I am hitting issue, https://issues.cloudera.org/browse/DISTRO-800 (related to https://issues.apache.org/jira/browse/HIVE-11625) I am unable to write empty array of types int or string (array of size 0) into parquet, please assist or suggest workaround for the same. spark version: 2.2.1 AWS EMR: 5.12, 5.13
Re: [Spark SQL] error in performing dataset union with complex data type (struct, list)
yes, issue is with array type only, I have confirmed that. I exploded array to struct but still getting the same error, *Exception in thread "main" org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. struct <> struct at the 21th column of the second table;;* On Mon, Jun 4, 2018 at 2:55 PM, Jorge Machado wrote: > Have you tryed to narrow down the problem so that we can be 100% sure that > it lies on the array types ? Just exclude them for sake of testing. > If we know 100% that it is on this array stuff try to explode that columns > into simple types. > > Jorge Machado > > > > > > > On 4 Jun 2018, at 11:09, Pranav Agrawal wrote: > > I am ordering the columns before doing union, so I think that should not > be an issue, > > > > > > > > > > > * String[] columns_original_order = baseDs.columns(); > String[] columns = baseDs.columns();Arrays.sort(columns); > baseDs=baseDs.selectExpr(columns); > incDsForPartition=incDsForPartition.selectExpr(columns);if > (baseDs.count() > 0) {return > baseDs.union(incDsForPartition).selectExpr(columns_original_order); > } else {return > incDsForPartition.selectExpr(columns_original_order);* > > > On Mon, Jun 4, 2018 at 2:31 PM, Jorge Machado wrote: > >> Try the same union with a dataframe without the arrays types. Could be >> something strange there like ordering or so. >> >> Jorge Machado >> >> >> >> >> >> On 4 Jun 2018, at 10:17, Pranav Agrawal wrote: >> >> schema is exactly the same, not sure why it is failing though. >> >> root >> |-- booking_id: integer (nullable = true) >> |-- booking_rooms_room_category_id: integer (nullable = true) >> |-- booking_rooms_room_id: integer (nullable = true) >> |-- booking_source: integer (nullable = true) >> |-- booking_status: integer (nullable = true) >> |-- cancellation_reason: integer (nullable = true) >> |-- checkin: string (nullable = true) >> |-- checkout: string (nullable = true) >> |-- city_id: integer (nullable = true) >> |-- cluster_id: integer (nullable = true) >> |-- company_id: integer (nullable = true) >> |-- created_at: string (nullable = true) >> |-- discount: integer (nullable = true) >> |-- feedback_created_at: string (nullable = true) >> |-- feedback_id: integer (nullable = true) >> |-- hotel_id: integer (nullable = true) >> |-- hub_id: integer (nullable = true) >> |-- month: integer (nullable = true) >> |-- no_show_reason: integer (nullable = true) >> |-- oyo_rooms: integer (nullable = true) >> |-- selling_amount: integer (nullable = true) >> |-- shifting: array (nullable = true) >> ||-- element: struct (containsNull = true) >> |||-- id: integer (nullable = true) >> |||-- booking_id: integer (nullable = true) >> |||-- shifting_status: integer (nullable = true) >> |||-- shifting_reason: integer (nullable = true) >> |||-- shifting_metadata: integer (nullable = true) >> |-- suggest_oyo: integer (nullable = true) >> |-- tickets: array (nullable = true) >> ||-- element: struct (containsNull = true) >> |||-- ticket_source: integer (nullable = true) >> |||-- ticket_status: string (nullable = true) >> |||-- ticket_instance_source: integer (nullable = true) >> |||-- ticket_category: string (nullable = true) >> |-- updated_at: timestamp (nullable = true) >> |-- year: integer (nullable = true) >> |-- zone_id: integer (nullable = true) >> >> root >> |-- booking_id: integer (nullable = true) >> |-- booking_rooms_room_category_id: integer (nullable = true) >> |-- booking_rooms_room_id: integer (nullable = true) >> |-- booking_source: integer (nullable = true) >> |-- booking_status: integer (nullable = true) >> |-- cancellation_reason: integer (nullable = true) >> |-- checkin: string (nullable = true) >> |-- checkout: string (nullable = true) >> |-- city_id: integer (nullable = true) >> |-- cluster_id: integer (nullable = true) >> |-- company_id: integer (nullable = true) >> |-- created_at: string (nullable = true) >> |-- discount: integer (nullable = true) >> |-- feedback_created_at: string (nullable = true) >> |-- feedback_id: integer (nullable = true) >> |-- hotel_id: integer (nullable = true) >> |-- hub_id: integer (nullable = true) >> |-- month: integer (nullable = true) >> |-- no_show_reason: integer (nullable
Re: [Spark SQL] error in performing dataset union with complex data type (struct, list)
I am ordering the columns before doing union, so I think that should not be an issue, * String[] columns_original_order = baseDs.columns(); String[] columns = baseDs.columns();Arrays.sort(columns); baseDs=baseDs.selectExpr(columns); incDsForPartition=incDsForPartition.selectExpr(columns);if (baseDs.count() > 0) {return baseDs.union(incDsForPartition).selectExpr(columns_original_order); } else {return incDsForPartition.selectExpr(columns_original_order);* On Mon, Jun 4, 2018 at 2:31 PM, Jorge Machado wrote: > Try the same union with a dataframe without the arrays types. Could be > something strange there like ordering or so. > > Jorge Machado > > > > > > On 4 Jun 2018, at 10:17, Pranav Agrawal wrote: > > schema is exactly the same, not sure why it is failing though. > > root > |-- booking_id: integer (nullable = true) > |-- booking_rooms_room_category_id: integer (nullable = true) > |-- booking_rooms_room_id: integer (nullable = true) > |-- booking_source: integer (nullable = true) > |-- booking_status: integer (nullable = true) > |-- cancellation_reason: integer (nullable = true) > |-- checkin: string (nullable = true) > |-- checkout: string (nullable = true) > |-- city_id: integer (nullable = true) > |-- cluster_id: integer (nullable = true) > |-- company_id: integer (nullable = true) > |-- created_at: string (nullable = true) > |-- discount: integer (nullable = true) > |-- feedback_created_at: string (nullable = true) > |-- feedback_id: integer (nullable = true) > |-- hotel_id: integer (nullable = true) > |-- hub_id: integer (nullable = true) > |-- month: integer (nullable = true) > |-- no_show_reason: integer (nullable = true) > |-- oyo_rooms: integer (nullable = true) > |-- selling_amount: integer (nullable = true) > |-- shifting: array (nullable = true) > ||-- element: struct (containsNull = true) > |||-- id: integer (nullable = true) > |||-- booking_id: integer (nullable = true) > |||-- shifting_status: integer (nullable = true) > |||-- shifting_reason: integer (nullable = true) > |||-- shifting_metadata: integer (nullable = true) > |-- suggest_oyo: integer (nullable = true) > |-- tickets: array (nullable = true) > ||-- element: struct (containsNull = true) > |||-- ticket_source: integer (nullable = true) > |||-- ticket_status: string (nullable = true) > |||-- ticket_instance_source: integer (nullable = true) > |||-- ticket_category: string (nullable = true) > |-- updated_at: timestamp (nullable = true) > |-- year: integer (nullable = true) > |-- zone_id: integer (nullable = true) > > root > |-- booking_id: integer (nullable = true) > |-- booking_rooms_room_category_id: integer (nullable = true) > |-- booking_rooms_room_id: integer (nullable = true) > |-- booking_source: integer (nullable = true) > |-- booking_status: integer (nullable = true) > |-- cancellation_reason: integer (nullable = true) > |-- checkin: string (nullable = true) > |-- checkout: string (nullable = true) > |-- city_id: integer (nullable = true) > |-- cluster_id: integer (nullable = true) > |-- company_id: integer (nullable = true) > |-- created_at: string (nullable = true) > |-- discount: integer (nullable = true) > |-- feedback_created_at: string (nullable = true) > |-- feedback_id: integer (nullable = true) > |-- hotel_id: integer (nullable = true) > |-- hub_id: integer (nullable = true) > |-- month: integer (nullable = true) > |-- no_show_reason: integer (nullable = true) > |-- oyo_rooms: integer (nullable = true) > |-- selling_amount: integer (nullable = true) > |-- shifting: array (nullable = true) > ||-- element: struct (containsNull = true) > |||-- id: integer (nullable = true) > |||-- booking_id: integer (nullable = true) > |||-- shifting_status: integer (nullable = true) > |||-- shifting_reason: integer (nullable = true) > |||-- shifting_metadata: integer (nullable = true) > |-- suggest_oyo: integer (nullable = true) > |-- tickets: array (nullable = true) > ||-- element: struct (containsNull = true) > |||-- ticket_source: integer (nullable = true) > |||-- ticket_status: string (nullable = true) > |||-- ticket_instance_source: integer (nullable = true) > |||-- ticket_category: string (nullable = true) > |-- updated_at: timestamp (nullable = false) > |-- year: integer (nullable = true) > |-- zone_id: integer (nullable = true) > > On Sun, Jun 3, 2018 at 8:05 PM, Alessandro Solimando < > alessandro.solima...@gmail.com> wrote: > >> H
Re: [Spark SQL] error in performing dataset union with complex data type (struct, list)
schema is exactly the same, not sure why it is failing though. root |-- booking_id: integer (nullable = true) |-- booking_rooms_room_category_id: integer (nullable = true) |-- booking_rooms_room_id: integer (nullable = true) |-- booking_source: integer (nullable = true) |-- booking_status: integer (nullable = true) |-- cancellation_reason: integer (nullable = true) |-- checkin: string (nullable = true) |-- checkout: string (nullable = true) |-- city_id: integer (nullable = true) |-- cluster_id: integer (nullable = true) |-- company_id: integer (nullable = true) |-- created_at: string (nullable = true) |-- discount: integer (nullable = true) |-- feedback_created_at: string (nullable = true) |-- feedback_id: integer (nullable = true) |-- hotel_id: integer (nullable = true) |-- hub_id: integer (nullable = true) |-- month: integer (nullable = true) |-- no_show_reason: integer (nullable = true) |-- oyo_rooms: integer (nullable = true) |-- selling_amount: integer (nullable = true) |-- shifting: array (nullable = true) ||-- element: struct (containsNull = true) |||-- id: integer (nullable = true) |||-- booking_id: integer (nullable = true) |||-- shifting_status: integer (nullable = true) |||-- shifting_reason: integer (nullable = true) |||-- shifting_metadata: integer (nullable = true) |-- suggest_oyo: integer (nullable = true) |-- tickets: array (nullable = true) ||-- element: struct (containsNull = true) |||-- ticket_source: integer (nullable = true) |||-- ticket_status: string (nullable = true) |||-- ticket_instance_source: integer (nullable = true) |||-- ticket_category: string (nullable = true) |-- updated_at: timestamp (nullable = true) |-- year: integer (nullable = true) |-- zone_id: integer (nullable = true) root |-- booking_id: integer (nullable = true) |-- booking_rooms_room_category_id: integer (nullable = true) |-- booking_rooms_room_id: integer (nullable = true) |-- booking_source: integer (nullable = true) |-- booking_status: integer (nullable = true) |-- cancellation_reason: integer (nullable = true) |-- checkin: string (nullable = true) |-- checkout: string (nullable = true) |-- city_id: integer (nullable = true) |-- cluster_id: integer (nullable = true) |-- company_id: integer (nullable = true) |-- created_at: string (nullable = true) |-- discount: integer (nullable = true) |-- feedback_created_at: string (nullable = true) |-- feedback_id: integer (nullable = true) |-- hotel_id: integer (nullable = true) |-- hub_id: integer (nullable = true) |-- month: integer (nullable = true) |-- no_show_reason: integer (nullable = true) |-- oyo_rooms: integer (nullable = true) |-- selling_amount: integer (nullable = true) |-- shifting: array (nullable = true) ||-- element: struct (containsNull = true) |||-- id: integer (nullable = true) |||-- booking_id: integer (nullable = true) |||-- shifting_status: integer (nullable = true) |||-- shifting_reason: integer (nullable = true) |||-- shifting_metadata: integer (nullable = true) |-- suggest_oyo: integer (nullable = true) |-- tickets: array (nullable = true) ||-- element: struct (containsNull = true) |||-- ticket_source: integer (nullable = true) |||-- ticket_status: string (nullable = true) |||-- ticket_instance_source: integer (nullable = true) |||-- ticket_category: string (nullable = true) |-- updated_at: timestamp (nullable = false) |-- year: integer (nullable = true) |-- zone_id: integer (nullable = true) On Sun, Jun 3, 2018 at 8:05 PM, Alessandro Solimando < alessandro.solima...@gmail.com> wrote: > Hi Pranav, > I don´t have an answer to your issue, but what I generally do in this > cases is to first try to simplify it to a point where it is easier to check > what´s going on, and then adding back ¨pieces¨ one by one until I spot the > error. > > In your case I can suggest to: > > 1) project the dataset to the problematic column only (column 21 from your > log) > 2) use explode function to have one element of the array per line > 3) flatten the struct > > At each step use printSchema() to double check if the types are as you > expect them to be, and if they are the same for both datasets. > > Best regards, > Alessandro > > On 2 June 2018 at 19:48, Pranav Agrawal wrote: > >> can't get around this error when performing union of two datasets >> (ds1.union(ds2)) having complex data type (struct, list), >> >> >> *18/06/02 15:12:00 INFO ApplicationMaster: Final app status: FAILED, >> exitCode: 15, (reason: User class threw exception: >> org.apache.spark.sql.AnalysisException: Union can only be performed on >> tables with the compatible column types. >> array> >> <> >> array> &
[Spark SQL] error in performing dataset union with complex data type (struct, list)
can't get around this error when performing union of two datasets (ds1.union(ds2)) having complex data type (struct, list), *18/06/02 15:12:00 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. array> <> array> at the 21th column of the second table;;* As far as I can tell, they are the same. What am I doing wrong? Any help / workaround appreciated! spark version: 2.2.1 Thanks, Pranav
[Spark SQL] error in performing dataset union with complex data type (struct, list)
can't get around this error when performing union of two datasets having complex data type (struct, list), *18/06/02 15:12:00 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. array> <> array> at the 21th column of the second table;;* As far as I can tell, they are the same. What am I doing wrong? Any help / workaround appreciated! spark version: 2.2.1 Thanks, Pranav