Mich, I am using .withColumn to add another column “dt” that is a reformatted version of an existing column “timestamp”. The partitioned by column is “dt”.
We are using Spark 1.6.0 in CDH 5.7.0. Thanks, Ben > On Jun 3, 2016, at 10:33 AM, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > what version of spark are you using > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > > On 3 June 2016 at 17:51, Mich Talebzadeh <mich.talebza...@gmail.com > <mailto:mich.talebza...@gmail.com>> wrote: > ok what is the new column is called? you are basically adding a new column to > an already existing table > > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > > On 3 June 2016 at 17:04, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > The table already exists. > > CREATE EXTERNAL TABLE `amo_bi_events`( > `event_type` string COMMENT '', > > `timestamp` string COMMENT '', > > `event_valid` int COMMENT '', > > `event_subtype` string COMMENT '', > > `user_ip` string COMMENT '', > > `user_id` string COMMENT '', > > `cookie_status` string COMMENT '', > > `profile_status` string COMMENT '', > > `user_status` string COMMENT '', > > `previous_timestamp` string COMMENT '', > > `user_agent` string COMMENT '', > > `referer` string COMMENT '', > > `uri` string COMMENT '', > > `request_elapsed` bigint COMMENT '', > > `browser_languages` string COMMENT '', > > `acamp_id` int COMMENT '', > > `creative_id` int COMMENT '', > > `location_id` int COMMENT '', > > `pcamp_id` int COMMENT '', > > `pdomain_id` int COMMENT '', > > `country` string COMMENT '', > > `region` string COMMENT '', > > `dma` int COMMENT '', > > `city` string COMMENT '', > > `zip` string COMMENT '', > > `isp` string COMMENT '', > > `line_speed` string COMMENT '', > > `gender` string COMMENT '', > > `year_of_birth` int COMMENT '', > > `behaviors_read` string COMMENT '', > > `behaviors_written` string COMMENT '', > > `key_value_pairs` string COMMENT '', > > `acamp_candidates` int COMMENT '', > > `tag_format` string COMMENT '', > > `optimizer_name` string COMMENT '', > > `optimizer_version` string COMMENT '', > > `optimizer_ip` string COMMENT '', > > `pixel_id` int COMMENT '', > > `video_id` string COMMENT '', > > `video_network_id` int COMMENT '', > > `video_time_watched` bigint COMMENT '', > > `video_percentage_watched` int COMMENT '', > > `conversion_valid_sale` int COMMENT '', > > `conversion_sale_amount` float COMMENT '', > > `conversion_commission_amount` float COMMENT '', > > `conversion_step` int COMMENT '', > > `conversion_currency` string COMMENT '', > > `conversion_attribution` int COMMENT '', > > `conversion_offer_id` string COMMENT '', > > `custom_info` string COMMENT '', > > `frequency` int COMMENT '', > > `recency_seconds` int COMMENT '', > > `cost` float COMMENT '', > > `revenue` float COMMENT '', > > `optimizer_acamp_id` int COMMENT '', > > `optimizer_creative_id` int COMMENT '', > > `optimizer_ecpm` float COMMENT '', > > `event_id` string COMMENT '', > > `impression_id` string COMMENT '', > > `diagnostic_data` string COMMENT '', > > `user_profile_mapping_source` string COMMENT '', > > `latitude` float COMMENT '', > > `longitude` float COMMENT '', > > `area_code` int COMMENT '', > > `gmt_offset` string COMMENT '', > > `in_dst` string COMMENT '', > > `proxy_type` string COMMENT '', > > `mobile_carrier` string COMMENT '', > > `pop` string COMMENT '', > > `hostname` string COMMENT '', > > `profile_ttl` string COMMENT '', > > `timestamp_iso` string COMMENT '', > > `reference_id` string COMMENT '', > > `identity_organization` string COMMENT '', > > `identity_method` string COMMENT '', > > `mappable_id` string COMMENT '', > > `profile_expires` string COMMENT '', > > `video_player_iframed` int COMMENT '', > > `video_player_in_view` int COMMENT '', > > `video_player_width` int COMMENT '', > > `video_player_height` int COMMENT '', > > `host_domain` string COMMENT '', > > `browser_type` string COMMENT '', > > `browser_device_cat` string COMMENT '', > > `browser_family` string COMMENT '', > > `browser_name` string COMMENT '', > > `browser_version` string COMMENT '', > > `browser_major_version` string COMMENT '', > > `browser_minor_version` string COMMENT '', > > `os_family` string COMMENT '', > > `os_name` string COMMENT '', > > `os_version` string COMMENT '', > > `os_major_version` string COMMENT '', > > `os_minor_version` string COMMENT '') > > PARTITIONED BY (`dt` timestamp) > > STORED AS PARQUET; > > Thanks, > Ben > > >> On Jun 3, 2016, at 8:47 AM, Mich Talebzadeh <mich.talebza...@gmail.com >> <mailto:mich.talebza...@gmail.com>> wrote: >> >> hang on are you saving this as a new table? >> >> Dr Mich Talebzadeh >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> >> >> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> >> >> >> On 3 June 2016 at 14:13, Benjamin Kim <bbuil...@gmail.com >> <mailto:bbuil...@gmail.com>> wrote: >> Does anyone know how to save data in a DataFrame to a table partitioned >> using an existing column reformatted into a derived column? >> >> val partitionedDf = df.withColumn("dt", >> concat(substring($"timestamp", 1, 10), lit(" "), substring($"timestamp", 12, >> 2), lit(":00"))) >> >> sqlContext.setConf("hive.exec.dynamic.partition", "true") >> sqlContext.setConf("hive.exec.dynamic.partition.mode", >> "nonstrict") >> partitionedDf.write >> .mode(SaveMode.Append) >> .partitionBy("dt") >> .saveAsTable("ds.amo_bi_events") >> >> I am getting an ArrayOutOfBounds error. There are 83 columns in the >> destination table. But after adding the derived column, then I get an 84 >> error. I assumed that the column used for the partition would not be counted. >> >> Can someone please help. >> >> Thanks, >> Ben >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> <mailto:user-unsubscr...@spark.apache.org> >> For additional commands, e-mail: user-h...@spark.apache.org >> <mailto:user-h...@spark.apache.org> >> >> > > >