Re: Save to a Partitioned Table using a Derived Column

2016-06-03 Thread Benjamin Kim
Mich,

Yes, it is already partitioned.

In Hive, I can do this:
INSERT OVERWRITE amo_bi_events PARTITION (dt)
SELECT event_type, timestamp, …,
concat(substring(timestamp, 1, 10), ' ', substring(timestamp, 12, 2), 
':00:00') AS dt
FROM amo_raw_events
WHERE to_date(timestamp_iso) BETWEEN ‘2016-06-01’ AND ‘2016-06-02’;

I tried the same equivalent using DataFrames but couldn’t get it to work.

Thanks,
Ben


> On Jun 3, 2016, at 11:06 AM, Mich Talebzadeh  
> wrote:
> 
> OK fine but dt is the column used for partitioning the table.
> 
> This is what I get in Hive itself
> 
> use test;
> set hive.exec.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> drop table if exists amo_bi_events;
> CREATE EXTERNAL TABLE `amo_bi_events`(
>`event_type` string COMMENT '',
>`timestamp` string COMMENT '',
>`event_valid` int COMMENT '',
>`event_subtype` string COMMENT '',
>`user_ip` string COMMENT '',
>`user_id` string COMMENT '',
>`cookie_status` string COMMENT '',
>`profile_status` string COMMENT '',
>`user_status` string COMMENT '',
>`previous_timestamp` string COMMENT '',
>`user_agent` string COMMENT '',
>`referer` string COMMENT '',
>`uri` string COMMENT '',
>`request_elapsed` bigint COMMENT '',
>`browser_languages` string COMMENT '',
>`acamp_id` int COMMENT '',
>`creative_id` int COMMENT '',
>`location_id` int COMMENT '',
>`pcamp_id` int COMMENT '',
>`pdomain_id` int COMMENT '',
>`country` string COMMENT '',
>`region` string COMMENT '',
>`dma` int COMMENT '',
>`city` string COMMENT '',
>`zip` string COMMENT '',
>`isp` string COMMENT '',
>`line_speed` string COMMENT '',
>`gender` string COMMENT '',
>`year_of_birth` int COMMENT '',
>`behaviors_read` string COMMENT '',
>`behaviors_written` string COMMENT '',
>`key_value_pairs` string COMMENT '',
>`acamp_candidates` int COMMENT '',
>`tag_format` string COMMENT '',
>`optimizer_name` string COMMENT '',
>`optimizer_version` string COMMENT '',
>`optimizer_ip` string COMMENT '',
>`pixel_id` int COMMENT '',
>`video_id` string COMMENT '',
>`video_network_id` int COMMENT '',
>`video_time_watched` bigint COMMENT '',
>`video_percentage_watched` int COMMENT '',
>`conversion_valid_sale` int COMMENT '',
>`conversion_sale_amount` float COMMENT '',
>`conversion_commission_amount` float COMMENT '',
>`conversion_step` int COMMENT '',
>`conversion_currency` string COMMENT '',
>`conversion_attribution` int COMMENT '',
>`conversion_offer_id` string COMMENT '',
>`custom_info` string COMMENT '',
>`frequency` int COMMENT '',
>`recency_seconds` int COMMENT '',
>`cost` float COMMENT '',
>`revenue` float COMMENT '',
>`optimizer_acamp_id` int COMMENT '',
>`optimizer_creative_id` int COMMENT '',
>`optimizer_ecpm` float COMMENT '',
>`event_id` string COMMENT '',
>`impression_id` string COMMENT '',
>`diagnostic_data` string COMMENT '',
>`user_profile_mapping_source` string COMMENT '',
>`latitude` float COMMENT '',
>`longitude` float COMMENT '',
>`area_code` int COMMENT '',
>`gmt_offset` string COMMENT '',
>`in_dst` string COMMENT '',
>`proxy_type` string COMMENT '',
>`mobile_carrier` string COMMENT '',
>`pop` string COMMENT '',
>`hostname` string COMMENT '',
>`profile_ttl` string COMMENT '',
>`timestamp_iso` string COMMENT '',
>`reference_id` string COMMENT '',
>`identity_organization` string COMMENT '',
>`identity_method` string COMMENT '',
>`mappable_id` string COMMENT '',
>`profile_expires` string COMMENT '',
>`video_player_iframed` int COMMENT '',
>`video_player_in_view` int COMMENT '',
>`video_player_width` int COMMENT '',
>`video_player_height` int COMMENT '',
>`host_domain` string COMMENT '',
>`browser_type` string COMMENT '',
>`browser_device_cat` string COMMENT '',
>`browser_family` string COMMENT '',
>`browser_name` string COMMENT '',
>`browser_version` string COMMENT '',
>`browser_major_version` string COMMENT '',
>`browser_minor_version` string COMMENT '',
>`os_family` string COMMENT '',
>`os_name` string COMMENT '',
>`os_version` string COMMENT '',
>`os_major_version` string COMMENT '',
>`os_minor_version` string COMMENT '')
>  PARTITIONED BY (`dt` timestamp)
>  STORED AS PARQUET;
> desc formatted amo_bi_events;
> 
> The output in hive is as follows:
> 
> # col_name  data_type   comment
> event_type  string
> timestamp   string
> event_valid int
> event_subtype   string
> user_ip string
> user_id string
> cookie_status   string
> profile_status  string
> user_status string
> previous_timestamp  string
> user_agent  string

Re: Save to a Partitioned Table using a Derived Column

2016-06-03 Thread Mich Talebzadeh
OK fine but dt is the column used for partitioning the table.

This is what I get in Hive itself

use test;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
drop table if exists amo_bi_events;
CREATE EXTERNAL TABLE `amo_bi_events`(
   `event_type` string COMMENT '',
   `timestamp` string COMMENT '',
   `event_valid` int COMMENT '',
   `event_subtype` string COMMENT '',
   `user_ip` string COMMENT '',
   `user_id` string COMMENT '',
   `cookie_status` string COMMENT '',
   `profile_status` string COMMENT '',
   `user_status` string COMMENT '',
   `previous_timestamp` string COMMENT '',
   `user_agent` string COMMENT '',
   `referer` string COMMENT '',
   `uri` string COMMENT '',
   `request_elapsed` bigint COMMENT '',
   `browser_languages` string COMMENT '',
   `acamp_id` int COMMENT '',
   `creative_id` int COMMENT '',
   `location_id` int COMMENT '',
   `pcamp_id` int COMMENT '',
   `pdomain_id` int COMMENT '',
   `country` string COMMENT '',
   `region` string COMMENT '',
   `dma` int COMMENT '',
   `city` string COMMENT '',
   `zip` string COMMENT '',
   `isp` string COMMENT '',
   `line_speed` string COMMENT '',
   `gender` string COMMENT '',
   `year_of_birth` int COMMENT '',
   `behaviors_read` string COMMENT '',
   `behaviors_written` string COMMENT '',
   `key_value_pairs` string COMMENT '',
   `acamp_candidates` int COMMENT '',
   `tag_format` string COMMENT '',
   `optimizer_name` string COMMENT '',
   `optimizer_version` string COMMENT '',
   `optimizer_ip` string COMMENT '',
   `pixel_id` int COMMENT '',
   `video_id` string COMMENT '',
   `video_network_id` int COMMENT '',
   `video_time_watched` bigint COMMENT '',
   `video_percentage_watched` int COMMENT '',
   `conversion_valid_sale` int COMMENT '',
   `conversion_sale_amount` float COMMENT '',
   `conversion_commission_amount` float COMMENT '',
   `conversion_step` int COMMENT '',
   `conversion_currency` string COMMENT '',
   `conversion_attribution` int COMMENT '',
   `conversion_offer_id` string COMMENT '',
   `custom_info` string COMMENT '',
   `frequency` int COMMENT '',
   `recency_seconds` int COMMENT '',
   `cost` float COMMENT '',
   `revenue` float COMMENT '',
   `optimizer_acamp_id` int COMMENT '',
   `optimizer_creative_id` int COMMENT '',
   `optimizer_ecpm` float COMMENT '',
   `event_id` string COMMENT '',
   `impression_id` string COMMENT '',
   `diagnostic_data` string COMMENT '',
   `user_profile_mapping_source` string COMMENT '',
   `latitude` float COMMENT '',
   `longitude` float COMMENT '',
   `area_code` int COMMENT '',
   `gmt_offset` string COMMENT '',
   `in_dst` string COMMENT '',
   `proxy_type` string COMMENT '',
   `mobile_carrier` string COMMENT '',
   `pop` string COMMENT '',
   `hostname` string COMMENT '',
   `profile_ttl` string COMMENT '',
   `timestamp_iso` string COMMENT '',
   `reference_id` string COMMENT '',
   `identity_organization` string COMMENT '',
   `identity_method` string COMMENT '',
   `mappable_id` string COMMENT '',
   `profile_expires` string COMMENT '',
   `video_player_iframed` int COMMENT '',
   `video_player_in_view` int COMMENT '',
   `video_player_width` int COMMENT '',
   `video_player_height` int COMMENT '',
   `host_domain` string COMMENT '',
   `browser_type` string COMMENT '',
   `browser_device_cat` string COMMENT '',
   `browser_family` string COMMENT '',
   `browser_name` string COMMENT '',
   `browser_version` string COMMENT '',
   `browser_major_version` string COMMENT '',
   `browser_minor_version` string COMMENT '',
   `os_family` string COMMENT '',
   `os_name` string COMMENT '',
   `os_version` string COMMENT '',
   `os_major_version` string COMMENT '',
   `os_minor_version` string COMMENT '')
 PARTITIONED BY (`dt` timestamp)
 STORED AS PARQUET;
desc formatted amo_bi_events;

The output in hive is as follows:

# col_name  data_type   comment
event_type  string
timestamp   string
event_valid int
event_subtype   string
user_ip string
user_id string
cookie_status   string
profile_status  string
user_status string
previous_timestamp  string
user_agent  string
referer string
uri string
request_elapsed bigint
browser_languages   string
acamp_idint
creative_id int
location_id int
pcamp_idint
pdomain_id  int
country string
region  string
dma int
citystring
zip string
isp string
line_speed  string
gender  string
year_of_birth   int
behaviors_read  string
behaviors_written   string
key_value_pairs string
acamp_candidatesint
tag_format  string
optimizer_name  string
optimizer_version   

Re: Save to a Partitioned Table using a Derived Column

2016-06-03 Thread Benjamin Kim
Mich,

I am using .withColumn to add another column “dt” that is a reformatted version 
of an existing column “timestamp”. The partitioned by column is “dt”.

We are using Spark 1.6.0 in CDH 5.7.0.

Thanks,
Ben

> On Jun 3, 2016, at 10:33 AM, Mich Talebzadeh  
> wrote:
> 
> what version of spark are you using
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
>  
> http://talebzadehmich.wordpress.com 
>  
> 
> On 3 June 2016 at 17:51, Mich Talebzadeh  > wrote:
> ok what is the new column is called? you are basically adding a new column to 
> an already existing table
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
>  
> http://talebzadehmich.wordpress.com 
>  
> 
> On 3 June 2016 at 17:04, Benjamin Kim  > wrote:
> The table already exists.
> 
>  CREATE EXTERNAL TABLE `amo_bi_events`(   
>`event_type` string COMMENT '',
>   
>`timestamp` string COMMENT '', 
>   
>`event_valid` int COMMENT '',  
>   
>`event_subtype` string COMMENT '', 
>   
>`user_ip` string COMMENT '',   
>   
>`user_id` string COMMENT '',   
>   
>`cookie_status` string COMMENT '', 
>   
>`profile_status` string COMMENT '',
>   
>`user_status` string COMMENT '',   
>   
>`previous_timestamp` string COMMENT '',
>   
>`user_agent` string COMMENT '',
>   
>`referer` string COMMENT '',   
>   
>`uri` string COMMENT '',   
>   
>`request_elapsed` bigint COMMENT '',   
>   
>`browser_languages` string COMMENT '', 
>   
>`acamp_id` int COMMENT '', 
>   
>`creative_id` int COMMENT '',  
>   
>`location_id` int COMMENT '',  
>   
>`pcamp_id` int COMMENT '', 
>   
>`pdomain_id` int COMMENT '',   
>   
>`country` string COMMENT '',   
>   
>`region` string COMMENT '',
>   
>`dma` int COMMENT '',  
>   
>`city` string COMMENT '',  
>   
>`zip` string COMMENT '',   
>   
>`isp` string COMMENT '',   
>   
>`line_speed` string COMMENT '',
>   
>`gender` string COMMENT '',
>   
>`year_of_birth` int COMMENT '',
>   
>`behaviors_read` string COMMENT '',
>   
>`behaviors_written` string COMMENT '', 
>   
>`key_value_pairs` string COMMENT '',   
>   
>`acamp_candidates` int COMMENT '', 
>   
>`tag_format` string COMMENT '',
>   
>`optimizer_name` string COMMENT '',
>   
>`optimizer_version` string COMMENT '', 
>   
>`optimizer_ip` string COMMENT '',  
>   
>`pixel_id` int COMMENT '', 
>   
>`video_id` string COMMENT '',  
>   
>`video_network_id` int COMMENT '', 
>   
>

Re: Save to a Partitioned Table using a Derived Column

2016-06-03 Thread Mich Talebzadeh
what version of spark are you using

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 3 June 2016 at 17:51, Mich Talebzadeh  wrote:

> ok what is the new column is called? you are basically adding a new column
> to an already existing table
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 3 June 2016 at 17:04, Benjamin Kim  wrote:
>
>> The table already exists.
>>
>>  CREATE EXTERNAL TABLE `amo_bi_events`(
>>`event_type` string COMMENT '',
>>
>>`timestamp` string COMMENT '',
>>
>>`event_valid` int COMMENT '',
>>
>>`event_subtype` string COMMENT '',
>>
>>`user_ip` string COMMENT '',
>>
>>`user_id` string COMMENT '',
>>
>>`cookie_status` string COMMENT '',
>>
>>`profile_status` string COMMENT '',
>>
>>`user_status` string COMMENT '',
>>
>>`previous_timestamp` string COMMENT '',
>>
>>`user_agent` string COMMENT '',
>>
>>`referer` string COMMENT '',
>>
>>`uri` string COMMENT '',
>>
>>`request_elapsed` bigint COMMENT '',
>>
>>`browser_languages` string COMMENT '',
>>
>>`acamp_id` int COMMENT '',
>>
>>`creative_id` int COMMENT '',
>>
>>`location_id` int COMMENT '',
>>
>>`pcamp_id` int COMMENT '',
>>
>>`pdomain_id` int COMMENT '',
>>
>>`country` string COMMENT '',
>>
>>`region` string COMMENT '',
>>
>>`dma` int COMMENT '',
>>
>>`city` string COMMENT '',
>>
>>`zip` string COMMENT '',
>>
>>`isp` string COMMENT '',
>>
>>`line_speed` string COMMENT '',
>>
>>`gender` string COMMENT '',
>>
>>`year_of_birth` int COMMENT '',
>>
>>`behaviors_read` string COMMENT '',
>>
>>`behaviors_written` string COMMENT '',
>>
>>`key_value_pairs` string COMMENT '',
>>
>>`acamp_candidates` int COMMENT '',
>>
>>`tag_format` string COMMENT '',
>>
>>`optimizer_name` string COMMENT '',
>>
>>`optimizer_version` string COMMENT '',
>>
>>`optimizer_ip` string COMMENT '',
>>
>>`pixel_id` int COMMENT '',
>>
>>`video_id` string COMMENT '',
>>
>>`video_network_id` int COMMENT '',
>>
>>`video_time_watched` bigint COMMENT '',
>>
>>`video_percentage_watched` int COMMENT '',
>>
>>`conversion_valid_sale` int COMMENT '',
>>
>>`conversion_sale_amount` float COMMENT '',
>>
>>`conversion_commission_amount` float COMMENT '',
>>
>>`conversion_step` int COMMENT '',
>>
>>`conversion_currency` string COMMENT '',
>>
>>`conversion_attribution` int COMMENT '',
>>
>>`conversion_offer_id` string COMMENT '',
>>
>>`custom_info` string COMMENT '',
>>
>>`frequency` int COMMENT '',
>>
>>`recency_seconds` int COMMENT '',
>>
>>`cost` float COMMENT '',
>>
>>`revenue` float COMMENT '',
>>
>>`optimizer_acamp_id` int COMMENT '',
>>
>>`optimizer_creative_id` int COMMENT '',
>>
>>`optimizer_ecpm` float COMMENT '',
>>
>>`event_id` string COMMENT '',
>>
>>`impression_id` string COMMENT '',
>>
>>`diagnostic_data` string COMMENT '',
>>
>>`user_profile_mapping_source` string COMMENT '',
>>
>>`latitude` float COMMENT '',
>>
>>`longitude` float COMMENT '',
>>
>>`area_code` int COMMENT '',
>>
>>`gmt_offset` string COMMENT '',
>>
>>`in_dst` string COMMENT '',
>>
>>`proxy_type` string COMMENT '',
>>
>>`mobile_carrier` string COMMENT '',
>>
>>`pop` string COMMENT '',
>>
>>`hostname` string COMMENT '',
>>
>>`profile_ttl` string COMMENT '',
>>
>>`timestamp_iso` string COMMENT '',
>>
>>`reference_id` string COMMENT '',
>>
>>`identity_organization` string COMMENT '',
>>
>>`identity_method` string COMMENT '',
>>
>>`mappable_id` string COMMENT '',
>>
>>`profile_expires` string COMMENT '',
>>
>>`video_player_iframed` int COMMENT '',
>>
>>`video_player_in_view` int COMMENT '',
>>
>>`video_player_width` int COMMENT '',
>>
>>`video_player_height` int COMMENT '',
>>
>>`host_domain` string COMMENT '',
>>
>>`browser_type` string COMMENT '',
>>
>>`browser_device_cat` string COMMENT '',
>>
>>`browser_family` string COMMENT '',
>>
>>`browser_name` string COMMENT '',
>>
>>`browser_version` string COMMENT '',
>>
>>`browser_major_version` string COMMENT '',
>>
>>`browser_minor_version` string COMMENT '',
>>
>>`os_family` string COMMENT '',
>>
>>`os_name` string COMMENT '',
>>
>>`os_version` string COMMENT '',
>>
>>`os_major_version` string COMMENT '',
>>
>>`os_minor_version` string COMMENT '')
>>
>>  PARTITIONED BY (`dt` timestamp)
>>
>>  STORED AS PARQUET;
>>
>> Thanks,
>> Ben
>>
>>

Re: Save to a Partitioned Table using a Derived Column

2016-06-03 Thread Mich Talebzadeh
ok what is the new column is called? you are basically adding a new column
to an already existing table



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 3 June 2016 at 17:04, Benjamin Kim  wrote:

> The table already exists.
>
>  CREATE EXTERNAL TABLE `amo_bi_events`(
>`event_type` string COMMENT '',
>
>`timestamp` string COMMENT '',
>
>`event_valid` int COMMENT '',
>
>`event_subtype` string COMMENT '',
>
>`user_ip` string COMMENT '',
>
>`user_id` string COMMENT '',
>
>`cookie_status` string COMMENT '',
>
>`profile_status` string COMMENT '',
>
>`user_status` string COMMENT '',
>
>`previous_timestamp` string COMMENT '',
>
>`user_agent` string COMMENT '',
>
>`referer` string COMMENT '',
>
>`uri` string COMMENT '',
>
>`request_elapsed` bigint COMMENT '',
>
>`browser_languages` string COMMENT '',
>
>`acamp_id` int COMMENT '',
>
>`creative_id` int COMMENT '',
>
>`location_id` int COMMENT '',
>
>`pcamp_id` int COMMENT '',
>
>`pdomain_id` int COMMENT '',
>
>`country` string COMMENT '',
>
>`region` string COMMENT '',
>
>`dma` int COMMENT '',
>
>`city` string COMMENT '',
>
>`zip` string COMMENT '',
>
>`isp` string COMMENT '',
>
>`line_speed` string COMMENT '',
>
>`gender` string COMMENT '',
>
>`year_of_birth` int COMMENT '',
>
>`behaviors_read` string COMMENT '',
>
>`behaviors_written` string COMMENT '',
>
>`key_value_pairs` string COMMENT '',
>
>`acamp_candidates` int COMMENT '',
>
>`tag_format` string COMMENT '',
>
>`optimizer_name` string COMMENT '',
>
>`optimizer_version` string COMMENT '',
>
>`optimizer_ip` string COMMENT '',
>
>`pixel_id` int COMMENT '',
>
>`video_id` string COMMENT '',
>
>`video_network_id` int COMMENT '',
>
>`video_time_watched` bigint COMMENT '',
>
>`video_percentage_watched` int COMMENT '',
>
>`conversion_valid_sale` int COMMENT '',
>
>`conversion_sale_amount` float COMMENT '',
>
>`conversion_commission_amount` float COMMENT '',
>
>`conversion_step` int COMMENT '',
>
>`conversion_currency` string COMMENT '',
>
>`conversion_attribution` int COMMENT '',
>
>`conversion_offer_id` string COMMENT '',
>
>`custom_info` string COMMENT '',
>
>`frequency` int COMMENT '',
>
>`recency_seconds` int COMMENT '',
>
>`cost` float COMMENT '',
>
>`revenue` float COMMENT '',
>
>`optimizer_acamp_id` int COMMENT '',
>
>`optimizer_creative_id` int COMMENT '',
>
>`optimizer_ecpm` float COMMENT '',
>
>`event_id` string COMMENT '',
>
>`impression_id` string COMMENT '',
>
>`diagnostic_data` string COMMENT '',
>
>`user_profile_mapping_source` string COMMENT '',
>
>`latitude` float COMMENT '',
>
>`longitude` float COMMENT '',
>
>`area_code` int COMMENT '',
>
>`gmt_offset` string COMMENT '',
>
>`in_dst` string COMMENT '',
>
>`proxy_type` string COMMENT '',
>
>`mobile_carrier` string COMMENT '',
>
>`pop` string COMMENT '',
>
>`hostname` string COMMENT '',
>
>`profile_ttl` string COMMENT '',
>
>`timestamp_iso` string COMMENT '',
>
>`reference_id` string COMMENT '',
>
>`identity_organization` string COMMENT '',
>
>`identity_method` string COMMENT '',
>
>`mappable_id` string COMMENT '',
>
>`profile_expires` string COMMENT '',
>
>`video_player_iframed` int COMMENT '',
>
>`video_player_in_view` int COMMENT '',
>
>`video_player_width` int COMMENT '',
>
>`video_player_height` int COMMENT '',
>
>`host_domain` string COMMENT '',
>
>`browser_type` string COMMENT '',
>
>`browser_device_cat` string COMMENT '',
>
>`browser_family` string COMMENT '',
>
>`browser_name` string COMMENT '',
>
>`browser_version` string COMMENT '',
>
>`browser_major_version` string COMMENT '',
>
>`browser_minor_version` string COMMENT '',
>
>`os_family` string COMMENT '',
>
>`os_name` string COMMENT '',
>
>`os_version` string COMMENT '',
>
>`os_major_version` string COMMENT '',
>
>`os_minor_version` string COMMENT '')
>
>  PARTITIONED BY (`dt` timestamp)
>
>  STORED AS PARQUET;
>
> Thanks,
> Ben
>
>
> On Jun 3, 2016, at 8:47 AM, Mich Talebzadeh 
> wrote:
>
> hang on are you saving this as a new table?
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 3 June 2016 at 14:13, Benjamin Kim  wrote:
>
>> Does anyone know how to save data in a DataFrame to a table partitioned
>> using an existing column reformatted into a 

Re: Save to a Partitioned Table using a Derived Column

2016-06-03 Thread Benjamin Kim
The table already exists.

 CREATE EXTERNAL TABLE `amo_bi_events`(   
   `event_type` string COMMENT '',  

   `timestamp` string COMMENT '',   

   `event_valid` int COMMENT '',

   `event_subtype` string COMMENT '',   

   `user_ip` string COMMENT '', 

   `user_id` string COMMENT '', 

   `cookie_status` string COMMENT '',   

   `profile_status` string COMMENT '',  

   `user_status` string COMMENT '', 

   `previous_timestamp` string COMMENT '',  

   `user_agent` string COMMENT '',  

   `referer` string COMMENT '', 

   `uri` string COMMENT '', 

   `request_elapsed` bigint COMMENT '', 

   `browser_languages` string COMMENT '',   

   `acamp_id` int COMMENT '',   

   `creative_id` int COMMENT '',

   `location_id` int COMMENT '',

   `pcamp_id` int COMMENT '',   

   `pdomain_id` int COMMENT '', 

   `country` string COMMENT '', 

   `region` string COMMENT '',  

   `dma` int COMMENT '',

   `city` string COMMENT '',

   `zip` string COMMENT '', 

   `isp` string COMMENT '', 

   `line_speed` string COMMENT '',  

   `gender` string COMMENT '',  

   `year_of_birth` int COMMENT '',  

   `behaviors_read` string COMMENT '',  

   `behaviors_written` string COMMENT '',   

   `key_value_pairs` string COMMENT '', 

   `acamp_candidates` int COMMENT '',   

   `tag_format` string COMMENT '',  

   `optimizer_name` string COMMENT '',  

   `optimizer_version` string COMMENT '',   

   `optimizer_ip` string COMMENT '',

   `pixel_id` int COMMENT '',   

   `video_id` string COMMENT '',

   `video_network_id` int COMMENT '',   

   `video_time_watched` bigint COMMENT '',  

   `video_percentage_watched` int COMMENT '',   

   `conversion_valid_sale` int COMMENT '',  

   `conversion_sale_amount` float COMMENT '',   

   `conversion_commission_amount` float COMMENT '', 

   `conversion_step` int COMMENT '',

   `conversion_currency` string COMMENT '', 

   `conversion_attribution` int COMMENT '', 

   `conversion_offer_id` string COMMENT '', 

   `custom_info` string COMMENT '', 

   `frequency` int COMMENT '',  

   `recency_seconds` int COMMENT '',

   `cost` float COMMENT '', 

   `revenue` float COMMENT '',  

   `optimizer_acamp_id` int COMMENT '', 

   `optimizer_creative_id` int COMMENT '',  

   `optimizer_ecpm` float COMMENT '',   

Re: Save to a Partitioned Table using a Derived Column

2016-06-03 Thread Mich Talebzadeh
hang on are you saving this as a new table?

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 3 June 2016 at 14:13, Benjamin Kim  wrote:

> Does anyone know how to save data in a DataFrame to a table partitioned
> using an existing column reformatted into a derived column?
>
> val partitionedDf = df.withColumn("dt",
> concat(substring($"timestamp", 1, 10), lit(" "), substring($"timestamp",
> 12, 2), lit(":00")))
>
> sqlContext.setConf("hive.exec.dynamic.partition", "true")
> sqlContext.setConf("hive.exec.dynamic.partition.mode",
> "nonstrict")
> partitionedDf.write
> .mode(SaveMode.Append)
> .partitionBy("dt")
> .saveAsTable("ds.amo_bi_events")
>
> I am getting an ArrayOutOfBounds error. There are 83 columns in the
> destination table. But after adding the derived column, then I get an 84
> error. I assumed that the column used for the partition would not be
> counted.
>
> Can someone please help.
>
> Thanks,
> Ben
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Save to a Partitioned Table using a Derived Column

2016-06-03 Thread Benjamin Kim
Does anyone know how to save data in a DataFrame to a table partitioned using 
an existing column reformatted into a derived column?

val partitionedDf = df.withColumn("dt", 
concat(substring($"timestamp", 1, 10), lit(" "), substring($"timestamp", 12, 
2), lit(":00")))

sqlContext.setConf("hive.exec.dynamic.partition", "true")
sqlContext.setConf("hive.exec.dynamic.partition.mode", 
"nonstrict")
partitionedDf.write
.mode(SaveMode.Append)
.partitionBy("dt")
.saveAsTable("ds.amo_bi_events")

I am getting an ArrayOutOfBounds error. There are 83 columns in the destination 
table. But after adding the derived column, then I get an 84 error. I assumed 
that the column used for the partition would not be counted.

Can someone please help.

Thanks,
Ben
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org