Mich,

Yes, it is already partitioned.

In Hive, I can do this:
INSERT OVERWRITE amo_bi_events PARTITION (dt)
SELECT event_type, timestamp, …,
        concat(substring(timestamp, 1, 10), ' ', substring(timestamp, 12, 2), 
':00:00') AS dt
FROM amo_raw_events
WHERE to_date(timestamp_iso) BETWEEN ‘2016-06-01’ AND ‘2016-06-02’;

I tried the same equivalent using DataFrames but couldn’t get it to work.

Thanks,
Ben


> On Jun 3, 2016, at 11:06 AM, Mich Talebzadeh <mich.talebza...@gmail.com> 
> wrote:
> 
> OK fine but dt is the column used for partitioning the table.
> 
> This is what I get in Hive itself
> 
> use test;
> set hive.exec.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> drop table if exists amo_bi_events;
> CREATE EXTERNAL TABLE `amo_bi_events`(
>    `event_type` string COMMENT '',
>    `timestamp` string COMMENT '',
>    `event_valid` int COMMENT '',
>    `event_subtype` string COMMENT '',
>    `user_ip` string COMMENT '',
>    `user_id` string COMMENT '',
>    `cookie_status` string COMMENT '',
>    `profile_status` string COMMENT '',
>    `user_status` string COMMENT '',
>    `previous_timestamp` string COMMENT '',
>    `user_agent` string COMMENT '',
>    `referer` string COMMENT '',
>    `uri` string COMMENT '',
>    `request_elapsed` bigint COMMENT '',
>    `browser_languages` string COMMENT '',
>    `acamp_id` int COMMENT '',
>    `creative_id` int COMMENT '',
>    `location_id` int COMMENT '',
>    `pcamp_id` int COMMENT '',
>    `pdomain_id` int COMMENT '',
>    `country` string COMMENT '',
>    `region` string COMMENT '',
>    `dma` int COMMENT '',
>    `city` string COMMENT '',
>    `zip` string COMMENT '',
>    `isp` string COMMENT '',
>    `line_speed` string COMMENT '',
>    `gender` string COMMENT '',
>    `year_of_birth` int COMMENT '',
>    `behaviors_read` string COMMENT '',
>    `behaviors_written` string COMMENT '',
>    `key_value_pairs` string COMMENT '',
>    `acamp_candidates` int COMMENT '',
>    `tag_format` string COMMENT '',
>    `optimizer_name` string COMMENT '',
>    `optimizer_version` string COMMENT '',
>    `optimizer_ip` string COMMENT '',
>    `pixel_id` int COMMENT '',
>    `video_id` string COMMENT '',
>    `video_network_id` int COMMENT '',
>    `video_time_watched` bigint COMMENT '',
>    `video_percentage_watched` int COMMENT '',
>    `conversion_valid_sale` int COMMENT '',
>    `conversion_sale_amount` float COMMENT '',
>    `conversion_commission_amount` float COMMENT '',
>    `conversion_step` int COMMENT '',
>    `conversion_currency` string COMMENT '',
>    `conversion_attribution` int COMMENT '',
>    `conversion_offer_id` string COMMENT '',
>    `custom_info` string COMMENT '',
>    `frequency` int COMMENT '',
>    `recency_seconds` int COMMENT '',
>    `cost` float COMMENT '',
>    `revenue` float COMMENT '',
>    `optimizer_acamp_id` int COMMENT '',
>    `optimizer_creative_id` int COMMENT '',
>    `optimizer_ecpm` float COMMENT '',
>    `event_id` string COMMENT '',
>    `impression_id` string COMMENT '',
>    `diagnostic_data` string COMMENT '',
>    `user_profile_mapping_source` string COMMENT '',
>    `latitude` float COMMENT '',
>    `longitude` float COMMENT '',
>    `area_code` int COMMENT '',
>    `gmt_offset` string COMMENT '',
>    `in_dst` string COMMENT '',
>    `proxy_type` string COMMENT '',
>    `mobile_carrier` string COMMENT '',
>    `pop` string COMMENT '',
>    `hostname` string COMMENT '',
>    `profile_ttl` string COMMENT '',
>    `timestamp_iso` string COMMENT '',
>    `reference_id` string COMMENT '',
>    `identity_organization` string COMMENT '',
>    `identity_method` string COMMENT '',
>    `mappable_id` string COMMENT '',
>    `profile_expires` string COMMENT '',
>    `video_player_iframed` int COMMENT '',
>    `video_player_in_view` int COMMENT '',
>    `video_player_width` int COMMENT '',
>    `video_player_height` int COMMENT '',
>    `host_domain` string COMMENT '',
>    `browser_type` string COMMENT '',
>    `browser_device_cat` string COMMENT '',
>    `browser_family` string COMMENT '',
>    `browser_name` string COMMENT '',
>    `browser_version` string COMMENT '',
>    `browser_major_version` string COMMENT '',
>    `browser_minor_version` string COMMENT '',
>    `os_family` string COMMENT '',
>    `os_name` string COMMENT '',
>    `os_version` string COMMENT '',
>    `os_major_version` string COMMENT '',
>    `os_minor_version` string COMMENT '')
>  PARTITIONED BY (`dt` timestamp)
>  STORED AS PARQUET;
> desc formatted amo_bi_events;
> 
> The output in hive is as follows:
> 
> # col_name              data_type               comment
> event_type              string
> timestamp               string
> event_valid             int
> event_subtype           string
> user_ip                 string
> user_id                 string
> cookie_status           string
> profile_status          string
> user_status             string
> previous_timestamp      string
> user_agent              string
> referer                 string
> uri                     string
> request_elapsed         bigint
> browser_languages       string
> acamp_id                int
> creative_id             int
> location_id             int
> pcamp_id                int
> pdomain_id              int
> country                 string
> region                  string
> dma                     int
> city                    string
> zip                     string
> isp                     string
> line_speed              string
> gender                  string
> year_of_birth           int
> behaviors_read          string
> behaviors_written       string
> key_value_pairs         string
> acamp_candidates        int
> tag_format              string
> optimizer_name          string
> optimizer_version       string
> optimizer_ip            string
> pixel_id                int
> video_id                string
> video_network_id        int
> video_time_watched      bigint
> video_percentage_watched        int
> conversion_valid_sale   int
> conversion_sale_amount  float
> conversion_commission_amount    float
> conversion_step         int
> conversion_currency     string
> conversion_attribution  int
> conversion_offer_id     string
> custom_info             string
> frequency               int
> recency_seconds         int
> cost                    float
> revenue                 float
> optimizer_acamp_id      int
> optimizer_creative_id   int
> optimizer_ecpm          float
> event_id                string
> impression_id           string
> diagnostic_data         string
> user_profile_mapping_source     string
> latitude                float
> longitude               float
> area_code               int
> gmt_offset              string
> in_dst                  string
> proxy_type              string
> mobile_carrier          string
> pop                     string
> hostname                string
> profile_ttl             string
> timestamp_iso           string
> reference_id            string
> identity_organization   string
> identity_method         string
> mappable_id             string
> profile_expires         string
> video_player_iframed    int
> video_player_in_view    int
> video_player_width      int
> video_player_height     int
> host_domain             string
> browser_type            string
> browser_device_cat      string
> browser_family          string
> browser_name            string
> browser_version         string
> browser_major_version   string
> browser_minor_version   string
> os_family               string
> os_name                 string
> os_version              string
> os_major_version        string
> os_minor_version        string
> # Partition Information
> # col_name              data_type               comment
> dt                      timestamp
> # Detailed Table Information
> Database:               test
> Owner:                  hduser
> CreateTime:             Fri Jun 03 19:03:20 BST 2016
> LastAccessTime:         UNKNOWN
> Retention:              0
> Location:               
> hdfs://rhes564:9000/user/hive/warehouse/test.db/amo_bi_events
> Table Type:             EXTERNAL_TABLE
> Table Parameters:
>         EXTERNAL                TRUE
>         transient_lastDdlTime   1464977000
> # Storage Information
> SerDe Library:          
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat:            
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat:           
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> Compressed:             No
> Num Buckets:            -1
> Bucket Columns:         []
> Sort Columns:           []
> Storage Desc Params:
>         serialization.format    1
> Time taken: 0.397 seconds, Fetched: 124 row(s)
> 
> So effectively that table is partitioned by dt in notime
> 
> Now what I don't understand whether that table is already partitioned as you 
> said the table already exists!
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 3 June 2016 at 18:48, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Mich,
> 
> I am using .withColumn to add another column “dt” that is a reformatted 
> version of an existing column “timestamp”. The partitioned by column is “dt”.
> 
> We are using Spark 1.6.0 in CDH 5.7.0.
> 
> Thanks,
> Ben
> 
>> On Jun 3, 2016, at 10:33 AM, Mich Talebzadeh <mich.talebza...@gmail.com 
>> <mailto:mich.talebza...@gmail.com>> wrote:
>> 
>> what version of spark are you using
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> On 3 June 2016 at 17:51, Mich Talebzadeh <mich.talebza...@gmail.com 
>> <mailto:mich.talebza...@gmail.com>> wrote:
>> ok what is the new column is called? you are basically adding a new column 
>> to an already existing table
>> 
>> 
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> On 3 June 2016 at 17:04, Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> The table already exists.
>> 
>>  CREATE EXTERNAL TABLE `amo_bi_events`(                               
>>    `event_type` string COMMENT '',                                           
>>        
>>    `timestamp` string COMMENT '',                                            
>>        
>>    `event_valid` int COMMENT '',                                             
>>        
>>    `event_subtype` string COMMENT '',                                        
>>        
>>    `user_ip` string COMMENT '',                                              
>>        
>>    `user_id` string COMMENT '',                                              
>>        
>>    `cookie_status` string COMMENT '',                                        
>>        
>>    `profile_status` string COMMENT '',                                       
>>        
>>    `user_status` string COMMENT '',                                          
>>        
>>    `previous_timestamp` string COMMENT '',                                   
>>        
>>    `user_agent` string COMMENT '',                                           
>>        
>>    `referer` string COMMENT '',                                              
>>        
>>    `uri` string COMMENT '',                                                  
>>        
>>    `request_elapsed` bigint COMMENT '',                                      
>>        
>>    `browser_languages` string COMMENT '',                                    
>>        
>>    `acamp_id` int COMMENT '',                                                
>>        
>>    `creative_id` int COMMENT '',                                             
>>        
>>    `location_id` int COMMENT '',                                             
>>        
>>    `pcamp_id` int COMMENT '',                                                
>>        
>>    `pdomain_id` int COMMENT '',                                              
>>        
>>    `country` string COMMENT '',                                              
>>        
>>    `region` string COMMENT '',                                               
>>        
>>    `dma` int COMMENT '',                                                     
>>        
>>    `city` string COMMENT '',                                                 
>>        
>>    `zip` string COMMENT '',                                                  
>>        
>>    `isp` string COMMENT '',                                                  
>>        
>>    `line_speed` string COMMENT '',                                           
>>        
>>    `gender` string COMMENT '',                                               
>>        
>>    `year_of_birth` int COMMENT '',                                           
>>        
>>    `behaviors_read` string COMMENT '',                                       
>>        
>>    `behaviors_written` string COMMENT '',                                    
>>        
>>    `key_value_pairs` string COMMENT '',                                      
>>        
>>    `acamp_candidates` int COMMENT '',                                        
>>        
>>    `tag_format` string COMMENT '',                                           
>>        
>>    `optimizer_name` string COMMENT '',                                       
>>        
>>    `optimizer_version` string COMMENT '',                                    
>>        
>>    `optimizer_ip` string COMMENT '',                                         
>>        
>>    `pixel_id` int COMMENT '',                                                
>>        
>>    `video_id` string COMMENT '',                                             
>>        
>>    `video_network_id` int COMMENT '',                                        
>>        
>>    `video_time_watched` bigint COMMENT '',                                   
>>        
>>    `video_percentage_watched` int COMMENT '',                                
>>        
>>    `conversion_valid_sale` int COMMENT '',                                   
>>        
>>    `conversion_sale_amount` float COMMENT '',                                
>>        
>>    `conversion_commission_amount` float COMMENT '',                          
>>        
>>    `conversion_step` int COMMENT '',                                         
>>        
>>    `conversion_currency` string COMMENT '',                                  
>>        
>>    `conversion_attribution` int COMMENT '',                                  
>>        
>>    `conversion_offer_id` string COMMENT '',                                  
>>        
>>    `custom_info` string COMMENT '',                                          
>>        
>>    `frequency` int COMMENT '',                                               
>>        
>>    `recency_seconds` int COMMENT '',                                         
>>        
>>    `cost` float COMMENT '',                                                  
>>        
>>    `revenue` float COMMENT '',                                               
>>        
>>    `optimizer_acamp_id` int COMMENT '',                                      
>>        
>>    `optimizer_creative_id` int COMMENT '',                                   
>>        
>>    `optimizer_ecpm` float COMMENT '',                                        
>>        
>>    `event_id` string COMMENT '',                                             
>>        
>>    `impression_id` string COMMENT '',                                        
>>        
>>    `diagnostic_data` string COMMENT '',                                      
>>        
>>    `user_profile_mapping_source` string COMMENT '',                          
>>        
>>    `latitude` float COMMENT '',                                              
>>        
>>    `longitude` float COMMENT '',                                             
>>        
>>    `area_code` int COMMENT '',                                               
>>        
>>    `gmt_offset` string COMMENT '',                                           
>>        
>>    `in_dst` string COMMENT '',                                               
>>        
>>    `proxy_type` string COMMENT '',                                           
>>        
>>    `mobile_carrier` string COMMENT '',                                       
>>        
>>    `pop` string COMMENT '',                                                  
>>        
>>    `hostname` string COMMENT '',                                             
>>        
>>    `profile_ttl` string COMMENT '',                                          
>>        
>>    `timestamp_iso` string COMMENT '',                                        
>>        
>>    `reference_id` string COMMENT '',                                         
>>        
>>    `identity_organization` string COMMENT '',                                
>>        
>>    `identity_method` string COMMENT '',                                      
>>        
>>    `mappable_id` string COMMENT '',                                          
>>        
>>    `profile_expires` string COMMENT '',                                      
>>        
>>    `video_player_iframed` int COMMENT '',                                    
>>        
>>    `video_player_in_view` int COMMENT '',                                    
>>        
>>    `video_player_width` int COMMENT '',                                      
>>        
>>    `video_player_height` int COMMENT '',                                     
>>        
>>    `host_domain` string COMMENT '',                                          
>>        
>>    `browser_type` string COMMENT '',                                         
>>        
>>    `browser_device_cat` string COMMENT '',                                   
>>        
>>    `browser_family` string COMMENT '',                                       
>>        
>>    `browser_name` string COMMENT '',                                         
>>        
>>    `browser_version` string COMMENT '',                                      
>>        
>>    `browser_major_version` string COMMENT '',                                
>>        
>>    `browser_minor_version` string COMMENT '',                                
>>        
>>    `os_family` string COMMENT '',                                            
>>        
>>    `os_name` string COMMENT '',                                              
>>        
>>    `os_version` string COMMENT '',                                           
>>        
>>    `os_major_version` string COMMENT '',                                     
>>        
>>    `os_minor_version` string COMMENT '')                                     
>>        
>>  PARTITIONED BY (`dt` timestamp)                                             
>>                         
>>  STORED AS PARQUET;
>> 
>> Thanks,
>> Ben
>> 
>> 
>>> On Jun 3, 2016, at 8:47 AM, Mich Talebzadeh <mich.talebza...@gmail.com 
>>> <mailto:mich.talebza...@gmail.com>> wrote:
>>> 
>>> hang on are you saving this as a new table?
>>> 
>>> Dr Mich Talebzadeh
>>>  
>>> LinkedIn  
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>  
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>>  
>>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>>  
>>> 
>>> On 3 June 2016 at 14:13, Benjamin Kim <bbuil...@gmail.com 
>>> <mailto:bbuil...@gmail.com>> wrote:
>>> Does anyone know how to save data in a DataFrame to a table partitioned 
>>> using an existing column reformatted into a derived column?
>>> 
>>>                 val partitionedDf = df.withColumn("dt", 
>>> concat(substring($"timestamp", 1, 10), lit(" "), substring($"timestamp", 
>>> 12, 2), lit(":00")))
>>> 
>>>                 sqlContext.setConf("hive.exec.dynamic.partition", "true")
>>>                 sqlContext.setConf("hive.exec.dynamic.partition.mode", 
>>> "nonstrict")
>>>                 partitionedDf.write
>>>                     .mode(SaveMode.Append)
>>>                     .partitionBy("dt")
>>>                     .saveAsTable("ds.amo_bi_events")
>>> 
>>> I am getting an ArrayOutOfBounds error. There are 83 columns in the 
>>> destination table. But after adding the derived column, then I get an 84 
>>> error. I assumed that the column used for the partition would not be 
>>> counted.
>>> 
>>> Can someone please help.
>>> 
>>> Thanks,
>>> Ben
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
>>> <mailto:user-unsubscr...@spark.apache.org>
>>> For additional commands, e-mail: user-h...@spark.apache.org 
>>> <mailto:user-h...@spark.apache.org>
>>> 
>>> 
>> 
>> 
>> 
> 
> 

Reply via email to