Mich,

I am using .withColumn to add another column “dt” that is a reformatted version 
of an existing column “timestamp”. The partitioned by column is “dt”.

We are using Spark 1.6.0 in CDH 5.7.0.

Thanks,
Ben

> On Jun 3, 2016, at 10:33 AM, Mich Talebzadeh <mich.talebza...@gmail.com> 
> wrote:
> 
> what version of spark are you using
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 3 June 2016 at 17:51, Mich Talebzadeh <mich.talebza...@gmail.com 
> <mailto:mich.talebza...@gmail.com>> wrote:
> ok what is the new column is called? you are basically adding a new column to 
> an already existing table
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 3 June 2016 at 17:04, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> The table already exists.
> 
>  CREATE EXTERNAL TABLE `amo_bi_events`(                               
>    `event_type` string COMMENT '',                                            
>       
>    `timestamp` string COMMENT '',                                             
>       
>    `event_valid` int COMMENT '',                                              
>       
>    `event_subtype` string COMMENT '',                                         
>       
>    `user_ip` string COMMENT '',                                               
>       
>    `user_id` string COMMENT '',                                               
>       
>    `cookie_status` string COMMENT '',                                         
>       
>    `profile_status` string COMMENT '',                                        
>       
>    `user_status` string COMMENT '',                                           
>       
>    `previous_timestamp` string COMMENT '',                                    
>       
>    `user_agent` string COMMENT '',                                            
>       
>    `referer` string COMMENT '',                                               
>       
>    `uri` string COMMENT '',                                                   
>       
>    `request_elapsed` bigint COMMENT '',                                       
>       
>    `browser_languages` string COMMENT '',                                     
>       
>    `acamp_id` int COMMENT '',                                                 
>       
>    `creative_id` int COMMENT '',                                              
>       
>    `location_id` int COMMENT '',                                              
>       
>    `pcamp_id` int COMMENT '',                                                 
>       
>    `pdomain_id` int COMMENT '',                                               
>       
>    `country` string COMMENT '',                                               
>       
>    `region` string COMMENT '',                                                
>       
>    `dma` int COMMENT '',                                                      
>       
>    `city` string COMMENT '',                                                  
>       
>    `zip` string COMMENT '',                                                   
>       
>    `isp` string COMMENT '',                                                   
>       
>    `line_speed` string COMMENT '',                                            
>       
>    `gender` string COMMENT '',                                                
>       
>    `year_of_birth` int COMMENT '',                                            
>       
>    `behaviors_read` string COMMENT '',                                        
>       
>    `behaviors_written` string COMMENT '',                                     
>       
>    `key_value_pairs` string COMMENT '',                                       
>       
>    `acamp_candidates` int COMMENT '',                                         
>       
>    `tag_format` string COMMENT '',                                            
>       
>    `optimizer_name` string COMMENT '',                                        
>       
>    `optimizer_version` string COMMENT '',                                     
>       
>    `optimizer_ip` string COMMENT '',                                          
>       
>    `pixel_id` int COMMENT '',                                                 
>       
>    `video_id` string COMMENT '',                                              
>       
>    `video_network_id` int COMMENT '',                                         
>       
>    `video_time_watched` bigint COMMENT '',                                    
>       
>    `video_percentage_watched` int COMMENT '',                                 
>       
>    `conversion_valid_sale` int COMMENT '',                                    
>       
>    `conversion_sale_amount` float COMMENT '',                                 
>       
>    `conversion_commission_amount` float COMMENT '',                           
>       
>    `conversion_step` int COMMENT '',                                          
>       
>    `conversion_currency` string COMMENT '',                                   
>       
>    `conversion_attribution` int COMMENT '',                                   
>       
>    `conversion_offer_id` string COMMENT '',                                   
>       
>    `custom_info` string COMMENT '',                                           
>       
>    `frequency` int COMMENT '',                                                
>       
>    `recency_seconds` int COMMENT '',                                          
>       
>    `cost` float COMMENT '',                                                   
>       
>    `revenue` float COMMENT '',                                                
>       
>    `optimizer_acamp_id` int COMMENT '',                                       
>       
>    `optimizer_creative_id` int COMMENT '',                                    
>       
>    `optimizer_ecpm` float COMMENT '',                                         
>       
>    `event_id` string COMMENT '',                                              
>       
>    `impression_id` string COMMENT '',                                         
>       
>    `diagnostic_data` string COMMENT '',                                       
>       
>    `user_profile_mapping_source` string COMMENT '',                           
>       
>    `latitude` float COMMENT '',                                               
>       
>    `longitude` float COMMENT '',                                              
>       
>    `area_code` int COMMENT '',                                                
>       
>    `gmt_offset` string COMMENT '',                                            
>       
>    `in_dst` string COMMENT '',                                                
>       
>    `proxy_type` string COMMENT '',                                            
>       
>    `mobile_carrier` string COMMENT '',                                        
>       
>    `pop` string COMMENT '',                                                   
>       
>    `hostname` string COMMENT '',                                              
>       
>    `profile_ttl` string COMMENT '',                                           
>       
>    `timestamp_iso` string COMMENT '',                                         
>       
>    `reference_id` string COMMENT '',                                          
>       
>    `identity_organization` string COMMENT '',                                 
>       
>    `identity_method` string COMMENT '',                                       
>       
>    `mappable_id` string COMMENT '',                                           
>       
>    `profile_expires` string COMMENT '',                                       
>       
>    `video_player_iframed` int COMMENT '',                                     
>       
>    `video_player_in_view` int COMMENT '',                                     
>       
>    `video_player_width` int COMMENT '',                                       
>       
>    `video_player_height` int COMMENT '',                                      
>       
>    `host_domain` string COMMENT '',                                           
>       
>    `browser_type` string COMMENT '',                                          
>       
>    `browser_device_cat` string COMMENT '',                                    
>       
>    `browser_family` string COMMENT '',                                        
>       
>    `browser_name` string COMMENT '',                                          
>       
>    `browser_version` string COMMENT '',                                       
>       
>    `browser_major_version` string COMMENT '',                                 
>       
>    `browser_minor_version` string COMMENT '',                                 
>       
>    `os_family` string COMMENT '',                                             
>       
>    `os_name` string COMMENT '',                                               
>       
>    `os_version` string COMMENT '',                                            
>       
>    `os_major_version` string COMMENT '',                                      
>       
>    `os_minor_version` string COMMENT '')                                      
>       
>  PARTITIONED BY (`dt` timestamp)                                              
>                        
>  STORED AS PARQUET;
> 
> Thanks,
> Ben
> 
> 
>> On Jun 3, 2016, at 8:47 AM, Mich Talebzadeh <mich.talebza...@gmail.com 
>> <mailto:mich.talebza...@gmail.com>> wrote:
>> 
>> hang on are you saving this as a new table?
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> On 3 June 2016 at 14:13, Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> Does anyone know how to save data in a DataFrame to a table partitioned 
>> using an existing column reformatted into a derived column?
>> 
>>                 val partitionedDf = df.withColumn("dt", 
>> concat(substring($"timestamp", 1, 10), lit(" "), substring($"timestamp", 12, 
>> 2), lit(":00")))
>> 
>>                 sqlContext.setConf("hive.exec.dynamic.partition", "true")
>>                 sqlContext.setConf("hive.exec.dynamic.partition.mode", 
>> "nonstrict")
>>                 partitionedDf.write
>>                     .mode(SaveMode.Append)
>>                     .partitionBy("dt")
>>                     .saveAsTable("ds.amo_bi_events")
>> 
>> I am getting an ArrayOutOfBounds error. There are 83 columns in the 
>> destination table. But after adding the derived column, then I get an 84 
>> error. I assumed that the column used for the partition would not be counted.
>> 
>> Can someone please help.
>> 
>> Thanks,
>> Ben
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
>> <mailto:user-unsubscr...@spark.apache.org>
>> For additional commands, e-mail: user-h...@spark.apache.org 
>> <mailto:user-h...@spark.apache.org>
>> 
>> 
> 
> 
> 

Reply via email to