Re: test - what is the wrong while adding one column in the dataframe

2016-06-16 Thread Zhiliang Zhu
just for test, since it seemed that the user email system was something wrong 
ago, is okay now.



On Friday, June 17, 2016 12:18 PM, Zhiliang Zhu 
 wrote:
 

 

 On Tuesday, May 17, 2016 10:44 AM, Zhiliang Zhu 
 wrote:
 

  Hi All,
For the given DataFrame created by hive sql, however, then it is required to 
add one more column based on the existing column, and should also keep the 
previous columns there for the result DataFrame.

final double DAYS_30 = 1000 * 60 * 60 * 24 * 30.0;
//DAYS_30 seems difficult to call in the sql ? 
DataFrame behavior_df = jhql.sql("SELECT cast (user_id as double) as user_id, 
cast (server_timestamp as 
   double) as server_timestamp, url, referer, source, 
app_version, params FROM log.request");
//it is okay to run, but behavior_df.printSchema() not changed any
behavior_df.withColumn("daysLater30", 
behavior_df.col("server_timestamp").plus(DAYS_30));

//it is okay to run, but behavior_df.printSchema() only has one column as 
daysLater30 .//it would be the schema is with the previous all columns and 
added one as daysLater30 
behavior_df = behavior_df.withColumn("daysLater30", 
behavior_df.col("server_timestamp").plus(DAYS_30));
Then, how would do it?
Thank you, 

 

the issue was resolved.

   

  

test - what is the wrong while adding one column in the dataframe

2016-06-16 Thread Zhiliang Zhu


 On Tuesday, May 17, 2016 10:44 AM, Zhiliang Zhu 
 wrote:
 

  Hi All,
For the given DataFrame created by hive sql, however, then it is required to 
add one more column based on the existing column, and should also keep the 
previous columns there for the result DataFrame.

final double DAYS_30 = 1000 * 60 * 60 * 24 * 30.0;
//DAYS_30 seems difficult to call in the sql ? 
DataFrame behavior_df = jhql.sql("SELECT cast (user_id as double) as user_id, 
cast (server_timestamp as 
   double) as server_timestamp, url, referer, source, 
app_version, params FROM log.request");
//it is okay to run, but behavior_df.printSchema() not changed any
behavior_df.withColumn("daysLater30", 
behavior_df.col("server_timestamp").plus(DAYS_30));

//it is okay to run, but behavior_df.printSchema() only has one column as 
daysLater30 .//it would be the schema is with the previous all columns and 
added one as daysLater30 
behavior_df = behavior_df.withColumn("daysLater30", 
behavior_df.col("server_timestamp").plus(DAYS_30));
Then, how would do it?
Thank you,