I gave myself a project to start actually writing Spark programs. I'm using 
Scala and Spark 2.2.0. In my project, I had to do some grouping and filtering 
by dates. It was awful and took forever. I was trying to use dataframes and SQL 
as much as possible. I see that there are date functions in the dataframe API 
but trying to use them was frustrating. Even following code samples was a 
headache because apparently the code is different depending on which version of 
Spark you are using. I was really hoping for a rich set of date functions like 
you'd find in T-SQL but I never really found them.

Is there a best practice for dealing with dates and time in Spark? I feel like 
taking a date/time string and converting it to a date/time object and then 
manipulating data based on the various components of the timestamp object 
(hour, day, year etc.) should be a heck of a lot easier than what I'm finding 
and perhaps I'm just not looking in the right place.

You can see my work here: 
https://github.com/BobLovesData/Apache-Spark-In-24-Hours/blob/master/src/net/massstreet/hour10/BayAreaBikeAnalysis.scala

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.massstreet.net<http://www.massstreet.net/>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData<http://twitter.com/BobLovesData>


Reply via email to