FWIW I was doing very similar substring (etc) machinations until we started converting output from back-end data-generation tools directly into parquet (using other tools). IMO it's a common enough format (at least in the types of data you and I likely have to work with :-) that it'd be great if there was direct support for it. If there is, I also missed it and would also be most appreciative of which incantations to use to take advantage of it.
On Wed, Oct 18, 2017 at 10:49 AM, Charles Givre <cgi...@gmail.com> wrote: > Hello Drillers, > I have a silly question which I’m a little stuck with. I have some data in > CSV format with dates in the following format: 2017-08-10T09:12:26.000Z. > I’m trying to convert this into a date time data field so that I have both > the date and the hours, however I keep running into road blocks. I’ve tried > the CAST( field AS DATE ) but in doing so I lose the time component. I’ve > tried the TO_TIMESTAMP function, however the only success I’ve had is using > the substring function to remove the timezone at the end, then use > regex_replace to get rid of the literal ’T’ in the middle of the string, then > TO_NUMBER. (See query below) > > SELECT TO_TIMESTAMP( REGEXP_REPLACE( SUBSTR( <field>, 1,19), 'T', ' '), > 'yyyy-MM-dd HH:mm:ss' ) AS dt, > EXTRACT( > hour FROM > TO_TIMESTAMP( REGEXP_REPLACE( SUBSTR( <field>, 1,19), 'T', ' '), 'yyyy-MM-dd > HH:mm:ss' ) > ) AS dt_hour > > I’d think you could do this directly with the TO_TIMESTAMP function however, > I can’t figure out how include the literal ’T’ in the formatting string. The > escape character seems to be the single quote which also is the only > character allowed to denote the formatting string. > > So, questions: > 1. Is there any way to include a literal character in a joda date format? > 2. Is it possible to use any character besides a single quote to mark the > beginning/end of a format string? > 3. Are there any ways to do this that I’m missing? > > Thanks! > —C >