Re: order by date
Thanks for all the feedback on this. I'll take a look at all the suggestions I received. Cheers! Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "Yet mark his perfect self-contentment, and hence learn his lesson, that to be self-contented is to be vile and ignorant, and that to aspire is better than to be blindly and impotently happy." -- Edwin A. Abbott, Flatland
RE: order by date
If you don't want to modify your CSV files, I would suggest doing the conversion as part of the query. For that, you can either include the conversion in each query, or you can create a view of your table that includes a column with the converted date. Either way, you may want to try from_unixtime(unix_timestamp(dateColName, 'M/d/ h:mm')), which should convert it into a format that you can sort by. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions Matt Tucker -Original Message- From: Keith Wiley [mailto:kwi...@keithwiley.com] Sent: Tuesday, March 13, 2012 1:47 PM To: user@hive.apache.org Subject: Re: order by date I see how I misled you, sorry. I wasn't implying that my csv data is cleanly represented in -MM-dd format. I was just asking syntactically how to use date functions in HiveQL because I hadn't found any examples and I used -MM-dd in my example. The dates in my csv tables are often in "American" format, month first without leading zeroes, e.g., "3/31/2012 7:00". The lack of leading zeroes and the unsortabled date format make the dates difficult to work with. I was thinking I could use the date functions with some other format to sort them (I guess it would be "M/d/ h:mm" or something like that). I admit, I didn't actually go to the trouble of providing the correct pattern string in my earlier post, I was focused on the HiveQL syntax in that post, not the precise date pattern given to the date function. So yeah, I'm still trying to determine the best way to sort queries against the date-time columns. One option is to read/write the entire tables with a date conversion to a lexicographic format. Another option -- my original question in this thread -- was how I might use hive's date functions at the time a query is performed. What do you think is the best way to deal with this? Thanks. On Mar 13, 2012, at 10:35 , Tucker, Matt wrote: > I'm a bit confused. It sounds like you're already storing your dates as > strings in a '-mm-dd' format. In that case, you can just sort by > dateColName. There's no issue with using UNIX_TIMESTAMP() in the order by > clause, as it outputs integer values. > > Most of the date functions in hive take arguments in string format, with a > few functions that will translate between unix timestamps and datetime > strings. > > Matt Tucker Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "I do not feel obliged to believe that the same God who has endowed us with sense, reason, and intellect has intended us to forgo their use." -- Galileo Galilei
Re: order by date
Okay, what are the differences and similarities with use a date function for the same purpose? Is there a reason to go one route or the other? On Mar 13, 2012, at 11:29 , Mark Grover wrote: > Hi Keith, > You should also consider writing you own UDF that takes in the date in > "American" format and spits out a lexicographical string. > That way you don't have to modify your base data, just use this newly created > from_american_date(String date) UDF to get your new date string in > > Mark Grover, Business Intelligence Analyst > OANDA Corporation > > www: oanda.com www: fxtrade.com > e: mgro...@oanda.com Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "It's a fine line between meticulous and obsessive-compulsive and a slippery rope between obsessive-compulsive and debilitatingly slow." -- Keith Wiley
Re: order by date
Hi Keith, You should also consider writing you own UDF that takes in the date in "American" format and spits out a lexicographical string. That way you don't have to modify your base data, just use this newly created from_american_date(String date) UDF to get your new date string in Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com e: mgro...@oanda.com "Best Trading Platform" - World Finance's Forex Awards 2009. "The One to Watch" - Treasury Today's Adam Smith Awards 2009. - Original Message - From: "Keith Wiley" To: user@hive.apache.org Sent: Tuesday, March 13, 2012 1:47:12 PM Subject: Re: order by date I see how I misled you, sorry. I wasn't implying that my csv data is cleanly represented in -MM-dd format. I was just asking syntactically how to use date functions in HiveQL because I hadn't found any examples and I used -MM-dd in my example. The dates in my csv tables are often in "American" format, month first without leading zeroes, e.g., "3/31/2012 7:00". The lack of leading zeroes and the unsortabled date format make the dates difficult to work with. I was thinking I could use the date functions with some other format to sort them (I guess it would be "M/d/ h:mm" or something like that). I admit, I didn't actually go to the trouble of providing the correct pattern string in my earlier post, I was focused on the HiveQL syntax in that post, not the precise date pattern given to the date function. So yeah, I'm still trying to determine the best way to sort queries against the date-time columns. One option is to read/write the entire tables with a date conversion to a lexicographic format. Another option -- my original question in this thread -- was how I might use hive's date functions at the time a query is performed. What do you think is the best way to deal with this? Thanks. On Mar 13, 2012, at 10:35 , Tucker, Matt wrote: > I'm a bit confused. It sounds like you're already storing your dates as > strings in a '-mm-dd' format. In that case, you can just sort by > dateColName. There's no issue with using UNIX_TIMESTAMP() in the order by > clause, as it outputs integer values. > > Most of the date functions in hive take arguments in string format, with a > few functions that will translate between unix timestamps and datetime > strings. > > Matt Tucker Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "I do not feel obliged to believe that the same God who has endowed us with sense, reason, and intellect has intended us to forgo their use." -- Galileo Galilei
Re: order by date
I see how I misled you, sorry. I wasn't implying that my csv data is cleanly represented in -MM-dd format. I was just asking syntactically how to use date functions in HiveQL because I hadn't found any examples and I used -MM-dd in my example. The dates in my csv tables are often in "American" format, month first without leading zeroes, e.g., "3/31/2012 7:00". The lack of leading zeroes and the unsortabled date format make the dates difficult to work with. I was thinking I could use the date functions with some other format to sort them (I guess it would be "M/d/ h:mm" or something like that). I admit, I didn't actually go to the trouble of providing the correct pattern string in my earlier post, I was focused on the HiveQL syntax in that post, not the precise date pattern given to the date function. So yeah, I'm still trying to determine the best way to sort queries against the date-time columns. One option is to read/write the entire tables with a date conversion to a lexicographic format. Another option -- my original question in this thread -- was how I might use hive's date functions at the time a query is performed. What do you think is the best way to deal with this? Thanks. On Mar 13, 2012, at 10:35 , Tucker, Matt wrote: > I'm a bit confused. It sounds like you're already storing your dates as > strings in a '-mm-dd' format. In that case, you can just sort by > dateColName. There's no issue with using UNIX_TIMESTAMP() in the order by > clause, as it outputs integer values. > > Most of the date functions in hive take arguments in string format, with a > few functions that will translate between unix timestamps and datetime > strings. > > Matt Tucker Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "I do not feel obliged to believe that the same God who has endowed us with sense, reason, and intellect has intended us to forgo their use." -- Galileo Galilei
RE: order by date
I'm a bit confused. It sounds like you're already storing your dates as strings in a '-mm-dd' format. In that case, you can just sort by dateColName. There's no issue with using UNIX_TIMESTAMP() in the order by clause, as it outputs integer values. Most of the date functions in hive take arguments in string format, with a few functions that will translate between unix timestamps and datetime strings. Matt Tucker -Original Message- From: Keith Wiley [mailto:kwi...@keithwiley.com] Sent: Tuesday, March 13, 2012 1:01 PM To: user@hive.apache.org Subject: Re: order by date Is see, you store the date-time as a lexicographically sortable string. That's fine, but I'm operating on existing csv tables. I guess I could whip up a hadoop job to convert all the date-time columns to lexicographic strings and then wrap hive around the resulting converted tables. I was just wondering if there was a more direct approach, whether I could apply hive to the original csv tables. *Would* the date functions work in the way I suggested in my first post or do you think a lexicographic conversion is practically required to perform date-ordering on query results? On Mar 13, 2012, at 09:49 , Tucker, Matt wrote: > Hi Keith, > > We generally store date columns as a string in a similar format to ISO 8601 > (-mm-dd hh:MM:ss). This way, when we put the date column in the ORDER BY > clause, it will be sorted chronologically. It also saves us the trouble of > whipping out a unix timestamp calculator to figure out what we're looking at. > > There is supposed to be a TIMESTAMP data type in Hive 0.8, but I haven't > found any documentation on it yet. > > Matt Tucker > > -Original Message- > From: Keith Wiley [mailto:kwi...@keithwiley.com] > Sent: Tuesday, March 13, 2012 12:45 PM > To: user@hive.apache.org > Subject: order by date > > I realize that hive doesn't have a date type for the columns and I realize > that hive *does* have various date functions. I just haven't found a > concrete example of how these two issues are brought together. Ordering the > results of a sql query is done by adding "order by dateColName" to the query. > Are the date functions supposed to be used in the form "order by > UNIX_TIMESTAMP(dateColName, '-MM-dd')"? Does the function just go right > into the "order by" clause like that or is that totally wrong? > > How does one order query results by a date column? Am I on the right track? > > Thanks. Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "Yet mark his perfect self-contentment, and hence learn his lesson, that to be self-contented is to be vile and ignorant, and that to aspire is better than to be blindly and impotently happy." -- Edwin A. Abbott, Flatland
Re: order by date
Is see, you store the date-time as a lexicographically sortable string. That's fine, but I'm operating on existing csv tables. I guess I could whip up a hadoop job to convert all the date-time columns to lexicographic strings and then wrap hive around the resulting converted tables. I was just wondering if there was a more direct approach, whether I could apply hive to the original csv tables. *Would* the date functions work in the way I suggested in my first post or do you think a lexicographic conversion is practically required to perform date-ordering on query results? On Mar 13, 2012, at 09:49 , Tucker, Matt wrote: > Hi Keith, > > We generally store date columns as a string in a similar format to ISO 8601 > (-mm-dd hh:MM:ss). This way, when we put the date column in the ORDER BY > clause, it will be sorted chronologically. It also saves us the trouble of > whipping out a unix timestamp calculator to figure out what we're looking at. > > There is supposed to be a TIMESTAMP data type in Hive 0.8, but I haven't > found any documentation on it yet. > > Matt Tucker > > -Original Message- > From: Keith Wiley [mailto:kwi...@keithwiley.com] > Sent: Tuesday, March 13, 2012 12:45 PM > To: user@hive.apache.org > Subject: order by date > > I realize that hive doesn't have a date type for the columns and I realize > that hive *does* have various date functions. I just haven't found a > concrete example of how these two issues are brought together. Ordering the > results of a sql query is done by adding "order by dateColName" to the query. > Are the date functions supposed to be used in the form "order by > UNIX_TIMESTAMP(dateColName, '-MM-dd')"? Does the function just go right > into the "order by" clause like that or is that totally wrong? > > How does one order query results by a date column? Am I on the right track? > > Thanks. Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "Yet mark his perfect self-contentment, and hence learn his lesson, that to be self-contented is to be vile and ignorant, and that to aspire is better than to be blindly and impotently happy." -- Edwin A. Abbott, Flatland
RE: order by date
Hi Keith, We generally store date columns as a string in a similar format to ISO 8601 (-mm-dd hh:MM:ss). This way, when we put the date column in the ORDER BY clause, it will be sorted chronologically. It also saves us the trouble of whipping out a unix timestamp calculator to figure out what we're looking at. There is supposed to be a TIMESTAMP data type in Hive 0.8, but I haven't found any documentation on it yet. Matt Tucker -Original Message- From: Keith Wiley [mailto:kwi...@keithwiley.com] Sent: Tuesday, March 13, 2012 12:45 PM To: user@hive.apache.org Subject: order by date I realize that hive doesn't have a date type for the columns and I realize that hive *does* have various date functions. I just haven't found a concrete example of how these two issues are brought together. Ordering the results of a sql query is done by adding "order by dateColName" to the query. Are the date functions supposed to be used in the form "order by UNIX_TIMESTAMP(dateColName, '-MM-dd')"? Does the function just go right into the "order by" clause like that or is that totally wrong? How does one order query results by a date column? Am I on the right track? Thanks. Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "What I primarily learned in grad school is how much I *don't* know. Consequently, I left grad school with a higher ignorance to knowledge ratio than when I entered." -- Keith Wiley
order by date
I realize that hive doesn't have a date type for the columns and I realize that hive *does* have various date functions. I just haven't found a concrete example of how these two issues are brought together. Ordering the results of a sql query is done by adding "order by dateColName" to the query. Are the date functions supposed to be used in the form "order by UNIX_TIMESTAMP(dateColName, '-MM-dd')"? Does the function just go right into the "order by" clause like that or is that totally wrong? How does one order query results by a date column? Am I on the right track? Thanks. Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "What I primarily learned in grad school is how much I *don't* know. Consequently, I left grad school with a higher ignorance to knowledge ratio than when I entered." -- Keith Wiley