On Mon, Jun 20, 2016 at 8:19 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Jun 13, 2016 at 12:25 PM, Robert Haas <robertmh...@gmail.com> > wrote: > > On Mon, Jun 13, 2016 at 12:12 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> amul sul <sul_a...@yahoo.co.in> writes: > >>> It's look like bug in to_timestamp() function when format string has > more whitespaces compare to input string, see below: > >> > >> No, I think this is a case of "input doesn't match the format string". > >> > >> As a rule of thumb, using to_timestamp() for input that could be parsed > >> just fine by the standard timestamp input function is not a particularly > >> good idea. to_timestamp() is meant to deal with input that is in a > >> well-defined format that happens to not be parsable by timestamp_in. > >> This example doesn't meet either of those preconditions. > > > > I think a space in the format string should skip a whitespace > > character in the input string, but not a non-whitespace character. > > It's my understanding that these functions exist in no small part for > > compatibility with Oracle, and Oracle declines to skip the digit '1' > > on the basis of an extra space in the format string, which IMHO is the > > behavior any reasonable user would expect. > > So Amul and I are of one opinion and Tom is of another. Anyone else > have an opinion? > > At least Tom's position has the benefit of being consistent with current behavior. The current implementation doesn't actually care what literal value you specify - any non-special character consumes a single token from the input, period. SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'YYYY/MM/DD--HH24:MI:SS'); SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'YYYY/MM/DD-HH24:MI:SS'); Both the above exhibit the same behavior as if you used a space instead of the hyphen as the group separator. The documentation should be updated to make this particular dynamic more clear. I don't see changing the general behavior of these "date formatting" functions a worthwhile endeavor. Adding a less-liberal "parse_timestamp" function I could get behind. IOW, the user seems to be happy with the fact that the "/" in the date matches his "-" but them complains that the space matches the number "1". You don't get to have it both ways. [re-reads the third usage note] Or maybe you do. We already define space as a being a greedy operator (when not used in conjunction with FX). A greedy space-space sequence makes little sense on its face and if we are going to be helpful here we should treat it as a single greedy space matcher. Note that "returns an error because to_timestamp expects one space only" is wrong - it errors because only a single space is captured and then the attempt to parse ' JUN' using "MON" fails. The following query doesn't fail though it exhibits the same space discrepancy (it just gives the same "wrong" result). SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'FXYYYY/MM/DD HH24:MI:SS'); Given that we already partially special-case the space expression I'd be inclined to consider Robert's and Amul's position on the matter. I think I'd redefine our treatment of space to be "zero or more" instead of "one or more" and require that it only match a literal space in the input. Having considered that, I'm not convinced its worth a compatibility break. I'd much rather deprecate these <to_*> versions and write slightly-less-liberal versions named <parse_*>. In any case I'd called the present wording a bug. Instead: A single space consumes a single token of input and then, unless the FX modifier is present, consumes zero or more subsequent literal spaces. Thus, using two spaces in a row without the FX modifier, while allowed, is unlikely to give you a satisfactory result. The first space will consume all available consecutive spaces so that the second space will be guaranteed to consume a non-space token from the input. David J.