Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-03-03 Thread Bruce Momjian

Added to TODO:

* Allow the UUID type to accept non-standard formats

  http://archives.postgresql.org/pgsql-hackers/2008-02/msg01214.php


---

Dawid Kuroczko wrote:
 Hello.
 
 I am currently playing with UUID data type and try to use it to store provided
 by third party (Hewlett-Packard) application.  The problem is they
 format UUIDs as
 -------, so I have to
 replace(text,'-','')::uuid for
 this kind of data.
 
 Nooow, the case is quite simple and it might be that there are other
 applications
 formatting UUIDs too liberally.
 
 I am working on a patch to support this format (yes, it is a simple
 modification).
 
 And in the meanwhile I would like to ask you what do you think about it?
 
 Cons: Such format is not standard.
 
 Pros: This will help UUID data type adoption. [1]  While good
 applications format
 their data well, there are others which don't follow standards.  Also
 I think it is
 easier for a human being to enter UUID as 8 times 4 digits.
 
 Your thoughts?  Should I submit a patch?
 
Regards,
  Dawid
 
 [1]: My first thought when I received the error message was hey! this
 is not an UUID,
 it is too long/too short!, only later did I check that they just
 don't format it too well.
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your Subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-hackers


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-28 Thread Jochem van Dieten
On Thu, Feb 28, 2008 at 1:19 AM, Tom Lane wrote:
  I think the question we have to answer is whether we want to be
  complicit in the spreading of a nonstandard UUID format.

I don't.

I have patched the UUID input and output functions to be compatible
with Adobe ColdFusion (http://adobe.com/products/coldfusion/ uses
8x-4x-4x-16x), and while I have released them I have deliberately made
the changes incompatible with other formats and will not submit them
to PostgreSQL because I want Adobe to fix ColdFusion to use the
standard format.

Jochem

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-28 Thread Josh Berkus
Tom,

 I think the question we have to answer is whether we want to be
 complicit in the spreading of a nonstandard UUID format.  Even if
 we answer yes for this HP case, it doesn't follow that we should
 create a mechanism for anybody to do anything with 'em.  That way
 lies the madness people already have to cope with for datetime
 data :-(

Well, I guess the question is: if we don't offer some builtin way to render 
non-standard formats built into company products, will those companies fix 
their format or just not use PostgreSQL?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-28 Thread Andrew Sullivan
On Thu, Feb 28, 2008 at 08:58:01AM -0800, Josh Berkus wrote:

 Well, I guess the question is: if we don't offer some builtin way to render 
 non-standard formats built into company products, will those companies fix 
 their format or just not use PostgreSQL?

Well, there is an advantage that Postgres has that some others don't: you
can extend Postgres pretty easily.  That suggests to me a reason to be
conservative in what we build in.  This is consistent with the principle,
Be conservative in what you send, and liberal in what you accept.

A


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-28 Thread Zeugswetter Andreas ADI SD

  Well, I guess the question is: if we don't offer some builtin way to
render 
  non-standard formats built into company products, will those
companies fix 
  their format or just not use PostgreSQL?
 
 Well, there is an advantage that Postgres has that some others don't:
you
 can extend Postgres pretty easily.  That suggests to me a reason to be
 conservative in what we build in.  This is consistent with the
principle,
 Be conservative in what you send, and liberal in what you accept.

Well, then the uuid input function should most likely disregard all -,
and accept the 4x-, 8x- formats and the like on input.

Andreas


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-28 Thread Kenneth Marshall
On Thu, Feb 28, 2008 at 08:06:46PM +0100, Zeugswetter Andreas ADI SD wrote:
 
   Well, I guess the question is: if we don't offer some builtin way to
 render 
   non-standard formats built into company products, will those
 companies fix 
   their format or just not use PostgreSQL?
  
  Well, there is an advantage that Postgres has that some others don't:
 you
  can extend Postgres pretty easily.  That suggests to me a reason to be
  conservative in what we build in.  This is consistent with the
 principle,
  Be conservative in what you send, and liberal in what you accept.
 
 Well, then the uuid input function should most likely disregard all -,
 and accept the 4x-, 8x- formats and the like on input.
 
 Andreas
 
 
We need to support the standard definition. People not using the standard
need to know that and explicitly acknowledge that by implementing the
conversion process themselves. Accepting random input puts a performance
hit on everybody following the standard. It is the non-standard users who
should pay that cost. 

Cheers,
Ken

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-28 Thread James Mansion

Kenneth Marshall wrote:

conversion process themselves. Accepting random input puts a performance
hit on everybody following the standard.

Why is that necessarily the case?

Why not have a liberal parser and a configurable switch that determines 
whether non-standard

forms are liberally accepted, accepted with a logged warning, or rejected?

James



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-28 Thread Mark Mielke

James Mansion wrote:

Kenneth Marshall wrote:

conversion process themselves. Accepting random input puts a performance
hit on everybody following the standard.

Why is that necessarily the case?

Why not have a liberal parser and a configurable switch that 
determines whether non-standard
forms are liberally accepted, accepted with a logged warning, or 
rejected?


I recall there being a measurable performance difference between the 
most liberal parser, and the most optimized parser, back when I wrote 
one for PostgreSQL. I don't know how good the one in use for PostgreSQL 
8.3 is. As to whether the cost is noticeable to people or not - that 
depends on what they are doing. The problem is that a UUID is pretty 
big, and parsing it liberally means a loop.


My personal opinion is that this is entirely a philosophical issue, and 
that both sides have merits. There is no reason for PostgreSQL to 
support all formats, not matter how non-standard, for every single type. 
So, why would UUID be special? Because it's easy to do is not 
necessarily a good reason. But then, it's not a bad reason either.


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-28 Thread James Mansion

Mark Mielke wrote:
I recall there being a measurable performance difference between the 
most liberal parser, and the most optimized parser, back when I wrote 
one for PostgreSQL. I don't know how good the one in use for 
PostgreSQL 8.3 is. As to whether the cost is noticeable to people or 
not - that depends on what they are doing. The problem is that a UUID 
is pretty big, and parsing it liberally means a loop.


It just seems odd - I would have thought one would use re2c or ragel to 
generate something and the performance would essentially be O[n] on the 
input length in characters - using either a collection of allowed forms 
or an engine that normalises case and discards the '-' characters 
between any hex pairs.  So yes these would have a control loop.  Is that 
so bad?


Either way its hard to imagine how parsing a string of this length could 
create a measurable performance issue compared to what will happen with 
the value post parse.


James


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-28 Thread Sam Mason
On Thu, Feb 28, 2008 at 06:45:18PM -0500, Mark Mielke wrote:
 My personal opinion is that this is entirely a philosophical issue, and 
 that both sides have merits. 

I think it depends on what you're optimising for: initial development
time, maintaince time or run time.

 There is no reason for PostgreSQL to 
 support all formats, not matter how non-standard, for every single type. 
 So, why would UUID be special? Because it's easy to do is not 
 necessarily a good reason. But then, it's not a bad reason either.

I never really buy the performance argument.  I much prefer the
correctness argument, if the code is doing something strange I'd prefer
to know about it as soon as possible.  This generally means that I'm
optimising for maintaince.

It's a similar argument to why lots of automatic casts were removed from
8.3, it generally doesn't hurt but the few times it does it's going to
be bad and if you're doing something strange to start with it's better
to be explicit about it.


  Sam

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-28 Thread Tom Lane
Andrew Sullivan [EMAIL PROTECTED] writes:
 Be conservative in what you send, and liberal in what you accept.

Yeah, I was about to quote that same maxim myself.  I don't have a big
problem with allowing uuid_in to accept known format variants.  (I'm
not sure about allowing a hyphen *anywhere*, because that could lead to
accepting things that weren't meant to be a UUID at all, but this HP
format seems regular enough that that's not a serious objection to it.)

What I was really complaining about was Josh's suggestion that we invent
a function to let users *output* UUIDs in random-format-of-the-week.
I can't imagine much good coming of that.  I think we should keep
uuid_out emitting only the RFC-standardized format.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-28 Thread Mark Mielke

James Mansion wrote:

Mark Mielke wrote:
I recall there being a measurable performance difference between the 
most liberal parser, and the most optimized parser, back when I wrote 
one for PostgreSQL. I don't know how good the one in use for 
PostgreSQL 8.3 is. As to whether the cost is noticeable to people or 
not - that depends on what they are doing. The problem is that a UUID 
is pretty big, and parsing it liberally means a loop.


It just seems odd - I would have thought one would use re2c or ragel 
to generate something and the performance would essentially be O[n] on 
the input length in characters - using either a collection of allowed 
forms or an engine that normalises case and discards the '-' 
characters between any hex pairs. 


Instruction level parallelism allows for multiple hex values to be 
processed in parallel, whereas a loop relies on branch prediction and 
speculative load and store? :-)


The liberal version is difficult to unroll. The strict version is easy 
to unroll.



So yes these would have a control loop.  Is that so bad?

Either way its hard to imagine how parsing a string of this length 
could create a measurable performance issue compared to what will 
happen with the value post parse.


I think so too.

Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-28 Thread Tom Dunstan
On Fri, Feb 29, 2008 at 9:26 AM, Tom Lane [EMAIL PROTECTED] wrote:
 Andrew Sullivan [EMAIL PROTECTED] writes:
   Be conservative in what you send, and liberal in what you accept.

  Yeah, I was about to quote that same maxim myself.  I don't have a big
  problem with allowing uuid_in to accept known format variants.  (I'm
  not sure about allowing a hyphen *anywhere*, because that could lead to
  accepting things that weren't meant to be a UUID at all, but this HP
  format seems regular enough that that's not a serious objection to it.)

This seems like a good enough opportunity to mention an idea that I
had while/after doing the enum patch. The patch was fairly intrusive
for something that was just adding a type because postgresql isn't
really set up for parameterized types other than core types. The idea
would be to extend the enum mechanism to allow UDTs etc to be
parameterized, and enums would just become one use of the mechanism.
Other obvious examples that I had in mind were allowing variable
lengths for that binary data type with hex IO for e.g. differently
sized checksums that people want, and allowing different formats for
uuids.

So the idea as applied to this case would be to do the enum-style
typesafe thing, ie:

create type coldfusion_uuid as generic_uuid('---');

...then just use that. I had some thoughts about whether it would be
worth allowing inline declarations of such types inside table creation
statements as well, and there are various related issues and thoughts
on implementation which I won't go into in this email. Do people think
the idea has legs, though?

  What I was really complaining about was Josh's suggestion that we invent
  a function to let users *output* UUIDs in random-format-of-the-week.
  I can't imagine much good coming of that.  I think we should keep
  uuid_out emitting only the RFC-standardized format.

Well, if the application is handing them to us in that format, it
might be a bit surprised if it gets back a fixed one. The custom
type approach wouldn't have that side effect.

Cheers

Tom

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-28 Thread Tom Lane
Tom Dunstan [EMAIL PROTECTED] writes:
 This seems like a good enough opportunity to mention an idea that I
 had while/after doing the enum patch. The patch was fairly intrusive
 for something that was just adding a type because postgresql isn't
 really set up for parameterized types other than core types. The idea
 would be to extend the enum mechanism to allow UDTs etc to be
 parameterized, and enums would just become one use of the mechanism.

Isn't this reasonably well covered by Teodor's work to support
typmods for user-defined types?  We've discussed how the typmod could
be effectively a key into a system catalog someplace, thus allowing it
to represent more than just an int32 worth of stuff.  I'm not seeing
where your proposal accomplishes more than that can.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-27 Thread Josh Berkus
Dawid,

 I am working on a patch to support this format (yes, it is a simple
 modification).

I'd suggest writing a formatting function for UUIDs instead.  Not sure what 
it should be called, though.  to_char is pretty overloaded right now.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-27 Thread Gevik Babakhani

  I am working on a patch to support this format (yes, it is a simple 
  modification).

There was a proposal and a discussion regarding how this datatype would be
before I started developing it. We decided to go with the format proposed in
RFC. Unless there is strong case, I doubt any non standard formatting will
be accepted into core. IIRC we where also opposed to support java like
formatted uuid's back then. This is no different.

Regards,
Gevik.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

2008-02-27 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 I am working on a patch to support this format (yes, it is a simple
 modification).

 I'd suggest writing a formatting function for UUIDs instead.

That seems like overkill, if not outright encouragement of people to
come up with yet other nonstandard formats for UUIDs.

I think the question we have to answer is whether we want to be
complicit in the spreading of a nonstandard UUID format.  Even if
we answer yes for this HP case, it doesn't follow that we should
create a mechanism for anybody to do anything with 'em.  That way
lies the madness people already have to cope with for datetime
data :-(

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings