Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
Added to TODO: * Allow the UUID type to accept non-standard formats http://archives.postgresql.org/pgsql-hackers/2008-02/msg01214.php --- Dawid Kuroczko wrote: Hello. I am currently playing with UUID data type and try to use it to store provided by third party (Hewlett-Packard) application. The problem is they format UUIDs as -------, so I have to replace(text,'-','')::uuid for this kind of data. Nooow, the case is quite simple and it might be that there are other applications formatting UUIDs too liberally. I am working on a patch to support this format (yes, it is a simple modification). And in the meanwhile I would like to ask you what do you think about it? Cons: Such format is not standard. Pros: This will help UUID data type adoption. [1] While good applications format their data well, there are others which don't follow standards. Also I think it is easier for a human being to enter UUID as 8 times 4 digits. Your thoughts? Should I submit a patch? Regards, Dawid [1]: My first thought when I received the error message was hey! this is not an UUID, it is too long/too short!, only later did I check that they just don't format it too well. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your Subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-hackers
Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
On Thu, Feb 28, 2008 at 1:19 AM, Tom Lane wrote: I think the question we have to answer is whether we want to be complicit in the spreading of a nonstandard UUID format. I don't. I have patched the UUID input and output functions to be compatible with Adobe ColdFusion (http://adobe.com/products/coldfusion/ uses 8x-4x-4x-16x), and while I have released them I have deliberately made the changes incompatible with other formats and will not submit them to PostgreSQL because I want Adobe to fix ColdFusion to use the standard format. Jochem ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
Tom, I think the question we have to answer is whether we want to be complicit in the spreading of a nonstandard UUID format. Even if we answer yes for this HP case, it doesn't follow that we should create a mechanism for anybody to do anything with 'em. That way lies the madness people already have to cope with for datetime data :-( Well, I guess the question is: if we don't offer some builtin way to render non-standard formats built into company products, will those companies fix their format or just not use PostgreSQL? -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
On Thu, Feb 28, 2008 at 08:58:01AM -0800, Josh Berkus wrote: Well, I guess the question is: if we don't offer some builtin way to render non-standard formats built into company products, will those companies fix their format or just not use PostgreSQL? Well, there is an advantage that Postgres has that some others don't: you can extend Postgres pretty easily. That suggests to me a reason to be conservative in what we build in. This is consistent with the principle, Be conservative in what you send, and liberal in what you accept. A ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
Well, I guess the question is: if we don't offer some builtin way to render non-standard formats built into company products, will those companies fix their format or just not use PostgreSQL? Well, there is an advantage that Postgres has that some others don't: you can extend Postgres pretty easily. That suggests to me a reason to be conservative in what we build in. This is consistent with the principle, Be conservative in what you send, and liberal in what you accept. Well, then the uuid input function should most likely disregard all -, and accept the 4x-, 8x- formats and the like on input. Andreas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
On Thu, Feb 28, 2008 at 08:06:46PM +0100, Zeugswetter Andreas ADI SD wrote: Well, I guess the question is: if we don't offer some builtin way to render non-standard formats built into company products, will those companies fix their format or just not use PostgreSQL? Well, there is an advantage that Postgres has that some others don't: you can extend Postgres pretty easily. That suggests to me a reason to be conservative in what we build in. This is consistent with the principle, Be conservative in what you send, and liberal in what you accept. Well, then the uuid input function should most likely disregard all -, and accept the 4x-, 8x- formats and the like on input. Andreas We need to support the standard definition. People not using the standard need to know that and explicitly acknowledge that by implementing the conversion process themselves. Accepting random input puts a performance hit on everybody following the standard. It is the non-standard users who should pay that cost. Cheers, Ken ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
Kenneth Marshall wrote: conversion process themselves. Accepting random input puts a performance hit on everybody following the standard. Why is that necessarily the case? Why not have a liberal parser and a configurable switch that determines whether non-standard forms are liberally accepted, accepted with a logged warning, or rejected? James ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
James Mansion wrote: Kenneth Marshall wrote: conversion process themselves. Accepting random input puts a performance hit on everybody following the standard. Why is that necessarily the case? Why not have a liberal parser and a configurable switch that determines whether non-standard forms are liberally accepted, accepted with a logged warning, or rejected? I recall there being a measurable performance difference between the most liberal parser, and the most optimized parser, back when I wrote one for PostgreSQL. I don't know how good the one in use for PostgreSQL 8.3 is. As to whether the cost is noticeable to people or not - that depends on what they are doing. The problem is that a UUID is pretty big, and parsing it liberally means a loop. My personal opinion is that this is entirely a philosophical issue, and that both sides have merits. There is no reason for PostgreSQL to support all formats, not matter how non-standard, for every single type. So, why would UUID be special? Because it's easy to do is not necessarily a good reason. But then, it's not a bad reason either. Cheers, mark -- Mark Mielke [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
Mark Mielke wrote: I recall there being a measurable performance difference between the most liberal parser, and the most optimized parser, back when I wrote one for PostgreSQL. I don't know how good the one in use for PostgreSQL 8.3 is. As to whether the cost is noticeable to people or not - that depends on what they are doing. The problem is that a UUID is pretty big, and parsing it liberally means a loop. It just seems odd - I would have thought one would use re2c or ragel to generate something and the performance would essentially be O[n] on the input length in characters - using either a collection of allowed forms or an engine that normalises case and discards the '-' characters between any hex pairs. So yes these would have a control loop. Is that so bad? Either way its hard to imagine how parsing a string of this length could create a measurable performance issue compared to what will happen with the value post parse. James ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
On Thu, Feb 28, 2008 at 06:45:18PM -0500, Mark Mielke wrote: My personal opinion is that this is entirely a philosophical issue, and that both sides have merits. I think it depends on what you're optimising for: initial development time, maintaince time or run time. There is no reason for PostgreSQL to support all formats, not matter how non-standard, for every single type. So, why would UUID be special? Because it's easy to do is not necessarily a good reason. But then, it's not a bad reason either. I never really buy the performance argument. I much prefer the correctness argument, if the code is doing something strange I'd prefer to know about it as soon as possible. This generally means that I'm optimising for maintaince. It's a similar argument to why lots of automatic casts were removed from 8.3, it generally doesn't hurt but the few times it does it's going to be bad and if you're doing something strange to start with it's better to be explicit about it. Sam ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
Andrew Sullivan [EMAIL PROTECTED] writes: Be conservative in what you send, and liberal in what you accept. Yeah, I was about to quote that same maxim myself. I don't have a big problem with allowing uuid_in to accept known format variants. (I'm not sure about allowing a hyphen *anywhere*, because that could lead to accepting things that weren't meant to be a UUID at all, but this HP format seems regular enough that that's not a serious objection to it.) What I was really complaining about was Josh's suggestion that we invent a function to let users *output* UUIDs in random-format-of-the-week. I can't imagine much good coming of that. I think we should keep uuid_out emitting only the RFC-standardized format. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
James Mansion wrote: Mark Mielke wrote: I recall there being a measurable performance difference between the most liberal parser, and the most optimized parser, back when I wrote one for PostgreSQL. I don't know how good the one in use for PostgreSQL 8.3 is. As to whether the cost is noticeable to people or not - that depends on what they are doing. The problem is that a UUID is pretty big, and parsing it liberally means a loop. It just seems odd - I would have thought one would use re2c or ragel to generate something and the performance would essentially be O[n] on the input length in characters - using either a collection of allowed forms or an engine that normalises case and discards the '-' characters between any hex pairs. Instruction level parallelism allows for multiple hex values to be processed in parallel, whereas a loop relies on branch prediction and speculative load and store? :-) The liberal version is difficult to unroll. The strict version is easy to unroll. So yes these would have a control loop. Is that so bad? Either way its hard to imagine how parsing a string of this length could create a measurable performance issue compared to what will happen with the value post parse. I think so too. Cheers, mark -- Mark Mielke [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
On Fri, Feb 29, 2008 at 9:26 AM, Tom Lane [EMAIL PROTECTED] wrote: Andrew Sullivan [EMAIL PROTECTED] writes: Be conservative in what you send, and liberal in what you accept. Yeah, I was about to quote that same maxim myself. I don't have a big problem with allowing uuid_in to accept known format variants. (I'm not sure about allowing a hyphen *anywhere*, because that could lead to accepting things that weren't meant to be a UUID at all, but this HP format seems regular enough that that's not a serious objection to it.) This seems like a good enough opportunity to mention an idea that I had while/after doing the enum patch. The patch was fairly intrusive for something that was just adding a type because postgresql isn't really set up for parameterized types other than core types. The idea would be to extend the enum mechanism to allow UDTs etc to be parameterized, and enums would just become one use of the mechanism. Other obvious examples that I had in mind were allowing variable lengths for that binary data type with hex IO for e.g. differently sized checksums that people want, and allowing different formats for uuids. So the idea as applied to this case would be to do the enum-style typesafe thing, ie: create type coldfusion_uuid as generic_uuid('---'); ...then just use that. I had some thoughts about whether it would be worth allowing inline declarations of such types inside table creation statements as well, and there are various related issues and thoughts on implementation which I won't go into in this email. Do people think the idea has legs, though? What I was really complaining about was Josh's suggestion that we invent a function to let users *output* UUIDs in random-format-of-the-week. I can't imagine much good coming of that. I think we should keep uuid_out emitting only the RFC-standardized format. Well, if the application is handing them to us in that format, it might be a bit surprised if it gets back a fixed one. The custom type approach wouldn't have that side effect. Cheers Tom ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
Tom Dunstan [EMAIL PROTECTED] writes: This seems like a good enough opportunity to mention an idea that I had while/after doing the enum patch. The patch was fairly intrusive for something that was just adding a type because postgresql isn't really set up for parameterized types other than core types. The idea would be to extend the enum mechanism to allow UDTs etc to be parameterized, and enums would just become one use of the mechanism. Isn't this reasonably well covered by Teodor's work to support typmods for user-defined types? We've discussed how the typmod could be effectively a key into a system catalog someplace, thus allowing it to represent more than just an int32 worth of stuff. I'm not seeing where your proposal accomplishes more than that can. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
Dawid, I am working on a patch to support this format (yes, it is a simple modification). I'd suggest writing a formatting function for UUIDs instead. Not sure what it should be called, though. to_char is pretty overloaded right now. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
I am working on a patch to support this format (yes, it is a simple modification). There was a proposal and a discussion regarding how this datatype would be before I started developing it. We decided to go with the format proposed in RFC. Unless there is strong case, I doubt any non standard formatting will be accepted into core. IIRC we where also opposed to support java like formatted uuid's back then. This is no different. Regards, Gevik. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] UUID data format 4x-4x-4x-4x-4x-4x-4x-4x
Josh Berkus [EMAIL PROTECTED] writes: I am working on a patch to support this format (yes, it is a simple modification). I'd suggest writing a formatting function for UUIDs instead. That seems like overkill, if not outright encouragement of people to come up with yet other nonstandard formats for UUIDs. I think the question we have to answer is whether we want to be complicit in the spreading of a nonstandard UUID format. Even if we answer yes for this HP case, it doesn't follow that we should create a mechanism for anybody to do anything with 'em. That way lies the madness people already have to cope with for datetime data :-( regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings