Re: [HACKERS] Aussie timezone database changes incoming
>The Russian changes are perhaps not such a big deal because they've >done that sort of thing before, but this is an earful: > > Australian eastern time zone abbreviations are now AEST/AEDT not > EST, and similarly for the other Australian zones. That is, for > eastern standard and daylight saving time the abbreviations are AEST > and AEDT instead of the former EST for both; similarly, ACST/ACDT, > ACWST/ACWDT, and AWST/AWDT are now used instead of the former CST, > CWST, and WST. This change does not affect UTC offsets, only time > zone abbreviations. (Thanks to Rich Tibbett and many others.) [...] >Anyone from down under care to remark about the actual usage of old >and new abbreviations? AEST/AEDT/etc are the official abbreviations and are commonly used. They have been increasingly used over the last 20 years or so, and the EST/EDT stuff on the Olsen tz database has been a source of annoyance for a very long time, eg: http://thread.gmane.org/gmane.comp.time.tz/2262 Quite likely this change will break stuff, but my feeling is more people will be cheering than screaming. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [RFC] ideas for a new Python DBAPI driver (was Re: [HACKERS] libpq test suite)
>For my Python DBAPI2 PostgreSQL driver I plan the following optimizations: I suggest you have a look at my Python ocpgdb driver: http://code.google.com/p/ocpgdb/ It uses the v3 binary protocol exclusively (to avoid the usual escaping security issues). A number of gotchyas were discovered along the way - in particular, you must be a lot more careful about types (as you note in a later reply). There were also some issues with the v3 protocol, most of which have been fixed now. ocpgdb does not implement everything, just the bits I needed. That said, other people/projects are using it in production, and it's proven to be fast and stable. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
>Andrew McNamara writes: >>>> The solution is to write the query in an unambiguous way: >>>> SELECT $1::date + 1; > >> You are missing the point: this is not about what types the SQL execution >> sees. It is about making sure the correct recv function is applied to >> the binary parameter data. > >Indeed, and the above locution does set that. Sure, but it requires the driver to modify the query - that isn't reasonable or practical. Expecting the user to the driver to know and correct set the type the driver will ultimately see is a recipe for disaster. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers {license}
>Obviously this is less urgent than having a driver that works now, but >it's still important. I think we would attract some goodwill from the >python community if we were helping them move to python3, rather than >sitting around waiting 'til they've already moved and decided that they >can't use postgresql. It's very, very difficult (but not impossible) to support both python 2 and 3 simultaneously, particularly if you have non-trivial C extension code. Even the python gods will admit that it's still early days. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
>>>The solution is to write the query in an unambiguous way: >>> >>> SELECT $1::date + 1; >>> >>>which is good practice, anyway. If it's not obvious to the type >>>inference system, it's probably not obvious to you, and will probably >>>surprise you ;) >> >> That address this specific case, but it's ugly and not general. The right >> thing is to set the correct type when you're marshalling the parameters... > >Well, ugly is in the eye of the beholder, and it certainly is a general >solution. Any query with ambiguity in its parameters should explicitly >declare the types, inside the query itself. Having the driver indicate >the type should be the exception, not the rule. You are missing the point: this is not about what types the SQL execution sees. It is about making sure the correct recv function is applied to the binary parameter data. The server cannot reliably infer which recv function to use based in query context (although it tries). A wrong guess can lead to silent data corruption, which is utterly unacceptable. If the client (driver) sets the type OID to match the format of the binary parameter it sends, the server can unambiguously decode the data (and cast the type, if need be). I would go as far as to suggest that postgres should not accept binary parameters with an "unknown" OID - it's dangerous, unreliable and serves no purpose. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
>> I'd like to see a requirement for the use of PQexecParams() over PQexec() - >> even when using libpq's PQescapeStringConn(), PQexec() makes me uneasy. > >Such a rule seems pretty entirely pointless, unless you have a way to >enforce that the query string passed to the function hasn't been >assembled from parts somewhere along the way. The point is that if the driver is doing the right thing, the user of the driver at least has to choice to do things safely. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
>That's just a matter of prioritizing the issues. Put the big ones at >the top, the trivia at the bottom, [...] I'd like to see a requirement for the use of PQexecParams() over PQexec() - even when using libpq's PQescapeStringConn(), PQexec() makes me uneasy. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
>If the date is passed in binary format, it will pass it to int4recv() -- >but because the date is 4 bytes, and int4recv is defined for any 4-byte >input, it won't cause an error; it will produce a wrong result. In other >words, the binary representation for a date _is_ a valid binary >representation for an integer. The type inference has found the wrong >type, but the recv function still accepts it, which causes a problem. Yes - of the worst kind: silent data corruption. >The solution is to write the query in an unambiguous way: > > SELECT $1::date + 1; > >which is good practice, anyway. If it's not obvious to the type >inference system, it's probably not obvious to you, and will probably >surprise you ;) That address this specific case, but it's ugly and not general. The right thing is to set the correct type when you're marshalling the parameters... >Or, as Andrew suggests, you can pass the type oid along with the >parameter so that postgresql knows the right type. That's right - if using the binary parameters, you *must* pass an appropriate type oid for the data you send to the server. If you use the "unknown" oid, bad things will happen (sooner or later). While this is strictly true of both binary and text parameters, text parameters have enough redundancy built into the format that it's rarely a problem. Users have come to expect this leniency. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
>For a given type, the input function may be more likely to catch an >input error than the recv function; or the reverse. Either way, it is >very type-specific, and the only difference is the whether the input is >misinterpreted (type error not caught; bad) or an error is thrown (type >error caught; better). This is the crux of the matter: the type input functions are universally more forgiving since, by their nature, text formats are designed for us fuzzy humans, and users of adapters have come to expect this. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
>On Tue, 2010-02-09 at 10:46 +1100, Andrew McNamara wrote: >> The problem is deeper than that - when query parameters use the binary >> option, the server has no way to decode the binary parameter without an >> appropriate type OID. > >Postgres does not attempt to decode anything (text or binary format) >until it figures out what type it is. How does it figure out what type it is? Either by the type oid passed by the caller, or by the context if the type oid is "unknown". Now, with the text format parameters, the parser usually does the right thing, since text formats have plenty of hints for us humans. However, with the binary format, unless the caller tells us, there's no way to tell whether we're correctly parsing the data. If the context implies one type, but the user passes another, we'll either get an ugly error or, worse, silently misparse their data. Generally this isn't a big problem with python, as we have good type information available. It's only an issue because people have gotten used to the text parameter parsing being so forgiving. Using my ocpgdb module, and interacting directly with the libpq wrapping code, you can see how postgres reacts to various inputs: >>> from oclibpq import * >>> from ocpgdb import pgoid >>> db=PgConnection('') No parameters: >>> r=db.execute('select 1', ()) >>> r.status PGRES_TUPLES_OK >>> list(r) [(,)] Int4 parameter, type specified: >>> r=db.execute('select $1', [(pgoid.int4, '\x00\x00\x00\x01')]) >>> r.status PGRES_TUPLES_OK >>> list(r) [(,)] Int4 parameter, type unknown, can't be determined from context: >>> r=db.execute('select $1', [(pgoid.unknown, '\x00\x00\x00\x01')]) >>> r.status PGRES_FATAL_ERROR >>> r.errorMessage 'ERROR: could not determine data type of parameter $1\n' Int4 parameter, type unknown, can be determined from context: >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x00\x01')]) >>> r.status PGRES_TUPLES_OK >>> list(r) [(,)] Text parameter, type unknown, mismatching context - surprising: >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '')]) >>> r.status PGRES_TUPLES_OK >>> list(r) [(,)] Date parameter, type unknown, int context, the value gets misinterpreted: >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x01n')]) >>> r.status PGRES_TUPLES_OK >>> list(r) [(,)] -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
>On Tue, 2010-02-09 at 09:15 +1100, Andrew McNamara wrote: >> I can't see how this would work with binary query parameters - the server >> will see a blob of binary data and have no way to know what it represents. > >Unknown is unknown, whether in binary or text format. As far as I know, >PostgreSQL never looks inside a literal of unknown type to try to >determine its type -- it only looks at the context (to what function is >it an argument?). > >For instance: > > SELECT '5'; -- has no idea what type it is > > SELECT '5' + 1; -- it's an int > > SELECT 'a' + 1; -- it's still an int > ERROR: invalid input syntax for integer: "a" > LINE 1: SELECT 'a' + 1; > > SELECT '5.0' + 1; -- still an int, bad input format > ERROR: invalid input syntax for integer: "5.0" > LINE 1: SELECT '5.0' + 1; The problem is deeper than that - when query parameters use the binary option, the server has no way to decode the binary parameter without an appropriate type OID. As you say, postgres will cast types depending on context, however this is stricter when binary parameters are used (because they only have one valid interpretation, whereas a text parameter may have several). -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
>> http://code.google.com/p/ocpgdb/ > >I saw your note that you have to specify the types for date values >etc. Is this really desirable or even necessary? Can't you specify >the type as unknown (OID 705, I believe)? > >At work, we recently used to typelessness of Perl's DBD::Pg with great >effect, introducing a more compact, type-safe representation for a few >columns, without having to change all the existing Perl scripts >accessing the database. That's why I'm wondering... I can't see how this would work with binary query parameters - the server will see a blob of binary data and have no way to know what it represents. I presume DBD::Pg is using text parameters, rather than binary. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
>I added you into the list at http://wiki.postgresql.org/wiki/Python Thanks. >Can you check what I put in there, confirm Windows compatibility, and >comment on Python 3.X support? I haven't tried it under Windows and I haven't had any feedback either way from Windows users. For now, ocpgdb has no Python 3 support (I don't foresee any real problems, however). >I'd be curious to hear more about the escaping bugs you ran into as well. >We already have some notes on the TODO that pushing more of this work >toward the standard libpq routines would seem appropriate for things >passing between the driver and libpq. Were the issues you ran into on >that side, or more on the Python side of how things were being formatted? It was a while ago now and I can't remember the specific details - it was more a general feeling that the existing offerings were going about it the wrong way (with respect to parameter passing and escaping). I suspect this was a historical artifact (presumably libpq didn't provide escaping facilities or parameterised queries when the adapters were written). Essentially, I just wanted a pyPgSQL with a more modern implementation. Psycopg was (is?) also using Protocol 2. I felt that the way forward was to switch to the Protocol 3 API features, in particular, parameterised queries, and none of the existing Python adapters had done that (I got the impression while writing my module that nobody was exercising the new features). -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
>Any other suggestions before I turn the above into a roadmap page on the >wiki? I got sick of the constant stream of escaping bugs impacting on psycopg and pyPgSQL, and wrote my own DB-API driver, using the more modern libpq/binary/protocol 3 APIs where ever possible. The result is BSD licensed: http://code.google.com/p/ocpgdb/ As well as using the newer APIs, I have attempted to keep the code as simple as possible, eschewing things like threading as adding too much complexity for too little gain (particularly true of Python threading), and I kept to just the code DB-API functionality. The C code exists mainly to present a pythonic view of libpq. I found that type conversion and marshalling could generally be done from python with more than acceptable performance (via the C-coded "struct" module in the standard library for common types). In my tests, ocpgdb has performed at least as well as pyPgSQL and psycopg, often a lot better, primarily due to the use of the libpq binary protocols, I think. I'm not proposing my module as your canonical implementation, although you're welcome to it if you like. Rather, it demonstrates another viable approach, minimal, and using newer libpq APIs. BTW, with respect to the discussion of the Python DB-API - I see it as specifying a lowest-common-denominator, or the subset of functionality that should be available from most databases without requiring contortions. Like eating at McDonalds, it does the job, but it's never going to delight or surprise. A PostGreSQL blessed adapter really should provide access to all the features in libpq, and I'm not sure this is directly compatible with DBAPI. Instead, the DBAPI-compliance should be layered on top. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Feedback on getting rid of VACUUM FULL
>Well, Andrew McNamara just posted today: >http://archives.postgresql.org/message-id/20090916063341.0735c5ac...@longblack.object-craft.com.au > >Had VACUUM FULL not been available, though, I'm pretty sure he would've >come up with something else instead. Indeed I would have. And it was our own slackness that got us into the situation. Several people suggested using a portable drive - in this case, it would not have been practical as the machines are physically managed by another group at a remote location (the paperwork would be the real blocker). Getting more drives added to the SAN would have been even more painful. >I was just going to post that we should make a decision about this, >because ISTM there's some code in Simon's hot standby patch that is only >required to support VACUUM FULL. If we make the decision that we drop >VACUUM FULL in 8.5, we can take that part out of the patch now. It's not >a huge amount of code, but still. > >I'm in favor of removing VACUUM FULL in 8.5. To replace it, we should offer: > >1) VACUUM REWRITE, which is like CLUSTER but doesn't use an index, and My preference would be to keep the VACUUM FULL command, but to reimplement it as a table rewriter (like CLUSTER?). I see little risk to changing the behaviour without changing the name - only experts are currently aware exactly what it actually does, and they are more likely to keep an eye out for changes like this. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
>> That's what I want to believe. But picture if you have, say a >> 1-terabyte table which is 50% dead tuples and you don't have a spare >> 1-terabytes to rewrite the whole table. > >But trying to VACUUM FULL that table is going to be horridly painful >too, and you'll still have bloated indexes afterwards. You might as >well just live with the 50% waste, especially since if you did a >full-table update once you'll probably do it again sometime. > >I'm having a hard time believing that VACUUM FULL really has any >interesting use-case anymore. This was almost exactly the scenario I faced recently. A production database unexpectedly filled up its partition. On investigation, we found a developer had added a component to the application that updated every row in one table each day, exhausting the free space map. Over time, most of the tables in the system had grown to contain 50-70% dead tuples. The owner of the system was understandably reluctant to dump and restore the system, and there wasn't enough space left on the system to rewrite any of the large tables. In the end, I dropped a table (the one owned by the offending developer... 8-), and this gave me just enough space to VACUUM FULL one table at a time. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Upcoming minor releases
>>The packager team is planning minor releases of 7.4.X to 8.4.X. The >>packaging of the releases will be done on September 3-4, with release >>due on September 9 (late to avoid a US holiday on September 7). > >Is this likely to include a 64 bit build in the fat binaries for OS X? Ah - after some more googling, I think I can answer my own question: http://www.postgresqlformac.com/news/ -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Upcoming minor releases
>The packager team is planning minor releases of 7.4.X to 8.4.X. The >packaging of the releases will be done on September 3-4, with release >due on September 9 (late to avoid a US holiday on September 7). Is this likely to include a 64 bit build in the fat binaries for OS X? -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No sanity checking performed on binary TIME parameters.
On 26/05/2009, at 10:25 AM, Tom Lane wrote: Andrew McNamara writes: Are there any other cases where the binary receive functions are missing sanity checks? Possibly --- you want to go looking? Uh. I'd be lying if I said I "wanted to" - I got enough of a taste of those functions when trying to work out what they expect when doing my ocpgdb module. At the moment, however, I have a good excuse for wimping out - every waking hour is being expended on swine flu related work (although I'd almost welcome the distraction of a good *virtual* bug hunt). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No sanity checking performed on binary TIME parameters.
On 26/05/2009, at 5:41 AM, Tom Lane wrote: The only place I can find where an oversize time value behaves in a seriously bogus fashion is in time_out, or more specifically EncodeTimeOnly(): it fails to initialize its output string at all. So you could easily get garbage text output, though in my quick tests you seem to usually get an empty string instead. The odds of an actual crash seem pretty small, but not quite zero (if somehow there was no zero byte up to the end of the stack). I'm seeing all sorts of odd stuff - typically the last column value output, but occasionally other snippets of random data that don't seem related to the query. My feeling is that the error check in EncodeTimeOnly is just stupid and should be removed. That code will work fine with oversize times (and no, it won't overrun the output buffers either). The callers aren't bothering to check for error returns anyway... I'm not sure it's postgresql's job to police things like this, but returning values greater than 24 hours may violate assumptions in user code, and I would be worried about potentially causing silent failures. Of course, it should no longer be possible to get an illegal value into the database, so the risk is low - either a database that predates the fix, or database corruption. Are there any other cases where the binary receive functions are missing sanity checks? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] No sanity checking performed on binary TIME parameters.
When submitting a query via the V3 binary protocol (PQexecParams, paramFormats[n]=1), it appears the PostgreSQL server performs no range checking on the passed values. Passing values greater than 24 hours results in unpredictable results (dumps that cannot be restored, strange output when printing the column in psql, etc). Tested with version 8.1 and 8.2 (integer_datetimes is false). Using my python ocpgdb module (http://code.google.com/p/ocpgdb/): >>> db.execute('select %s::time::text', DateTimeDelta(0,23,59,59)) [('23:59:59',)] >>> db.execute('select %s::time::text', DateTimeDelta(0,28,0,0)) [('K|\x1f',)] ocpgdb has a lower-level API which is a thin layer on top of libpq - exercising this directly to rule out any problems with the mx.DateTime.DateTimeDelta class yields the same results: >>> import struct >>> import ocpgdb, oclibpq >>> db=oclibpq.PgConnection('') >>> list(db.execute('select $1::time::text', [(ocpgdb.pgoid.time, struct.pack('!d', 23*60*60))])) [(0x42a4a0>,)] >>> list(db.execute('select $1::time::text', [(ocpgdb.pgoid.time, struct.pack('!d', 48*60*60))])) [(0x42a500>,)] Apologies if this bug has already been addressed - I didn't find any references to it while googling. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers