On Wed, Jul 9, 2008 at 3:35 AM, Martijn van Oosterhout <[EMAIL PROTECTED]> wrote:
> Just clarifying for myself: you are mostly listing theoretical problems > here, not actual "I ran it and got regression failures" problems, right? Correct. This is why most of them point out that they are not actually real problems, just written in a less than ideal way. Like all warnings, they might be a problem, and they might not. Some of them almost certainly would cause problems, like any time it is assumed that a "long" it large enough to hold a memory address. Many of them are completely innocent, and it is clear that they are (this is true with most of them, because they occur in string operations). Unfortunately they generate the same warning whether they're bad or not, and its quite hard to look through the compiler output and make sense of it. I can't really make a mental note that I "already checked that one and it seemed find" because there are 400 of them in just the main postgres daemon alone. But its hard to try to find the bad ones with so many of them (most not problems) on the screen at once. I'd fix them, but would anyone be willing to commit that? I basically have to fix them anyway, just so I don't have to look at them. I can't disable the warning, because I need to see it in the instances where it is important. I included them in the summary not because they are all problems which cause failure, but because I felt they belonged in a summary of all data-model-related portability issues. Note that instances where it is and is not important is decidable only by a human. No one is going to input a user name bigger than 2^32 characters. You may not truncate a pointer. These are the exact same warning (truncating an integer). > You spend some time arguing that long is the wrong type for lengths in > memory but since all Datums in postgres are limited to 1GB I don't > understand how this can be a practical problem since that can be stored > in an int and a long on any platform. No, I am not trying to argue this (or at least not from a design standpoint). Here is something that probably should have been obvious, but perhaps is not: I do not know very much about postgres, or how it works. I also don't know very much about database theory/algorithms/programming either. I am also not interested in adding new functionality to the system. I'm just a developer who wants to use a UDF in a 64-bit DLL with a free database, for a totally different project that I am working on. MySQL and Firebird are the only two databases in which you can currently do this. I like postgres a lot more than MySQL and Firebird, plus MySQL crashes on my very large datasets and Firebird has numerous features missing that I would like. The easiest thing (although that is becoming less true) would be get to postgres to compile as a native 64 bit application. I honestly thought it would be a lot easier than I think it will be now. If this continues to be such a problem, I will need to move back to MySQL and hope they fix the bugs. I know that a Datum cannot be bigger than 1 GB either way, but the documentation around the Datum typedef notes that Datum must large enough to hold a pointer. It does not say why, or where this assumption gets used, or why it was made. It's simply a warning that somewhere, in the very large codebase, this may happen. Where does it happen? Does it ever happen? I really, really wish I knew or that someone could just tell me. Stuff like that is basically the reason that I post to this mailing list. My assumption is though, that this code is old and that no one really remembers all the semantics surrounding all the uses of it. Are the regression tests so strong that I can not even worry about it? That just seems like a very bad idea, especially since the documentation warns me that "this _will_ happen." As for what would replace it, I think intptr_t. This type has the same size as long on LP32, ILP32, LP64, and ILP64 so there would be no changes to anything that already works, plus this type can hold a pointer on LLP64 compilers. This is a change I have already made. Then I found several references to the assumption that Datum has long alignment, from code that produce no warnings. This would almost certainly create problems. It also means I would have to look at every use of Datum to find stuff like this. I must also understand it in full, even when it has little or no documentation. The story is similar with long, where it is assumed it can only hold memory addresses, has pointer alignment, in other places. As I read this stuff, I become more and more annoyed with Microsoft's decision to make an LLP64 compiler in the first place. To not have the larger of the two basic integer types be able to hold a pointer is just ridiculous. However, despite the embrace of gcc and Linux by the culture, I believe by the numbers MSVC is the most common C/C++ compiler and by a significant margin. I would get rid of it, it if only I could. The project that I need the database for is for a corporation which is not interested in switching. > Mostly you seem to be noting that whatever compiler you are using is > much stricter than the other compilers used in the buildfarm. Clearly > neither icc nor sun studio find these problems on other 64-bit > platforms. I am not familiar with these compilers, but I believe neither icc or sun studio should have these problems. They are both LP64, so the assumption that long can always hold a pointer is correct. LP64 should still have all the string warnings that do not matter, but they could just be turned off. Any potential real problem is unique to LLP64, a data model never supported by postgres. > I don't understand what you mean here: the Datum type has very clear > rules about how it is stored. It is essentially opaque, but given the > typlen you have enough information to know how to copy it for example. Well, that is some good news. Where can I find these rules? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers