Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Tom Lane wrote: > Shachar Shemesh <[EMAIL PROTECTED]> writes: > >> Tom Lane wrote: >> >>> No, not unless you can make the case why this handles NaNs and >>> denormalized numbers compatibly across platforms... >>> >>> >> NaNs and infinite (plus and minus) should not be a problem. >> > > Really? Need I point out that these concepts, let alone their > representation, are not standardized in non-IEEE float implementations? > So what? They only need be standardtized in our export format, which we define. That's why we define it, after all > >> I'm not sure what denormalized numbers are. >> > > You should find out before proposing representation replacements for > floats. > Yes, but we're still discussing the preliminary stages. At these stages, it is enough to know that they CAN be defined (and they can). > >> What is the scenario in which you would want to tell them apart? >> Likewise, would you really want to tell +0 and -0 apart? >> > > IIRC (it's been fifteen or so years since I did any serious numerical > analysis) the arguments in favor have mostly to do with preserving > maximal accuracy for intermediate results in a series of calculations. > So maybe you could claim that these arguments are not so relevant to > storage in a database. But personally I don't think it's the province > of a database to decide that it need not accurately preserve the data > it's given to store. > This is not data given to store. It's data being exported. I think you are failing to consider something. The simple truth of the matter is that drivers are used far more often to access the server than pqlib or direct TCP programming. OLE DB has been stagnant for over two years now, is only available for one platform, and the easiest install option for it is through the postgresql installer, and yet it is still the third most popular download on pgfoundry (with the .NET provider being #1). What I'm getting at is that drivers are important. It is important that they have good performance. It is important that they be stable. I really think the backend should take driver considerations more seriously. The suggested method, of switching to text mode, will surely work, but it will also hurt performance. I've said it before. I find it highly unlikely that the ARM FP format will have any problem with being exported, even to a 64bit IEEE number. Not knowing the FP format, but knowing the platform, it likely just got rid of all the corner cases (NaN, denormalized numbers) merely so they can implement it more efficiently in hardware. I find the chances that it will have a wider range than IEEE in either mantissa or exponent unlikely. The question here is a broader question, though. Should we strive for binary compatibility across all platforms of a given version? The benefit is faster drivers and being able to COPY across platforms (but, still, not across versions). The cost are a couple of types (I would really change timestamp too, while at it) that need a non-straight forward export/import function. Tom seems to think this is not a goal (though, aside from his disbelief that such a goal is attainable, I have heard no arguments against it). What do the other people think? Shachar ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [Oledb-dev] [HACKERS] Re: double precision error with pg linux server, but not with windows pg server
Shachar Shemesh wrote: Even the original sentence does not describe the problem we're seeing here. It does not mention cross platform incompatibility. That's why I suggested it should be improved. The COPY docs are probably more correct: "The BINARY key word causes all data to be stored/read as binary format rather than as text. It is somewhat faster than the normal text mode, but a binary-format file is less portable across machine architectures and PostgreSQL versions." Again, to me this sentence spells "you have a problem, we're not going to help you out, deal with it". Eh? It spells out what we provide and what the limitations are. You could put that spin on every documented limitation, if you were so inclined. This is especially true if what Tom said was true, that the text format does not maintain total precision. You are essentially telling the user "you cannot move your data reliably even between servers of the same version". Since this is the exact mechanism used by pg_dump, we would surely have been long since deafened by complaints if this were a problem of any significance. cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Shachar Shemesh <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> No, not unless you can make the case why this handles NaNs and >> denormalized numbers compatibly across platforms... >> > NaNs and infinite (plus and minus) should not be a problem. Really? Need I point out that these concepts, let alone their representation, are not standardized in non-IEEE float implementations? > I'm not sure what denormalized numbers are. You should find out before proposing representation replacements for floats. > What is the scenario in which you would want to tell them apart? > Likewise, would you really want to tell +0 and -0 apart? There are competent authorities --- for instance, the guys who created the IEEE float standard --- who think it's worth distinguishing them. IIRC (it's been fifteen or so years since I did any serious numerical analysis) the arguments in favor have mostly to do with preserving maximal accuracy for intermediate results in a series of calculations. So maybe you could claim that these arguments are not so relevant to storage in a database. But personally I don't think it's the province of a database to decide that it need not accurately preserve the data it's given to store. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [Oledb-dev] [HACKERS] Re: double precision error with pg linux server, but not with windows pg server
Andrew Dunstan wrote: > Why the heck do the OLE DB specs care about the internals of the > client-server prototocol? It is documented fairly clearly that text is > the only portable way to transfer data. > Is it? > Perhaps we need to expand this sentence in the docs: "Keep in mind that > binary representations for complex data types may change across server > versions;" > Where is that in the docs. It does not appear in the page discussing PQLIB and binary data transfers (http://www.postgresql.org/docs/8.2/interactive/libpq-exec.html). Even the original sentence does not describe the problem we're seeing here. It does not mention cross platform incompatibility. > The COPY docs are probably more correct: "The BINARY key word causes all > data to be stored/read as binary format rather than as text. It is > somewhat faster than the normal text mode, but a binary-format file is > less portable across machine architectures and PostgreSQL versions." > Again, to me this sentence spells "you have a problem, we're not going to help you out, deal with it". This is especially true if what Tom said was true, that the text format does not maintain total precision. You are essentially telling the user "you cannot move your data reliably even between servers of the same version". Shachar ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [Oledb-dev] [HACKERS] Re: double precision error with pg linux server, but not with windows pg server
Tom Lane wrote: > Obviously, if you are transporting the dump across platforms then that > may be an impossibility. In that case you use a text dump and accept > that you get an approximation. That's something that I've been meaning to ask about, but you all seemed so sure of yourself. What you are essentially saying is that no one aside from the server itself is allowed to get full precision. That seems like a strange decision to make. > But there should be a way to ensure that > you can have a lossless dump of whatever strange FP format your server > may use, as long as you are restoring onto a similar machine. > Personally (and I know my opinion "doesn't count"), I find the "similar machine" requirement a bit hard. It is usually accepted for HA or live load balancing, but is really inappropriate for backups or data transfers. Just my humble opinion. > If there is a guaranteed-reversible transform between the ARM FP format > and IEEE format, then I'd be interested in hacking things the way you > suggest I find it highly likely that there will be. After all, the ARM format was not designed to be better packed than IEEE, just easier to hardware implement in an efficient way. However > --- but what I suspect is that such a conversion must lose > either range or precision. There are only so many bits in a double. > Like I said elsewhere, a 64bit FP only has 64bits, but there is nothing constraining us to export a 64bit number to 64bits. > regards, tom lane > What I'm worried about in that regard is about other platforms that PG may be running on. Like I said above, I'm fairly sure (will get the specs and make sure) that there shouldn't be a problem in exporting ARM FP into 64bit IEEE with no loss at all. This says nothing, however, about other platforms. Unless we map all cases, we had better choose an export format that is capable of extension. Shachar ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Tom Lane wrote: > Sure it's "possible". Send a Parse command, ask for Describe Statement > output, then specify the column formats as desired in Bind. Now this > does imply an extra server round trip, which might be annoying if your > client code doesn't have another reason to need to peek at Describe > output. > No, it's far worse than annoying. It kills my performance. OLE DB does have a mechanism for explicit "prepare", and I certainly COULD do it only for those cases, but it is meaningless. > >> What if I send a patch that sends a 64bit float as 128bit number, >> containing two integers, one for mantissa and one for exponent. This >> format is guaranteed to never lose precision, but is wire compatible >> across platforms. Would that be considered a good solution? >> > > No, not unless you can make the case why this handles NaNs and > denormalized numbers compatibly across platforms... > NaNs and infinite (plus and minus) should not be a problem. I'm not sure what denormalized numbers are. If you mean (switching to base 10 for a second) that 2*10^3 vs. 20*10^2, then I would have to ask why you want them treated differently. What is the scenario in which you would want to tell them apart? Likewise, would you really want to tell +0 and -0 apart? If I have an export/import round trip that turns -0 into +0, is that really a problem? > regards, tom lane > Shachar ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Passing more context info to selectivity-estimation code
I've been thinking about what it will take to solve the problem noted here: http://archives.postgresql.org/pgsql-performance/2007-05/msg00325.php which briefly is that 8.2 is really bad at estimating the number of rows returned by locutions like SELECT ... FROM tab1 LEFT JOIN tab2 ON (tab1.x = tab2.y) WHERE tab2.y IS NULL; the point being to extract just those rows of tab1 that have no join partner in tab2. We frequently recommend this idiom as a better-performing substitute for NOT IN. However, the selectivity of the IS NULL clause is estimated without thought for the effects of the outer join; thus for example if there are no nulls in the actual tab2.y column, the estimate will be that there are no rows in the join result. (Which gets clamped to the minimum estimate of one row, but that's not much help.) If this join is then joined to something else, 8.2 is likely to think it should put the "one-row-out" join on the outside of a nestloop join, yielding horrible performance if there are actually many rows out. Although pre-8.2 releases also failed to consider the effects of outer joins' null insertion, they misestimated this case as returning at least as many result rows as there were in the lefthand input relation (because they didn't distinguish WHERE and JOIN/ON conditions in this context). That's perhaps less likely to give rise to a fatally bad plan. AFAICS the only principled fix for this is to give selectivity estimator functions more knowledge about the context their argument clause appears in. For instance if we restructure the above as SELECT ... FROM tab1 LEFT JOIN tab2 ON (tab1.x = tab2.y AND tab2.y IS NULL); the IS NULL estimator should certainly *not* count the effects of the outer join, even though it's looking at exactly the same clause. So there has to be some context passed in to show which outer joins we are "above" and which not. There already is some context passed to the estimators, in the form of "varRelid" and join-type parameters. In the long term I am inclined to replace these rather ad-hoc parameters with something along the lines of "ClauseContext *context" to carry info about the context in which the estimate needs to be made. This notation would let us add or redefine fields in the ClauseContext struct without having to touch every estimator function again. But it's a bit late to be considering that for 8.3, and it's certainly a nonstarter to think of retrofitting it into 8.2. The only way I can see to fix the problem in 8.2 is to store clause context information within the PlannerInfo data structures. This is pretty grotty, since PlannerInfo is supposed to be global state information for a planner run; but I can't see any need for selectivity estimation code to be re-entrant with respect to a particular planner invocation, so it should work without any problems. The specific hack I'm thinking of is to extend the OuterJoinInfo structures with some fields that would indicate whether the currently considered clause is "above" or "below" each outer join, and further show the estimated percentage of injected nulls for each one we're "above". A traversal of this list would then provide enough knowledge for nulltestsel() or other routines to derive the right answer. Callers of clause_selectivity or clauselist_selectivity would need to ensure that the correct state was set up beforehand. For the moment I'm inclined to teach only nulltestsel() how to use the info. Probably in the long run we'd want all the selectivity estimators to incorporate this refinement, but I'm not sure how good the estimates of null-injection will be; seems prudent not to rely on them everywhere until we get more field experience with it. This is a pretty large and ugly patch to be thinking of back-patching :-(. A quick-hack approach would be to revert this patch: http://archives.postgresql.org/pgsql-committers/2006-11/msg00066.php which would cause 8.2 to uniformly overestimate rather than underestimate the size of the result. That pretty well sucks too, as seen here: http://archives.postgresql.org/pgsql-general/2006-11/msg00472.php but maybe it's less bad than an underestimate. Comments, better ideas? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: I do recall someone telling me that text mode transfer could actually be faster than binary, somewhat to their (and my) surprise. Seems a bit improbable --- what was their test case? No idea - this was idle chat on IRC I think. I am similarly skeptical. After all, we just had a discussion about improving performance of PLs by avoiding use of the input/output functions in some cases. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Andrew Dunstan <[EMAIL PROTECTED]> writes: > I do recall someone telling me that text mode transfer could actually be > faster than binary, somewhat to their (and my) surprise. Seems a bit improbable --- what was their test case? The only such situation that comes to mind is that some values are smaller as text than binary (eg "2" is shorter as text than as any binary numeric format), so in a situation where number of bytes sent dominates all other costs, text would win. But of course there are also many values that're smaller in binary format, so I'd think this would only happen in restricted test cases. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Shachar Shemesh <[EMAIL PROTECTED]> writes: > Heikki Linnakangas wrote: >> Is it not possible to use text >> format in OLE DB, for floating points? > It is impossible to use text format for just floating point. I often > don't know in advance what type the result is going to be. Sure it's "possible". Send a Parse command, ask for Describe Statement output, then specify the column formats as desired in Bind. Now this does imply an extra server round trip, which might be annoying if your client code doesn't have another reason to need to peek at Describe output. An idea that's been in the back of my mind for awhile is to provide some way to let the client say things like "I want float and timestamp results in text and everything else in binary", so that one setup step at the start of the session avoids the need for the extra round trips. Haven't got a detailed proposal at the moment though. > What if I send a patch that sends a 64bit float as 128bit number, > containing two integers, one for mantissa and one for exponent. This > format is guaranteed to never lose precision, but is wire compatible > across platforms. Would that be considered a good solution? No, not unless you can make the case why this handles NaNs and denormalized numbers compatibly across platforms... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Shachar Shemesh wrote: Perhaps OLE is trying to use binary instead of text transmission of data? Of course it does. That's what the OLE DB specs say. Said so in my original email. Why the heck do the OLE DB specs care about the internals of the client-server prototocol? It is documented fairly clearly that text is the only portable way to transfer data. Perhaps we need to expand this sentence in the docs: "Keep in mind that binary representations for complex data types may change across server versions;" The COPY docs are probably more correct: "The BINARY key word causes all data to be stored/read as binary format rather than as text. It is somewhat faster than the normal text mode, but a binary-format file is less portable across machine architectures and PostgreSQL versions." I do recall someone telling me that text mode transfer could actually be faster than binary, somewhat to their (and my) surprise. cheers andrew ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Shachar Shemesh <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Binary format has other goals that are not always compatible with 100% >> platform independence --- that's unfortunate, sure, but it's reality. >> > Maybe the misunderstanding is mine. What are the goals for the binary > format? Well, the one that I'm concerned about at the moment is that dump/reload using COPY BINARY ought to be 100% guaranteed to reproduce the original datum. Obviously, if you are transporting the dump across platforms then that may be an impossibility. In that case you use a text dump and accept that you get an approximation. But there should be a way to ensure that you can have a lossless dump of whatever strange FP format your server may use, as long as you are restoring onto a similar machine. If there is a guaranteed-reversible transform between the ARM FP format and IEEE format, then I'd be interested in hacking things the way you suggest --- but what I suspect is that such a conversion must lose either range or precision. There are only so many bits in a double. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Heikki Linnakangas wrote: >> But sometimes, like now, PG puts me in an impossible position. You are >> essentially telling me "you will get the numbers in an unknown format, >> you will not have any way of knowing whether you got them in a strange >> format or not, nor will you have any docs on what that format is going >> to be". That is no way to treat your driver developers. > > You seem to be ignoring the fact that the text format is > platform-independent. That's what for example JDBC uses, and I'd > imagine other interfaces as well. Is it not possible to use text > format in OLE DB, for floating points? It is impossible to use text format for just floating point. I often don't know in advance what type the result is going to be. I can switch EVERYTHING to text mode, but as the other end of the interface requires me to get things out in binary format, I get the following absurd path: PG stores in binary PG translates to text PG sends to driver driver translates to binary As long as I get consistent binary format, I prefer to translate between binary formats than between text and binary. > >> What are the "send" functions used for, beside server to client >> communication, anyways? You are asking me to treat the binary data as an >> opaque. Well, I'll counter with a question - what good is that to me? > > Imagine an application that does this: > > Issue query "SELECT foofloat FROM footable", and store the value to a > variable > Issue "INSERT INTO footable2 VALUES (?)", and send back the same value. Why would I want to do that over "insert into footable2 select foofloat from footable"? I know, even if it makes no sense you'd want it to work. All I'm saying is that something has got to give, and there is no reason to assume that your usage is more likely than mine. For that reason, without looking into the ARM float implementation, it is just as likely that it contains LESS precision than the IEEE one. Would that change the objection? > > Don't you think footable and footable2 should now have the same value? > If "SELECT * FROM footable, footable2 WHERE foofloat=foofloat" returns > 0 rows, I'd be pissed. > > It's possible that the conversion within the driver loses some > precision, depending on the data types supported by the language and > platform, but the wire protocol should at least give the driver a > chance to get it right. I'm not sure why there is a difference here. How is that above example different than the exact same example written in ADODB (and thus passing driver conversion)? I'll take this one step further. Does the text representation never lose precision? Ever? What if I send a patch that sends a 64bit float as 128bit number, containing two integers, one for mantissa and one for exponent. This format is guaranteed to never lose precision, but is wire compatible across platforms. Would that be considered a good solution? > > The current code is not used for communicating between two servers. > And it works fine as long as the client and the server are on the same > platform. So does the driver. And yet, a user complained! In my dictionary, this means that "as long as the client and server are on the same platform" is not a good enough requirement. > > Don't get me wrong, I agree that the binary format is broken as it is, > but the cure mustn't be worse than the disease. > >> Please, give your own interface designers something to work with. Your >> attitude essentially leaves me out in the cold. > > Design a wire protocol that > 1. Doesn't lose information on any platform > 2. Is more efficient than text format > > and I'm pretty sure it'll be accepted. > I just offered one. I would hate it myself, and it would mean that pre 8.3 (or whenever it is that it will go in) will have a different representation than post the change, but it will live up to your requests. Shachar ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Tom Lane wrote: > Binary format has other goals that are not always compatible with 100% > platform independence --- that's unfortunate, sure, but it's reality. > Maybe the misunderstanding is mine. What are the goals for the binary format? Shachar ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Not ready for 8.3
On Sat, 19 May 2007, Andrew Dunstan wrote: What would making a branch actually do for you? The only advantage I can see is that it will give you a way of checkpointing your files. Exactly. It's not as bad now, but when I was getting started there were lots of times I got something working and I wanted a clean way to save it in that state before I started messing with anything else--such that I could backtrack what I did if I later broke it. This style of work has some advantages for working on other people's patches as well, especially if you're trying to review two at once and it takes you a while to finish--the situation I always find myself in. I don't see how you can do it reasonably off a local cvs mirror - rsync will just blow away any changes you have checked in next time you sync with the master. It's certainly not easy. I try not to let the fact that what I'd like to do may actually be impossible ever discourage me. There are at least three ways to approach this general problem just counting the rsync/CVS variations, each with some obvious and some subtle advantages/disadvantages. I'm still in the stage where I'm mapping out the options. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > I agree in principle that the wire protocol should be > platform-independent. The *TEXT* format is for that. The problem here is that Shachar is insisting on using binary format in a context where it is inappropriate. Binary format has other goals that are not always compatible with 100% platform independence --- that's unfortunate, sure, but it's reality. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Shachar Shemesh wrote: Tom Lane wrote: Shachar Shemesh <[EMAIL PROTECTED]> writes: I'll reiterate - the problem is not that PG is exporting the internal ARM FP format. The problem is that the server is exporting the internal ARM FP format when the server is ARM, and the IEEE format when the server is Intel. It's not the format, it's the inconsistency. This is not a bug, it's intentional. While IEEE-spec floats are reasonably interchangeable these days (modulo the endianness issue), other FP formats tend to have different ranges, special cases, etc. If we try to force them to IEEE spec we may have problems with overflow, loss of precision, who knows what. Yes, but if we do not then we have a data interchange library that is useless for data interchange. I think overflow and precision loss is preferable. I agree in principle that the wire protocol should be platform-independent. At the very least, if the server always sends in native format, the client needs to know which format it's receiving and be able to convert between all formats. Switching to a common format on all platforms would of course make life easier for client library developers. But sometimes, like now, PG puts me in an impossible position. You are essentially telling me "you will get the numbers in an unknown format, you will not have any way of knowing whether you got them in a strange format or not, nor will you have any docs on what that format is going to be". That is no way to treat your driver developers. You seem to be ignoring the fact that the text format is platform-independent. That's what for example JDBC uses, and I'd imagine other interfaces as well. Is it not possible to use text format in OLE DB, for floating points? (In this context "reliable" means "can reproduce the original datum exactly when transmitted back".) Who cares? If you are using the same function for binary communication inside the server and for communications to the clients (or, for that matter, another server), then there is something wrong in your design. What are the "send" functions used for, beside server to client communication, anyways? You are asking me to treat the binary data as an opaque. Well, I'll counter with a question - what good is that to me? Imagine an application that does this: Issue query "SELECT foofloat FROM footable", and store the value to a variable Issue "INSERT INTO footable2 VALUES (?)", and send back the same value. Don't you think footable and footable2 should now have the same value? If "SELECT * FROM footable, footable2 WHERE foofloat=foofloat" returns 0 rows, I'd be pissed. It's possible that the conversion within the driver loses some precision, depending on the data types supported by the language and platform, but the wire protocol should at least give the driver a chance to get it right. Please note that the current code is useless for communicating binary data between two servers, even if they are guaranteed to be of the same version! How much less reliable can you get? The current code is not used for communicating between two servers. And it works fine as long as the client and the server are on the same platform. Don't get me wrong, I agree that the binary format is broken as it is, but the cure mustn't be worse than the disease. Please, give your own interface designers something to work with. Your attitude essentially leaves me out in the cold. Design a wire protocol that 1. Doesn't lose information on any platform 2. Is more efficient than text format and I'm pretty sure it'll be accepted. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Tom Lane wrote: > Shachar Shemesh <[EMAIL PROTECTED]> writes: > >> I'll reiterate - the problem is not that PG is exporting the internal >> ARM FP format. The problem is that the server is exporting the internal >> ARM FP format when the server is ARM, and the IEEE format when the >> server is Intel. It's not the format, it's the inconsistency. >> > > This is not a bug, it's intentional. While IEEE-spec floats are > reasonably interchangeable these days (modulo the endianness issue), > other FP formats tend to have different ranges, special cases, etc. > If we try to force them to IEEE spec we may have problems with overflow, > loss of precision, who knows what. > Yes, but if we do not then we have a data interchange library that is useless for data interchange. I think overflow and precision loss is preferable. Please remember that I'm only trying to help Postgresql here. I have a spec to work with on the outside. I'm more than willing to do what's necessary (see the STRRRANGE date conversion code) in order to adapt whatever PG throws my way to the no less strange representation expected of me. That's what I do as a driver hacker. Sometimes, the specs don't help me. Windows' notion of "timezone free timestamps" is nothing short of a disgrace, and some of the hacks that are needed around that issues are, well, hacks. I don't come complaining here, because this has nothing to do with PG. It's bad design on the other end of the two ends that a driver has to make meet. But sometimes, like now, PG puts me in an impossible position. You are essentially telling me "you will get the numbers in an unknown format, you will not have any way of knowing whether you got them in a strange format or not, nor will you have any docs on what that format is going to be". That is no way to treat your driver developers. > >> Like I said elsewhere, I'm willing to write a patch to "pq_sendfloat8" >> (and probably "pq_getmsgfloat8" too) to make sure it does the conversion >> on ARM platforms. Hell, I think I can even write it portable enough so >> that it will work on all non-IEEE platforms >> > > Really? Will it be faster Absolutely. Do you honestly believe that turning a 64bit binary number into a 40 something byte decimal number will be quicker than turning a 64 bit binary number into another 64 bit number? For one thing, I really doubt that my technique will require division, modulo or, in fact, any math operations at all. It will likely be done with a few bit shifting and that's it. I also find it strange, though, that you berate me for using binary rather than text format, and then complain about speed. That's what makes OLE DB faster than ODBC - binary interface. > and more reliable than conversion to text? > Well, depends on how you define "more reliable". If you define it to mean "exactly represents what happens in the server internals", then the answer is "no". If you define it to mean "make more sense to the client, and have better chances of producing results that more closely approximate the right number than the current code", then the answer is a definite yes. > (In this context "reliable" means "can reproduce the original datum > exactly when transmitted back".) > Who cares? If you are using the same function for binary communication inside the server and for communications to the clients (or, for that matter, another server), then there is something wrong in your design. What are the "send" functions used for, beside server to client communication, anyways? You are asking me to treat the binary data as an opaque. Well, I'll counter with a question - what good is that to me? Please note that the current code is useless for communicating binary data between two servers, even if they are guaranteed to be of the same version! How much less reliable can you get? Please, give your own interface designers something to work with. Your attitude essentially leaves me out in the cold. > regards, tom lane > Shachar ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Tom Lane wrote: > Shachar Shemesh <[EMAIL PROTECTED]> writes: > >> I've received a bug report on the OLE DB list, which I suspect is >> actually a server bug. The correspondence so far is listed further on, >> but, in a nutshell, user runs an OLE DB client on windows (OLE DB uses >> the binary interface), and server version 8.1.9 on Windows, and all is >> fine. When the server is switched to 8.1.8 on Debian Etch on ARM9, >> float8 type is not received properly by OLE DB. >> > > Perhaps OLE is trying to use binary instead of text transmission of > data? Of course it does. That's what the OLE DB specs say. Said so in my original email. > It's not a server bug if ARM has a different FP format than > the client hardware; No. The server can use, internally, whatever it wants. > it's the client's responsibility to either use > text format or be prepared to cope with whatever the binary format is. > I agree 100%. I'll reiterate - the problem is not that PG is exporting the internal ARM FP format. The problem is that the server is exporting the internal ARM FP format when the server is ARM, and the IEEE format when the server is Intel. It's not the format, it's the inconsistency. I can (and I do) handle, in PgOleDb, binary format that are vastly different than those that I need (anyone said "timestamps"?). Handling a format that is inconsistent across same version backends merely because of platform, now that's a server bug if I ever saw one. > regards, tom lane > Like I said elsewhere, I'm willing to write a patch to "pq_sendfloat8" (and probably "pq_getmsgfloat8" too) to make sure it does the conversion on ARM platforms. Hell, I think I can even write it portable enough so that it will work on all non-IEEE platforms (I'm not sure yet, but I have a general idea). What I'm hoping for, however, is that if I do, you (or another comitter) will get it in. Shachar ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Stefan Kaltenbrunner wrote: > Shachar Shemesh wrote: > >> Hi guys of the pgsql-hackers list. >> >> I've received a bug report on the OLE DB list, which I suspect is >> actually a server bug. The correspondence so far is listed further on, >> but, in a nutshell, user runs an OLE DB client on windows (OLE DB uses >> the binary interface), and server version 8.1.9 on Windows, and all is >> fine. When the server is switched to 8.1.8 on Debian Etch on ARM9, >> float8 type is not received properly by OLE DB. >> > > if 8.1.8 is built based on the debian packages it is probably compiled > with --enable-integer-datetimes. > Is the oledb client library able to cope with that ? > I'm calling "GetPgStatus(&stat, _bstr_t("integer_datetimes") );" to find out what the server representation is, and cope accordingly. This only doesn't work on 7.4 servers, but otherwise I'm fine. either way, this is off topic to this problem. >From further reading, it seems that ARM indeed uses its own representation for IEEE floats. I'll try to poll my sources, try and find out what the %([EMAIL PROTECTED]&$ this format actually is (google was no help), and try and formulate a patch for PG to export it in IEEE despite the discrepancies. > Stefan > Shachar ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Shachar Shemesh <[EMAIL PROTECTED]> writes: > I'll reiterate - the problem is not that PG is exporting the internal > ARM FP format. The problem is that the server is exporting the internal > ARM FP format when the server is ARM, and the IEEE format when the > server is Intel. It's not the format, it's the inconsistency. This is not a bug, it's intentional. While IEEE-spec floats are reasonably interchangeable these days (modulo the endianness issue), other FP formats tend to have different ranges, special cases, etc. If we try to force them to IEEE spec we may have problems with overflow, loss of precision, who knows what. > Like I said elsewhere, I'm willing to write a patch to "pq_sendfloat8" > (and probably "pq_getmsgfloat8" too) to make sure it does the conversion > on ARM platforms. Hell, I think I can even write it portable enough so > that it will work on all non-IEEE platforms Really? Will it be faster and more reliable than conversion to text? (In this context "reliable" means "can reproduce the original datum exactly when transmitted back".) regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Shachar Shemesh <[EMAIL PROTECTED]> writes: > I've received a bug report on the OLE DB list, which I suspect is > actually a server bug. The correspondence so far is listed further on, > but, in a nutshell, user runs an OLE DB client on windows (OLE DB uses > the binary interface), and server version 8.1.9 on Windows, and all is > fine. When the server is switched to 8.1.8 on Debian Etch on ARM9, > float8 type is not received properly by OLE DB. Perhaps OLE is trying to use binary instead of text transmission of data? It's not a server bug if ARM has a different FP format than the client hardware; it's the client's responsibility to either use text format or be prepared to cope with whatever the binary format is. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Shachar Shemesh wrote: > Hi guys of the pgsql-hackers list. > > I've received a bug report on the OLE DB list, which I suspect is > actually a server bug. The correspondence so far is listed further on, > but, in a nutshell, user runs an OLE DB client on windows (OLE DB uses > the binary interface), and server version 8.1.9 on Windows, and all is > fine. When the server is switched to 8.1.8 on Debian Etch on ARM9, > float8 type is not received properly by OLE DB. if 8.1.8 is built based on the debian packages it is probably compiled with --enable-integer-datetimes. Is the oledb client library able to cope with that ? Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Signing off of patches (was Re: Not ready for 8.3)
"Karl O. Pinc" <[EMAIL PROTECTED]> writes: > On 05/18/2007 08:59:11 PM, Tom Lane wrote: >> I'd like to see something that emphasizes review and feedback at the >> stages of germinal idea, rough functional spec, implementation >> concept, > Speaking as a larval Postgres hacker I have trouble asking about > the germinal idea and rough functional spec parts. Without > having some clue about the implementation concept it's > difficult for me to imagine whether or not I want to > or will be able to put the effort into making the actual > code work. Well, but if you ask at an early stage it's perfectly fair to ask for comments on how much work an implementation of idea X might be. Plus people could save you from wasting time going down dead-end paths. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] COPY into a view; help w. design & patch
"Karl O. Pinc" <[EMAIL PROTECTED]> writes: > I don't really want to do this. I really want my users > to be able to use the COPY statement without worrying > about whether they are copying into a table or a view. But ... but ... the proposed feature entirely fails to achieve that. Copying into an explicit INSERT statement isn't necessarily a bad idea, but surely it's not transparent in that way. > I _could_ make tables that "correspond" > to the views and put BEFORE INSERT triggers on them and > have the triggers insert into the views (or the equalivent), > but then the users would have to use the views for most > things and the "corresponding tables" when doing a COPY > or using the application's data import function. There's been previous discussion of allowing BEFORE INSERT triggers on views, so long as the triggers always return NULL to suppress the actual insertion attempt (ie, we'd move the "can't insert into view" test out of the rewriter and put it downstream of trigger firing in the executor). So far no one's figured out how to make that idea work for UPDATE/DELETE, but maybe you could argue that even if it only worked for INSERT it'd be a useful feature. It'd certainly solve the problem for COPY. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Hi guys of the pgsql-hackers list. I've received a bug report on the OLE DB list, which I suspect is actually a server bug. The correspondence so far is listed further on, but, in a nutshell, user runs an OLE DB client on windows (OLE DB uses the binary interface), and server version 8.1.9 on Windows, and all is fine. When the server is switched to 8.1.8 on Debian Etch on ARM9, float8 type is not received properly by OLE DB. Since OLE DB doesn't really care what version the server is running, the chances of this being a server side bug are really high. I don't know ARM9 well enough to comment on floating point format there. Julian Heeb wrote: > Shachar Shemesh schrieb: > >> Julian Heeb wrote: >> >> >>> Hello >>> >>> Our acounting software can use the PostgreSQL OLE DB driver to access >>> a postgreSQL database. With the pg server installed on windows, >>> everything works fine. >>> >>> I moved now the database to a postgreSQL server on a linux server, but >>> now every floating point number gets wrongly interpreted by the >>> acounting software, either by replacing it with a 0 or a very large >>> number (e.g. xxxE+308). Only the floating point numbers are affected, >>> integer or characters are correct. pgAdmin shows even the fp numbers >>> correctly, so I guess it has something to do with the pgoledb driver. >>> >>> Can someone give me a hint, how to solve the problem? >>> >>> >> It's hard to give a precise answer. Let's try a couple of venues. >> >> First of all, what platform is the Linux server? Is that an Intel, or >> something else? >> >> > It is an ARM9 platform with Debian Etch (Linkstation Pro Fileserver with > Freelink). > >> Also, what is the precise type of the floating point var on the server? >> Can you give the SQL line that generated the table? >> >> > The table has been generated by the following SQL line. The problem > occures at the double precision fields. > I have some bad news. This is the comment in the Postgresql source code. This seems to be a core problem at the server side of things: > /* > *pq_sendfloat8- append a float8 to a StringInfo buffer > * > * The point of this routine is to localize knowledge of the external > binary > * representation of float8, which is a component of several datatypes. > * > * We currently assume that float8 should be byte-swapped in the same way > * as int8. This rule is not perfect but it gives us portability across > * most IEEE-float-using architectures. > * > */ Could it be that ARM9 is not IEEE float standard? Can anyone from the "hackers" list give any insight into this? The function for the data type import on the client side seems to be in order (switch the byte order around, and assume it's a valid "double" C type). Shachar ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Not ready for 8.3
Andrew Dunstan wrote: What would making a branch actually do for you? The only advantage I can see is that it will give you a way of checkpointing your files. As I remarked upthread, I occasionally use RCS for that. But mostly I don't actually bother. I don't see how you can do it reasonably off a local cvs mirror - rsync will just blow away any changes you have checked in next time you sync with the master. I don't think we can make CVS behave like a distributed SCM system, and ability to create local branches seems to me one of the fundamental points of such systems. If that's what the demand is for, then we should look again at moving to something like Mercurial. I think the great thing about DCVS systems is that not everybody necessarily needs to use the *same* system. And it doesn't really matter what the central repository runs on - I think they are gateway from/to nearly everything available... I currently use GIT for my SoC project, and it works quite well - I can create an abitrary number of local branches, and syncing the currently active branch with CVS is archived by just doing "cg-update pgsql-head". greetings, Florian Pflug ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Reducing stats collection overhead
Afaik Tom hadn't finished his patch when I was testing things, so I don't know. But we're in the process of benchmarking a new system (dual quad-core Xeon) and we'll have a look at how it performs in the postgres 8.2dev we used before, the stable 8.2.4 and a fresh HEAD-checkout (which we'll call 8.3dev). I'll let you guys (or at least Tom) know how they compare in our benchmark. Best regards, Arjen On 18-5-2007 15:12 Alvaro Herrera wrote: Tom Lane wrote: Arjen van der Meijden told me that according to the tweakers.net benchmark, HEAD is noticeably slower than 8.2.4, and I soon confirmed here that for small SELECT queries issued as separate transactions, there's a significant difference. I think much of the difference stems from the fact that we now have stats_row_level ON by default, and so every transaction sends a stats message that wasn't there by default in 8.2. When you're doing a few thousand transactions per second (not hard for small read-only queries) that adds up. So, did this patch make the performance problem go away? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Not ready for 8.3
Greg Smith wrote: Heikki's great summary helps (I think the one piece I was screwing up is covered there), and Pavan's comments adds some useful bits. The still missing part is how to make a real branch to work in, which is much easier to work with once you figure out how to do it than either using just diffs or tagging. What would making a branch actually do for you? The only advantage I can see is that it will give you a way of checkpointing your files. As I remarked upthread, I occasionally use RCS for that. But mostly I don't actually bother. I don't see how you can do it reasonably off a local cvs mirror - rsync will just blow away any changes you have checked in next time you sync with the master. I don't think we can make CVS behave like a distributed SCM system, and ability to create local branches seems to me one of the fundamental points of such systems. If that's what the demand is for, then we should look again at moving to something like Mercurial. cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq