On Thu, Jul 20, 2017 at 9:31 PM, Robert Haas <robertmh...@gmail.com> wrote: > I was initially surprised that your testing managed to pass, but then > I noticed that this sanity test is using && where it should really be > using ||; it will only fail if ALL of the data types are wrong. Oops.
Oh, oh. If this was right from the beginning I would have hit this issue. Yes that's a clear oversight of the original code. > This code plays pretty fast and loose with off_t vs. size_t vs. uint64 > vs. int64, but those definitely don't all agree in signedness and may > not even agree in width (size_t, at least, might be 4 bytes). We use > stat() to get the size of a file as an off_t, and then store it into a > uint64. Perhaps off_t is always uint64 anyway, but I'm not sure. > Then, that value gets passed to fetch_file_range(), still as a uint64, > and it then gets sent to the server to be stored into an int8 column, > which is signed rather than unsigned. That will error out if there's > a file size >= 2^63, but filesystem holding more than 8 exabytes are > unusual and will probably remain so for some time. The server sends > back an int64 which we pass to write_target_range(), whose argument is > declared as off_t, which is where we started. I'm not sure there's > any real problem with all of that, but it's a little nervous-making > that we keep switching types. Similarly, libpqProcessFileList gets > the file size as an int64 and then passes it to process_source_file() > which is expecting size_t. So we manage to use 4 different data types > to represent a file size. On most systems, all of them except SQL's > int8 are likely to be 64-bit unsigned integers, but I'm not sure what > will happen on obscure platforms. Yes, I felt uneasy as well when hacking the patch with all the type switches that have been done, but I kept my focus on bringing out a minimally invasive patch to fix the issue and any other holes I found. One thing that I think could be added in the patch is an overflow check in libpqGetFile(), because I think that we still want to use size_t for this code path. What do you think? Note I am not much worrying on signedness of the values yet (Postgres still lacks a proper in-core unsigned type, this gives an argument for one), as long as the value are correctly stored on 8 bytes, and we still have some margin until we reach that. We could add a comment in the code to underline that assumption, though I am not sure that this is really necessary... > Still, it can't be worse than the status quo, where instead of int64 > we're using int and int32, so maybe we ought to back-patch it as-is > for now and look at any further cleanup that is needed as a > master-only improvement. Yes. I don't like playing much with the variable types on back-branches, as long as the initial amount of bytes is large enough we will be safe for some time. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers