Re: Windows installation problem at post-install step
On Mon, Jul 22, 2024 at 11:51 PM Sandeep Thakkar wrote: > EDB's windows installer gets the locales on the system using the > https://github.com/EnterpriseDB/edb-installers/blob/REL-16/server/scripts/windows/getlocales/getlocales.cpp > and then substitute some patterns > (https://github.com/EnterpriseDB/edb-installers/blob/REL-16/server/pgserver.xml.in#L2850) > I'm not sure why we do that but that is the old code and probably @Dave Page > may know but I'm not sure if that piece of code is responsible for this > change in encoding in this case. Ah, so it's calling EnumSystemLocales(). Interestingly, the documentation for that function says: "Note For interoperability reasons, the application should prefer the EnumSystemLocalesEx function to EnumSystemLocales because Microsoft is migrating toward the use of locale names instead of locale identifiers for new locales. Any application that will be run only on Windows Vista and later should use EnumSystemLocalesEx." That seems to be talking about this exact issue, that we're supposed to be using "locale names". I'm a little confused about the terminology for the various types of names and identifiers but if you follow the link to a example program[1] you can see that it's talking about the BCP47 "en-US" kind, that we want. (That quote makes it sound like a new thing, but Vista came out ~17 years ago.) So one idea would be that in v18, we not only change initdb.exe to pick a BCP47 locale name by default as I proposed in that other thread[2], but also in the v18 version of the EDB installer you consider switching that code over to EnumSystemLocalesEx(). Then we can start to kiss goodbye to the bad old names. People would still propagate them into the future with pg_upgrade I guess, and it'd be up to users to replace them by updating their catalogs manually. Does that make sense? [1] https://learn.microsoft.com/en-us/windows/win32/intl/nls--name-based-apis-sample [2] https://www.postgresql.org/message-id/flat/CA%2BhUKGJ%3DXThErgAQRoqfCy1bKPxXVuF0%3D2zDbB%2BSxDs59pv7Fw%40mail.gmail.com
Re: Windows installation problem at post-install step
On Mon, Jul 22, 2024 at 11:58 AM Ertan Küçükoglu wrote: > Thomas Munro , 21 Tem 2024 Paz, 23:27 tarihinde şunu > yazdı: >> 2. Some existing database clusters which had been installed with the >> name "Turkish_Turkey.1254" became unstartable when the OS upgrade >> renamed that locale to "Turkish_Türkiye.1254". I'm trying to provide >> a pathway[2] to fix such systems in core PostgreSQL in the next minor >> release. Everyone affected probably already found another way but at >> least next time a country is renamed this might help with the next >> point too. > > I was also hit by that OS update. > There is a Microsoft tool for creating a locale installer > https://www.microsoft.com/en-us/download/details.aspx?id=41158 > Using that tool and adding a second locale Turkish_Turkey.1254 (name before > Microsoft update) in the OS can fix your broken PostgreSQL. > I believe most people simply choose this path. > There are also several blogs/articles written in Turkish about the problem. If that's easy and good enough then maybe I should abandon that on-the-fly renaming patch and we should just do a little documentation note... >> 3. I'd also like to teach initdb to use BCP47 names like "tr-TR" >> instead of those names by default (ie if you don't specify a locale >> name explicitly), and have proposed that before[3] but it hasn't gone >> in due to lack of testing/reviews from Windows users. It seems like >> that doesn't matter much in practice to all the people using the >> popular EDB installer, since it apparently takes control of picking >> the locale and explicitly passes it in (and screws up the encoding as >> we have now learned). > > If I am not mistaken BCP47 names are already used in Linux systems. > Using them would make PostgreSQL use the same locale names across Linux and > Windows systems. Not exactly. POSIX systems use [language[_territory][.codeset][@modifier]], but POSIX doesn't say what any of those components are[1] (are they ISO country codes? English words? Hieroglyphs?), so, curiously, those Windows names like "English_United States.1252" are probably POSIX-conforming. Every real POSIX system of course uses ISO language and country codes these days (though I still recall other names being used years ago), so they look similar to the simpler kinds of BCP47 tags, which are just language-country with the same ISO codes but a different separator. They diverge further once you get into the finer points with more components. Incidentally that lack of standardisation is the reason you can't say that the glibc ".utf8" ending is "wrong", even though it is obviously stupid :-p (all systems I know accept .UTF-8, 'cause that's what Ken Thompson, Rob Pike and the Unicode standard called it). I suspect that Windows accepts the POSIX style en_US too, but it's not what the manual tells you to use. But really we shouldn't have to know or care how locales are named; we should get the names from the OS in the first place, and then we should remember them and give them back to the OS at the right times. The two problems here is that Windows has two kinds, one unstable over time and with illegal (for us) characters in the name, and one stable; we need to find all the places where the old unstable ones can get into our system, and block them off. I'm aware of two places now: the EDB installer, and initdb's default for people who run it on the command line with giving an explicit name. > I can help with the testing part. Let me know the details, please. Thanks! I will rebase that patch, and CC you on the thread. [1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html
Re: Windows installation problem at post-install step
On Mon, Jul 22, 2024 at 7:29 AM Adrian Klaver wrote: > On 7/21/24 12:00, Ertan Küçükoglu wrote: > > My main purpose was and still is to reach EDB people using the forum and > > let them know about the problem. > > I believe it is something to be fixed for future installations. I would > > like to provide additional information if needed. > > You could try a back door method and post here: > > https://www.postgresql.org/list/pgadmin-support/ > > pgAdmin comes from EDB also, maybe someone on that list could pass your > issue on. I guess this is where EDB installer issues should go: https://github.com/EnterpriseDB/edb-installers/issues It seems like there are about 3 different problems associated with the new Turkish_Türkiye.1254 locale name: 1. EDB's installer apparently has a problem with the encoding of the name of the locale itself. Looking at your log file with my pager, it shows: The database cluster will be initialized with locale "Turkish_Trkiye.1254". I think that means that it had the name of the locale encoded as "CP437" at some point (where ü is 0x81, apparently[1]), but then somewhere it was reencoded to the sequence 0xc2 0x81 (shown by my pager as ), which is nonsense. The way to get there would be to believe falsely that the source encoding was Latin1, I guess. I'm not even sure what encoding it should be giving to initdb (maybe the ACP of your system?), and in fact it's a bit undefined for PostgreSQL at least, but that seems to be double-confused. I suspect the solution to this might be for EDB's installer to somehow convert your selected language to the modern short code format, like "tr-TR". Those are pure ASCII. I don't know where it should get the list from. 2. Some existing database clusters which had been installed with the name "Turkish_Turkey.1254" became unstartable when the OS upgrade renamed that locale to "Turkish_Türkiye.1254". I'm trying to provide a pathway[2] to fix such systems in core PostgreSQL in the next minor release. Everyone affected probably already found another way but at least next time a country is renamed this might help with the next point too. 3. I'd also like to teach initdb to use BCP47 names like "tr-TR" instead of those names by default (ie if you don't specify a locale name explicitly), and have proposed that before[3] but it hasn't gone in due to lack of testing/reviews from Windows users. It seems like that doesn't matter much in practice to all the people using the popular EDB installer, since it apparently takes control of picking the locale and explicitly passes it in (and screws up the encoding as we have now learned). As for your immediate problem, you can also use initdb.exe directly to set up a cluster, and tell it to use locale tr-TR. I can't recommend all the switches you'd need to pass it for best compatibility with the EDB GUI tools though, but maybe the ones from your log. [1] https://en.wikipedia.org/wiki/%C3%9C#Computing_codes [2] https://www.postgresql.org/message-id/flat/CA%2BhUKGJTOgnTzu4VD6Am0X6g67atkQHFVk%2BC-w5wkGrGiao-%3DQ%40mail.gmail.com#556557efd6b83cd7a336b62507efe347 [3] https://www.postgresql.org/message-id/flat/CA%2BhUKGJ%3DXThErgAQRoqfCy1bKPxXVuF0%3D2zDbB%2BSxDs59pv7Fw%40mail.gmail.com
Re: libpq v17 PQsocketPoll timeout is not granular enough
On Tue, Jun 11, 2024 at 2:36 AM Dominique Devienne wrote: > Hi. I've noticed [that libpq API in v17 beta1][1], and wanted to use > it to replace an existing Boost.ASIO-based async polling of the > connection's socket, waiting for notifications. The use case being > using PostgreSQL LISTEN/NOTIFY for a simple message queue. The code > needs to be cross-platform Windows and Linux. My goal would be to > eliminate that Boost.ASIO dependency for that, to use just libpq. One idea I have wondered about is why you wouldn't just use poll() directly, if you want a facility that works on all known operating systems without extra portability libraries. Windows has WSApoll(), which AFAIK was designed to be Unix-compatible and a drop-in replacement, requiring just a rename but otherwise having the same macros and struct etc. For some period of time, people who had to deal with socket connection events (that includes us) avoided it, with the Curl guys' blog being the most often cited public explanation for why: https://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/ However, as far as I know, that was fixed ~4 years ago: https://learn.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsapoll "Note As of Windows 10 version 2004, when a TCP socket fails to connect, (POLLHUP \| POLLERR \| POLLWRNORM) is indicated." I wonder if that means that it's now completely usable on all reasonable versions of the OS. I think so? I don't use Windows myself, my interest in this topic is that I have a slow moving background project to figure out how and when to remove all remaining uses of select() from our tree, and this one is on my hit list. > PQsocketPoll() being based on time_t, it has only second resolution, AFAIK. > Despite the [underlying implementation in fe-misc.c][2] supporting at > least milliseconds. Yeah, that is not nice and your complaint is very reasonable, and we should probably do something like what Tom suggested. Hmm, but if what I speculated above is true, I wonder if the extern function is even worth its bits... but I don't know how to determine that completely.
Re: Preallocation changes in Postgresql 16
On Fri, Apr 26, 2024 at 4:37 AM Riku Iki wrote: > I am wondering if there were preallocation related changes in PG16, and if it > is possible to disable preallocation in PostgreSQL 16? I have no opinion on the btrfs details, but I was wondering if someone might show up with a system that doesn't like that change. Here is a magic 8, tuned on "some filesystems": /* * If available and useful, use posix_fallocate() (via * FileFallocate()) to extend the relation. That's often more * efficient than using write(), as it commonly won't cause the kernel * to allocate page cache space for the extended pages. * * However, we don't use FileFallocate() for small extensions, as it * defeats delayed allocation on some filesystems. Not clear where * that decision should be made though? For now just use a cutoff of * 8, anything between 4 and 8 worked OK in some local testing. */ if (numblocks > 8) I wonder if it wants to be a GUC.
Re: could not open file "global/pg_filenode.map": Operation not permitted
On Sat, Mar 23, 2024 at 3:01 AM Nick Renders wrote: > We now have a second machine with this issue: it is an Intel Mac mini running > macOS Sonoma (14.4) and PostgreSQL 16.2. > This one only has a single Data directory, so there are no multiple instances > running. BTW if you're running databases on mains-powered Macs, I have a patch that you might be interested in, which so far hasn't attracted any reviews. The short version is that I bet you can at least lose many seconds of commits (because WAL doesn't really hit durable part of disk), and possibly also fail to recover (pg_control hits disk before WAL, not sure if this is really possible), if you yank the power and you're using the default settings for wal_sync_method. I'd like to rationalise the settings for that stuff and make it safe by default. I don't know anything about the USB storage pathway but I'd be surprised if it's different. https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BF0EL4Up6yVYbbcWse4xKaqW4wc2xpw67Pq9FjmByWVg%40mail.gmail.com
Re: could not open file "global/pg_filenode.map": Operation not permitted
On Thu, Apr 4, 2024 at 3:11 AM Nick Renders wrote: > In the macOS Finder, when you show the Info (command+i) for an external drive > (or any partition that is not the boot drive), there is a checkbox "Ignore > ownership on this volume" in the Permissions section. I think it is by > default "on" for external drives. Hmm. Sounds suspicious, but why would only this file be affected? > The external SSD is an Orico drive that is connected with USB-C. It is > initialised as a GUID Partition Map with a single AFPS partition. > > We have run PostgreSQL 15 and earlier, before upgrading to 16 when it came > out last year. We didn't have any problems with 16 until recently, after > upgrading to Sonoma. Interesting. So the rename might have something to do with it, though I don't have a theory for how,. Can you show what the permissions and ownership looks like for pg_* under there, normally, and once the system reaches this state? Something like: tmunro@phonebox postgresql % ls -slap pgdata/global/pg_* 16 -rw--- 1 tmunro staff 8192 4 Apr 09:50 pgdata/global/pg_control 8 -rw--- 1 tmunro staff 524 4 Apr 09:50 pgdata/global/pg_filenode.map I'm asking for "pg_*" because I want to see pg_control as well, to understand the permissions for the other files in the cluster, and because I want to see if there are any stray remnants of a temporary file, which would be called pg_filenode.map.tmp.
Re: could not open file "global/pg_filenode.map": Operation not permitted
On Fri, Mar 29, 2024 at 4:47 AM Nick Renders wrote: > Looking at the 2 machines that are having this issue (and the others that > don't), I think it is somehow related to the following setup: > - macOS Sonoma (14.4 and 14.4.1) > - data directory on an external drive > > That external drive (a Promise RAID system in one case, a simple SSD in the > other) has the option "ignore ownership" on by default. I have tried > disabling that, and updating the data directory to have owner + read/write > access for the postgres user. It seemed to work at first, but just now the > issue re-appeared again. > > Any other suggestions? I don't have any specific ideas and I have no idea what "ignore ownership" means ... what kind of filesystem is running on it? For the simple SSD, is it directly connected, running a normal Apple APFS filesystem, or something more complicated? I wonder if this could be related to the change in 16 which started to rename that file: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d8cd0c6c95c0120168df93aae095df4e0682a08a Did you ever run 15 or earlier on that system?
Re: How do you optimize the disk IO when you cannot assume a file will start at a boundary then?
On Thu, Feb 22, 2024 at 2:23 PM Siddharth Jain wrote: > I understand the storage layer in databases goes to great lengths to ensure: > - a row does not cross a block boundary > - read/writes/allocation happen in units of blocks > etc. The motivation is that at the OS level, it reads and writes pages > (blocks), not individual bytes. I am only concerned about SSDs but I think > the principle applies to HDD as well. > > but how can we do all this when we are not even guaranteed that the beginning > of a file will be aligned with a block boundary? refer this. Interesting question. I was aware of FFS/UFS fragments 'cause I've poked at that code a bit (one of my learn-to-hack-kernels side projects was to add O_DSYNC support), but not some of those others mentioned. I don't think they are a problem for anything PostgreSQL does. Even with O_DIRECT, FFS (at least in its FreeBSD descendant) just quietly skips its raw fast path if the I/O is not *sector* aligned, but otherwise quietly falls back to the usual buffered path; actual interesting sized tables wouldn't use that feature anyway, so really it's just an internal edge case space optimisation that we don't have to worry about. Generally, that family of systems interprets O_DIRECT to mean "'*try* to bypass buffers" anyway, so there is no way for it to bite us. On the other hand, it does perform pretty badly if you use logical blocks that are too large: that's why in src/tools/ci/gcp_freebsd_repartition.sh I set up a new partition with 8KB blocks for CI testing (I originally did that in a branch doing a lot more with O_DIRECT, where the slow down due to block mismatch is worse). I think we just have to call that an administrator's problem to worry about. I have idly wondered before about some way to set a preferred 'grain' size on a tablespace so that, for example, a parallel seq scan over a very large read-only compressed ZFS/BTRFS/whatever file with 128KB blocks could try to align the allocation of block ranges with those boundaries in mind, just as a sort of performance hint. Or perhaps read it directly from st_blksize[1]. Combined with the I/O merging thing I'm working on right now which can build up single 128KB preadv() calls, that'd allow parallel workers to line those up neatly, and avoid accessing the same underlying block from two processes, which might have advantages in terms of ZFS's decompression threads and other kinds of contentions. I haven't looked into the potential benefits of that though, that is just speculation... For direct I/O, which we're still working on bringing to PostgreSQL in useful form, we had to do this: commit faeedbcefd40bfdf314e048c425b6d9208896d90 Author: Thomas Munro Date: Sat Apr 8 10:38:09 2023 +1200 Introduce PG_IO_ALIGN_SIZE and align all I/O buffers. ... to avoid EINVAL errors, falling back to buffered mode or pathologically bad performance (depending on the system). It probably also helps buffered I/O performance for non-huge-pages a tiny bit (by touching the minimum number of memory pages). There is no industry standard to tell us what exactly is required for O_DIRECT, but we make sure addresses, offsets and lengths are all aligned to our macro PG_IO_ALIGN_SIZE, which is 4KB, because this is enough to satisfy all the systems we are aware of today (on some systems the true offset/length requirement is based on real or historical sectors, either 4KB or 512 bytes, which may be smaller than the filesystem block size, while on others it may be the filesystem block size which is usually some power-of-two multiple of those numbers). Since you can theoretically set PostgreSQL's data or WAL block size to less than that at compile time, we just don't let you turn direct I/O on in that case (perhaps we should, in case it does work, and just let it error out if it's going to? It's early days...). If the file system's block size is larger than that, you may have some issues, though, depending on the file system. Nobody standardised this stuff, and we certainly can't make any guarantees that anything we come up with will work on future systems, eg if someone invents 8KB sectors maybe it will need some work. That's OK, because direct I/O will likely always be a non-default optional mode, and many systems don't support it at all today. > Further, I don't see any APIs exposing I/O operations in terms of blocks. All > File I/O APIs I see expose a file as a randomly accessible contiguous byte > buffer. Would it not have been easier if there were APIs that exposed I/O > operations in terms of blocks? There are/were operating systems with block or record oriented I/O (eg IBM ones). Unix was about minimalism and simplicity and explicitly rejected that idea (first bullet point in [2]). Buffered, byte-oriented, synchronous I/O suited most applications. But yeah, not so much databases. Dr Stonebraker, who started this project, wrote a bunch of papers
Re: Query crash with 15.5 on debian bookworm/armv8
On Wed, Dec 27, 2023 at 5:17 AM Clemens Eisserer wrote: > > FWIW, since this crash is inside LLVM you could presumably dodge the bug > > by setting "jit" to off. > > Thanks, this indeed solved the crash. > Just to make sure this crash doesn't have anything to do with my > setup/config (I'd changed quite a few settings in postgresql.conf), > I gave it a try on a fresh bookworm install and it also crashed immeditaly. > > > As for an actual fix, perhaps a newer version of LLVM is needed? > > I don't see a problem testing this query on my RPI with Ubuntu 23.10 > > (LLVM 16). > > I also gave Ubuntu 23.10 a try (15.4 built with llvm-15) and it worked > as expected, explain analyze even mentioned the JIT was active. I can't reproduce this on LLVM 14 on an aarch64 Mac FWIW (after setting jit_*_cost to 0, as required since the table is empty). > I've filed a debian bug report with a link to this discussion and a > plea to build postgresql against llvm >= 15: > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059476 I doubt they'll change that, and in any case we'll need to get to the bottom of this. Perhaps an assertion build of LLVM will fail in some illuminating internal assertion? Unfortunately it's a non-trivial business to get a debug build of LLVM going (it takes oodles of disk and CPU and a few confusing-to-me steps)... . o O ( It would be wonderful if assertion-enabled packages were readily available for a common platform like Debian. I've finally been spurred on to reach out to the maintainer of apt.llvm.org to ask about that. It'd also be very handy for automated next-version monitoring. )
Re: How to generate random bigint
On Thu, Dec 21, 2023 at 7:21 PM Tom Lane wrote: > Phillip Diffley writes: > > Postgres's random() function generates a random double. That can be > > converted to a random int for smaller integers, but a double can't > > represent all of the values in a bigint. Is there a recommended way to > > generate a random bigint in Postgres? > > Doesn't look like there's anything directly exposed for that. > Since PG v13 you could use gen_random_uuid, if you're careful > to extract only bits that aren't fixed by the v4 UUID spec. > pgcrypto's pg_random_bytes() function offers another > some-assembly-required solution that'd work considerably > further back. Or you could make a custom C function that > leverages pg_strong_random(). Also pg_read_binary_file('/dev/urandom', 0, 8) could be useful (assuming you're on Unix) if you can figure out how to cast it...
Re: psql crash with custom build on RedHat 7
On Wed, Dec 20, 2023 at 4:41 AM Dominique Devienne wrote: > On Tue, Dec 19, 2023 at 2:02 PM Thomas Munro wrote: >> On Wed, Dec 20, 2023 at 1:39 AM Dominique Devienne >> wrote: >> > Program received signal SIGSEGV, Segmentation fault. >> > 0x004232b8 in slash_yylex () >> >> I think this might have to do with flex changing. Does it help if you >> "make maintainer-clean"? > > My colleague who did the custom build double-checked the flex/bison > requirements, > and the version of the packages on the RH7 machine he built on, and they > check out (see below). > > He also tells me he builds debug and release versions off different > workspaces/checkouts, > thus there are no remnants of previous builds, assuming that's what `make > maintainer-clean` is for. OK but be warned that if you're using tarballs, we shipped lexer remnants in the tree (until https://github.com/postgres/postgres/commit/721856ff, an interesting commit to read). The slash lexer is a kind of extension that (IIRC) shares the same PsqlScanState (opaque pointer to private lexer state), but if these two things are compiled to C by different flex versions, they may contain non-identical 'struct yyguts_t' (and even if the structs were identical, what the code does with them might still be incompatible, but I guess the struct itself would be a good first thing to look at along with the versions mentioned near the top of the .c): src/fe_utils/psqlscan.l -> psqlscan.c src/bin/psql/psqlscanslash.l -> psqlscanslash.c The usual "clean" doesn't remove those .c files in PG < 17, which means that if your pipeline involves tarballs but you finished up regenerating one of the files, or some other sequence involving different flex versions, you could get that. I've seen it myself on a few systems, a decade ago when I guess flex rolled out an incompatible change (maybe contemporaneous with RHEL7) and flex was upgraded underneath my feet. I remember that "maintainer-clean" (or maybe I'm misremembering and it was "distclean") fixed it.
Re: psql crash with custom build on RedHat 7
On Wed, Dec 20, 2023 at 1:39 AM Dominique Devienne wrote: > Program received signal SIGSEGV, Segmentation fault. > 0x004232b8 in slash_yylex () I think this might have to do with flex changing. Does it help if you "make maintainer-clean"?
Re: fsync data directory after DB crash
On Wed, Jul 19, 2023 at 2:09 PM Pandora wrote: > Yes, I saw the usage of syncfs in PG14, but it is recommended to use it on > Linux 5.8 or higher. If my OS version is lower than 5.8, can I still enable > it? Nothing stops you from enabling it, it's fairly ancient and should work. It just doesn't promise to report errors before Linux 5.8, which is why we don't recommend it, so you have to figure out the risks. One way to think about the risks: all we do is log the errors, but you could probably also check the kernel logs for errors. The edge cases around writeback failure are a tricky subject. If the reason we are running crash recovery is because we experienced an I/O error and PANIC'd before, then it's possible for recovery_init_sync_method=fsync to succeed while there is still phantom data in the page cache masquerading as "clean" (ie will never be sent to the disk by Linux). So at least in some cases, it's no better than older Linux's syncfs for our purposes. (I think the comment that Michael quoted assumes the default FreeBSD caching model: that cached data stays dirty until it's transferred to disk or the file system is force-removed, whereas the Linux model is: cached data stays dirty until the kernel has attempted to transfer it to disk just once, and then it'll report an error to user space one time (or, in older versions, sometimes fewer) and it is undefined (ie depends on file system) whether the affected data is forgotten from cache, or still present as phantom data that is bogusly considered clean. The reason this probably isn't a bigger deal than it sounds may be that "transient" I/O failures are probably rare -- it's more likely that a system with failing storage just completely self-destructs and you never reach these fun edge cases. But as database hackers, we try to think about this stuff... perhaps one day soon we'll be able to just go around this particular molehill with direct I/O.)
Re: fsync data directory after DB crash
On Wed, Jul 19, 2023 at 12:41 PM Michael Paquier wrote: > On Tue, Jul 18, 2023 at 04:50:25PM +0800, Pandora wrote: > > I found that starting from version 9.5, PostgreSQL will do fsync on > > the entire data directory after DB crash. Here's a question: if I > > have FPW = on, why is this step still necessary? > > Yes, see around the call of SyncDataDirectory() in xlog.c: > * - There might be data which we had written, intending to fsync it, but > * which we had not actually fsync'd yet. Therefore, a power failure in > * the near future might cause earlier unflushed writes to be lost, even > * though more recent data written to disk from here on would be > * persisted. To avoid that, fsync the entire data directory. FTR there was some discussion and experimental patches that would add recovery_init_sync_method=none and recovery_init_sync_method=wal, which are based on the OP's observation + an idea for how to make it work even without FPWs enabled: https://www.postgresql.org/message-id/flat/CA%2BhUKGKgj%2BSN6z91nVmOmTv2KYrG7VnAGdTkWdSjbOPghdtooQ%40mail.gmail.com#576caccf21cb6c3e883601fceb28d36b Only recovery_init_sync_method=syncfs actually went in from that thread. It works better for some setups (systems where opening squillions of files just do perform a no-op fsync() is painfully expensive).
Re: "PANIC: could not open critical system index 2662" - twice
On Tue, May 9, 2023 at 10:04 AM Tom Lane wrote: > Michael Paquier writes: > > One thing I was wondering about to improve the odds of the hits is to > > be more aggressive with the number of relations created at once, so as > > we are much more aggressive with the number of pages extended in > > pg_class from the origin database. > > Andres seems to think it's a problem with aborting a DROP DATABASE. > Adding more data might serve to make the window wider, perhaps. Here's an easy way: @@ -1689,6 +1689,14 @@ dropdb(const char *dbname, bool missing_ok, bool force) /* Close all smgr fds in all backends. */ WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE)); +/* XXX pretend one of the above steps got interrupted by a statement timeout or ^C */ +if (random() % 2 == 0) +{ +QueryCancelPending = true; +InterruptPending = true; +CHECK_FOR_INTERRUPTS(); +} postgres=# create database db2; CREATE DATABASE postgres=# drop database db2; ERROR: canceling statement due to user request $ psql db2 psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: PANIC: could not open critical system index 2662 $ od -t x1 base/17/2662 000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 010 $ od -t x1 base/17/2837 000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 004 $ od -t x1 base/17/2840 000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 010
Re: "PANIC: could not open critical system index 2662" - twice
On Mon, May 8, 2023 at 2:24 PM Michael Paquier wrote: > I can reproduce the same backtrace here. That's just my usual laptop > with ext4, so this would be a Postgres bug. First, here are the four > things running in parallel so as I can get a failure in loading a > critical index when connecting: That sounds like good news, but I'm still confused: do you see all 0s in the target database (popo)'s catalogs, as reported (and if so can you explain how they got there?), or is it regression that is corrupted in more subtle ways also involving 1s?
Re: "PANIC: could not open critical system index 2662" - twice
On Mon, May 8, 2023 at 4:10 AM Evgeny Morozov wrote: > On 6/05/2023 11:13 pm, Thomas Munro wrote: > > Would you like to try requesting FILE_COPY for a while and see if it > > eventually happens like that too? > Sure, we can try that. Maybe you could do some one way and some the other, so that we try to learn more? > Ubuntu 18.04.6 > Kernel 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 > x86_64 x86_64 x86_64 GNU/Linux > zfsutils-linux package version 0.7.5-1ubuntu16.12 amd64 I tried for a few hours to reproduce this by trying to make as many things as similar to yours as I could based on info in this thread (Linux: up-to-date Ubuntu 18.04 in vagrant which has nearly the same kernel 4.15.0-208-generic and a close but slightly different version of ancient ZFS 0.7.5-1ubuntu15, not sure why, ZFS: mirror (I used a pair of loopback files), recordsize=16kB, compression=lz4, PG: compiled from tag REL_15_2, data_checksums=on, full_page_writes=off, wal_recycle=off, wal_init_zero=off), with what I thought might be roughly what you're doing (creating three DBs, two clones of the first, with various modification at various points, with various overlapping activities, and then checking for catalog corruption). No cigar. Hrmph.
Re: "PANIC: could not open critical system index 2662" - twice
On Sun, May 7, 2023 at 1:21 PM Tom Lane wrote: > Thomas Munro writes: > > Did you previously run this same workload on versions < 15 and never > > see any problem? 15 gained a new feature CREATE DATABASE ... > > STRATEGY=WAL_LOG, which is also the default. I wonder if there is a > > bug somewhere near that, though I have no specific idea. > > Per the release notes I was just writing ... > > > > > Fix potential corruption of the template (source) database after > CREATE DATABASE with the STRATEGY > WAL_LOG option (Nathan Bossart, Ryo Matsumura) > Hmm. That bug seems to have caused corruption (backwards time travel) of blocks in the *source* DB's pg_class, by failing to write back changes. We seem to have zeroed pages in the *target* database, for all catalogs (apparently everything copied by RelationCopyStorageUsingBuffer()), even though the template is still fine. It is as if RelationCopyStorageUsingBuffer() created the zero-filed file with smgrextend(), but then the buffer data was never written out even though we memcpy'd it into the a buffer and set the buffer dirty. Bug-in-PostgreSQL explanations could include that we forgot it was dirty, or some backend wrote it out to the wrong file; but if we were forgetting something like permanent or dirty, would there be a more systematic failure? Oh, it could require special rare timing if it is similar to 8a8661828's confusion about permanence level or otherwise somehow not setting BM_PERMANENT, but in the target blocks, so I think that'd require a checkpoint AND a crash. It doesn't reproduce for me, but perhaps more unlucky ingredients are needed. Bug-in-OS/FS explanations could include that a whole lot of writes were mysteriously lost in some time window, so all those files still contain the zeroes we write first in smgrextend(). I guess this previously rare (previously limited to hash indexes?) use of sparse file hole-punching could be a factor in an it's-all-ZFS's-fault explanation: openat(AT_FDCWD,"base/16390/2662",O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC,0600) = 36 (0x24) openat(AT_FDCWD,"base/1/2662",O_RDWR|O_CLOEXEC,00) = 37 (0x25) lseek(37,0x0,SEEK_END) = 32768 (0x8000) lseek(37,0x0,SEEK_END) = 32768 (0x8000) pwrite(36,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,8192,0x6000) = 8192 (0x2000) <-- smgrextend(final block) lseek(36,0x0,SEEK_END) = 32768 (0x8000) I was trying to think about how I might go about trying to repro the exact system setup. Evgeny, do you mind sharing your "zfs get all /path/to/pgdata" (curious to see block size, compression settings, anything else etc) and your postgresql.conf? And your exact Ubuntu kernel version and ZFS package versions?
Re: "PANIC: could not open critical system index 2662" - twice
On Sun, May 7, 2023 at 10:23 AM Jeffrey Walton wrote: > This may be related... I seem to recall the GNUlib folks talking about > a cp bug on sparse files. It looks like it may be fixed in coreutils > release 9.2 (2023-03-20): > https://github.com/coreutils/coreutils/blob/master/NEWS#L233 > > If I recall correctly, it had something to do with the way > copy_file_range worked. (Or maybe, it did not work as expected). > > According to the GNUlib docs > (https://www.gnu.org/software/gnulib/manual/html_node/copy_005ffile_005frange.html): > > This function has many problems on Linux > kernel versions before 5.3 That's quite interesting, thanks (we've been talking about making direct use of copy_file_range() in a few threads, I'll definitely be looking into that history), but we don't currently use copy_file_range() or any coreutils stuff in the relevant code paths here -- this data is copied by plain old pread() and pwrite().
Re: "PANIC: could not open critical system index 2662" - twice
On Sun, May 7, 2023 at 12:29 AM Evgeny Morozov wrote: > On 6/05/2023 12:34 pm, Thomas Munro wrote: > > So it does indeed look like something unknown has replaced 32KB of > > data with 32KB of zeroes underneath us. Are there more non-empty > > files that are all-zeroes? Something like this might find them: > > > > for F in base/1414389/* > > do > > if [ -s $F ] && ! xxd -p $F | grep -qEv '^(00)*$' > /dev/null > > then > > echo $F > > fi > > done > > Yes, a total of 309 files are all-zeroes (and 52 files are not). > > I also checked the other DB that reports the same "unexpected zero page > at block 0" error, "test_behavior_638186280406544656" (OID 1414967) - > similar story there. I uploaded the lists of zeroed and non-zeroed files > and the ls -la output for both as > https://objective.realityexists.net/temp/pgstuff3.zip > > I then searched recursively such all-zeroes files in $PGDATA/base and > did not find any outside of those two directories (base/1414389 and > base/1414967). None in $PGDATA/global, either. So "diff -u zeroed-files-1414967.txt zeroed-files-1414389.txt" shows that they have the same broken stuff in the range cloned from the template database by CREATE DATABASE STRATEGY=WAL_LOG, and it looks like it's *all* the cloned catalogs, and then they have some non-matching relfilenodes > 140, presumably stuff you created directly in the new database (I'm not sure if I can say for sure that those files are broken, without knowing what they are). Did you previously run this same workload on versions < 15 and never see any problem? 15 gained a new feature CREATE DATABASE ... STRATEGY=WAL_LOG, which is also the default. I wonder if there is a bug somewhere near that, though I have no specific idea. If you explicitly added STRATEGY=FILE_COPY to your CREATE DATABASE commands, you'll get the traditional behaviour. It seems like you have some kind of high frequency testing workload that creates and tests databases all day long, and just occasionally detects this corruption. Would you like to try requesting FILE_COPY for a while and see if it eventually happens like that too? My spidey sense is leaning away from filesystem bugs. We've found plenty of filesystem bugs on these mailing lists over the years and of course it's not impossible, but I dunno... it seems quite suspicious that all the system catalogs have apparently been wiped during or moments after the creation of a new database that's running new PostgreSQL 15 code...
Re: "PANIC: could not open critical system index 2662" - twice
On Sat, May 6, 2023 at 9:58 PM Evgeny Morozov wrote: > Right - I should have realised that! base/1414389/2662 is indeed all > nulls, 32KB of them. I included the file anyway in > https://objective.realityexists.net/temp/pgstuff2.zip OK so it's not just page 0, you have 32KB or 4 pages of all zeroes. That's the expected length of that relation when copied from the initial template, and consistent with the pg_waldump output (it uses FPIs to copy blocks 0-3). We can't see the block contents but we know that block 2 definitely is not all zeroes at that point because there are various modifications to it, which not only write non-zeroes but must surely have required a sane page 0. So it does indeed look like something unknown has replaced 32KB of data with 32KB of zeroes underneath us. Are there more non-empty files that are all-zeroes? Something like this might find them: for F in base/1414389/* do if [ -s $F ] && ! xxd -p $F | grep -qEv '^(00)*$' > /dev/null then echo $F fi done
Re: "PANIC: could not open critical system index 2662" - twice
On Fri, May 5, 2023 at 7:50 PM Evgeny Morozov wrote: > The OID of the bad DB ('test_behavior_638186279733138190') is 1414389 and > I've uploaded base/1414389/pg_filenode.map and also base/5/2662 (in case > that's helpful) as https://objective.realityexists.net/temp/pgstuff1.zip Thanks. That pg_filenode.map looks healthy to me. tmunro@build1:~/junk $ od -t x1 pg_filenode.map 00017 27 59 00 11 00 00 00 eb 04 00 00 eb 04 00 00 020e1 04 00 00 e1 04 00 00 e7 04 00 00 e7 04 00 00 040df 04 00 00 df 04 00 00 14 0b 00 00 14 0b 00 00 06015 0b 00 00 15 0b 00 00 4b 10 00 00 4b 10 00 00 1004c 10 00 00 4c 10 00 00 82 0a 00 00 82 0a 00 00 12083 0a 00 00 83 0a 00 00 8f 0a 00 00 8f 0a 00 00 14090 0a 00 00 90 0a 00 00 62 0a 00 00 62 0a 00 00 16063 0a 00 00 63 0a 00 00 66 0a 00 00 66 0a 00 00 ... hex(2662) is 0xa66, and we see 63 0a 00 00 followed by 63 0a 00 00 in that last line as expected, so that rules out the idea that it's somehow trashed that map file and points to the wrong relation file. Next can you share the file base/1414389/2662? ("5" was from the wrong database.) > > Maybe you still have enough WAL if it happened recently? > > Maybe! What should I do with pg_waldump? I've never used it before. Try something like: pg_waldump -R 1663/1414389/2662 -F main 00010001 00010007 ... but change that to the range of files you have in your pg_wal.
Re: "PANIC: could not open critical system index 2662" - twice
On Fri, May 5, 2023 at 11:15 AM Thomas Munro wrote: > What does select > pg_relation_filepath('pg_class_oid_index') show in the corrupted > database, base/5/2662 or something else? Oh, you can't get that far, but perhaps you could share the pg_filenode.map file? Or alternatively strace -f PostgreSQL while it's starting up to see which file it's reading, just to be sure. One way to find clues about whether PostgreSQL did something wrong, once we definitely have the right relfilenode for the index, aside from examining its contents, would be to search the WAL for references to that block with pg_waldump. Maybe you still have enough WAL if it happened recently?
Re: "PANIC: could not open critical system index 2662" - twice
On Fri, May 5, 2023 at 11:15 AM Thomas Munro wrote: > Now *that* is a piece of > logic that changed in PostgreSQL 15. It changed from sector-based > atomicity assumptions to a directory entry swizzling trick, in commit > d8cd0c6c95c0120168df93aae095df4e0682a08a. Hmm. I spoke too soon, that only changed in 16. But still, it means there are two files that could be corrupted here, pg_filenode.map which might somehow be pointing to the wrong file, and the relation (index) main fork file.
Re: "PANIC: could not open critical system index 2662" - twice
On Fri, May 5, 2023 at 6:11 AM Evgeny Morozov wrote: > Meanwhile, what do I do with the existing server, though? Just try to > drop the problematic DBs again manually? That earlier link to a FreeBSD thread is surely about bleeding edge new ZFS stuff that was briefly broken then fixed, being discovered by people running code imported from OpenZFS master branch into FreeBSD main branch (ie it's not exactly released, not following the details but I think it might soon be 2.2?), but you're talking about an LTS Ubuntu release from 2018, which shipped "ZFS on Linux" version 0.7.5, unless you installed a newer version somehow? So it doesn't sound like it could be related. That doesn't mean it couldn't be a different ZFS bug though. While looking into file system corruption issues that had similar symptoms on some other file system (which turned out to be a bug in btrfs) I did bump into a claim that ZFS could product unexpected zeroes in some mmap coherency scenario, OpenZFS issue #14548. I don't immediately see how PostgreSQL could get tangled up with that problem though, as we aren't doing that... It seems quite interesting that it's always pg_class_oid_index block 0 (the btree meta-page), which feels more like a PostgreSQL bug, unless the access pattern of that particular file/block is somehow highly unusual compared to every other block and tickling bugs elsewhere in the stack. How does that file look, in terms of size, and how many pages in it are zero? I think it should be called base/5/2662. Oooh, but this is a relation that goes through RelationMapOidToFilenumber. What does select pg_relation_filepath('pg_class_oid_index') show in the corrupted database, base/5/2662 or something else? Now *that* is a piece of logic that changed in PostgreSQL 15. It changed from sector-based atomicity assumptions to a directory entry swizzling trick, in commit d8cd0c6c95c0120168df93aae095df4e0682a08a. Hmm.
Re: src/test/examples/testlibpq2.c where the HAVE_SYS_SELECT_H is defined.
On Fri, Mar 17, 2023 at 7:48 PM jian he wrote: > Hi, > playing around with $[0] testlibpq2.c example. I wondered where > HAVE_SYS_SELECT_H is defined? > > I searched on the internet, founded that people also asked the same question > in $[1]. > > In my machine, I do have . > system version: Ubuntu 22.04.1 LTS > gcc version: gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 > gcc compile command: gcc pg_testlibpq2.c -I/home/jian/postgres/pg16/include \ > -L/home/jian/postgres/pg16/lib -lpq > > [0]https://www.postgresql.org/docs/current/libpq-example.html > [1]https://stackoverflow.com/questions/37876850/in-compilation-time-how-to-find-the-macro-is-defined-in-which-header-file In 15 and earlier, it is defined in pg_config.h, which is created by configure. But in 16, that particular macro was removed by commit 7e50b4e3c. It looks like you are using PostgreSQL 16 sources, but looking at PostgreSQL 15 examples?
Re: 13.x, stream replication and locale(?) issues
On Wed, Mar 1, 2023 at 10:30 AM Thomas Munro wrote: > On Wed, Mar 1, 2023 at 12:09 AM Eugene M. Zheganin wrote: > > 3) how do I fix it ? Should I take locale sources for ru_RU.utf8 on Linux > > and compile it on FreeBSD - will it help ? > > Out of curiosity (I'm not saying it's a good idea!), do you know if > FreeBSD's localedef can compile glibc's collation definitions? In > theory they are in a format standardised by POSIX... I suspect there > may be extensions and quirks... Another thought: if you upgrade to 15, you could use ICU as the default collation provider, and then make sure you have the same major version of ICU on your Linux and FreeBSD systems (which probably means compiling from source on FreeBSD, as FreeBSD tends to have a newish ICU in packages, while popular stable Linux distributions tend to have a fairly old one). As far as I know, Linux and FreeBSD systems on amd64 arch should match in every other important detail already (they both follow the System V ABI, so there shouldn't be any layout differences in eg structs that are sent over the wire AFAIK). For what it's worth, for the future we are trying to figure out how to support multi-version ICU so that you could explicitly set the provider to ICU v72 to get that sort of thing working across OS versions and even "close enough" cousins like your case, without having to compile anything from source, but unfortunately we didn't manage to get it ready in time for 16.
Re: Interval in hours but not in days Leap second not taken into account
On Mon, Feb 27, 2023 at 8:26 PM PALAYRET Jacques wrote: > # PostgreSQL does not take into account the additional second (leap second) > in some calendar days ; eg. 2016, 31 dec. : > SELECT to_timestamp('20170102 10:11:12','mmdd hh24:mi:ss') - > to_timestamp('20161230 00:00:00','mmdd hh24:mi:ss') intervalle ; >intervalle > - > 3 days 10:11:12 Bonjour Jacques, Just for fun: postgres=# SELECT utc_to_tai(to_timestamp('20170102 10:11:12','mmdd hh24:mi:ss')) - utc_to_tai(to_timestamp('20161230 00:00:00','mmdd hh24:mi:ss')) intervalle; intervalle - 3 days 10:11:13 (1 row) PostgreSQL could, in theory, provide built-in UTC/TAI conversions functions using a leap second table that would be updated in each minor release, considering that the leap second table is included in the tzdata package that PostgreSQL vendors (ie includes a copy of), but it doesn't do anything like that or know anything about leap seconds. Here's a quick and dirty low technology version of the above: CREATE TABLE leap_seconds (time timestamptz primary key, off int); -- refresh leap second table from ietf.org using not-very-secure hairy shell code BEGIN; CREATE TEMP TABLE import_leap_seconds (s int8, off int); COPY import_leap_seconds FROM PROGRAM 'curl -s https://www.ietf.org/timezones/data/leap-seconds.list | grep -v ''^#'' | cut -f1,2'; TRUNCATE TABLE leap_seconds; INSERT INTO leap_seconds (time, off) SELECT '1900-01-01 00:00:00Z'::timestamptz + interval '1 second' * s, off FROM import_leap_seconds; DROP TABLE import_leap_seconds; COMMIT; CREATE OR REPLACE FUNCTION leap_seconds_before_utc_time(t timestamptz) RETURNS int STRICT LANGUAGE SQL AS $$ SELECT off FROM leap_seconds WHERE time <= t ORDER BY time DESC FETCH FIRST ROW ONLY $$; CREATE OR REPLACE FUNCTION utc_to_tai(t timestamptz) RETURNS timestamptz STRICT LANGUAGE SQL AS $$ SELECT t + interval '1 second' * coalesce(leap_seconds_before_utc_time(t), 0); $$; CREATE OR REPLACE FUNCTION tai_to_utc(t timestamptz) RETURNS timestamptz STRICT LANGUAGE SQL AS $$ SELECT t - interval '1 second' * coalesce(leap_seconds_before_utc_time(t), 0); $$;
Re: 13.x, stream replication and locale(?) issues
On Wed, Mar 1, 2023 at 12:09 AM Eugene M. Zheganin wrote: > 3) how do I fix it ? Should I take locale sources for ru_RU.utf8 on Linux and > compile it on FreeBSD - will it help ? Out of curiosity (I'm not saying it's a good idea!), do you know if FreeBSD's localedef can compile glibc's collation definitions? In theory they are in a format standardised by POSIX... I suspect there may be extensions and quirks... At a wild guess, since the data you showed doesn't even look like it contains non-ASCII characters (it looks like machine readable identifiers or something, and perhaps its the sort order of '-' that is causing you trouble), so it might also be possible to use "ucs_basic" locale for that column and then all computers will agree on the sort order, but of course that doesn't address the more general problem; presumably you might also have Russian language text in your system too. As for ".utf8" vs ".UTF-8", which one is selected by initdb as the database default seems to be something that varies between Linux distributions, so I guess maybe the installers use different techniques for discovering and selecting default locale names. Unlike glibc, FreeBSD doesn't do any name mangling at all when mapping LC_COLLATE to a pathname to find the file, whereas glibc downcases and removes '-' so you can find both formats of name in the various places...
Re: Getting PSQL in Windows to support TAB/Autocomplete via modified readline...
On Tue, Nov 22, 2022 at 4:25 AM Kirk Wolak wrote: > In researching this problem, it appears that the decision was made like > 17yrs ago, when windows did not have a realistic "terminal" type interface. > Assuming we target Windows 8.1 or higher, I believe this goes away. FWIW PostgreSQL 16 will require Windows 10+. Not a Windows user myself, but I read somewhere that Windows 8.1 has already started showing full screen warnings at startup that security patches etc end in January, while PostgreSQL 16 (the soonest release that could include your changes) is expected late in the year. > What we are looking for is a simple bullet point list of what would make > such a contribution acceptable. And how far should we get through that list > on our own, before getting some help, especially from the build teams? [Our > goal would be an NEW Config type flag: READLINE_FOR_WINDOWS (you guys name > it, and control the default setting)] Some thoughts: Re configuration flags: don't waste time with the old perl-based build system. The configuration should be done only with the new meson build system (soon to be the only way to build on Windows). I didn't quite understand if you were saying that readline itself needs patches for this (I gather from earlier threads about this that there were some problems with dll symbol export stuff, so maybe that's it?). In passing, I noticed that there is also a Windows port of editline AKA libedit, the BSD-licensed alternative to readline. It has a compatible API and PostgreSQL can use that too. I have no idea which is easier, more supported, etc on Windows. It's OK to post a work-in-progress patch to pgsql-hackers, even if it doesn't work right yet. With any luck, people will show up to help with problems. I am 100% sure that our Windows user community would love this feature. It would be good if the tests in src/bin/psql/t/010_tab_completion.pl pass on Windows, but if that's hard, don't let that stop you sharing a patch.
Re: PANIC: could not flush dirty data: Cannot allocate memory
On Wed, Nov 16, 2022 at 1:24 AM wrote: > Filesystem is ext4. VM technology is mixed: VMware, KVM and XEN PV. > Kernel is 5.15.0-52-generic. > > We have not seen this with Ubutnu 18.04 and 20.04 (although we might not > have noticed it). > > I guess upgrading to postgresql 13/14/15 does not help as the problem > happens in the kernel. > > Do you have any advice how to go further? Shall I lookout for certain > kernel changes? In the kernel itself or in ext4 changelog? It'd be good to figure out what is up with Linux or tuning. I'll go write a patch to reduce that error level for non-EIO errors, to discuss for the next point release. In the meantime, you could experiment with setting checkpoint_flush_after to 0, so the checkpointer/bgwriter/other backends don't call sync_file_range() all day long. That would have performance consequences for checkpoints which might be unacceptable though. The checkpointer will fsync relations one after another, with less I/O concurrency. Linux is generally quite lazy at writing back dirty data, and doesn't know about our checkpointer's plans to fsync files on a certain schedule, which is why we ask it to get started on multiple files concurrently using sync_file_range(). https://www.postgresql.org/docs/15/runtime-config-wal.html#RUNTIME-CONFIG-WAL-CHECKPOINTS
Re: PANIC: could not flush dirty data: Cannot allocate memory
On Tue, Nov 15, 2022 at 10:54 AM Christoph Moench-Tegeder wrote: > ## klaus.mailingli...@pernau.at (klaus.mailingli...@pernau.at): > > On several servers we see the error message: PANIC: could not flush > > dirty data: Cannot allocate memory > Of these three places, there's an sync_file_range(), an posix_fadvise() > and an msync(), all in src/backend/storage/file/fd.c. "Cannot allocate > memory" would be ENOMEM, which posix_fadvise() does not return (as per > it's docs). So this would be sync_file_range(), which could run out > of memory (as per the manual) or msync() where ENOMEM actually means > "The indicated memory (or part of it) was not mapped". Both cases are > somewhat WTF for this setup. It must be sync_file_range(). The others are fallbacks that wouldn't apply on a modern Linux. It has been argued before that we might have been over-zealous applying the PANIC promotion logic to sync_file_range(). It's used to start asynchronous writeback to make the later fsync() call fast, so it's "only a hint", but I have no idea if it could report a writeback error from the kernel that would then be consumed and not reported to the later fsync(), so I defaulted to assuming that it could.
Re: Segmentation Fault PG 14
On Tue, Nov 8, 2022 at 11:45 AM Willian Colognesi wrote: > root@ip-10-x-x-x:/home/ubuntu# pg_config --configure > ... --with-extra-version= (Ubuntu 14.5-2.pgdg20.04+2)' ... > ... '--with-llvm' 'LLVM_CONFIG=/usr/bin/llvm-config-10' ... > There is no llvm installed on ubuntu server, postgresql was installed via apt > package `apt install postgresql-14` We can see from the pg_config output that it's built with LLVM 10. Also that looks like it's the usual pgdg packages which are certainly built against LLVM and will install it automatically.
Re: Strange collation names ("hu_HU.UTF-8")
On Wed, Aug 3, 2022 at 1:43 AM Tom Lane wrote: > I believe most if not all variants of Unix are > permissive about the spelling of the encoding part. I've only seen glibc doing that downcase-and-strip-hyphens thing to the codeset part of a locale name when looking for locale definition files. Other systems like FreeBSD expect to be able to open /usr/share/locale/$LC_COLLATE/LC_COLLATE directly without any kind of munging. On a Mac it's probably a little fuzzy because the filenames are case insensitive...
Re: AIX and EAGAIN on open()
On Mon, Jun 20, 2022 at 9:53 PM Christoph Berg wrote: > IBM's reply to the issue back in December 2020 was this: > > The man page / infocenter document is not intended as an exhaustive > list of all possible error codes returned and their circumstances. > "Resource temporarily unavailable" may also be returned for > O_NSHARE, O_RSHARE with O_NONBLOCK. > > Afaict, PG does not use these flags either. > > We also ruled out that the system is using any anti-virus or similar > tooling that would intercept IO traffic. > > Does anything of that ring a bell for someone? Is that an AIX bug, a > PG bug, or something else? No clue here. Anything unusual about the file system (NFS etc)? Can you truss/strace the system calls, to sanity check the flags arriving into open(), and see if there's any unexpected other activity around open() calls that might be coming from something you're linked against?
Re: Order of rows in statement triggers NEW/OLD tables
On Fri, May 6, 2022 at 6:20 PM hubert depesz lubaczewski wrote: > when defining statement triggers on update I can use: > > REFERENCING OLD TABLE AS xxx NEW TABLE as YYY > > these "pseudo" tables contain rows that were before and after. > > Is the order guaranteed? > > Can I assume that "first" row returned by select from xxx, will be older > version of first row returned by select from yyy? Good question, and one I've wondered about before. I *think* that is currently true, due to implementation details, but it could change. The trigger code fills up both tuplestores (= memory/disk containers for transition tables) in sync with each other in AfterTriggerSaveEvent(), and then NamedTuplestoreScan node emits tuples in insertion order. We already banned the use of transition tables when there is "ON UPDATE OR INSERT" etc so you can't get mixed INSERT/UPDATE/DELETE results which would desynchronise the old and new tuples, and I also wondered if something tricky like FK ON DELETE CASCADE/SET NULL in a self-referencing table could mix some old-with-no-new into UPDATE results, but I can't see how to do that, and I can't think of any other way off the top of my head. Of course, joins etc could destroy the order higher in your query plan. While reading about proposed incremental materialized view ideas, I once idly wondered if it'd be useful, as an extension to the standard, to be able to use WITH ORDINALITY for transition tables (normally only used for unnest(), and in PG also any SRF) and then you could use that + ORDER BY to explicitly state your desired order (and possibly teach the planner that ORDINALITY attributes are path keys by definition so it doesn't need to insert sort nodes in simple cases). That is, instead of relying on scan order. In any case, an in-core IMV feature is allowed to peek deeper into the implementation and doesn't even need SQL here, so I didn't follow that thought very far... I am not sure about this, but I wonder if any user-level portable-across-SQL-implementation user-level scheme for replication/materialization built on top of trigger transition tables would need to require immutable unique keys in the rows in order to be able match up before/after tuples.
Re: Improve configurability for IO related behavoir
On Sun, May 29, 2022 at 4:29 AM 浩辰 何 wrote: > Furthermore, the results above are also related to IO API supported by OS. > MySQL support synchronized IO and Linux libaio. It seems > that PostgreSQL only supports synchronized IO, so shall we support more IO > engines? like io_uring which is very popular in recent years. Hi Haochen, There is an active project to bring these things to PostgreSQL. https://wiki.postgresql.org/wiki/AIO has some information and links. The short version is that there is a development patch set to add these GUCs: io_method=worker,io_uring,posix_aio,iocp,... io_data_direct=on,off io_wal_direct=on,off It also adds a bunch of specialised logic that knows how to initiate IO in key places (scans, writeback, recovery, ...), because it's not enough to just turn off kernel I/O buffering, we also have to do all the work the kernel is doing for us or performance will be terrible.
Re: Pg14 possible index corruption after reindex concurrently
On Wed, May 25, 2022 at 6:17 AM Aleš Zelený wrote: > SELECT format('REINDEX SCHEMA CONCURRENTLY %I;', n.nspname) This may be related to bug #17485, discussed at: https://www.postgresql.org/message-id/flat/17485-396609c6925b982d%40postgresql.org
Re: SELECT creates millions of temp files in a single directory
On Sun, Apr 24, 2022 at 8:00 AM Peter wrote: > More than a million files in a single directory, this is > inacceptable. You didn't show EXPLAIN (ANALYZE) but if [Parallel] Hash is making insane numbers of temporary files then something is not working as intended... and I can take a guess at what it is. I tried to recreate this with randomly distributed file.pathid, same size tables as you, and I got 32 batches and a nice manageable number of temporary files. Adding millions of extra file rows with duplicate pathid=42 gets me something like "Batches: 524288 (originally 32)" in EXPLAIN (ANALYZE) output. I guess that's the sort of distribution you have here? Extensive discussion of the cause of that and potential (hard) solutions over here: https://www.postgresql.org/message-id/flat/CA%2BhUKGKWWmf%3DWELLG%3DaUGbcugRaSQbtm0tKYiBut-B2rVKX63g%40mail.gmail.com To summarise, when the hash table doesn't fit in work_mem, then we "partition" (spill part of the data to disk) to make twice as many (hopefully) smaller hash tables that do fit. Sometimes partitoning produces one or more hash tables that are still too big because of uneven key distribution, so we go around again, doubling the number of partitions (and thus temporary files) every time. I would say that once we get past hundreds of partitions, things are really turning to custard (file descriptors, directory entries, memory overheads, ... it just doesn't work well anymore), but currently we don't give up until it becomes very clear that repartitioning is not helping. This algorithmic problem existed before parallelism was added, but it's possible that the parallel version of the meltdown hurts a lot more (it has extra per-process files, and in multi-attempt scenarios the useless earlier attempts hang around until the end of the query instead of being cleaned up sooner, which doubles the number of files). Hopefully that gives some clues about how one might rewrite the query to avoid massive unsplittable set of duplicate keys in hash tables, assuming I'm somewhere close to the explanation here (maybe some subselect with UNIQUE or GROUP BY in it, or some way to rewrite the query to avoid having the problematic duplicates on the "inner" side, or completely different plan..., or just crank up work_mem massively for this query so that you don't need a partition step at all) . Obviously it would be nice for PostgreSQL to be hardened against this risk, eventually, though. As for merge join planning, unfortunately they aren't fully parallelisable and in the plan you showed, a separate copy of the big sort runs in every process, which isn't nice (we don't have a Parallel Sort that could fix that yet).
Re: frequency of positive and negative numbers, sizes of numbers and frequency of alteration of polarity
On Fri, Feb 18, 2022 at 10:42 AM Shaozhong SHI wrote: > Given 2 or more such columns, is there any measure that can be calculated to > tell which one alternates more than others? Well, you could report non-flips as NULL and flips as magnitude, and then wrap that query in another query to compute whatever statistical properties you need... and you could have multiple columns so you're computing those numbers for each input column... I was mainly trying to point out the LAG() facility, which lets you compare a row with the preceding row, according to some sort order, which I think you'd want to build your query on top of. Hope that helps... postgres=# with flips as (select time, value, case when sign(lag(value) over (order by time)) != sign(value) then abs(lag(value) over (order by time) - value) end as flip_magnitude from time_series) select count(flip_magnitude) as num_flips, avg(flip_magnitude) as avg_magnitude from flips; count | avg ---+- 2 | 14. (1 row)
Re: frequency of positive and negative numbers, sizes of numbers and frequency of alteration of polarity
On Fri, Feb 18, 2022 at 9:11 AM Shaozhong SHI wrote: > How to calculate frequency of positive and negative numbers and define and > calculate frequency of alteration of polarity? > > Surely, we can use frequency of alteration of polarity and level of change > (e.g., size of positive and negative numbers) to measure degree and frequency > of alteration. > > Any ideas in doing so in postgres tables' columns full of positive and > negative numbers? Window functions might be useful to detect polarity changes: postgres=# create table time_series (time int, value int); CREATE TABLE postgres=# insert into time_series values (1, -5), (2, -5), (3, 10), (4, -3); INSERT 0 4 postgres=# select time, value, sign(lag(value) over (order by time)) != sign(value) as flipped from time_series; time | value | flipped --+---+- 1 |-5 | 2 |-5 | f 3 |10 | t 4 |-3 | t (4 rows)
Re: Compile 14.1 in EL5.8
On Thu, Feb 10, 2022 at 2:23 PM Tom Lane wrote: > Thomas Munro writes: > > ... I wondered about also removing the leftover comment > > "We assume that any system that has Linux epoll() also has Linux > > signalfd()" which was my attempt to explain that there wasn't a > > separate configure check for signalfd.h, but I guess the sentence is > > still true in a more general sense, so we can just leave it there. > > Oh, I didn't notice that comment, or I probably would have tweaked it. > Perhaps along the line of "there are too few systems that have epoll > and not signalfd to justify maintaining a separate code path"? WFM, though I remain a little unclear on whether our support policy is stochastic or declarative :-D
Re: Compile 14.1 in EL5.8
On Thu, Feb 10, 2022 at 3:38 AM Tom Lane wrote: > Gabriela Serventi writes: > > Hi! I'm trying to compile release 14.1 in a very old Linux Server (Red Hat > > Enterprise Linux Server release 5.8). I can run configure successfully, but > > when I try to run make, I get the following error: > > latch.c:85:26: error: sys/signalfd.h: No such file or directory > > It looks like since 6a2a70a02, latch.c effectively uses HAVE_SYS_EPOLL_H > to decide whether it can include , which seems not too > well thought out. A proper fix will require another configure check, > but you could hack it by adding -DWAIT_USE_POLL to CPPFLAGS in > src/Makefile.global after running configure. I mean, I did think about it, but I thought it wasn't worth the configure cycles to detect EOL'd kernels manufactured 2005-2007, since we've said before that we don't support RHEL5 or kernels that old[1]. My primary goal was not to have to write the support for the epoll/self-pipe combination with zero coverage and zero users. Apparently I was off by one on the users, and since it's so easy to fall back to poll() I agree that we might as well! BTW after seeing your commit today I wondered about also removing the leftover comment "We assume that any system that has Linux epoll() also has Linux signalfd()" which was my attempt to explain that there wasn't a separate configure check for signalfd.h, but I guess the sentence is still true in a more general sense, so we can just leave it there. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGKL_%3DaO%3Dr30N%3Ds9VoDgTqHpRSzePRbA9dkYO7snc7HsxA%40mail.gmail.com
Re: sort order for UTF-8 char column with Japanese UTF-8
On Fri, Feb 4, 2022 at 8:11 AM Matthias Apitz wrote: > On my FreeBSD laptop the same file sorts as > > guru@c720-r368166:~ $ LANG=de_DE.UTF-8 sort swd > A > ゲアハルト・A・リッター > ゲルハルト・A・リッター > チャールズ・A・ビアード > A010STRUKTUR > A010STRUKTUR > A010STRUKTUR > A0150SUPRALEITER Wow, so it's one thing to have a different default "script order" than glibc and ICU (which is something you can customise IIRC), but isn't something broken here if the Japanese text comes between "A" and "A0..."?? Hmm, it's almost as if it completely ignored the Japanese text. From my FreeBSD box: tmunro=> select * from t order by x collate "de_DE.UTF-8"; x -- ゲアハルト A ゲアハルト・A・リッター A0 A010STRUKTUR AA ゲアハルト・AA・リッター ゲアハルト・B・リッター (8 rows) tmunro=> select * from t order by x collate "ja_JP.UTF-8"; x -- A A0 A010STRUKTUR AA ゲアハルト ゲアハルト・AA・リッター ゲアハルト・A・リッター ゲアハルト・B・リッター (8 rows) Seems like something to investigate in FreeBSD land.
Re: could not accept SSL connection: Success
On Thu, Jan 20, 2022 at 12:06 AM Carla Iriberri wrote: > On Wed, Jan 19, 2022 at 5:42 AM Michael Paquier wrote: >> "On an unexpected EOF, versions before OpenSSL 3.0 returned >> SSL_ERROR_SYSCALL, nothing was added to the error stack, and errno was >> 0. Since OpenSSL 3.0 the returned error is SSL_ERROR_SSL with a >> meaningful error on the error stack." > Thanks, Michael, that's it, indeed! I had missed that part of the > OpenSSL docs. These PG instances are running on Ubuntu Focal hosts that come > with OpenSSL 1.1.1. Good news, I'm glad they nailed that down. I recall that this behaviour was a bit of a moving target in earlier versions: https://www.postgresql.org/message-id/CAEepm%3D3cc5wYv%3DX4Nzy7VOUkdHBiJs9bpLzqtqJWxdDUp5DiPQ%40mail.gmail.com
Re: create database hangs forever on WSL - autovacuum deadlock?
On Sun, Jan 9, 2022 at 2:15 PM Alicja Kucharczyk wrote: > sob., 8 sty 2022 o 22:40 Thomas Munro napisał(a): >> On Sat, Jan 8, 2022 at 8:11 PM Alicja Kucharczyk >> wrote: >> > this is postgres running on windows wsl: PostgreSQL 14.1 (Ubuntu >> > 14.1-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu >> > 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit >> >> Is this WSL1 (some kind of Linux system call emulator running on an NT >> kernel) or WSL2 (some kind of virtual machine running a Linux kernel)? > > > lookd like WSL1: > NAME STATE VERSION > * UbuntuRunning 1 I don't think you're going to like this answer. In PostgreSQL 14, we started using a signalfd in an epoll set (multiplexed with other fds for sockets etc) to receive "latch wakeups". This works pretty well on a real Linux kernel, but apparently it is not emulated well enough to work on WSL1. I don't know the exact reason why, but if someone can figure it out I'd be interested in seeing if we can fix it (though I thought everyone moved to WSL2 which has none of these problems?). There is a WSL1 machine in the build farm, but it's using -DWAIT_USE_POLL to avoid this problem; that's useless if you're trying to run with stock PostgreSQL packages from Ubuntu or whatever, though, it only helps if you compile from source. For what it's worth, running PostgreSQL compiled for Windows using Wine on a Unix system also fails in various ways. It turns out that emulating a foreign kernel is super hard... Previous discussion: https://www.postgresql.org/message-id/flat/CAEP4nAymAZP1VEBNoWAQca85ZtU5YxuwS95%2BVu%2BXW%2B-eMfq_vQ%40mail.gmail.com
Re: create database hangs forever on WSL - autovacuum deadlock?
On Sat, Jan 8, 2022 at 8:11 PM Alicja Kucharczyk wrote: > this is postgres running on windows wsl: PostgreSQL 14.1 (Ubuntu > 14.1-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu > 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit Is this WSL1 (some kind of Linux system call emulator running on an NT kernel) or WSL2 (some kind of virtual machine running a Linux kernel)?
Re: Need to know more about pg_test_fsync utility
On Mon, Dec 13, 2021 at 3:04 PM PGSQL DBA wrote: > As you mentioned in question-8, "I'd investigate whether data is being cached > unexpectedly, perhaps indicating that committed transactions be lost in a > system crash event." So, I would like to know that if we configure the disk > for the WALs with read+write disk cache then will it create any performance > issue and show the attached output? Which OS and file system are you running and what kind of storage system is it? If you can turn write caching on and off in your storage layer without affecting the ops/sec that's a pretty good clue that it's respecting the cache control commands that the OS sends. The numbers you showed are reassuringly low. Hmm, I wonder why you have such a low number for non-sync'd writes (the last line). I have a concrete example of using this tool to learn something about an unfamiliar-to-me operating system, from this morning: I saw a system that reported ~6k IOPS for open_datasync and only ~600 IOPS for fdatasync. With some limited knowledge of file systems I expect the latter to involve sending a SYNCHRONIZE CACHE command to flush device caches, and the former to do that, or use a more efficient FUA flag to flush just particular writes and not the whole write cache. I didn't expect it to go 10x faster, so something had to be wrong. With some more investigation and reading I learned that the storage drivers I was using do indeed ignore FUA cache control flags, so that wal_sync_method is not crash safe; if you lose power and come back up, you might lose a bunch of committed transactions from the end of the WAL. I turned off write caches in the storage device, and then open_datasync became a lot slower, matching my expectations. These are primitive tools, but can help to check some assumptions... Obligatory remark: PostgreSQL 9.5 is out of support, please see https://www.postgresql.org/support/versioning/. > I also would like to know is there any best Practice from PostgreSQL which > mentions what is the disk latency required for the WAL & DATA disk? No particular latency is required by PostgreSQL, and I don't have a general answer to this. Is it fast enough for you? What sort of workload is it running, OLTP, reporting, ...?
Re: Need to know more about pg_test_fsync utility
On Fri, Dec 10, 2021 at 3:20 PM PGSQL DBA wrote: > 1) How to interpret the output of pg_test_fsync? The main interesting area is probably the top section that compares the different wal_sync_method settings. For example, it's useful to verify the claim that fdatasync() is faster than fsync() (because it only flushes data, not meta-data like file modified time). It may also be useful for measuring the effects of different caching settings on your OS and storage. Unfortunately open_datasync is a bit misleading; we don't actually use O_DIRECT with open_datasync anymore, unless you set wal_level=minimal, which almost nobody ever does. > 2) What is the meaning of ops/sec & usecs/op? Number of times it managed to flush data to disk per second sequentially, and the same information expressed as microseconds per flush. > 3) How does this utility work internally? It just does a loop over some system calls, or to be more precise, https://github.com/postgres/postgres/blob/master/src/bin/pg_test_fsync/pg_test_fsync.c > 4) What is the IO pattern of this utility? serial/sequence IO or Multiple > thread with Parallel IO? Sequential, no threads. > 5) Can we change the testing like FIO with multiple threads and parallel IO? Nope. This is a simple tool. Fio is much more general and useful. > 6) How a commit happened in the background while executing this utility? Nothing happens in the background, it uses synchronous system calls from one thread. > 7) How can we use this tool to measure the I/O issue? It's a type of micro-benchmark that gives you an idea of a sort of baseline you can expect from a single PostgreSQL session committing to the WAL. > 8) In which area or section in the output do we need to focus while > troubleshooting I/O issues? If PostgreSQL couldn't commit small sequential transactions about that fast I'd be interested in finding out why, and if fdatasync is performing faster than published/device IOPS suggest should be possible then I'd investigate whether data is being cached unexpectedly, perhaps indicating that committed transactions be lost in a system crash event. > 9) What is the meaning of “Non-sync’ed 8kB writes? Calling the pwrite() system call, which writes into your operating system's page cache but (usually) doesn't wait for any I/O. Should be somewhere north of 1 million/sec.
Re: Wrong sorting on docker image
On Sun, Oct 17, 2021 at 4:42 AM Tom Lane wrote: > Speaking of ICU, if you are using an ICU-enabled Postgres build, > maybe you could find an ICU collation that acts the way you want. > This wouldn't be a perfect solution, because we don't yet have > the ability to set an ICU collation as a database's default. > But you can attach ICU collations to individual text columns, > and maybe that would be a good enough workaround. For what it's worth, ICU's "ru-RU-x-icu" and FreeBSD's libc agree with glibc on these sort orders, so I suspect this might be coming from CLDR/UCA/DUCET/ISO 14651 common/synchronised data. It does look quite suspicious to me, but I don't know Russian and I'm only speculating wildly here: it does look as if ё is perhaps getting a lower weight than it should. That said, it seems strange that something so basic should be wrong. Nosing around in the unicode.org issue tracker, it seems as though some people might think there is something funny about Ё (and I wonder if there are/were similar issues with й/Й): https://unicode-org.atlassian.net/browse/CLDR-2745?jql=text%20~%20%22%D0%81%22 https://unicode-org.atlassian.net/browse/CLDR-1974?jql=text%20~%20%22%D0%81%22 (and more) It's probably not a great idea, but for the record, you can build your own collation for glibc and other POSIX-oid systems. For example, see glibc commit 159738548130d5ac4fe6178977e940ed5f8cfdc4, where they previously had customisations on top of the iso14651_t1 file to reorder a special Ukrainian character in ru_RU, so in theory you could reorder ё/Ё with a similar local hack and call it ru_RU_X... I also wonder if there is some magic switch you can put after an @ symbol on ICU collations that would change this, perhaps some way to disable the "contractions" that are potentially implicated here. Not sure.
Re: To JIT (or not!) in Ubuntu packages
On Wed, Sep 15, 2021 at 3:30 PM Ben Chobot wrote: > So I've installed > http://apt.postgresql.org/pub/repos/apt/pool/14/p/postgresql-14/postgresql-14_14~beta3-1.pgdg18.04+1_arm64.deb, > after which I see: Ahhh, so you're on 18.04, an old LTS. I remember now, there was this issue in LLVM 3.9 on that aarch64 (which I saw on Debian 9 during testing of the JIT stuff), resolved by later LLVM versions (maybe 6?): https://www.postgresql.org/message-id/flat/CAEepm%3D0HqkxWk2w8N2nXQXC_43Mucn-v%3D8QdY8vOG5ojo9kJRA%40mail.gmail.com#5d656391e0b1d49d3e577b7a41e69b7c I don't know Ubuntu versioning policies etc, but maybe something to do with which LLVM versions it's allowed to depend on?
Re: To JIT (or not!) in Ubuntu packages
On Tue, Sep 14, 2021 at 10:11 AM Ben Chobot wrote: > We've noticed that the Ubuntu postgresql-12 package has --with-llvm > enabled on x86_64, but not on aarch64. Does anybody know if this was > intentional, or just an oversight? > > For what it's worth, it seems the beta postgresql-14 package for Ubuntu > still doesn't have --with-llvm. > > I'm not sure if this explains why our aarch64 DBs are missing a > jit_provider GUC variable, but I expect it does explain why > pg_jit_available() tells me no. Hmm. No Ubuntu here and I don't know the answer (CC Christoph). Can you show exactly where the package is coming from, what pg_config outputs. For what it's worth, it does look like it's enabled in the Ubuntu aarch64 packages from apt.postgresql.org, according to a quick look using caveman techniques: $ curl -sO http://apt.postgresql.org/pub/repos/apt/pool/14/p/postgresql-14/postgresql-14_14~beta3-1.pgdg+1_arm64.deb $ ar x postgresql-14_14~beta3-1.pgdg+1_arm64.deb $ tar xf data.tar.xz $ strings usr/lib/postgresql/14/bin/postgres | grep 'with-llvm' '--build=aarch64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/aarch64-linux-gnu' '--runstatedir=/run' '--disable-maintainer-mode' '--disable-dependency-tracking' '--with-icu' '--with-tcl' '--with-perl' '--with-python' '--with-pam' '--with-openssl' '--with-libxml' '--with-libxslt' 'PYTHON=/usr/bin/python3' '--mandir=/usr/share/postgresql/14/man' '--docdir=/usr/share/doc/postgresql-doc-14' '--sysconfdir=/etc/postgresql-common' '--datarootdir=/usr/share/' '--datadir=/usr/share/postgresql/14' '--bindir=/usr/lib/postgresql/14/bin' '--libdir=/usr/lib/aarch64-linux-gnu/' '--libexecdir=/usr/lib/postgresql/' '--includedir=/usr/include/postgresql/' '--with-extra-version= (Debian 14~beta3-1.pgdg+1)' '--enable-nls' '--enable-thread-safety' '--enable-tap-tests' '--enable-debug' '--enable-dtrace' '--enable-cassert' '--disable-rpath' '--with-uuid=e2fs' '--with-gnu-ld' '--with-lz4' '--with-pgport=5432' '--with-system-tzdata=/usr/share/zoneinfo' '--with-llvm' 'LLVM_CONFIG=/usr/bin/llvm-config-11' 'CLANG=/usr/bin/clang-11' '--with-systemd' '--with-selinux' 'MKDIR_P=/bin/mkdir -p' 'PROVE=/usr/bin/prove' 'TAR=/bin/tar' 'XSLTPROC=xsltproc --nonet' 'CFLAGS=-g -O2 -fstack-protector-strong -Wformat -Werror=format-security' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now' '--with-gssapi' '--with-ldap' 'build_alias=aarch64-linux-gnu' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O2 -fstack-protector-strong -Wformat -Werror=format-security' $ find . -name '*jit*.so' ./usr/lib/postgresql/14/lib/llvmjit.so It's certainly expected to work on this arch, and we have snakefly and eelpout testing it in the build farm.
Re: spiexceptions.UndefinedFile: could not open file "base/16482/681830": No such file or directory
On Thu, Sep 9, 2021 at 9:19 AM Celia McInnis wrote: > Note that the file does exist:! (How do I know if it is looking under the > correct directory? Other times I have done similar temporary table creations > with no problems!): PostgreSQL internally uses relative paths. It's probably not a very good idea to use 'chdir' in a procedure.
Re: Is there something similar like flashback query from Oracle planned for PostgreSQL
On Thu, Jun 24, 2021 at 6:54 AM Dirk Krautschick wrote: > Is there something planned to get a behaviour like Oracle's flashback query > based on the old values > before deleted by vacuum? > > So a feature to recreate old versions of rows if still there? > > Or are there any related extensions or tools doing this? There are some things like pg_dirtyread and probably more. You might be interested in some of the references in this thread: https://www.postgresql.org/message-id/flat/CAKLmikOkK%2Bs0V%2B3Pi1vS2GUWQ0FAj8fEkVj9WTGSwZE9nRsCbQ%40mail.gmail.com As for the SQL standard's approach to this, there are some active -hackers threads on that with patches in development... look for "temporal tables" and "system versioned".
Re: order by
On Thu, Jun 10, 2021 at 8:40 PM Luca Ferrari wrote: > On Thu, Jun 10, 2021 at 10:15 AM Vijaykumar Jain > > when you provide a table in query in the order by clause, it is > > ordered by cols of that table in that order. > > Clever, thanks! > I also realized that this "table to tuples" expansion works for GROUP BY too. > However, I'm not able to find this documented in GROUP BY, WHERE, > ORDER BY clauses sections into select documentation > https://www.postgresql.org/docs/12/sql-select.html>. Could be > my fault, of course. There's something about this here: https://www.postgresql.org/docs/13/rowtypes.html#ROWTYPES-USAGE
Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?
On Wed, Jun 2, 2021 at 7:15 AM Vijaykumar Jain wrote: > i only get workers to create mv, but refresh mv plan does not use workers for > the same conf params. Yeah, this changed in v14: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9e7ccd9ef64d05e87ceb1985d459bef9031205c0
Re: Chain Hashing
On Thu, May 6, 2021 at 9:48 PM Jian He wrote: > The following part is about the Chain Hashing. >> >> Maintain a linked list of buckets for each slot in the hash table. >> Resolve collisions by placing all elements with the same hash key into the >> same bucket. >> → To determine whether an element is present, hash to its bucket and scan >> for it. >> → Insertions and deletions are generalizations of lookups. > > >I still don't get it. Stackoverflow seems don't have good answers yet. So > I come here, asking Not sure which part you're asking about, but here's an example. Suppose you want to put three objects (in our case tuples) into a hash table, where the first attribute is the key: ('AAA', 'Aardvark'), ('BBB', 'Bumblebee'), ('CCC', 'Cat') Suppose your hash table has 4 buckets. We can use the lower 2 bits to map hash values to bucket numbers. hash('AAA') = 0xfe7f3cba, hash('AAA') & 0x3 = 2 hash('BBB') = 0x87e3287b, hash('BBB') & 0x3 = 3 hash('CCC') = 0x194bcedf, hash('CCC') & 0x3 = 3 'BBB' and 'CCC' collided: they both want to be in bucket 3. To insert, a chaining hash table just add each object to the computed bucket's chain: +---+ | 0 | +---+ | 1 | +---+ | 2 |->('AAA', 'Aardvark') +---+ | 3 |->('BBB', 'Bumblebee')->('CCC', 'Cat') +---+ When looking up key 'CCC' during the "probe" phase of a hash join, we'll again compute hash('CCC') & 0x3 = 3, look in bucket 3, and then compare the key of every tuple in that list with 'CCC' to see if we can find any matches. That's called "lookup" in the text you quoted. It also mentions deletion, which is pretty much just lookup following by removing the matching entry from the list, but that's a general comment about hash tables. It doesn't apply to their use in hash joins: there is never a need to remove individual keys, we just build, probe and destroy the whole table. Another difference between general purpose hash tables such as you might find in a typical programming language standard library and the hash tables used to implement hash joins is the latter need to be able to tolerate duplicate keys, so the 'scan' of a bucket doesn't give up as soon as it finds a match (unless it's a semi-join): it normally has to emit all of the matches. PostgreSQL uses chaining for hash joins, but it also uses Robin Hood hash tables in some other places, including hash-based GROUP BY.
Re: Huge performance penalty with parallel queries in Windows x64 v. Linux x64
On Wed, May 5, 2021 at 2:12 PM Thomas Munro wrote: > It might be interesting to know how that 40ms time scales as you add > more workers. ... Another thought: I'd also try tests like that in large databases (ie large virtual memory) vs small ones, and with and without huge/locked memory pages configured[1], to see how much of the overhead is due to virtual memory work (though I know nothing about Windows VM, it's just an idea). [1] https://www.postgresql.org/docs/13/runtime-config-resource.html#GUC-HUGE-PAGES
Re: Huge performance penalty with parallel queries in Windows x64 v. Linux x64
On Wed, May 5, 2021 at 3:50 AM Hans Buschmann wrote: > (BTW: Is this cost multiplied by the real count of workers choosen > (max_parallel_workers_per_gather) or only a value independent of the number > of workers?. This would matter in windows-high-parallel scenarios) It's not multiplied: https://github.com/postgres/postgres/blob/50e17ad281b8d1c1b410c9833955bc80fbad4078/src/backend/optimizer/path/costsize.c#L398 It might be interesting to know how that 40ms time scales as you add more workers. For what it's worth, I see that the following query takes around about ~6ms + ~1.5ms per worker on my FreeBSD machine, and on Linux it's harder to pin down, varying a lot, usually a bit slower (sorry I didn't have time to do proper statistics). create table t (); alter table t set (parallel_workers=8); set min_parallel_table_scan_size = 0; set parallel_setup_cost = 0; set parallel_tuple_cost = 0; set max_parallel_workers_per_gather = 1; explain analyze select count(*) from t; ... set max_parallel_workers_per_gather = 7; explain analyze select count(*) from t;
Re: Huge performance penalty with parallel queries in Windows x64 v. Linux x64
On Tue, May 4, 2021 at 7:40 PM Hans Buschmann wrote: > The problem seems that this (probably inherent) performance disadvantage of > windows is not reflected in the cost model. https://www.postgresql.org/docs/13/runtime-config-query.html#GUC-PARALLEL-SETUP-COST is for that. It might be interesting to do some profiling to see exactly what is slow. Presumably CreateProcess(), but I wonder why exactly. It'd be nice if we had a way to reuse parallel workers, but alas we do not, yet. Or threads.
Re: Huge performance penalty with parallel queries in Windows x64 v. Linux x64
On Tue, May 4, 2021 at 4:05 AM Hans Buschmann wrote: > The main difference is the time shown for the Gather Merge step (65 ms vs. 7 > ms) No Windows here, but could it be super slow at launching workers? How does a trivial parallel query compare, something like? SET force_parallel_mode = on; EXPLAIN ANALYZE SELECT 42;
Re: -1/0 virtualtransaction
On Wed, Apr 28, 2021 at 2:25 AM Mike Beachy wrote: > On Tue, Apr 27, 2021 at 2:56 AM Laurenz Albe wrote: >> >> Not sure, but do you see prepared transactions in "pg_prepared_xacts"? > > No, the -1 in the virtualtransaction > (https://www.postgresql.org/docs/11/view-pg-locks.html) for pg_prepared_xacts > was another clue I saw! But, it seems more or less a dead end as I have > nothing in pg_prepared_xacts. > > Thanks for the idea, though. There is another way to get a pg_lock that shows -1/0 there: if we run out of SERIALIZABLEXACT objects, we transfer the locks of the oldest committed transaction to a single SERIALIZABLEXACT object called OldCommittedSxact, and it has an invalid virtual xid. You can see this if you recompile with TEST_SUMMARIZE_SERIAL defined to force that path, and then run three overlapping transactions like this: tx1: BEGIN; tx1: SELECT * FROM t WHERE id = 42; tx2: BEGIN; tx2: SELECT * FROM t WHERE id = 999; tx1: COMMIT; Even though it comitted, at this point we still see tx1's locks, along with tx2's. tx3: BEGIN; tx3: SELECT * FROM t WHERE id = 1234; At this point we see tx1's locks still but their vxid has changed to -1/0. > I still need to put more effort into Tom's idea about SIReadLock hanging out > after the transaction, but some evidence pointing in this direction is that > I've reduced the number of db connections and found that the '-1/0' locks > will eventually go away! I interpret this as the db needing to find time when > no overlapping read/write transactions are present. This doesn't seem > completely correct, as I don't have any long lived transactions running while > these locks are hanging out. Confusion still remains, for sure. But do you have lots of short overlapping transactions so that there is never a moment where there are zero transactions running? As mentioned, locks (and transactions, and conflicts) hang around after you commit. That's because things that your finished transaction saw can cause transactions that are still running to become uncommittable, by forming part of a cycle. The rules for when the locks can be freed change once you reach the degraded "summarised" mode, though, due to lack of bookkeeping space. Not sure of the exact details without rereading the source code 42 times with much coffee, but it's something like: summarised locks can only be freed at times when there are zero active read/write serializable transactions, because that's when "CanPartialClearThrough" advances, while in the normal case they can be freed as soon as there are no SSI snapshots that were taken before it committed, because that's when "SxactGlobalXmin" advances. The number of SERIALIZABLEXACT objects is (max_connections + max_prepared_transactions) * 10. So, you could try increasing max_connections (without increasing the actual number of connections) to see if you can get to a point where you don't see these invalid virtual xids, and then maybe it'll be able to clean up locks more aggressively.
Re: Postgres crashes at memcopy() after upgrade to PG 13.
On Mon, Mar 15, 2021 at 1:29 PM Avinash Kumar wrote: > Is this expected when replication is happening between PostgreSQL databases > hosted on different OS versions like Ubuntu 16 and Ubuntu 20 ? Or, do we > think this is some sort of corruption ? Is this index on a text datatype, and using a collation other than "C"? https://wiki.postgresql.org/wiki/Locale_data_changes Not that I expect it to crash if that's the cause, I thought it'd just get confused.
Re: SV: Log files polluted with permission denied error messages after every 10 seconds
Hi Andrus, On Thu, Mar 11, 2021 at 2:21 AM Andrus wrote: > Windows Resource manger shows that wal files are used by large number of > postgres processes: > > postgres.exe22656FileC:\Program > Files\PostgreSQL\13\data\pg_wal\0001000A0075 > postgres.exe30788FileC:\Program > Files\PostgreSQL\13\data\pg_wal\0001000A0075 > postgres.exe14144FileC:\Program > Files\PostgreSQL\13\data\pg_wal\0001000A0074 ... This is normal -- postgres.exe holds open various files it's interested in, and it's supposed to be OK for them to be renamed or unlinked at any time by another process because they are opened with special FILE_SHARE_XXX flags that allow that. That list doesn't show the open flags, but it looks like nothing *else* has the files open. Usually when these types of errors are reported on the mailing list, it turns out to be due to some other program opening the file without those flags. It's also possible that this is a real permissions problem, and not a sharing violation. I'd definitely look into this feedback: On Tue, Mar 9, 2021 at 11:19 PM Thomas Kellerer wrote: > The data directory should not be stored in "C:\Program File"s on Windows. > > I wouldn't be surprised if "Program Files" has some additional security > settings that come into play here. > > %ProgramData% is a better location for the data directory.
Re: SV: Log files polluted with permission denied error messages after every 10 seconds
On Tue, Mar 9, 2021 at 9:43 PM Andrus wrote: > > Any hints in Windows event viewer? Events occurring at the same time > showing up there. > > Looked into Administrative Events/Custom views and few others. There are no > messages about this. Windowsi perfomance monitor and Filemon show files > opened by process. > > How to do reverse: log processes and threads which use files in pg_wal > directory ? Maybe this does the reverse? https://docs.microsoft.com/en-us/sysinternals/downloads/handle
Re: Log files polluted with permission denied error messages after every 10 seconds
On Sat, Mar 6, 2021 at 2:36 PM Michael Paquier wrote: > On Fri, Mar 05, 2021 at 07:36:37PM +0200, Andrus wrote: > > Then turned real-time protection off: > > > > Problem persists. New entry is written after every 10 seconds. > > On which files are those complaints? It seems to me that you may have > more going on in this system that interacts with your data folder than > you think. Suggestion received off-list from my colleague Bevan Arps, when I was complaining about this general variety of problem: maybe we should look into using Windows' RestartManager[1][2] API to find out which processes (at least the pids, maybe also names) currently have a file open? Then, if it is indeed a sharing violation that's causing the problem, we might at least be able to log message that says who's blocking us once we reach that dreaded retry loop. There are other ways to get that information too, I believe, no idea which API would be best, but this one looks to be the best documented. I'm unlikely to work on this myself as a card carrying Unix hacker, so I'm just passing on this insight in case it's useful... Another thought: if it's not a sharing violation, I wonder if we should consider dumping more raw Windows error information in the messages we log, because, if I recall correctly, we're converting many Windows error codes into few Unix-style error numbers and thereby throwing away valuable clues. It makes it a bit more confusing when trying to ask a Windows expert what might be happening. [1] https://docs.microsoft.com/en-us/windows/win32/rstmgr/restart-manager-portal [2] https://devblogs.microsoft.com/oldnewthing/20120217-00/?p=8283
Re: [LDAPS] Test connection user with ldaps server
On Tue, Feb 16, 2021 at 4:32 AM Laurenz Albe wrote: > What I would do is experiment with the "ldapsearch" executable from OpenLDAP > and see > if you can reproduce the problem from the command line. Also, maybe try doing this as the "postgres" user (or whatever user PostgreSQL runs as), just in case there are some environmental differences affecting the behaviour.
Re: How to post to this mailing list from a web based interface
On Fri, Jan 29, 2021 at 4:27 AM Alvaro Herrera wrote: > On 2021-Jan-28, Ravi Krishna wrote: > > I recollect there use to be a website from where one can reply from web. > > The community does not maintain such a service. > > There used to be a Gmane archive of this list that you could use to > post. Seems it's still online at postgresql-archive.org. They have a > "Reply" button and it says to require your account, but given SPF and > DMARC and other restrictions on email generation, it seems pretty > uncertain that emails posted that way would work correctly. I think we > would even reject such emails if they reached our mailing list servers. While catching up with some interesting new threads just now I was quite confused by the opening sentence of this message (which also arrived in my mailbox): https://www.postgresql.org/message-id/1611355191319-0.post%40n3.nabble.com ... until I got to the last line. I wonder if the "Resend" facility on our own archives could be better advertised, via a "Want to join this thread?" link in the Quick Links section that explains how to use it and what problem it solves, or something...
Re: Unable To Drop Tablespace
On Fri, Feb 5, 2021 at 12:43 PM Ian Lawrence Barwick wrote: > 2021年2月5日(金) 3:52 Pavan Pusuluri : >> We are trying to drop a table space on RDS Postgres . We have removed the >> objects etc, but it still won't drop. >> >> I have checked and there's no reference anywhere to this tablespace but it >> complains it's not empty. >> >> I checked if it is a default for a database, revoked all privileges on the >> tablespace. >> >> We dropped the database but underpinning tablespace remained but when I >> query to see if any reference i get no hits. >> >> "Select c.relname,t.spcname from pg_class c JOIN pg_tablespace t ON >> c.reltablespace=t.oid where t.spcname='mytablespace' >> >> I dont find any objects referencing. Kindly let me know if anything else >> needs to be checked? > > > There's a handy function "pg_tablespace_databases()" to check which databases > might still have objects in a database. There are a couple of useful queries > demonstrating > usage here: > > https://pgpedia.info/p/pg_tablespace_databases.html It's also possible for there to be stray files in there, in some crash scenarios where PostgreSQL doesn't currently clean up relation files that it ideally should. The one with the widest window AFAIK is where you crash after creating a table but before committing[1]. You'd need a directory listing to investigate that. [1] https://www.postgresql.org/message-id/flat/CAEepm%3D0ULqYgM2aFeOnrx6YrtBg3xUdxALoyCG%2BXpssKqmezug%40mail.gmail.com
Re: Unable to compile postgres 13.1 on Slackware current x64
On Tue, Nov 17, 2020 at 8:02 PM Condor wrote: > I try to compile postgres again with (cd src/backend/commands; sed > 's/TRUE/true/' collationcmds.c > collationcmds.c; ) and it's compiled > but get new error on linking: Doesn't that produce an empty file collationcmds.c? I think you want: sed 's/TRUE/true/' collationcmds.c > collationcmds.c.tmp && mv collationcmds.c.tmp collationcmds.c
Re: Unable to compile postgres 13.1 on Slackware current x64
On Mon, Nov 16, 2020 at 10:10 PM Laurenz Albe wrote: > On Mon, 2020-11-16 at 09:15 +0200, Condor wrote: > > collationcmds.c: In function ‘get_icu_language_tag’: > > collationcmds.c:467:51: error: ‘TRUE’ undeclared (first use in this > > function); did you mean ‘IS_TRUE’? > >467 | uloc_toLanguageTag(localename, buf, sizeof(buf), TRUE, ); > >| ^~~~ > >| IS_TRUE > > > "UBool" and "TRUE" are defined in "umachine.h", which is a header file for the > "libicu" library. > > PostgreSQL includes "unicode/ucol.h", which will include "umachine.h" > (via "utypes.h"), so that should be fine. > > Are your libicu headers installed under /usr/include/unicode? > Do you get any messages about missing include files earlier? It looks like something happened to ICU's boolean macros . See this commit in FreeBSD ports: https://github.com/freebsd/freebsd-ports/commit/81a88b4506ec06d07be10d199170ef4003eb0e30 ... which references: https://github.com/unicode-org/icu/commit/c3fe7e09d844
Re: PostgreSQL on Windows' state
On Wed, Sep 23, 2020 at 10:53 PM Alessandro Dentella wrote: > Thanks Pavel, but psycopg2 (that I always use is just for Python). T > hey claim/complain that from c# there's no native solution. Maybe https://www.npgsql.org/?
Re: Check replication lag
On Thu, Aug 6, 2020 at 7:02 AM Sreejith P wrote: > IN SYNC mode of replication what would be the impact on Master DB server in > terms of over all performance ? The pg_stat_replication columns write_lag, flush_lag and replay_lag are designed tell you how long to expect commits to take for synchronous standbys, based on recent history, if synchronous_commit it set to remote_write, on or remote_apply respectively. Those times tell you about commit latency, which limits sequential commit rate for each session.
Re: PG 9.5.5 cores on AIX 7.1
On Sun, Jul 19, 2020 at 11:01 PM Abraham, Danny wrote: > Segmentation fault in _alloc_initial_pthread at 0x9521474 > 0x9521474 (_alloc_initial_pthread+0x1d4) e803 ld > r0,0x0(r3) > (dbx) where > _alloc_initial_pthread(??) at 0x9521474 > __pth_init(??) at 0x951f390 > uload(??, ??, ??, ??, ??, ??) at 0x9fff000ab70 load1(??, ??, ??, ??) at > 0x9000b74 load(??, ??, ??) at 0x9001ef0 loadAndInit(??, ??, > ??) at 0x905b38c dlopen(??, ??) at 0x909bfe0 > internal_load_library(??) at 0x10014c684 > RestoreLibraryState(??) at 0x10014d79c > ParallelWorkerMain(??) at 0x1000bb2d0 > StartBackgroundWorker() at 0x10026cd94 > maybe_start_bgworkers() at 0x10003834c > sigusr1_handler(??) at 0x10003902c > __fd_select(??, ??, ??, ??, ??) at 0x91567fc > ServerLoop() at 0x1004cec90 > PostmasterMain(??, ??) at 0x10003a4e8 > main(??, ??) at 0x108f8 FWIW there was a report a decade ago that looked at least superficially similar: https://www.postgresql.org/message-id/flat/09B23E7BF70425478C1330D893A722C602FEC019BD%40MailSVR.invera.com
Re: Same query taking less time in low configuration machine
On Tue, Jul 14, 2020 at 9:27 PM Vishwa Kalyankar wrote: > Function Scan on kseb_geometry_trace_with_barrier_partition > (cost=0.25..10.25 rows=1000 width=169) (actual time=11626.548..11626.568 > rows=254 loops=1) > Function Scan on kseb_geometry_trace_with_barrier_partition > (cost=0.25..10.25 rows=1000 width=169) (actual time=22304.425..22304.448 > rows=254 loops=1) I have no idea what that function does, but perhaps it runs more queries, and you can't see the plans for those here. If you set up auto_explain[1], and turn on auto_explain.log_nested_statements, then you'll be able to see the query plans for the internal stuff happening in that function. I'd look at EXPLAIN (BUFFERS) or auto_explain.log_buffers to get more information on cache hits and misses. I'd look for settings differences with EXPLAIN (SETTINGS) to see if there's anything accidentally set differently (maybe JIT or paralelism or something like that). I'd look at pg_stat_activity repeatedly while it runs to see what the processes are doing, especially the wait_event column. I'd also look at the CPU and I/O on the systems with operating system tools like iostat, htop, perf to try to find the difference. [1] https://www.postgresql.org/docs/current/auto-explain.html
Re: Definition of REPEATABLE READ
On Mon, Jun 22, 2020 at 12:25 AM Peter J. Holzer wrote: > I've read http://jepsen.io/analyses/postgresql-12.3 which reports a > problem in PostgreSQL's SERIALIZABLE isolation leven (which has since been > fixed) and also shows an example of a violation of what they consider to > be the correct definition of REPEATABLE READ. In response to that report, we added a paragraph to the manual to state explicitly that what we offer is snapshot isolation: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1575fbf1ac3c8464b2dade245deff537a3aa2498 https://www.postgresql.org/docs/devel/transaction-iso.html#XACT-REPEATABLE-READ > Since those papers are now 25 and 20 years old, respectively, and there > have been several revisions of the SQL standard in the meantime, has the > SQL committee come around to that view (SQL/2003 seems unchanged, I > couldn't find a later revision online)? And if it has, did it follow > those definitions or come up with different ones (it seems to me that > G2-item is much stronger that warranted by the wording in the standard)? SQL:2016 is the latest, and I don't have it to hand right now, but SQL:2011 still defined four degrees of isolation in terms of the three phenomena (pages 133-135 of part II, "Foundations", at least in the draft copy you can find on the net). As for what else PostgreSQL should do about this historical confusion, I suspect that a patch to pgsql-hackers to accept the name SNAPSHOT ISOLATION as an alternative would be popular, and we could probably even spin it as the "true" name of the level (while still accepting REPEATABLE READ; there is no point in breaking that), and maybe add a second table to the manual's Transaction Isolation section to cover the later formalisations in a digested user friendly format (if that is possible).
Re: troubleshooting postgresql ldap authentication
On Tue, Jun 9, 2020 at 9:05 AM Chris Stephens wrote: > hostsslall all 0.0.0.0/0 ldap > ldapserver="ldaps://xxx" ldapbasedn="yyy" ldaptls=1 > does anyone know what might be causing "LDAP: Bad parameter to an ldap > routine" You probably want ldapurl="ldaps://xxx" (note: ldapurl, not ldapserver). Or you could use ldapscheme="ldaps" and ldapserver="xxx".
Re: Shared memory error
On Fri, Jun 5, 2020 at 1:00 AM Sonam Sharma wrote: > The dynamic_shared_memory_type was set to POSIX . Because of this it was > using tmpfs /dev/shm. When the query was running I saw the file system was > filling. So I extended the file system and luckily the query worked for that > time Oh, hmm. When you say "filling", maybe you hit the leaking shm problem that was fixed in 11.7, that requires messages somewhere in your log about lack of DSM slots? I don't have an explanation for the exact the error you're seeing though. It's a bit tricky to speculate on older releases with so many bug fixes missing though. Please let us know if you still see the problem after your next scheduled upgrade.
Re: Shared memory error
>> Do you see any other errors around this one, in the PostgreSQL logs? > No , only this is the error from db and jdbc end .. and queries are failing If you need a workaround right now you could always set max_parallel_workers_per_gather=0 so that it doesn't try to use parallel query. That could make some queries run more slowly. When you say you extended the /dev/shm filesystem, what are the mount options, or how did you configure it? It's interesting that the problem went away for a while when you did that.
Re: Shared memory error
On Thu, Jun 4, 2020 at 6:18 AM Sonam Sharma wrote: >>> 1) Postgres version : 11.2 FYI This is missing over a year's worth of bugfixes. That said, I don't know of anything fixed that has this symptom. >>> 4) Is this only with one query and if so what is it doing? : No , few >>> queries work, few fails .. mostly all are complex select queries. It's a long shot, because I expect you're running the usual packages under a user set up by the package to be a "system" user, but just in case: there is one known way for POSIX shm to be unlinked while you're still using it: systemd. For example, this might happen if you have cronjobs running as the same user. Some more info: https://www.postgresql.org/docs/11/kernel-resources.html#SYSTEMD-REMOVEIPC Do you see any other errors around this one, in the PostgreSQL logs?
Re: Can we get SQL Server-like cross database queries
On Thu, Jun 4, 2020 at 4:26 PM Ron wrote: > On 6/3/20 2:57 PM, Rob Sargent wrote: > >> On Jun 3, 2020, at 1:46 PM, Tom Lane wrote: > >> Guyren Howe writes: > >>> Is it practical to provide the SQL Server-like feature in Postgres? > >> No. > > That got me chuckling. > > I had just decided not to bother posting, but well, here goes. > > > > I call bs on the “cross db query” notion of tsql - but I admit I haven’t > > used it since it was a Sybase thing. > > > > Is db.owner.table (iirc) is really anything more than nuance on > > schema.table. Does a db for automotive parts need to be > > up-close-and-personal with a db payroll? > > Those aren't the only two databases that exist. Think of a federated system > where you've got a "reference" database full of lookup tables, and one > database for every 10,000 customers. For 45,000 customers you've got five > databases, and they all need to access the reference database, plus "all > customers" queries need to access all five databases. There's no doubt it's useful, and it's also part of the SQL spec, which says you can do catalog.schema.table. I would guess that we might get that as a byproduct of any project to make PostgreSQL multithreaded. That mountain moving operation will require us to get rid of all the global state that currently ties a whole process to one session and one database, and replace it with heap objects with names like Session and Database that can be passed around between worker threads.
Re: Help understanding SIReadLock growing without bound on completed transaction
On Fri, May 22, 2020 at 7:48 AM Mike Klaas wrote: > It's my understanding that these locks should be cleared when there are no > conflicting transactions. These locks had existed for > 1 week and we have > no transactions that last more than a few seconds (the oldest transaction in > pg_stat_activity is always < 1minute old). > Why would a transaction that is finished continue accumulating locks over > time? Predicate locks are released by ClearOldPredicateLocks(), which releases SERIALIZABLEXACTs once they are no longer interesting. It has a conservative idea of what is no longer interesting: it waits until the lowest xmin across active serializable snapshots is >= the transaction's finishedBefore xid, which was the system's next xid (an xid that hasn't been used yet*) at the time the SERIALIZABLEXACT committed. One implication of this scheme is that SERIALIZABLEXACTs are cleaned up in commit order. If you somehow got into a state where a few of them were being kept around for a long time, but others committed later were being cleaned up (which I suppose must be the case or your system would be complaining about running out of SERIALIZABLEXACTs), that might imply that there is a rare leak somewhere in this scheme. In the past I have wondered if there might be a problem with wraparound in the xid tracking for finished transactions, but I haven't worked out the details (transaction ID wraparound is both figuratively and literally the Ground Hog Day of PostgreSQL bug surfaces). *Interestingly, it takes an unlocked view of that value, but that doesn't seem relevant here; it could see a value that's too low, not too high.
Re: Help understanding SIReadLock growing without bound on completed transaction
On Fri, May 22, 2020 at 7:48 AM Mike Klaas wrote: > locktype: page > relation::regclass::text: _pkey > virtualtransaction: 36/296299968 > granted:t > pid:2263461 That's an unusually high looking pid. Is that expected, for example did you crank Linux's pid_max right up, or is this AIX, or something?
Re: Ubuntu 20.04: apt.postgresql.org/pub/repos/apt focal-pgdg InRelease' doesn't support architecture 'i386'
On Sun, May 17, 2020 at 10:45 AM Hugh wrote: > While this doesn't appear to be a bug that causes problems of any kind, I do > have a question about its cause. > > The "error" listed in the Subject: line is basically what I'm seeing. The > entire message is below, particularly the 'N:' at the end. Is there a repo > setting I should change to prevent the request for '386' architecture? Thank > you in advance for your assistance. I'm not sure, but it seems related to this complaint and the answer might be to tell your sources.list that that source has only amd64: https://www.postgresql.org/message-id/flat/16402-1f2d77e819f9e1f2%40postgresql.org
Re: Is there a significant difference in Memory settings between 9.5 and 12
On Tue, May 12, 2020 at 2:52 PM Tory M Blue wrote: > It took the change but didn't help. So 10GB of shared_buffers in 12 is still > a no go. I'm down to 5GB and it works, but this is the same hardware, the > same exact 9.5 configuration. So I'm missing something. WE have not had to > mess with kernel memory settings since 9.4, so this is an odd one. > > I'll keep digging, but i'm hesitant to do my multiple TB db's with half of > their shared buffer configs, until I understand what 12 is doing differently > than 9.5 Which exact version of 9.5.x are you coming from? What's the exact error message on 12 (you showed the shared_memory_type=sysv error, but with the default value (mmap) how does it look)? What's your huge_pages setting? Can you reproduce the problem with a freshly created test cluster? As a regular user, assuming regular RHEL packaging, something like /usr/pgsql-12/bin/initdb -D test_pgdata, and then /usr/pgsql-12/bin/postgres -D test_pgdata -c shared_buffers=10GB (then ^C to stop it). If that fails to start in the same way, it'd be interesting to see the output of the second command with strace in front of it, in the part where it allocates shared memory. And perhaps it'd be interesting to see the same output with /usr/pgsql-9.5/bin/XXX (if you still have the packages). For example, on my random dev laptop that looks like: openat(AT_FDCWD, "/proc/meminfo", O_RDONLY) = 6 fstat(6, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 read(6, "MemTotal: 16178852 kB\nMemF"..., 1024) = 1024 read(6, ":903168 kB\nShmemHugePages: "..., 1024) = 311 close(6)= 0 mmap(NULL, 11016339456, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = -1 ENOMEM (Cannot allocate memory) mmap(NULL, 11016003584, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0) = 0x7ff74e579000 shmget(0x52e2c1, 56, IPC_CREAT|IPC_EXCL|0600) = 3244038 shmat(3244038, NULL, 0) = 0x7ff9df5ad000 The output is about the same on REL9_5_STABLE and REL_12_STABLE for me, only slightly different sizes. If that doesn't fail in the same way on your system with 12, perhaps there are some more settings from your real clusters required to make it fail. You could add them one by one with -c foo=bar or in the throw away test_pgdata/postgresql.conf, and perhaps that process might shed some light? I was going to ask if it might be a preloaded extension that is asking for gobs of extra memory in 12, but we can see from your "Failed system call was shmget(key=5432001, size=11026235392, 03600)" that it's in the same ballpark as my total above for shared_buffers=10GB.
Re: 12.2: Howto check memory-leak in worker?
On Tue, May 5, 2020 at 10:13 AM Peter wrote: > BTW, I would greatly appreciate if we would reconsider the need for > the server to read the postmaster.pid file every few seconds (probably > needed for something, I don't know). > That makes it necessary to set atime=off to get a spindown, and I > usually prefer to have atime=on so I can see what my stuff is > currently doing. That'd be this: /* * Once a minute, verify that postmaster.pid hasn't been removed or * overwritten. If it has, we force a shutdown. This avoids having * postmasters and child processes hanging around after their database * is gone, and maybe causing problems if a new database cluster is * created in the same place. It also provides some protection * against a DBA foolishly removing postmaster.pid and manually * starting a new postmaster. Data corruption is likely to ensue from * that anyway, but we can minimize the damage by aborting ASAP. */ if (now - last_lockfile_recheck_time >= 1 * SECS_PER_MINUTE) { if (!RecheckDataDirLockFile())
Re: Transition tables for column-specific UPDATE triggers
On Wed, Oct 9, 2019 at 3:59 PM Guy Burgess wrote: > The manual says: > https://www.postgresql.org/docs/current/sql-createtrigger.html > > A column list cannot be specified when requesting transition relations. > > And (I think the same point): > > The standard allows transition tables to be used with column-specific UPDATE > triggers, but then the set of rows that should be visible in the transition > tables depends on the trigger's column list. This is not currently > implemented by PostgreSQL. > > Are there any plans to allow transition tables to be used with > column-specific UPDATE triggers? Or, is there another way for a trigger > function to see the rows changed by a column-specific UPDATE trigger? Hi Guy, Answering an ancient message that went unanswered... I'm not aware of anyone working on that, and there isn't another way: the transition tables feature simply won't let you create such a trigger. The last I've seen anyone say about that was in the following commit: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0f79440fb0b4c5a9baa9a95570c01828a9093802 Excerpt: "Also, forbid using transition tables with column-specific UPDATE triggers. The spec requires such transition tables to show only the tuples for which the UPDATE trigger would have fired, which means maintaining multiple transition tables or else somehow filtering the contents at readout. Maybe someday we'll bother to support that option, but it looks like a lot of trouble for a marginal feature." The code preventing it is here: /* * We currently don't allow column-specific triggers with * transition tables. Per spec, that seems to require * accumulating separate transition tables for each combination of * columns, which is a lot of work for a rather marginal feature. */ if (stmt->columns != NIL) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("transition tables cannot be specified for triggers with column lists"))); In theory you could do some filtering in your trigger procedure, by comparing rows in OLD TABLE and NEW TABLE to see which row-pairs represent changes to the columns you care about, but that's slightly complicated: you can join OLD and NEW using whatever keys you have defined, but that only works if there's no danger of the keys themselves changing. I wondered about inventing something like WITH ORDINALITY so that you get unique ordered numbers in an extra column that can be used to join the two transition tables without knowing anything about the user defined keys, but among other problems I couldn't figure out how to fit it in syntax-wise. I suppose PostgreSQL could do this internally to make the feature you want work: whenever you scan either table, in an UPDATE OF trigger, it could scan both transition tables in sync and filter out rows that didn't change your columns of interest. Or it could do that filtering up front, before your trigger fires, to create two brand new tuplestores just for your trigger invocation. Or there could be a third spillable data structure, that records which triggers should be able to see each old/new-pair, or which columns changed, and is scanned in sync with the others. Just some first thoughts; I am not planning to work on this any time soon.
Re: EINTR while resizing dsm segment.
On Tue, Apr 7, 2020 at 8:58 PM Nicola Contu wrote: > So that seems to be a bug, correct? > Just to confirm, I am not using NFS, it is directly on disk. > > Other than that, is there a particular option we can set in the postgres.conf > to mitigate the issue? Hi Nicola, Yeah, I think it's a bug. We're not sure exactly where yet.
Re: EINTR while resizing dsm segment.
On Thu, Apr 2, 2020 at 9:25 PM Kyotaro Horiguchi wrote: > I provided the subject, and added -hackers. > > > Hello, > > I am running postgres 11.5 and we were having issues with shared segments. > > So I increased the max_connection as suggested by you guys and reduced my > > work_mem to 600M. > > > > Right now instead, it is the second time I see this error : > > > > ERROR: could not resize shared memory segment "/PostgreSQL.2137675995" to > > 33624064 bytes: Interrupted system call > > The function posix_fallocate is protected against EINTR. > > | do > | { > | rc = posix_fallocate(fd, 0, size); > | } while (rc == EINTR && !(ProcDiePending || QueryCancelPending)); > > But not for ftruncate and write. Don't we need to protect them from > ENTRI as the attached? We don't handle EINTR for write() generally because that's not supposed to be necessary on local files (local disks are not "slow devices", and we document that if you're using something like NFS you should use its "hard" mount option so that it behaves that way too). As for ftruncate(), you'd think it'd be similar, and I can't think of a more local filesystem than tmpfs (where POSIX shmem lives on Linux), but I can't seem to figure that out from reading man pages; maybe I'm reading the wrong ones. Perhaps in low memory situations, an I/O wait path reached by ftruncate() can return EINTR here rather than entering D state (non-interruptable sleep) or restarting due to our SA_RESTART flag... anyone know? Another thought: is there some way for the posix_fallocate() retry loop to exit because (ProcDiePending || QueryCancelPending), but then for CHECK_FOR_INTERRUPTS() to do nothing, so that we fall through to reporting the EINTR?
Re: \COPY to accept non UTF-8 chars in CHAR columns
On Sat, Mar 28, 2020 at 4:46 AM Tom Lane wrote: > Matthias Apitz writes: > > In short, it there a way to let \COPY accept such broken ISO bytes, just > > complaining about, but not stopping the insert of the row? > > No. We don't particularly believe in the utility of invalid data. > > If you don't actually care about what encoding your data is in, > you could use SQL_ASCII as the database "encoding" and thereby > disable all UTF8-specific behavior. Otherwise, maybe this conversion > is a good time to clean up the mess? Something like this approach might be useful for fixing the CSV file: https://codereview.stackexchange.com/questions/185821/convert-a-mix-of-latin-1-and-utf-8-to-proper-utf-8 I haven't tested that program but it looks like the right sort of approach; I remember writing similar logic to untangle the strange mixtures of Latin 1, Windows 1252, and UTF-8 that late 90s browsers used to send. That sort of approach can't fix every theoretical problem (some valid Latin1 sequences are also valid UTF-8 sequences) but it's doable with text in European languages.
Re: LDAP with TLS is taking more time in Postgresql 11.5
On Wed, Feb 26, 2020 at 7:37 AM Adrian Klaver wrote: > On 2/25/20 10:23 AM, Mani Sankar wrote: > > Hi Adrian, > > > > Both the machines are in same network and both are pointing towards the > > same LDAP server > > I don't see any errors in the Postgres logs. > > You probably should take a look at the LDAP server logs to see if there > is anything there. > > You could also turn up the logging detail in Postgres to see if it > reveals anything. A couple more ideas: If you take PostgreSQL out of the picture and run the equivalent LDAP queries with the ldapsearch command line tool, do you see the same difference in response time? If so, I'd trace that with strace etc with timings to see where the time is spent -- for example, is it simply waiting for a response from the LDAP (AD?) server? If not, I'd try tracing the PostgreSQL process and looking at the system calls (strace -tt -T for high res times and elapsed times), perhaps using PostgreSQL's pre_auth_delay setting to get time to attach strace. A wild stab in the dark: if it's slow from one computer and not from another, perhaps the problem has something to do with a variation in reverse DNS lookup speed on the LDAP server side when it's verifying the certificate. Or something like that.
Re: Postgres 12.1 : UPPER() in WHERE clause restarts server
On Mon, Feb 10, 2020 at 4:35 AM Marc wrote: > We will keep the 12.1 in place so that we can run additional tests to assist > to pin-point the issue. > > Feel free to ask but allow us to recover from these hectic days ;-) Here's how to get a stack so we can see what it was doing, assuming you have the Apple developer tools installed: 1. Find the PID of the backend you're connected to with SELECT pg_backend_pid(). 2. "lldb -p PID" from a shell to attach to the process, then "cont" to let it continue running. 3. Run the query in that backend and wait for the SIGKILL. 4. In the lldb session, type "bt". It'll only make sense if your PostgreSQL build has debug symbols, but let's see.
Re: Postgres 12.1 : UPPER() in WHERE clause restarts server
On Sun, Feb 9, 2020 at 11:46 AM Tom Lane wrote: > "Nick Renders" writes: > > When we do the following statement: > > SELECT * FROM f_gsxws_schedule WHERE UPPER(gwsc_dossier) = 'TEST' > > the Postgres service restarts. > > Hm. > > > Here is what is logged: > > 2020-02-08 20:21:19.942 CET [83892] LOG: server process (PID 85456) was > > terminated by signal 9: Killed: 9 > > 2020-02-08 20:21:19.942 CET [83892] DETAIL: Failed process was running: > > SELECT * FROM f_gsxws_schedule WHERE UPPER(gwsc_dossier) = 'TEST' > > That's mighty interesting, because signal 9 is an external SIGKILL, > not an internal-to-Postgres software fault. > > If you were running on Linux I would hypothesize that your process > was getting killed by the infamous OOM killer, in which case we could > guess that for some reason this query is consuming an unreasonable > amount of memory and thereby attracting the wrath of the OOM killer. > However, I'm not aware that any such mechanism exists on macOS. macOS's thing like that appears as "kernel[0]: memorystatus_thread: idle exiting pid XXX [some program]" in system.log, which seems like a bit of an understatement to me but that's what they call it. Some details here: http://newosxbook.com/articles/MemoryPressure.html Nick, did you see that?
Re: Unable to startup postgres: Could not read from file "pg_clog/00EC"
On Fri, Feb 7, 2020 at 1:47 AM Nick Renders wrote: > Thank you for the feedback, Alvaro. > > Unfortunately, the database is no longer "dumpable". We were able to do > a pg_dump yesterday morning (12 hours after the crash + purging the > pg_clog) but if we try one now, we get the following error: > > unexpected chunk number 1 (expected 0) for toast value 8282331 in > pg_toast_38651 > > Looking at our data, there seem to be 6 tables that have corrupt > records. Doing a SELECT * for one of those records, will return a > similar error: > > missing chunk number 0 for toast value 8288522 in pg_toast_5572299 > > > What is the best way to go from here? Is tracking down these corrupt > records and deleting them the best / only solution? > Is there a way to determine of there are issues with new data (after the > crash)? > > Any help and advice is very much appreciated. This error indicates that the file did exist already, it was just shorter than we expected: 2020-02-04 15:20:44 CET DETAIL: Could not read from file "pg_clog/00EC" at offset 106496: Undefined error: 0. What was the length of the file before you overwrote it? Are there 00EB and 00ED files, and if so what size? When your server rebooted, did crash recovery run or had it shut down cleanly? Do you know if the machine lost power, or the kernel crashed, or if it was a normal reboot? What are your settings for "fsync" and "wal_sync_method"? What is the output of pg_controldata -D pgdata? I wonder if that part of the clog file was supposed to be created before the checkpoint (ie the checkpoint is somehow borked), or was supposed to be created during recovery after that checkpoint (something else is borked, but I don't know what), or if the xid is somehow corrupted. Here's a dirty trick that might help rescue some data. Assuming you have a copy of the original file before you zeroed it, you could write a 256kb file full of 0x55 (that's 01010101 and represents 4 commits, so if you fill the file up with that it means 'all transactions in this range committed', which is probably closer to the truth than all zeroes), and then copy the original shorter file over the top of it, so that at least the range of transactions represented by the earlier part of the file that did make it to disk are preserved, and we have just have bogus force-everything-to-look-committed data after that. But as Alvaro said, this is a pretty bad situation, this is key meta data used to interpret all other data files, so all bets are off here, this is restore-from-backups territory.
Re: ERROR: too many dynamic shared memory segments
On Fri, Jan 31, 2020 at 11:05 PM Nicola Contu wrote: > Do you still recommend to increase max_conn? Yes, as a workaround of last resort. The best thing would be to figure out why you are hitting the segment limit, and see if there is something we could tune to fix that. If you EXPLAIN your queries, do you see plans that have a lot of "Gather" nodes in them, perhaps involving many partitions? Or are you running a lot of parallel queries at the same time? Or are you running queries that do very, very large parallel hash joins? Or something else?
Re: ERROR: too many dynamic shared memory segments
On Thu, Jan 30, 2020 at 12:26 AM Thomas Munro wrote: > On Wed, Jan 29, 2020 at 11:24 PM Julian Backes wrote: > > we only had the "too many shared too many dynamic shared memory segments" > > error but no segmentation faults. The error started occurring after > > upgrading from postgres 10 to postgres 12 (server has 24 cores / 48 > > threads, i.e. many parallel workers). The error itself was not that much of > > a problem but /dev/shm started filling up with orphaned files which > > probably (?) had not been cleaned up by postgres after the parallel workers > > died. In consequence, after some time, /dev/shm was full and everything > > crashed. > > Oh, thanks for the report. I think see what was happening there, and > it's a third independent problem. The code in dsm_create() does > DSM_OP_DESTROY (ie cleans up) in the DSM_CREATE_NULL_IF_MAXSEGMENTS > case, but in the case where you see "ERROR: too many dynamic shared > memory segments" it completely fails to clean up after itself. I can > reproduce that here. That's a terrible bug, and has been sitting in > the tree for 5 years. I committed a fix for that. It'll be in the new releases that due out in a couple of weeks. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=93745f1e019543fe7b742d0c5e971aad8d08fd56 > > Unfortunately, the only "solution" we found so far was to increase max > > connections from 100 to 1000. After that (about 2 months ago I think), the > > error had gone. > > I'll take that as a vote for increasing the number of slots. I committed something to do this for 13 (due out end of year), but I'm not game to back-patch it to the release branches. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d061ea21fc1cc1c657bb5c742f5c4a1564e82ee2
Re: ERROR: too many dynamic shared memory segments
On Wed, Jan 29, 2020 at 11:53 PM Thomas Munro wrote: > On Wed, Jan 29, 2020 at 10:37 PM Nicola Contu wrote: > > This is the error on postgres log of the segmentation fault : > > > > 2020-01-21 14:20:29 GMT [] [4]: [108-1] db=,user= LOG: server process > > (PID 2042) was terminated by signal 11: Segmentation fault > > 2020-01-21 14:20:29 GMT [] [4]: [109-1] db=,user= DETAIL: Failed > > process was running: select pid from pg_stat_activity where query ilike > > 'REFRESH MATERIALIZED VIEW CONCURRENTLY matview_vrs_request_stats' > > 2020-01-21 14:20:29 GMT [] [4]: [110-1] db=,user= LOG: terminating any > > other active server processes > That gave me an idea... I hacked my copy of PostgreSQL to flip a coin > to decide whether to pretend there are no slots free (see below), and > I managed to make it crash in the regression tests when doing a > parallel index build. It's late here now, but I'll look into that > tomorrow. It's possible that the parallel index code needs to learn > to cope with that. Hi Nicola, Without more information I can't know if I found the same bug you experienced, but I think it's likely. I have committed a fix for that, which will be available in the next release (mid-February). https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=74618e77b43cfce670b4725d5b9a300a2afd12d1
Re: ERROR: too many dynamic shared memory segments
On Wed, Jan 29, 2020 at 10:37 PM Nicola Contu wrote: > This is the error on postgres log of the segmentation fault : > > 2020-01-21 14:20:29 GMT [] [4]: [108-1] db=,user= LOG: server process > (PID 2042) was terminated by signal 11: Segmentation fault > 2020-01-21 14:20:29 GMT [] [4]: [109-1] db=,user= DETAIL: Failed process > was running: select pid from pg_stat_activity where query ilike 'REFRESH > MATERIALIZED VIEW CONCURRENTLY matview_vrs_request_stats' > 2020-01-21 14:20:29 GMT [] [4]: [110-1] db=,user= LOG: terminating any > other active server processes Ok, this is a bug. Do you happen to have a core file? I don't recall where CentOS puts them. > > If you're on Linux, you can probably see them with "ls /dev/shm". > > I see a lot of files there, and doing a cat they are empty. What can I do > with them? Not much, but it tells you approximately how many 'slots' are in use at a given time (ie because of currently running parallel queries), if they were created since PostgreSQL started up (if they're older ones they could have leaked from a crashed server, but we try to avoid that by trying to clean them up when you restart). > Those are two different problems I guess, but they are related because right > before the Segmentation Fault I see a lot of shared segment errors in the > postgres log. That gave me an idea... I hacked my copy of PostgreSQL to flip a coin to decide whether to pretend there are no slots free (see below), and I managed to make it crash in the regression tests when doing a parallel index build. It's late here now, but I'll look into that tomorrow. It's possible that the parallel index code needs to learn to cope with that. #2 0x00a096f6 in SharedFileSetInit (fileset=0x80b2fe14c, seg=0x0) at sharedfileset.c:71 #3 0x00c72440 in tuplesort_initialize_shared (shared=0x80b2fe140, nWorkers=2, seg=0x0) at tuplesort.c:4341 #4 0x005ab405 in _bt_begin_parallel (buildstate=0x7fffc070, isconcurrent=false, request=1) at nbtsort.c:1402 #5 0x005aa7c7 in _bt_spools_heapscan (heap=0x801ddd7e8, index=0x801dddc18, buildstate=0x7fffc070, indexInfo=0x80b2b62d0) at nbtsort.c:396 #6 0x005aa695 in btbuild (heap=0x801ddd7e8, index=0x801dddc18, indexInfo=0x80b2b62d0) at nbtsort.c:328 #7 0x00645b5c in index_build (heapRelation=0x801ddd7e8, indexRelation=0x801dddc18, indexInfo=0x80b2b62d0, isreindex=false, parallel=true) at index.c:2879 #8 0x00643e5c in index_create (heapRelation=0x801ddd7e8, indexRelationName=0x7fffc510 "pg_toast_24587_index", indexRelationId=24603, parentIndexRelid=0, I don't know if that's the bug that you're hitting, but it definitely could be: REFRESH MATERIALIZED VIEW could be rebuilding an index. === diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c index 90e0d739f8..f0b49d94ee 100644 --- a/src/backend/storage/ipc/dsm.c +++ b/src/backend/storage/ipc/dsm.c @@ -468,6 +468,13 @@ dsm_create(Size size, int flags) nitems = dsm_control->nitems; for (i = 0; i < nitems; ++i) { + /* BEGIN HACK */ + if (random() % 10 > 5) + { + nitems = dsm_control->maxitems; + break; + } + /* END HACK */ if (dsm_control->item[i].refcnt == 0) { dsm_control->item[i].handle = seg->handle;
Re: ERROR: too many dynamic shared memory segments
On Wed, Jan 22, 2020 at 4:06 AM Nicola Contu wrote: > after a few months, we started having this issue again. > So we revert the work_mem parameter to 600MB instead of 2GB. > But the issue is still there. A query went to segmentation fault, the DB went > to recovery mode and our app went to read only for a few minutes. Hi Nicola, Hmm, a segmentation fault sounds like a different problem. Can you please share the exact error messages from PostgreSQL and OS logs? > I understand we can increase max_connections so we can have many more > segments. > > My question is : is there a way to understand the number of segments we > reached? If you're on Linux, you can probably see them with "ls /dev/shm". > Currently we have 220 max_conn so as your formula is 64 + 2* 220 we have > about 500 shared segments. > We would like to increase that number to 300 or 400 but would be great to > understand if there is a way to make sure we will solve the issue as it > requires a restart of the service. > > I know you were also talking about a redesign this part in PostgreSQL. Do you > know if anything has changed in any of the newer versions after 11.5? It's possible that we should increase a couple of constants used the formula -- I'll look into that again. But first I'd like to see if we're even investigating the right problem here.