Re: [HACKERS] libpq and psql not on same page about SIGPIPE
Tom Lane wrote: Not really: it only solves the problem *if you change the application*, which is IMHO not acceptable. In particular, why should a non-threaded app expect to have to change to deal with this issue? But we can't safely build a thread-safe libpq.so for general use if it breaks non-threaded apps that haven't been changed. No. non-threaded apps do not need to change. The default is the old, 7.3 code: change the signal handler around the write calls. Which means that non-threaded apps are guaranteed to work without any changes, regardless of the libpq thread safety setting. Threaded apps would have to change, but how many threaded apps use libpq? They check their code anyway - either just add PQinitLib() or review (and potentialy update) their signal handling code if it match any of the gotchas of the transparent handling. -- Manfred -- Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] libpq and psql not on same page about SIGPIPE
Bruce Momjian wrote: Comments? This seems like our only solution. This would be a transparent solution. Another approach would be: - Use the old 7.3 approach by default. This means perfect backward compatibility for single-threaded apps and broken multithreaded apps. - Add a new PQinitDB(int disableSigpipeHandler) initialization function. Document that multithreaded apps must call the function with disableSigpipeHandle=1 and handle SIGPIPE for libpq. Perhaps with a reference implementation in libpq (i.e. a sigpipeMode with 0 for old approach, 1 for do nothing, 2 for install our own handler). It would prefer that approach: It means that the multithreaded libpq apps must be updated [are there any?], but the solution is simpler and less fragile than calling 4 signal handling function in a row to selectively block SIGPIPE per-thread. -- Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] libpq and psql not on same page about SIGPIPE
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: His idea of pthread_sigmask/send/sigpending/sigwait/restore-mask. Seems we could also check errno for SIGPIPE rather than calling sigpending. He has a concern about an application that already blocked SIGPIPE and has a pending SIGPIPE signal waiting already. One idea would be to check for sigpending() before the send() and clear the signal only if SIGPIPE wasn't pending before the call. I realize that if our send() also generates a SIGPIPE it would remove the previous realtime signal info but that seems a minor problem. Supposing that we don't touch the signal handler at all, then it is possible that the application has set it to SIG_IGN, in which case a SIGPIPE would be discarded rather than going into the pending mask. So I think the logic has to be: pthread_sigmask to block SIGPIPE and save existing signal mask send(); if (errno == EPIPE) { if (sigpending indicates SIGPIPE pending) use sigwait to clear SIGPIPE; } pthread_sigmask to restore prior signal mask The only case that is problematic is where the application had already blocked SIGPIPE and there is a pending SIGPIPE signal when we are entered, *and* we get SIGPIPE ourselves. If the C library does not support queued signals then our sigwait will clear both our own EPIPE and the pending signal. This is annoying but it doesn't seem fatal --- if the app is writing on a closed pipe then it'll probably try it again later and get the signal again. If the C library does support queued signals then we will read the existing SIGPIPE condition and leave our own signal in the queue. This is no problem to the extent that one pending SIGPIPE looks just like another --- does anyone know of platforms where there is additional info carried by a SIGPIPE event? Linux stores pid/uid together with the signal. pid doesn't matter and no sane programmer will look at the uid, so it seems to be possible. This seems workable as long as we document the possible gotchas. Is that really worthwhile? There are half a dozend assumption about the C library and kernel internal efficiency of the signal handling functions in the proposal. Adding a PQinitLib function is obviously a larger change, but it solves the problem. I'm aware of one minor gotcha: PQinSend() is not usable right now: it relies on the initialization of pq_thread_in_send, which is only created in the middle of the first connectDB(). That makes proper signal handling for the first connection impossible. -- Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Why frequently updated tables are an issue
[EMAIL PROTECTED] wrote a few months ago: PostgreSQL's behavior on these cases is poor. I don't think anyone who has tried to use PG for this sort of thing will disagree, and yes it is getting better. Does anyone else consider this to be a problem? If so, I'm open for suggestions on what can be done. I've suggested a number of things, and admittedly they have all been pretty weak ideas, but they were potentially workable. What about a dblink style interface to a non-MVCC SQL database? I think someone on this list mentioned that there are open source in-memory SQL databases. -- Manfred ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] tweaking MemSet() performance - 7.4.5
[EMAIL PROTECTED] wrote: If the memset bypasses the cache then the following access will cause a cache line miss, which can be so slow that using the faster memset can result in a net performance loss. Could you suggest some structs to test? If I get your meaning, I would make a loop that sets then reads from the structure. Read the sources and the cpu specs. Benchmarking such problems is virtually impossible. I don't have OS-X, thus I checked the Linux-kernel sources: It seems that the power architecture doesn't have the same problem as x86. There is a special clear cacheline instruction for large memsets and the rest is done through carefully optimized store byte/halfword/word/double word sequences. Thus I'd check what happens if you memset not perfectly aligned buffers. That's another point where over-optimized functions sometimes break down. If there is no slowdown, then I'd replace the postgres function with the OS provided function. I'd add some __builtin_constant_p() optimizations, but I guess Tom won't like gcc hacks ;-) -- Manfred ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] tweaking MemSet() performance - 7.4.5
Marc Colosimo wrote: Oops, I used the same setting as in the old hacking message (-O2, gcc 3.3). If I understand what you are saying, then it turns out yes, PG's MemSet is faster for smaller blocksizes (see below, between 32 and 64). I just replaced the whole MemSet with memset and it is not very low when I profile. Could you check what the OS-X memset function does internally? One trick to speed up memset it to bypass the cache and bulk-write directly from write buffers to main memory. i386 cpus support that and in microbenchmarks it's 3 times faster (or something like that). Unfortunately it's a loss in real-world tests: Typically a structure is initialized with memset and then immediately accessed. If the memset bypasses the cache then the following access will cause a cache line miss, which can be so slow that using the faster memset can result in a net performance loss. I could squeeze more out of it if I spent more time trying to understand it (change MEMSET_LOOP_LIMIT to 32 and then add memset after that?). I'm now working one understanding Spin Locks and friends. Putting in a sync call (in s_lock.h) is really a time killer and bad for performance (it takes up 35 cycles). That's the price you pay for weakly ordered memory access. Linux on ppc uses eieio, on ppc64 lwsync is used. Could you check if they are faster? -- Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] futex
Josh Berkus wrote: Gaetano, I knew there was an evaluation on the futex vs spinlock, and Josh Berkus on IRC told me that there was only a 20% performance increase, is this increase to throw away ? Before we get totally off track here I evaluated futexes strictly as an attempt to solve the context switch storm bug. I did NOT test whether they improved performance overall. What did you test exactly and could you explain a bit about the context switch storm? Did you use the futex interface directly or pthread_rwlock_rdlock? -- Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] fsync and hardware write cache
[EMAIL PROTECTED] wrote: Something to think about: if you run PostgreSQL with fsync on, but you use the hardware write cache on your disk drives, how likely are you to lose data? Obviously, this is a fairly limited problem, as it only applies to power down (which you can control) or power loss where the risks may be reduced but not eliminated with a UPS. Does it make sense to add a platform specific call that will flush a write cache when fsync is enable? Pete Zaitsev from mysql wrote that there is a special call on Mac OS: Quoting him: Mac OS X also has this optimization, but at least it provides an alternative flush method for Database Servers: fcntl(fd, F_FULLFSYNC, NULL) can be used instead of fsync() to get true fsync() behavior. I couldn't confirm this with a quick google search - perhaps someone with MacOS docs (or mysql sources) should check it. What might be useful is a test tool that benchmarks fsync: if it's faster than the rotational speed of a 15k rpm disk then probably someone caches the write calls. -- Manfred ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] NOT LOGGED options (was Point in Time Recovery )
[EMAIL PROTECTED] wrote: Tom Lane wrote NOT LOGGED options on CREATE INDEX and COPY, to allow users to take advantage of the no logging optimization without turning off PITR system wide. (Just as this is possible in Oracle and Teradata). Isn't this in direct conflict with your opinion above? And I cannot say that I think this one is a good idea. We do not have support for selective catalog xlogging; Is it possible to skip the xlog fsync for NOT LOGGED transactions? -- Manfred ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] hot spare / log shipping work on
Gaetano Mendola wrote: a1) If exist check that is a 16MB file ( the request can ~arrive during the copy ), I think this will fail under windows: copy first sets the file size and then transfers the data. I wouldn't rule out that some Unices use the same implementation. ~a2) If the file not exist this mean that is not yet recycled and ~is a partial file present on the partial directory, ~check if the alive file is older then 2 minutes. ~ a21) If the file is older than 2 minutes I assume that ~the master is dead: I'd concentrate on cold failover: the user (or the OS) must call a script to cause a fail-over. The tricky thing are the various partial connection losses between master and spare: perhaps the alive file is not updated anymore due to a net split, but the master is still alive. Unless you are really careful both master and spare could run. I think SAP DB / MaxDB supports failover - perhaps it would be interesting to check their failover scripts. -- Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] fsync vs open_sync
[EMAIL PROTECTED] wrote: I have been considering a full sweep in my test lab off client time later on. ext2, ext3, jfs, xfs, and ReiserFS, fsync on with fdatasync or open_sync, and fsync off. Before you start: double check that the disks are not lying: At least the suse 2.4 kernel send cache flush commands to ide disks on fsync(), but not with O_SYNC: http://marc.theaimsgroup.com/?l=linux-kernelm=107964507113585 -- Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] switch WAL segment
Andreas Pflug wrote: Tom Lane wrote: Do we have a TODO for allowing users to force switching to a new WAL file segment? Together with PITR, this might make sense? Another idea: Has anyone tried to put the WAL segment directory on a cluster filesystem and use that for cold (perhaps even hot) failover? The archive script could apply completed wal segments to the backup node. If the primary node fails, the last (partial) segment is applied as well and the backup node is activated. -- Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] fsync vs open_sync
Tom Lane wrote: [EMAIL PROTECTED] writes: The improvements were REALLY astounding, and I would like to know if other Linux users see this performance increase, I mean, it is almost 8~10 times faster than using fsync. Furthermore, it seems to also have the added benefit of reducing the I/O storm at checkpoints over a system running with fsync off. What size transactions are you using in your tests? For a system with small transactions (not much more than 1 page worth of WAL traffic per transaction) I'd be pretty surprised if there was any real difference at all. There certainly should not be any difference in terms of the number of physical writes. We have seen some platforms where fsync() is inefficiently implemented and requires more kernel overhead than is reasonable --- not for I/O, but just to look through the kernel buffers and confirm that none of them need flushing. But I didn't think Linux was one of these. IDE or scsi? If IDE: Write cache on or off? Which 2.4 kernel? The numbers are very high - it could be a side effect of write caching by the disks. I think some Suse 2.4 kernels have partial support for reliable fsync even if the write cache is on (i.e. fsync issues a cache flush command to the disk), but not all code paths are handled. Perhaps fsync is handled and O_SYNC is not handled. I could try to find the details. -- Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] xeon processors
Christopher Browne wrote: The fix for this problem is to rewrite all of your applications so that they become conscious of which bits of memory they're using so they can tune their own behaviour. This, of course, requires discarding useful notions such as virtual memory that are _assumed_ by most modern operating systems. This is misleading: PAE means that a 32-bit cpu can have more that 4 GB physical memory. Each process can map at most 4 (in reality: ~2) GB memory. Many databases manage their own, huge buffer pool and read/write the database tables with O_DIRECT. These apps must support buffer pools 2 GB, which requires some work. Linux and Solaris contain a special syscall that helps Oracle to manage it's buffer pool for such setups (remap_page_rage()). OTHO postgres has a small user space buffer pool, the majority of the file buffers are handled by OS. Thus no changes are required inside postgres for PAE, all it needs is an OS that support PAE for the buffer pool. Regarding hyperthreading: I'm aware of two changes: - busy loops must contain PAUSE instructions. Postgres does that. - virtual aliases should be avoided: If two processes access memory at the same virtual address, then this can cause cache collisions and then misses. I think this is handled by the C library by randomizing the return addresses of malloc() and Intel mitigated the issue by improving the cache. -- Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] Compiling libpq with VisualC
[EMAIL PROTECTED] wrote: What is the recommended way to create mutex objects (CreateMutex) from Win32 libraries? There must be a clean way like there is in pthreads. A mutex is inherently a global object. CreateMutex(NULL, FALSE, NULL) will return a handle to an unowned mutex. That's not the problem. Under pthread, it's possible to initialize a mutex from compile time: static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER; This means that the mutex is immediately valid, no races with the initialization. I couldn't find an equivalent Win32 feature. -- Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Table Spaces
Bruce Momjian wrote: The only downside to removal is that folks without symlinks (I believe Win32 only) will loose that functionality with nothing to replace it. However, I think the clarity of removing it is worth it. Also, I think someone had a special way to do symlinks on Win32 and we should look into that. Windows 2000 and later support mount points - you can attach a new partition as C:\pgsql\data\xlog instead of D:\. That might be enough for most users. IIRC there was a tool to create arbitrary links, but it was removed just before W2K final. -- Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Linux 2.6.6 also
Gregory Stark wrote: This patch also looks relevant to Postgres for two reasons. This part seems like it might expose some bugs that otherwise might have remained hidden: This affects I/O scheduling potentially quite significantly. It is no longer the case that the kernel will submit pages for I/O in the order in which the application dirtied them. We instead submit them in file-offset order all the time. The part about part-file fdatasync calls seems like could be really useful. It seems like that's just speculation about future directions though? Correct. The kernel could do that now, but it's not exposed to user space. But the change highlights one point: the order in which file blocks are written to disk is undefined. Theoretically the wal checkpoint record could be on the platter, but the preceeding pages were not written. Is that case handled by the wal replay code? -- Manfred ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Flush to Disk
Diego Montenegro wrote: Hello all, Can anyone point me to where in the code does Postgres Flush all the Data to disk??? When XLogFlush is called, it only flushes the XLOG to disk, right? Does the entire Data get flushed at the same time as the Log? in src/backend/storage/smgr/md.c, mdsync(): During a checkpoint, the whole system cache is synced to the disk. Note that checkpoints should be rare - I think every few minutes. The xlog contains enough data to recover a transaction after a system crash, therefore only the xlog is forced to the disk during transaction commit. -- Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [HACKERS] fsync method checking
[EMAIL PROTECTED] wrote: Compare file sync methods with one 8k write: (o_dsync unavailable) open o_sync, write 6.270724 write, fdatasync13.275225 write, fsync, 13.359847 Odd. Which filesystem, which kernel? It seems fdatasync is broken and syncs the inode, too. -- Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [HACKERS] fsync method checking
Tom Lane wrote: [EMAIL PROTECTED] writes: I could certainly do some testing if you want to see how DBT-2 does. Just tell me what to do. ;) Just do some runs that are identical except for the wal_sync_method setting. Note that this should not have any impact on SELECT performance, only insert/update/delete performance. I've made a test run that compares fsync and fdatasync: The performance was identical: - with fdatasync: http://khack.osdl.org/stp/290607/ - with fsync: http://khack.osdl.org/stp/290483/ I don't understand why. Mark - is there a battery backed write cache in the raid controller, or something similar that might skew the results? The test generates quite a lot of wal traffic - around 1.5 MB/sec. Perhaps the writes are so large that the added overhead of syncing the inode is not noticable? Is the pg_xlog directory on a seperate drive? Btw, it's possible to request such tests through the web-interface, see http://www.osdl.org/lab_activities/kernel_testing/stp/script_param.html -- Manfred ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Why O_SYNC is faster than fsync on ext3
Yusuf Goolamabbas wrote: I sent this to Bruce but forgot to cc pgsql-hackers, The patches are likely to go into 2.6.6. People interested in extremely safe fsync writes should also follow the IDE barrier thread and the true fsync() in Linux on IDE thread Actually the most interesting part of the thread was the initial post from Peter Zaitsev on a fcntl(fd, F_FULLSYNC, NULL): He wrote that this is necessary for Mac OS X to force a flush of the write caches in the disks. Unfortunately I can't find anything about this flag with google. Another interesting point is that right now, ide write caches must be disabled for reliable fsync operations with Linux. Recent suse kernels contain partial support. If the existing patches are completed and merged, it will be safe to enable write caching. Perhaps Bruce's cache flush test could be modified slightly to check that the OS isn't lying about fsync: if fsync is faster than the rotational delay of the disks, then the setup is not suitable for postgres. This could be recommended as a setup test in the install document. -- Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] WAL write of full pages
Marty Scholes wrote: 2. Put them on an actual (or mirrored actual) spindle Pros: * Keeps WAL and data file I/O separate Cons: * All of the non array drives are still slower than the array Are you sure this is a problem? The dbt-2 benchmarks from osdl run on an 8-way Intel computer with several raid arrays distributed to 40 disks. IIRC it generates around 1.5 MB wal logs per second - well withing the capability of a single drive. My laptop can write around 10 MB/sec (measured with dd if=/dev/zero of=fill and vmstat), fast drives should be above 20 MB/sec. How much wal data is generated by large postgres setups? Are there any setups that are limited by the wal logs. -- Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] libpq thread safety
Bruce Momjian wrote: Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches I will try to apply it within the next 48 hours. You are too fast: the patch was a proof of concept, not really tested (actually quite buggy). Attached are two patches: - ready-sigpipe: check_sigpipe_handler skips pthread_create_key if a signal handler was installed. This is wrong - the key is always required. - ready-locking: locking around kerberos and openssl. The patches pass the regression tests on i386 linux. Kerberos is untested, ssl only partially tested due to the lack of a test setup. I'm still not sure if the new code is the right thing for the openssl initialization: libpq calls SSL_library_init() unconditionally. If the calling app uses ssl, too, this might confuse openssl. Could you replace my initial proposal with these two patches? Btw, is it intentional that THREAD_SUPPORT is not set in src/template/linux? -- Manfred Index: src/backend/libpq/md5.c === RCS file: /projects/cvsroot/pgsql-server/src/backend/libpq/md5.c,v retrieving revision 1.22 diff -c -r1.22 md5.c *** src/backend/libpq/md5.c 29 Nov 2003 19:51:49 - 1.22 --- src/backend/libpq/md5.c 14 Mar 2004 10:46:54 - *** *** 271,277 static void bytesToHex(uint8 b[16], char *s) { ! static char *hex = 0123456789abcdef; int q, w; --- 271,277 static void bytesToHex(uint8 b[16], char *s) { ! static const char *hex = 0123456789abcdef; int q, w; Index: src/interfaces/libpq/fe-auth.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-auth.c,v retrieving revision 1.89 diff -c -r1.89 fe-auth.c *** src/interfaces/libpq/fe-auth.c 7 Jan 2004 18:56:29 - 1.89 --- src/interfaces/libpq/fe-auth.c 14 Mar 2004 10:46:55 - *** *** 590,595 --- 590,596 case AUTH_REQ_KRB4: #ifdef KRB4 + pglock_thread(); if (pg_krb4_sendauth(PQerrormsg, conn-sock, (struct sockaddr_in *) conn-laddr.addr, (struct sockaddr_in *) conn-raddr.addr, *** *** 597,604 --- 598,607 { snprintf(PQerrormsg, PQERRORMSG_LENGTH, libpq_gettext(Kerberos 4 authentication failed\n)); + pgunlock_thread(); return STATUS_ERROR; } + pgunlock_thread(); break; #else snprintf(PQerrormsg, PQERRORMSG_LENGTH, *** *** 608,620 --- 611,626 case AUTH_REQ_KRB5: #ifdef KRB5 + pglock_thread(); if (pg_krb5_sendauth(PQerrormsg, conn-sock, hostname) != STATUS_OK) { snprintf(PQerrormsg, PQERRORMSG_LENGTH, libpq_gettext(Kerberos 5 authentication failed\n)); + pgunlock_thread(); return STATUS_ERROR; } + pgunlock_thread(); break; #else snprintf(PQerrormsg, PQERRORMSG_LENGTH, *** *** 722,727 --- 728,734 if (authsvc == 0) return NULL;/* leave original error message in place */ + pglock_thread(); #ifdef KRB4 if (authsvc == STARTUP_KRB4_MSG) name = pg_krb4_authname(PQerrormsg); *** *** 759,763 --- 766,771 if (name (authn = (char *) malloc(strlen(name) + 1))) strcpy(authn, name); + pgunlock_thread(); return authn; } Index: src/interfaces/libpq/fe-connect.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-connect.c,v retrieving revision 1.268 diff -c -r1.268 fe-connect.c *** src/interfaces/libpq/fe-connect.c 10 Mar 2004 21:12:47 - 1.268 --- src/interfaces/libpq/fe-connect.c 14 Mar 2004 10:46:56 - *** *** 2902,2908 PQsetClientEncoding(PGconn *conn, const char *encoding) { charqbuf[128]; ! static char query[] = set client_encoding to '%s'; PGresult *res; int status; --- 2902,2908
Re: [HACKERS] libpq thread safety
Bruce Momjian wrote: How can we test if libpq needs to call that? Seems that is an issue whether we are threaded or not, no? I think it's always an issue: in the non-threaded case, it's just not fatal. At least some openssl init functions are protected with if (done) return; done = 1;, and it the worst case, it's a memory leak. With threaded apps, it might corrupt a concurrent ssl transaction. Perhaps PQenableSSLLocks could handle that case, too - a special flag for skip SSL_library_init(). There is a new test program in src/tools/thread that needs to be run for every platform for 7.5. We can't use the 7.4.X tests because it didn't report individual function tests, just one general value. We need individual test reports for 7.5. Run the test program and post the results and I will get it updated. The test output on my bsd/os machine is: RedHat Fedora Core 1 and Debian 3.0 both report Make sure you have added any needed 'THREAD_CPPFLAGS' and 'THREAD_LIBS' defines to your template/$port file before compiling this program. Add this to your template/$port file: STRERROR_THREADSAFE=yes GETPWUID_THREADSAFE=no GETHOSTBYNAME_THREADSAFE=no The uname's are Linux snip 2.4.25-1-686 #1 Tue Feb 24 10:55:59 EST 2004 i686 unknown unknown GNU/Linux and Linux ab 2.4.22-1.2174.nptl #1 Wed Feb 18 16:38:32 EST 2004 i686 i686 i386 GNU/Linux Both glibc 2.3.2, one with nptl, one with linuxthreads as the pthread library. -- Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Log rotation
Bruce Momjian wrote: Which basically shows one fsync, no O_SYNC's, and setting of the flag only for klog reads. Which sysklogd do you look at? The version from RedHat 9 contains this block: /* * Crack a configuration file line */ void cfline(line, f) char *line; register struct filed *f; { register char *p; [snip] if (*p == '-') { syncfile = 0; p++; } else syncfile = 1; [snip] if (syncfile) f-f_flags |= SYNC_FILE; And the the fsync depends on SYNC_FILE. As documented in man syslog.conf: You may prefix each entry with the minus ``-'' sign to omit syncing the file after every logging. Note that you might lose information if the system crashes right behind a write attempt. Nevertheless this might give you back some performance, especially if you run programs that use logging in a very verbose manner. It's sysklogd-1.4.1rh, I'm not sure what part of it are Redhat specific. -- Manfred ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] libpq thread safety
Bruce Momjian wrote: What killed the idea of doing ssl or kerberos locking inside libpq was that there was no way to be sure that outside code didn't also access those routines. A callback based implementation can handle that: libpq has a default implementation for apps that do not use openssl or kerberos themself. If the app wants to use the libraries, too, then it must replace the hooks with their own locks. I've attached a simple proposal, just for kerberos 4. If you agree on the general approach, I'll add it to all functions that are not thread safe. I have documented that SSL and Kerberos are not thread-safe in the libpq docs. Let's wait and see If we need additional work in this area. It means that multithreading is not usable: As Tom explained, the connect string is often set directly by the end user. Setting sslmode would result is races - impossible to support. In the very least, sslmode and Kerberos would have to fail if the app is multithreaded. -- Manfred Index: src/interfaces/libpq/fe-auth.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-auth.c,v retrieving revision 1.89 diff -u -r1.89 fe-auth.c --- src/interfaces/libpq/fe-auth.c 7 Jan 2004 18:56:29 - 1.89 +++ src/interfaces/libpq/fe-auth.c 12 Mar 2004 20:07:02 - @@ -590,6 +590,7 @@ case AUTH_REQ_KRB4: #ifdef KRB4 + pglock_thread(); if (pg_krb4_sendauth(PQerrormsg, conn-sock, (struct sockaddr_in *) conn-laddr.addr, (struct sockaddr_in *) conn-raddr.addr, @@ -597,8 +598,10 @@ { snprintf(PQerrormsg, PQERRORMSG_LENGTH, libpq_gettext(Kerberos 4 authentication failed\n)); + pgunlock_thread(); return STATUS_ERROR; } + pgunlock_thread(); break; #else snprintf(PQerrormsg, PQERRORMSG_LENGTH, @@ -722,6 +725,7 @@ if (authsvc == 0) return NULL;/* leave original error message in place */ + pglock_thread(); #ifdef KRB4 if (authsvc == STARTUP_KRB4_MSG) name = pg_krb4_authname(PQerrormsg); @@ -759,5 +763,6 @@ if (name (authn = (char *) malloc(strlen(name) + 1))) strcpy(authn, name); + pgunlock_thread(); return authn; } Index: src/interfaces/libpq/fe-connect.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-connect.c,v retrieving revision 1.268 diff -u -r1.268 fe-connect.c --- src/interfaces/libpq/fe-connect.c 10 Mar 2004 21:12:47 - 1.268 +++ src/interfaces/libpq/fe-connect.c 12 Mar 2004 20:07:03 - @@ -3163,4 +3163,34 @@ #undef LINELEN } +/* + * To keep the API consistent, the locking stubs are always provided, even + * if they are not required. + */ +pgthreadlock_t *g_threadlock; +static pgthreadlock_t default_threadlock; +static void +default_threadlock(bool acquire) +{ +#if defined(ENABLE_THREAD_SAFETY) + static pthread_mutex_t singlethread_lock = PTHREAD_MUTEX_INITIALIZER; + if (acquire) + pthread_mutex_lock(singlethread_lock); + else + pthread_mutex_unlock(singlethread_lock); +#endif +} + +pgthreadlock_t * +PQregisterThreadLock(pgthreadlock_t *newhandler) +{ + pgthreadlock_t *prev; + + prev = g_threadlock; + if (newhandler) + g_threadlock = newhandler; + else + g_threadlock = default_threadlock; + return prev; +} Index: src/interfaces/libpq/libpq-fe.h === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/libpq-fe.h,v retrieving revision 1.102 diff -u -r1.102 libpq-fe.h --- src/interfaces/libpq/libpq-fe.h 9 Jan 2004 02:02:43 - 1.102 +++ src/interfaces/libpq/libpq-fe.h 12 Mar 2004 20:07:03 - @@ -274,6 +274,22 @@ PQnoticeProcessor proc, void *arg); +typedef void (pgsigpipehandler_t)(bool enable, void **state); + +extern pgsigpipehandler_t * +PQregisterSigpipeCallback(pgsigpipehandler_t *newhandler); + +/* + * Used to set callback that prevents concurrent access to + * non-thread safe functions that libpq needs. + * The default implementation uses a libpq internal mutex. + * Only required for multithreaded apps that use kerberos + * both within their app and for postgresql connections. + */ +typedef void (pgthreadlock_t)(bool acquire); + +extern pgthreadlock_t *
Re: [HACKERS] friday 13 bug?
zohn_ming wu wrote: swap_free: Bad swap file entry 0004 Do you use ECC memory, is ECC enabled in the BIOS [and does it work - some vendors lie about ECC support]? I would bet that it's a soft memory error: means not used. One bit differs, and the kernel complains about the invalid value. I think the following oops is a side effect of the bad swap entry. Do you have timestaps in the system log? Is the swap error just before the BUG in buffer.c? -- Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] libpq thread safety
Bruce Momjian wrote: However, we really have two types of function tested. The first, strerror, can be thread safe by using thread-local storage _or_ by returning pointers to static strings. The other two function tests require thread-local storage to be thread-safe. You are completely ignoring that libpq is a library: what if the app itself wants to call gethostbyname or stderror, too? Right now libpq has it's own private mutex. This doesn't work - the locking must be process-wide. The current implementation could be the default, and apps that want to use gethostbyname [or kerberos authentication, etc.] outside libpq must fill in appropriate callbacks. -- Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Mixing threaded and non-threaded
Bruce Momjian wrote: Woh, as far as I know, any application should run fine with -lpthread, threaded or not. What OS are you on? This is the first I have heard of this problem. Perhaps we should try to figure out how other packages handle multithreaded/singlethreaded libraries? I'm looking at openssl right now, and openssl never links against libpthread: The caller is responsible for registering the locking primitives. -- Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Disaster!
Greg Stark wrote: I do know that AFS returns quota failures on close. This was unusual enough that when AFS was deployed at school unix tools failed left and right over precisely this issue. Though it mostly just meant they returned the wrong exit status. That means open(); write(); sync(); could succeed, but the data is not stored on disk, correct? -- Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] LWLock/ShmemIndex startup question
Tom Lane wrote: Claudio Natoli [EMAIL PROTECTED] writes: Or, maybe we'll just use the tas() implementation that already exists for __i386__/__x86_64__ in s_lock.h. How did I miss that? Move along. Nothing to see here. Actually, I was expecting you to complain that the s_lock.h coding is gcc-specific. Which compilers do we need to support on Windows? I think intel's compiler supports the gcc syntax. At least the Linux version can compile the Linux kernel. MSVC has it's own syntax that is very primitive, and AFAIK not supported by the 64-bit windows versions. The AMD64 version definitively doesn't support inline assembly at all. What are the chances for Win64 support? sizeof(unsigned long) remains 4, sizeof(void*) is 8. -- Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] LWLock/ShmemIndex startup question
Tom Lane wrote: Manfred Spraul [EMAIL PROTECTED] writes: What are the chances for Win64 support? sizeof(unsigned long) remains 4, sizeof(void*) is 8. If you can tell me what type Datum should be (unsigned long long maybe?), we could probably handle that. Probably uintptr_t: That's the official C99 integer type for storing pointers. I'm not sure if it's guaranteed to be wide enough for ULONG_MAX (or only UINT_MAX). -- Manfred ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] libpq thread safety
libpq needs additional changes for complete thread safety: - openssl needs different initialization. - kerberos is not thread safe. - functions such as gethostbyname are not thread safe, and could be used by kerberos. Right now protected with a libpq specific mutex. - dito for getpwuid and stderror. openssl is trivial: just proper flags are needed for the init function. But what about kerberos: I'm a bit reluctant to add a forth mutex: what if kerberos calls gethostbyname or getpwuid internally? Usually I would use one single_thread mutex and use that mutex for all operations - races are just too difficult to debug. Any better ideas? Otherwise I'd start searching for the non-threadsafe functions and add pthread_lock around them. Actually I'm not even sure if it should be a libpq specific mutex: what if the calling app needs to access openssl or kerberos as well? Perhaps libpq should use a system similar to openssl: http://www.openssl.org/docs/crypto/threads.html -- Manfred ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] PQinSend question
From fe-secure.c: /* * Indicates whether the current thread is in send() * For use by SIGPIPE signal handlers; they should * ignore SIGPIPE when libpq is in send(). This means * that the backend has died unexpectedly. */ pqbool PQinSend(void) { #ifdef ENABLE_THREAD_SAFETY return (pthread_getspecific(thread_in_send) /* has it been set? */ *(char *)pthread_getspecific(thread_in_send) == 't') ? true : false; #else return false; /* No threading, so we can't be in send() */ Why not? Signal delivery can interrupt send() even with single-threaded users. I really like the openssl interface: what about something like typedef void (*pgsigpipehandler_t)(bool enable); void PQregisterSignalCallback(pgsigpipehandler_t new); The callback is global, and called around the send() calls. The default handler uses the sigaction code from 7.4. The current autodetection code is less flexible than a callback, and it's not 100% backward compatible. -- Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] libpq thread safety
Tom Lane wrote: Manfred Spraul [EMAIL PROTECTED] writes: But what about kerberos: I'm a bit reluctant to add a forth mutex: what if kerberos calls gethostbyname or getpwuid internally? Wouldn't help anyway, if some other part of the app also calls kerberos. That's why I've proposed to use the system from openssl: The libpq user must implement a lock callback, and libpq calls it around the critical sections. Attached is an untested prototype patch. What do you think? -- Manfred Index: src/interfaces/libpq/fe-connect.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-connect.c,v retrieving revision 1.267 diff -u -r1.267 fe-connect.c --- src/interfaces/libpq/fe-connect.c 9 Jan 2004 02:02:43 - 1.267 +++ src/interfaces/libpq/fe-connect.c 11 Jan 2004 16:54:06 - @@ -885,12 +885,6 @@ struct addrinfo hint; const char *node = NULL; int ret; -#ifdef ENABLE_THREAD_SAFETY - static pthread_once_t check_sigpipe_once = PTHREAD_ONCE_INIT; - - /* Check only on first connection request */ - pthread_once(check_sigpipe_once, check_sigpipe_handler); -#endif if (!conn) return 0; Index: src/interfaces/libpq/fe-secure.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-secure.c,v retrieving revision 1.36 diff -u -r1.36 fe-secure.c --- src/interfaces/libpq/fe-secure.c9 Jan 2004 02:17:15 - 1.36 +++ src/interfaces/libpq/fe-secure.c11 Jan 2004 16:54:07 - @@ -146,11 +146,6 @@ static SSL_CTX *SSL_context = NULL; #endif -#ifdef ENABLE_THREAD_SAFETY -static void sigpipe_handler_ignore_send(int signo); -pthread_key_t thread_in_send; -#endif - /* */ /* Hardcoded values */ /* */ @@ -212,6 +207,26 @@ /* */ /* + * Sigpipe handling. + * Dummy provided even for WIN32 to keep the API consistent + */ +pgsigpipehandler_t default_sigpipehandler; + +void default_sigpipehandler(bool enable, void **state) +{ +#ifndef WIN32 + if (enable) { + *state = (void*) pqsignal(SIGPIPE, SIG_IGN); + } else { + pqsignal(SIGPIPE, (pqsigfunc)*state); + } +#endif +} + +static pgsigpipehandler_t *g_sigpipehandler = default_sigpipehandler; + + +/* * Initialize global context */ int @@ -356,12 +371,9 @@ { ssize_t n; -#ifdef ENABLE_THREAD_SAFETY - pthread_setspecific(thread_in_send, t); -#else #ifndef WIN32 - pqsigfunc oldsighandler = pqsignal(SIGPIPE, SIG_IGN); -#endif + void *sigstate; + g_sigpipehandler(true, sigstate); #endif #ifdef USE_SSL @@ -420,12 +432,8 @@ #endif n = send(conn-sock, ptr, len, 0); -#ifdef ENABLE_THREAD_SAFETY - pthread_setspecific(thread_in_send, f); -#else #ifndef WIN32 - pqsignal(SIGPIPE, oldsighandler); -#endif + g_sigpipehandler(false, sigstate); #endif return n; @@ -1066,62 +1074,18 @@ #endif /* USE_SSL */ - -#ifdef ENABLE_THREAD_SAFETY /* - * Check SIGPIPE handler and perhaps install our own. + * PQregisterSigpipeCallback */ -void -check_sigpipe_handler(void) +pgsigpipehandler_t * +PQregisterSigpipeCallback(pgsigpipehandler_t *newhandler) { - pqsigfunc pipehandler; + pgsigpipehandler_t *prev; - /* -* If the app hasn't set a SIGPIPE handler, define our own -* that ignores SIGPIPE on libpq send() and does SIG_DFL -* for other SIGPIPE cases. -*/ - pipehandler = pqsignalinquire(SIGPIPE); - if (pipehandler == SIG_DFL) /* not set by application */ - { - /* -* Create key first because the signal handler might be called -* right after being installed. -*/ - pthread_key_create(thread_in_send, NULL); - pqsignal(SIGPIPE, sigpipe_handler_ignore_send); - } -} - -/* - * Threaded SIGPIPE signal handler - */ -void -sigpipe_handler_ignore_send(int signo) -{ - /* -* If we have gotten a SIGPIPE outside send(), exit. -* Synchronous signals are delivered to the thread -* that caused the signal. -*/ - if (!PQinSend()) - exit(128 + SIGPIPE);/* typical return value for SIG_DFL */ -} -#endif - -/* - * Indicates whether the current thread is in send() - * For use by SIGPIPE signal handlers; they should - * ignore SIGPIPE when libpq is in send(). This means - * that the backend has died unexpectedly
Re: [HACKERS] libpq thread safety
Tom Lane wrote: Personally I find diff -u format completely unreadable :-(. Send diff -c if you want useful commentary. diff -c is attached. I've removed the signal changes, they are unrelated. I'll resent them separately. -- Manfred Index: src/interfaces/libpq/libpq-fe.h === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/libpq-fe.h,v retrieving revision 1.102 diff -c -r1.102 libpq-fe.h *** src/interfaces/libpq/libpq-fe.h 9 Jan 2004 02:02:43 - 1.102 --- src/interfaces/libpq/libpq-fe.h 11 Jan 2004 17:29:38 - *** *** 458,463 --- 458,480 */ pqbool PQinSend(void); + /* === in thread.c === */ + + /* + *Used to set callback that prevents concurrent access to + *non-thread safe functions that libpq needs. + *The default implementation uses a libpq internal mutex. + *Only required for multithreaded apps on platforms that + *do not support the thread-safe equivalents and that want + *to use the functions, too. + *List of functions: + *- stderror, getpwuid, gethostbyname. + *TODO: the mutex must be used around kerberos calls, too. + */ + typedef void (pgthreadlock_t)(bool acquire); + + extern pgthreadlock_t * PQregisterThreadLock(pgthreadlock_t *newhandler); + #ifdef __cplusplus } #endif Index: src/interfaces/libpq/libpq-int.h === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/libpq-int.h,v retrieving revision 1.84 diff -c -r1.84 libpq-int.h *** src/interfaces/libpq/libpq-int.h9 Jan 2004 02:02:43 - 1.84 --- src/interfaces/libpq/libpq-int.h11 Jan 2004 17:29:38 - *** *** 448,453 --- 448,460 #ifdef ENABLE_THREAD_SAFETY extern void check_sigpipe_handler(void); extern pthread_key_t thread_in_send; + + extern pgthreadlock_t *g_threadlock; + #define pglock_thread() g_threadlock(true); + #define pgunlock_thread() g_threadlock(false); + #else + #define pglock_thread() ((void)0) + #define pgunlock_thread() ((void)0) #endif /* Index: src/port/thread.c === RCS file: /projects/cvsroot/pgsql-server/src/port/thread.c,v retrieving revision 1.14 diff -c -r1.14 thread.c *** src/port/thread.c 29 Nov 2003 22:41:31 - 1.14 --- src/port/thread.c 11 Jan 2004 17:29:38 - *** *** 65,70 --- 65,105 *non-*_r functions. */ + #if defined(FRONTEND) + #include libpq-fe.h + #include libpq-int.h + /* + * To keep the API consistent, the locking stubs are always provided, even + * if they are not required. + */ + pgthreadlock_t *g_threadlock; + + static pgthreadlock_t default_threadlock; + static void + default_threadlock(bool acquire) + { + #if defined(ENABLE_THREAD_SAFETY) + static pthread_mutex_t singlethread_lock = PTHREAD_MUTEX_INITIALIZER; + if (acquire) + pthread_mutex_lock(singlethread_lock); + else + pthread_mutex_unlock(singlethread_lock); + #endif + } + + pgthreadlock_t * + PQregisterThreadLock(pgthreadlock_t *newhandler) + { + pgthreadlock_t *prev; + + prev = g_threadlock; + if (newhandler) + g_threadlock = newhandler; + else + g_threadlock = default_threadlock; + return prev; + } + #endif /* * Wrapper around strerror and strerror_r to use the former if it is *** *** 82,96 #else #if defined(FRONTEND) defined(ENABLE_THREAD_SAFETY) defined(NEED_REENTRANT_FUNCS) !defined(HAVE_STRERROR_R) ! static pthread_mutex_t strerror_lock = PTHREAD_MUTEX_INITIALIZER; ! pthread_mutex_lock(strerror_lock); #endif /* no strerror_r() available, just use strerror */ StrNCpy(strerrbuf, strerror(errnum), buflen); #if defined(FRONTEND) defined(ENABLE_THREAD_SAFETY) defined(NEED_REENTRANT_FUNCS) !defined(HAVE_STRERROR_R) ! pthread_mutex_unlock(strerror_lock); #endif return strerrbuf; --- 117,130 #else #if defined(FRONTEND) defined(ENABLE_THREAD_SAFETY) defined(NEED_REENTRANT_FUNCS) !defined(HAVE_STRERROR_R) ! g_threadlock(true); #endif /* no strerror_r() available, just use strerror */ StrNCpy(strerrbuf, strerror(errnum), buflen); #if defined(FRONTEND) defined(ENABLE_THREAD_SAFETY) defined(NEED_REENTRANT_FUNCS) !defined(HAVE_STRERROR_R) ! g_threadlock(false); #endif return strerrbuf; *** *** 118,125 #else #if defined(FRONTEND) defined(ENABLE_THREAD_SAFETY) defined(NEED_REENTRANT_FUNCS) !defined(HAVE_GETPWUID_R) ! static pthread_mutex_t getpwuid_lock = PTHREAD_MUTEX_INITIALIZER; ! pthread_mutex_lock(getpwuid_lock); #endif /* no getpwuid_r() available, just use getpwuid() */ --- 152,158 #else
Re: [HACKERS] libpq thread safety
Tom Lane wrote: Wait a minute. I am *not* buying into any proposal that we need to support ENABLE_THREAD_SAFETY on machines where libc is not thread-safe. We have other things to do than adopt an open-ended commitment to work around threading bugs on obsolete platforms. I don't believe that any sane application programmer is going to try to implement a multi-threaded app on such a platform anyway. I'd agree - convince Bruce and I'll replace the mutexes in thread.c with #error. But I think libpq should support a mutex around kerberos (or at least fail at runtime) - right now it's too easy to corrupt the kerberos authentication state. -- Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] using stp for dbt2 + postgresql
Bruce Momjian wrote: [EMAIL PROTECTED] wrote: Hi Manfred, Just wanted to let you know I tried your patch-spinlock-i386 patch on our STP (our automated test platform) 8-way systems and saw a 5.5% improvement with Pentium III Xeons. If you want to see those results: PostgreSQL 7.4.1: http://khack.osdl.org/stp/285062/ PostgreSQL 7.4.1 w/ your patch: http://khack.osdl.org/stp/285087/ Impressive. Thanks. The best thing is that we can try our own postgres patches with SDT now: this gives us a chance to run tests on up to 8-way systems, with 4 gb memory, 40 spindles. From my experience, the typical turnaround time is half a day - submit patch [web interface], start benchmark run, and after a few ours you get a mail that contains the output. With oprofile, it's very detailed - % cpu time for each function, down to individual asm instructions, plus the ability for custom logging into the postmaster log. I think we should try to use that to find a cache replacement policy that is SMP scalable, i.e. doesn't need a global lock - I searched a few minutes on citeseer, but couldn't find anything that doesn't rely on global lists. -- Manfred ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] update i386 spinlock for hyperthreading
Bruce Momjian wrote: Anyone see an attack path here? Should we have one lock per hash bucket rather than one for the entire hash? That's the simple part. The problem is the aging strategy: we need a strategy that doesn't rely on a global list that's updated after every lookup. If I understand the ARC code correctly, there is a STRAT_MRU_INSERT(cdb, STRAT_LIST_T2) that happen in every lookup. -- Manfred ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [PATCHES] update i386 spinlock for hyperthreading
Jan Wieck wrote: Moving the Cache Directory Block (cdb) on a hit to the MRU position of the appropriate queue is the bookkeeping of this strategy. The whole algorithm is based on it, and I don't see yet how to avoid that without opening a huge can of worms that look like deadlocks. But I'll think about it for a while. I feared that. Are there strategies that do not rely on a global lock? The Linux kernel uses a lazy LRU with referenced bits: on access, the referenced bit is set. The freespace logic takes pages from the end of a linked list, and checks that bit: if it's set, then the page is moved back to the top of the list. Otherwise it's a candidate for replacement. Pages start at the head of that pseudo-lru list, with the reference bit clear: that way a page that is accessed only once has a lower priority than a frequently accessed page. At least that's how I understand the algorithm. -- Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] update i386 spinlock for hyperthreading
[EMAIL PROTECTED] wrote: Hi Manfred, I'm using unixware 7 but couldn't compile your source with native cc, I had to compile it with gcc. here are the results: Thanks. The test app compares the time needed for three different short loops: a loop with six empty function calls, a loop with six function calls and one nop in the middle, and a loop with a rep;nop; in the middle. Result: - nop needs 0 cycles - executed in parallel. - rep;nop between 24 and 60 cycles - long enough that the pipeline is emptied. I've searched around for further info regarding the recommended spinlock algorithm: - The optimization manual (google for Intel 248966) contains a section about pause instructions: The memory ordering violation is from the multiple simultaneous reads that are executed due to pipelining the busy loop. - It references the Application Note AP-949 Using Spin-Loops on Intel Pentium 4 Processor and Intel Xeon Processor for further details. Unfortunately the app notes are stored on cedar.intel.com, and that server appears to be down :-( -- Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Issue with Linux+Pentium SMP Context Switching
Josh Berkus wrote: Initial debug logging of a test on one Xeon system demonstrating this issue showed a very large number of unattributed semop() calls. We are still following up on this. Postgres has it's own user space spinlock and semaphore implementation. Both fall back to semop if there is contention. Hmm. You wrote that the problem is Xeon specific, and that AthlonMP are unaffected. Perhaps Xeon cpus do not like the s_lock implementation? It doesn't follow Intel's recommentations: - no pause instructions. - always TAS. The recommended approach is nonatomic tests until the value is 0, then an atomic TAS. Attached is a gross hack that adds pause instructions. If this doesn't magically fix your problem, then we must figure out what causes the semop calls, and avoid them. Could you ask your Linux hackers why they blame the shared memory implementation in postgres? I don't see any link between shared memory and lock contention. -- Manfred Index: backend/storage/lmgr/s_lock.c === RCS file: /projects/cvsroot/pgsql-server/src/backend/storage/lmgr/s_lock.c,v retrieving revision 1.16 diff -c -r1.16 s_lock.c *** backend/storage/lmgr/s_lock.c 8 Aug 2003 21:42:00 - 1.16 --- backend/storage/lmgr/s_lock.c 19 Dec 2003 20:01:33 - *** *** 111,116 --- 111,117 spins = 0; } + __asm__ __volatile__(rep;nop\n: : : memory); } } ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] fsync method checking
Bruce Momjian wrote: write 0.000360 write fsync 0.001391 write, close fsync 0.001308 open o_fsync, write0.000924 That's 1 milliseconds vs. 1.3 milliseconds. Neither value is realistic - I guess the hw cache on and the os doesn't issue cache flush commands. Realistic values are probably 5 ms vs 5.3 ms - 6%, not 30%. How large is the syscall latency with BSD/OS 4.3? One advantage of a seperate write and fsync call is better performance for the writes that are triggered within AdvanceXLInsertBuffer: I'm not sure how often that's necessary, but it's a write while holding both the WALWriteLock and WALInsertLock. If every write contains an implicit sync, that call would be much more expensive than necessary. -- Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Double linked list with one pointer
Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: Treating pointers as integers is technically nonportable but realistically you would be pretty hard pressed to find any architecture anyone runs postgres on where there isn't some integer datatype that you can cast both directions from pointers safely. ... like, say, Datum. We already make that assumption, so there's no new portability risk involved. There is a new type in C99 for integer that can hold a pointer value. I think it's called intptr_t resp. uintptr_t, but I don't have the standard around. It will be necessary for a 64-bit Windows port: Microsoft decided that pointer are 64-bit on WIN64, intlong remain 32-bit. Microsoft's own typedefs are UINT_PTR, DWORD_PTR, INT_PTR. -- Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] libpq thread safety
Hi, I've searched through libpq and looked for global or static variables as indicators of non-threadsafe code. I found: - Win32 and BeOS: there is a global ioctlsocket_ret variable, but it seems to be a dummy variable that is always discarded. - pg_krb4_init(): Are the kerberos libraries thread safe? Additionally, setting init_done is racy. - pg_krb4_authname(): uses a static buffer. - kerberos 5: Is the library thread safe? the initialization could run twice, I'm not sure if that's intentional. - pg_krb4_authname(): relies on the global variable pg_krb5_name. - PQoidStatus: uses a static buffer. - libpq_gettext: setting already_bound is racy. - openssl: According to http://www.openssl.org/docs/crypto/threads.html libpq must register locking callbacks within openssl, otherwise there will be random corruptions. Additionally the SSL_context initialization is not properly synchronized, and SSLerrmessage relies on a static buffer. PQoidStatus is already documented as not thread safe, but what about OpenSSL and kerberos? It seems openssl needs support with callbacks, and according to google searches MIT kerberos 5 is not thread safe, and libpq must use mutexes to prevent concurrent calls into the kerberos library. -- Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Experimental patch for inter-page delay in VACUUM
Greg Stark wrote: I'm assuming fsync syncs writes issued by other processes on the same file, which isn't necessarily true though. It was already pointed out that we can't rely on that assumption. So the NetBSD and Sun developers I checked with both asserted fsync does in fact guarantee this. And SUSv2 seems to back them up: At least Linux had one problem: fsync() syncs the inode to disk, but not the directory entry: if you rename a file, open it, write to it, fsync, and the computer crashes, then it's not guaranteed that the file rename is on the disk. I think only the old ext2 is affected, not the journaling filesystems. -- Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Performance features the 4th
Jan Wieck wrote: _Vacuum page delay_: Tom Lane's napping during vacuums with another tuning option. I replaced the usleep() call with a PG_DELAY(msec) macro in miscadmin.h, which does use select(2) instead. That should address the possible portability problems. What about skipping the delay if there are no outstanding disk operations? Then vacuum would get the full disk bandwidth if the system is idle. -- Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Performance features the 4th
Tom Lane wrote: Manfred's idea is interesting but AFAICS completely unimplementable in any portable fashion. You'd have to have hooks into the kernel. I thought about outstanding operations from postgres - I don't know enough about the buffer layer if it's possible to keep a counter of the currently running read() and write() operations, or something similar. -- Manfred ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] OSDL DBT-2 w/ PostgreSQL 7.3.4 and 7.4beta5
[EMAIL PROTECTED] wrote: On 1 Nov, Tom Lane wrote: Manfred Spraul [EMAIL PROTECTED] writes: signal handlers are a process property, not a thread property - that code is broken for multi-threaded apps. Yeah, that's been mentioned before, but I don't see any way around it. What we really want is to turn off SIGPIPE delivery on our socket (only), but AFAIK there is no API to do that. Will this be a problem for multi-threaded apps with any of the client interfaces? Anyone working on making it threadsafe? The POSIX api is not thread safe: signal handlers are per process, and libpq would like to block SIGPIPE for it's send() calls. For single threaded apps, libpq just calls sigaction and sets the handler to SIG_IGN around the syscalls. For multithreaded apps, this is not possible: sigaction is per process. Thus the calling application must handle the SIGPIPE signals for libpq - either by blocking or ignoring them. We are still discussing the exact API. Probably a global state that is accessible through a new function. One thread-safe alternative might be the combination of sigprocmask / pthread_sigmask and sigwait, but I think this would be too fragile. -- Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] OSDL DBT-2 w/ PostgreSQL 7.3.4 and 7.4beta5
Tom Lane wrote: Manfred Spraul [EMAIL PROTECTED] writes: For multithreaded apps, this is not possible: sigaction is per process. Thus the calling application must handle the SIGPIPE signals for libpq - either by blocking or ignoring them. We are still discussing the exact API. Probably a global state that is accessible through a new function. I think we should also take a hard look at avoiding the problem by using MSG_NOSIGNAL on platforms that have it, I think that's the second step. First we need a portable solution, then we can optimize it. The fastest solution is one signal(SIGPIPE, SIG_IGN) in main(), but that requires a change in all libpq users. OTHO there shouldn't be that many multithreaded users. sigprocmask + sigwait could work, but sigprocmask is undefined if multiple threads are running. Is there a portable approach for weak links? libpq would have to call proc_sigmask if linked against libpthread, and sigprocmask if not linked against libpthread. With gcc, I could use 'void proc_sigmask () __attribute__ ((weak, alias (_sigprocmask)));' or something similar, but this wouldn't be portable either. -- Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] adding support for posix_fadvise()
Neil Conway wrote: The present Linux implementation doesn't do this, AFAICS -- all it does it increase the readahead for this file: AFAIK Linux uses a modified LRU that automatically puts pages that were touched only once at a lower priority than frequently accessed pages. Neil: what about calling posix_fadvise for the whole file immediately after issue_xlog_fsync() in XLogWrite? According to the comment, it's guaranteed that this will happen only once. Or: add an posix_fadvise into issue_xlog_fsync(), for the range just sync'ed. Btw, how much xlog traffic does a busy postgres site generate? -- Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: Avoiding SIGPIPE (was Re: [HACKERS] OSDL DBT-2 w/ PostgreSQL
Tom Lane wrote: It strikes me that sigpipe handling will be a global affair in any particular application --- it's unlikely that it would be correct for some PG connections and wrong for others. So one possibility is to make the control variable be global (static) and thus it could be set before creating the first PGconn. What about the attached patches? I hope I found all places that must be updated when a new function is added to libpq. -- Manfred Index: doc/src/sgml/libpq.sgml === RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/libpq.sgml,v retrieving revision 1.141 diff -c -r1.141 libpq.sgml *** doc/src/sgml/libpq.sgml 1 Nov 2003 01:56:29 - 1.141 --- doc/src/sgml/libpq.sgml 3 Nov 2003 20:35:57 - *** *** 645,650 --- 645,693 /listitem /varlistentry + varlistentry + termfunctionPQsetsighandling/functionindextermprimaryPQsetsighandling///term + termfunctionPQgetsighandling/functionindextermprimaryPQgetsighandling///term + listitem +para +Set/query SIGPIPE signal handling. + synopsis + void PQsetsighandling(int internal_sigign); + /synopsis + synopsis + int PQgetsighandling(void); + /synopsis + /para + + para + These functions allow to query and set the SIGPIPE signal handling + of libpq: by default, Unix systems generate a (fatal) SIGPIPE signal + on a send to a socket that lost it's connection. Most callers expect + a normal error return instead of the signal. A normal error return + can be achieved by blocking or ignoring the SIGPIPE signal. This can + be done either globally in the application or inside libpq. +/para +para + If internal signal handling is enabled (this is the default), then + libpq sets the SIGPIPE handler to SIG_IGN before every socket send + operation and restores it afterwards. This prevents libpq from + killing the application, at the cost of a slight performance + decrease. This approach is not reliable for multithreaded applications. +/para +para + If internal signal handling is disabled, then the caller is + responsible for blocking or handling SIGPIPE signals. This is + recommended for multithreaded applications. +/para +para + The signal handler setting is a global flag, it affects all + connections. The setting has no effect for Win32 clients - Win32 + doesn't generate SIGPIPE events. +/para + /listitem + /varlistentry + + /variablelist /para /sect1 Index: src/interfaces/libpq/blibpqdll.def === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/blibpqdll.def,v retrieving revision 1.9 diff -c -r1.9 blibpqdll.def *** src/interfaces/libpq/blibpqdll.def 13 Aug 2003 16:29:03 - 1.9 --- src/interfaces/libpq/blibpqdll.def 3 Nov 2003 20:35:59 - *** *** 113,118 --- 113,120 _PQfformat @ 109 _PQexecPrepared @ 110 _PQsendQueryPrepared @ 111 + _PQsetsighandling@ 112 + _PQgetsighandling@ 113 ; Aliases for MS compatible names PQconnectdb = _PQconnectdb *** *** 226,228 --- 228,232 PQfformat = _PQfformat PQexecPrepared = _PQexecPrepared PQsendQueryPrepared = _PQsendQueryPrepared + PQsetsighandling= _PQsetsighandling + PQgetsighandling= _PQgetsighandling Index: src/interfaces/libpq/fe-secure.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-secure.c,v retrieving revision 1.32 diff -c -r1.32 fe-secure.c *** src/interfaces/libpq/fe-secure.c29 Sep 2003 16:38:04 - 1.32 --- src/interfaces/libpq/fe-secure.c3 Nov 2003 20:35:59 - *** *** 198,203 --- 198,204 -END DH PARAMETERS-\n; #endif + static int do_sigaction = 1; /* */ /* Procedures common to all secure sessions */ /* */ *** *** 348,354 ssize_t n; #ifndef WIN32 ! pqsigfunc oldsighandler = pqsignal(SIGPIPE, SIG_IGN); #endif #ifdef USE_SSL --- 349,358 ssize_t n; #ifndef WIN32 ! pqsigfunc oldsighandler = NULL; ! ! if (do_sigaction) ! oldsighandler = pqsignal(SIGPIPE, SIG_IGN); #endif #ifdef USE_SSL *** *** 408,417 n = send(conn-sock, ptr, len, 0); #ifndef WIN32 ! pqsignal(SIGPIPE, oldsighandler); #endif return n; } /* */ --- 412,432
Re: [HACKERS] OSDL DBT-2 w/ PostgreSQL 7.3.4 and 7.4beta5
Mark Wong wrote: On Sat, Nov 01, 2003 at 10:29:34PM +0100, Manfred Spraul wrote: Mark Wong wrote: Yeah, my dbt2 applications are multithreaded. Do you need SIGPIPE delivery in your app? If no, could you try what happens if you apply the attached patch to postgres, and perform the signal(SIGPIPE, SIG_IGN); once in your dbt2 app? Wow, that patch made a pretty big difference: http://developer.osdl.org/markw/dbt2-pgsql/191/ - metric 1605.51 So no one has to look for older mail before I applied that patch: http://developer.osdl.org/markw/dbt2-pgsql/190/ - metric 1427.24 Looks like about a 12% improvement in the overall metric. The first thing I noticed is that do_sigaction in the kernel profile almost disappeared. Cool. The top few functions in the database profile doesn't appear to have changed much. I've looked at the profile: The only unusal line is the memcpy(cur_skey, cache-cc_skey, sizeof(cur_skey)): it copies 144 byte and needs ~5.3% global cpu time, from the 12.1% in SearchCatCache. The cachelines (line size 128 bytes) of cc_skey are shared with cc_bucket. 1.8% cpu time is spent in DLMoveToFront, the function that moves cache entries around. Perhaps a scalability problem of the hash table? The implementation moves the entries around all the time, i.e. the worst case for cache line transfers. -- Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: Avoiding SIGPIPE (was Re: [HACKERS] OSDL DBT-2 w/ PostgreSQL
AgentM wrote: That wouldn't offer a solution for people who use SIGPIPE for other things during the lifetime of the program (after creating the connection) and if a SIGPIPE handler is called due to the connection, the handler won't be expecting the source, and polling signal for state is essentially what you do now. Instead, I propose a PQsigpipeOK/PQacceptsigpipe/PQrecvsigpipe(PGconn*) or something to that effect which skips this check for the connection. That way, programmers are aware that the connection could call their SIGPIPE handler because they explicitly request it and the library remains backwards-compatible. If I understand libpq sources correctly, the first packets are send during connection setup - PQsigpipeOK(PGconn *) would be too late. That's why I added sigpipe=caller as a new flag for PQconnectdb. -- Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] OSDL DBT-2 w/ PostgreSQL 7.3.4 and 7.4beta5
[EMAIL PROTECTED] wrote: Results from 7.4beta5 http://developer.osdl.org/markw/dbt2-pgsql/188/ - metric 1446.01 CPU: P4 / Xeon with 2 hyper-threads, speed 1497.51 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (count cycles when processor is active) count 10 samples %app name symbol name 15369575 9.6780 postgres SearchCatCache 13714258 8.6357 vmlinux .text.lock.signal 10611912 6.6822 vmlinux do_sigaction 4400461 2.7709 vmlinux rm_from_queue 18% cpu time in the kernel signal handlers. What are signals used for by postgres? I've seen the sigalarm to implement timeouts, what else? -- Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] OSDL DBT-2 w/ PostgreSQL 7.3.4 and 7.4beta5
I've straced $ pgbench -c 5 -s 6 -t 1000 total 157k syscalls, 70k of them are rt_sigaction(SIGPIPE): 1754 poll([{fd=3, events=POLLOUT|POLLERR, revents=POLLOUT}], 1, -1) = 1 1754 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 1754 send(3, \0\0\0%\0\3\0\0user\0postgres\0database\0t..., 37, 0) = 37 1754 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 1754 poll([{fd=3, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1 1754 recv(3, R\0\0\0\10\0\0\0\0S\0\0\0\36client_encoding\0SQ..., 16384, 0) = 169 1754 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 1754 send(3, Q\0\0\0\35SET search_path = public\0, 30, 0) = 30 1754 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 1754 poll([{fd=3, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1 1754 recv(3, C\0\0\0\10SET\0Z\0\0\0\5I, 16384, 0) = 15 1754 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 and so on. Is that really necessary? Mark: could you strace your dbt2 app? I guess your app creates a similar streams of rt_sigaction calls. -- Manfred ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] OSDL DBT-2 w/ PostgreSQL 7.3.4 and 7.4beta5
Tom Lane wrote: Manfred Spraul [EMAIL PROTECTED] writes: signal handlers are a process property, not a thread property - that code is broken for multi-threaded apps. Yeah, that's been mentioned before, but I don't see any way around it. Do not handle SIGPIPE on multithreaded apps, and ask the caller to do that? The current code doesn't block SIGPIPE reliably, which makes it totally useless (except that it's a debugging nightmare, because triggering it depends on the right timing). What we really want is to turn off SIGPIPE delivery on our socket (only), but AFAIK there is no API to do that. Linux has as MSG_NOSIGNAL flag for send(), but that seems to be Linux specific. -- Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: Avoiding SIGPIPE (was Re: [HACKERS] OSDL DBT-2 w/ PostgreSQL
Tom Lane wrote: A bigger objection is that we couldn't get libssl to use it (AFAIK). The flag really needs to be settable on the socket (eg, via fcntl), not per-send. It's a per-send flag, it's not possible to force it on with a fcntl :-( What about an option to skip the sigaction calls for apps that can handle SIGPIPE? I'm not sure if an option at connect time, or a flag accessible through a function like PQsetnonblocking() is the better approach. Attached is a patch that adds a connstr option, but I don't like it. -- Manfred Index: fe-connect.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-connect.c,v retrieving revision 1.260 diff -c -r1.260 fe-connect.c *** fe-connect.c5 Sep 2003 02:08:36 - 1.260 --- fe-connect.c1 Nov 2003 21:02:04 - *** *** 65,70 --- 65,71 #else #define DefaultSSLModedisable #endif + #define DefaultSIGPIPEModesigaction /* -- *** *** 152,157 --- 153,161 {sslmode, PGSSLMODE, DefaultSSLMode, NULL, SSL-Mode, , 8}, /* sizeof(disable) == 8 */ + {sigpipemode, PGSIGPIPEMODE, DefaultSIGPIPEMode, NULL, + SIGPIPE-Mode, , 10},/* sizeof(sigaction) == 10 */ + /* Terminating entry --- MUST BE LAST */ {NULL, NULL, NULL, NULL, NULL, NULL, 0} *** *** 369,374 --- 373,380 conn-sslmode = strdup(require); } #endif + tmp = conninfo_getval(connOptions, sigpipemode); + conn-sigpipemode = tmp ? strdup(tmp) : NULL; /* * Free the option info - all is in conn now *** *** 478,483 --- 484,508 else conn-sslmode = strdup(DefaultSSLMode); + /* +* validate sigpipemode option +*/ + if (conn-sigpipemode) + { + if (strcmp(conn-sigpipemode, caller) != 0 +strcmp(conn-sigpipemode, sigaction) != 0) + { + conn-status = CONNECTION_BAD; + printfPQExpBuffer(conn-errorMessage, +libpq_gettext(unrecognized sigpipemode: \%s\\n), + conn-sigpipemode); + return false; + } + } + else + conn-sigpipemode = strdup(DefaultSIGPIPEMode); + + return true; } *** *** 951,956 --- 976,986 else if (conn-sslmode[0] == 'a') /* allow */ conn-wait_ssl_try = true; #endif + if (conn-sigpipemode[0] == 's') /* sigaction */ + conn-do_sigaction = true; + else + conn-do_sigaction = false; + /* * Set up to try to connect, with protocol 3.0 as the first attempt. *** *** 2033,2038 --- 2063,2070 free(conn-pgpass); if (conn-sslmode) free(conn-sslmode); + if (conn-sigpipemode) + free(conn-sigpipemode); /* Note that conn-Pfdebug is not ours to close or free */ if (conn-notifyList) DLFreeList(conn-notifyList); Index: fe-secure.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-secure.c,v retrieving revision 1.30 diff -c -r1.30 fe-secure.c *** fe-secure.c 5 Sep 2003 02:08:36 - 1.30 --- fe-secure.c 1 Nov 2003 21:02:06 - *** *** 348,354 ssize_t n; #ifndef WIN32 ! pqsigfunc oldsighandler = pqsignal(SIGPIPE, SIG_IGN); #endif #ifdef USE_SSL --- 348,357 ssize_t n; #ifndef WIN32 ! pqsigfunc oldsighandler = NULL; ! ! if (conn-do_sigaction) ! oldsighandler = pqsignal(SIGPIPE, SIG_IGN); #endif #ifdef USE_SSL *** *** 408,414 n = send(conn-sock, ptr, len, 0); #ifndef WIN32 ! pqsignal(SIGPIPE, oldsighandler); #endif return n; --- 411,418 n = send(conn-sock, ptr, len, 0); #ifndef WIN32 ! if (conn-do_sigaction) ! pqsignal(SIGPIPE, oldsighandler); #endif return n; Index: libpq-int.h === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/libpq-int.h,v retrieving revision 1.82 diff -c -r1.82 libpq-int.h *** libpq-int.h 5 Sep 2003 02:08:36 - 1.82 --- libpq-int.h 1 Nov 2003 21:02:07 - *** *** 250,255 --- 250,256 char *pguser; /* Postgres username and password, if any */ char *pgpass; char *sslmode;/* SSL mode
Re: [HACKERS] OSDL DBT-2 w/ PostgreSQL 7.3.4 and 7.4beta5
Mark Wong wrote: Yeah, my dbt2 applications are multithreaded. Do you need SIGPIPE delivery in your app? If no, could you try what happens if you apply the attached patch to postgres, and perform the signal(SIGPIPE, SIG_IGN); once in your dbt2 app? -- Manfred --- pgsql.orig/src/interfaces/libpq/fe-secure.c 2003-11-01 22:28:13.0 +0100 +++ pgsql/src/interfaces/libpq/fe-secure.c 2003-11-01 22:27:21.0 +0100 @@ -348,7 +348,7 @@ ssize_t n; #ifndef WIN32 - pqsigfunc oldsighandler = pqsignal(SIGPIPE, SIG_IGN); +/* pqsigfunc oldsighandler = pqsignal(SIGPIPE, SIG_IGN); */ #endif #ifdef USE_SSL @@ -408,7 +408,7 @@ n = send(conn-sock, ptr, len, 0); #ifndef WIN32 - pqsignal(SIGPIPE, oldsighandler); +/* pqsignal(SIGPIPE, oldsighandler); */ #endif return n; ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: Avoiding SIGPIPE (was Re: [HACKERS] OSDL DBT-2 w/ PostgreSQL
Tom Lane wrote: Manfred Spraul [EMAIL PROTECTED] writes: What about an option to skip the sigaction calls for apps that can handle SIGPIPE? If the app is ignoring SIGPIPE globally, then our calls will have no effect anyway. Wrong. From the opengroup manpage: SIG_IGN - ignore signal [snip] - Setting a signal action to SIG_IGN for a signal that is pending will cause the pending signal to be discarded, whether or not it is blocked This is why the kernel spends 20% cpu time processing the SIG_IGN: it must walk through all threads of the process and check if there are any SIGPIPE signals pending. I don't see that this proposal adds any security. It's not about security: Right now multithreaded apps must call signal(SIGPIPE, SIG_IGN), otherwise they could get killed by sudden SIGPIPE signals. Additionally, they can't rely on sigpending, because the pendings bits are cleared regularly. On top, they get a noticable performance hit. My proposal means that apps that know what they are doing (SIGPIPE either SIG_IGN, or blocked, or a suitable handler) can avoid the signal(SIGPIPE, SIG_IGN) in pqsecure_write. With backward compatibility, because the current system works for single threaded apps. -- Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] O_DIRECT in freebsd
Greg Stark wrote: Manfred Spraul [EMAIL PROTECTED] writes: One problem for WAL is that O_DIRECT would disable the write cache - each operation would block until the data arrived on disk, and that might block other backends that try to access WALWriteLock. Perhaps a dedicated backend that does the writeback could fix that. aio seems a better fit. Has anyone tried to use posix_fadvise for the wal logs? http://www.opengroup.org/onlinepubs/007904975/functions/posix_fadvise.html Linux supports posix_fadvise, it seems to be part of xopen2k. Odd, I don't see it anywhere in the kernel. I don't know what syscall it's using to do this tweaking. At least in 2.6: linux/mm/fadvise.c, the syscall is fadvise64 or 64_64 This is the only option that seems useful for postgres for both the WAL and vacuum (though in other threads it seems the problems with vacuum lie elsewhere): POSIX_FADV_DONTNEED attempts to free cached pages associated with the specified region. This is useful, for example, while streaming large files. A program may periodically request the kernel to free cached data that has already been used, so that more useful cached pages are not discarded instead. Pages that have not yet been written out will be unaffected, so if the application wishes to guarantee that pages will be released, it should call fsync or fdatasync first. I agree. Either immediately after each flush syscall, or just before closing a log file and switching to the next. Perhaps POSIX_FADV_RANDOM and POSIX_FADV_SEQUENTIAL could be useful in a backend before starting a sequential scan or index scan, but I kind of doubt it. IIRC the recommendation is ~20% total memory for the postgres user space buffers. That's quite a lot - it might be sufficient to protect that cache from vacuum or sequential scans. AddBufferToFreeList already contains a comment that this is the right place to try buffer replacement strategies. -- Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] O_DIRECT in freebsd
Tom Lane wrote: Not for WAL --- we never read the WAL at all in normal operation. (If it works for writes, then we would want to use it for writing WAL, but that's not apparent from what Christopher quoted.) At least under Linux, it works for writes. Oracle uses O_DIRECT to access (both read and write) disks that are shared between multiple nodes in a cluster - their database kernel must know when the data is visible to the other nodes. One problem for WAL is that O_DIRECT would disable the write cache - each operation would block until the data arrived on disk, and that might block other backends that try to access WALWriteLock. Perhaps a dedicated backend that does the writeback could fix that. Has anyone tried to use posix_fadvise for the wal logs? http://www.opengroup.org/onlinepubs/007904975/functions/posix_fadvise.html Linux supports posix_fadvise, it seems to be part of xopen2k. -- Manfred ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Database Kernels and O_DIRECT
Andrew Dunstan wrote: I have wondered (somewhat fruitlessly) for several years about the possibilities of special purpose lightweight file systems that could relax some of the assumptions and checks used in general purpose file systems. Such a thing might provide most of the benefits of a database kernel without imposing anything extra on the database application layer. CPU is usually cheap compared to disk io. There are two things that might be worth looking into: Oracle released their cluster filesystem (ocfs) as a GPL driver for Linux. It might be interesting to check how it performs if used for postgres, but I fear that it implicitely assumes that the bulk of the caching is performed by the database in user space. And using O_DIRECT for the WAL logs - the logs are never read. -- Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] compile warning
Andrew Dunstan wrote: Bruce Momjian wrote: This seems to be a bug in gcc-3.3.1. -fstrict-aliasing is enabled by -O2 or higher optimization in gcc 3.3.1. According to the C standard, it's illegal to access a data with a pointer of the wrong type. The only exception is char *. This can be used by compilers to pipeline loops, or to reorder instructions. For example void dummy(double *out, int *in, int len) { int j; for (j=0;jlen;j++) out[j] = 1.0/in[j]; } Can be pipelined if a compiler relies on strict aliasing: it's guaranteed that writing to out[5] won't overwrite in[6]. I think MemSet violates strict aliasing: it writes to the given address with (int32*). gcc might move the instructions around. I would disable strict aliasing with -fno-strict-aliasing. In the Linux kernel, you can see this in include/linux/tcp.h: /* * The union cast uses a gcc extension to avoid aliasing problems * (union is compatible to any of its members) * This means this part of the code is -fstrict-aliasing safe now. */ The kernel is still compiled with -fno-strict-aliasing - I'm not sure if there are outstanding problems, or if it's just a safety precaution. -- Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] IDE Drives and fsync
scott.marlowe wrote: OK, I've done some more testing on our IDE drive machine. First, some background. The hard drives we're using are Seagate drives, model number ST380023A. Firmware version is 3.33. The machine they are in is running RH9. The setup string I'm feeding them on startup right now is: hdparm -c3 -f -W1 /dev/hdx where: -c3 sets I/O to 32 bit w/sync (uh huh, sure...) sync has nothing to do with sync to disk. The sync means read from three magic io ports before transfering data to or from the device. -f sets the drive to flush buffer cache on exit -f shouldn't have any effect: it means that the buffer cache in the OS is flushed after hdparm exits, it has no long-term effect on the disk. -W1 turns on write caching That's the problem: turning on write caching causes corruptions. What's needed is partial write caching: write cache on, and fsync() sends a barrier to the disk, and only after the disk reports that the barrier is completed, then fsync() returns. I consider that an OS/driver problem, not a problem for postgres. The drives come up using DMA. turning unmask IRQ on / off has no affect on the tests I've been performaing. Of course. irq unmasking is about interrupt latency if DMA is not used: DMA off and dma masking off results in dropped bytes on serial links. Without the -f switch, data corruption due to sudden power down is an almost certain. It's odd that adding -f reduces the corruptions - probably it changes available memory, and thus the writeback of data from kernel to disk. Tom, you had mentioned adding a delay of some kind to the fsync logic, and I'd be more than willing to try out any patch you'd like to toss out to me to see if we can get a semi-stable behaviour out of IDE drives with the -W1 and -f switches turned on. I'm not aware that there is any safe delay. Disks with write caches reorder io operations, and some hold back write operations indefinitively. Unfortunately Linux doesn't implement write barriers, and the support in some IDE disks is missing, too :-( -- Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 2-phase commit
Peter Eisentraut wrote: Tom Lane writes: No. The real problem with 2PC in my mind is that its failure modes occur *after* you have promised commit to one or more parties. In multi-master, if you fail you know it before you have told the client his data is committed. I have a book here which claims that the solution to the problems of 2-phase commit is 3-phase commit, which goes something like this: coordinator participant --- --- INITIAL INITIAL prepare -- WAIT -- vote commit READY (all voted commit) prepare-to-commit -- PRE-COMMIT -- ready-to-commit PRE-COMMIT global-commit -- COMMIT COMMIT If the coordinator fails and all participants are in state READY, they can safely decide to abort after some timeout. If some participant is already in state PRE-COMMIT, it becomes the new coordinator and sends the global-commit message. Details are left as an exercise. :-) Ok. Lets assume one coordinator, two partitipants. Global commit send to both by coordinator. One replies with ok, the other one remains silent. What should the coordinator do? It can't fail the transaction - the first partitipant has commited its part. It can't complete the transaction, because the ok from the 2nd partitipant is still outstanding. I think Bruce is right: It's an admin decision. If a timeout expires, a user supplied app should be called, with a safe default (database shutdown?). -- Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Threads vs Processes (was: NuSphere and PostgreSQL
Tom Lane wrote: Claudio Natoli [EMAIL PROTECTED] writes: How are you dealing with the issue of wanting some static variables to be per-thread and others not? To be perfectly honest, I'm still trying to familiarize myself with the code sufficiently well so that I can tell which variables need to be per-thread and which are shared (and, in turn, which of these need to be protected from concurrent access). No. Not protected from concurrent access. Each thread must have it's own copy. Well, the first-order approximation would be to duplicate the current fork semantics: *all* static variables are per-thread, and should be copied from the parent thread at thread creation. If there is some reasonably non-invasive way to do that, we'd have a long leg up on the problem. There is a declspec(thread) that makes a global variable per-thread. AFAIK it uses linker magic to replace the actual memory accesses with calls to TlsAlloc() etc. Note that declspec(thread) doesn't work from within dynamic link libraries, but that shouldn't be a big problem. -- Manfred ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] semtimedop instead of setitimer/semop/setitimer
Tom Lane wrote: AFAIK, semops are not done unless we actually have to yield the processor, so saving a syscall or two in that path doesn't sound like a big win. I'd be more interested in asking why you're seeing long series of semops in the first place. Virtually all semops yield the processor, that part works. I couldn't figure out what exactly causes the long series of semops. I tried to track it down (enable LOCK_DEBUG): - postgres 7.3.3. - pgbench -c 30 -t 300 - database stored on ramdisk - laptop disks are just too slow. The long series of semops are caused by lots of processes that try to acquire a lock that is held exclusively by another process. Something like * 10 processes are waiting for a ShareLock on lock c568c. One of them already owns an ExclusiveLock on lock c91b4. * everyone receives the shared lock A, does something, drops it. * then the 9 processes try to acquire a ShareLock on lock B, and go to sleep. Is there are simple way to figure out what lock c91b4 is? Here is the log: I've added getpid() to the elog calls and I've overridden LOCK_DEBUG_ENABLED to write out everything always. Additionally, I've printed the caller address for LockAcquire Process 29420 acquires a lock exclusively: LockAcquire for pid 29420 called by 0x81147d6 (XactLockTableInsert) LockAcquire: new: 29420 lock(c91b4) tbl(1) rel(376) db(0) obj(1439) grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0) type(ExclusiveLock) LockAcquire: new: 29420 holder(c95e8) lock(c91b4) tbl(1) proc(a47b0) xid(1439) hold(0,0,0,0,0,0,0)=0 LockCheckConflicts: no conflict: 29420 holder(c95e8) lock(c91b4) tbl(1) proc(a47b0) xid(1439) hold(0,0,0,0,0,0,0)=0 GrantLock: 29420 lock(c91b4) tbl(1) rel(376) db(0) obj(1439) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock) [ Snip] Process 29420 acquires another lock shared, goes to sleep. LockAcquire for pid 29420 called by 0x811484a (XactLockTableWait) LockAcquire: found: 29420 lock(c568c) tbl(1) rel(376) db(0) obj(1421) grantMask(80) req(0,0,0,0,2,0,1)=3 grant(0,0,0,0,0,0,1)=1 wait(2) type(ShareLock) LockAcquire: new: 29420 holder(c62c0) lock(c568c) tbl(1) proc(a47b0) xid(1439) hold(0,0,0,0,0,0,0)=0 LockCheckConflicts: conflicting: 29420 holder(c62c0) lock(c568c) tbl(1) proc(a47b0) xid(1439) hold(0,0,0,0,0,0,0)=0 WaitOnLock: sleeping on lock: 29420 lock(c568c) tbl(1) rel(376) db(0) obj(1421) grantMask(80) req(0,0,0,0,3,0,1)=4 grant(0,0,0,0,0,0,1)=1 wait(2) type(ShareLock) ProcSleep from 0x8115763, pid 29420, proc 0xbf2f57b0 for 0xbf31668c, mode 5. omitted: several other processes sleep on the same lock. omitted: LockReleaseAll grants the lock to everyone that was sleeping on c568c For several threads: LOG: ProcSleep from 0x8115763, pid 29436, proc 0xbf2f52f0 for 0xbf31668c done. LOG: WaitOnLock: wakeup on lock: 29436 lock(c568c) tbl(1) rel(376) db(0) obj(1421) grantMask(20) req(0,0,0,0,3,0,0)=3 grant(0,0,0,0,3,0,0)=3 wait(0) type(ShareLock) LOG: LockAcquire: granted: 29436 holder(c6274) lock(c568c) tbl(1) proc(a42f0) xid(1446) hold(0,0,0,0,1,0,0)=1 LOG: LockAcquire: granted: 29436 lock(c568c) tbl(1) rel(376) db(0) obj(1421) grantMask(20) req(0,0,0,0,3,0,0)=3 grant(0,0,0,0,3,0,0)=3 wait(0) type(ShareLock) LOG: LockRelease: found: 29436 lock(c568c) tbl(1) rel(376) db(0) obj(1421) grantMask(20) req(0,0,0,0,3,0,0)=3 grant(0,0,0,0,3,0,0)=3 wait(0) type(ShareLock) LOG: LockRelease: found: 29436 holder(c6274) lock(c568c) tbl(1) proc(a42f0) xid(1446) hold(0,0,0,0,1,0,0)=1 LOG: LockRelease: updated: 29436 lock(c568c) tbl(1) rel(376) db(0) obj(1421) grantMask(20) req(0,0,0,0,2,0,0)=2 grant(0,0,0,0,2,0,0)=2 wait(0) type(ShareLock) LOG: LockRelease: updated: 29436 holder(c6274) lock(c568c) tbl(1) proc(a42f0) xid(1446) hold(0,0,0,0,0,0,0)=0 LOG: LockRelease: deleting: 29436 holder(c6274) lock(c568c) tbl(1) proc(a42f0) xid(1446) hold(0,0,0,0,0,0,0)=0 LOG: LockAcquire for pid 29436 called by 0x811484a. (XactLockTableWait) LOG: LockAcquire: found: 29436 lock(c91b4) tbl(1) rel(376) db(0) obj(1439) grantMask(80) req(0,0,0,0,2,0,1)=3 grant(0,0,0,0,0,0,1)=1 wait(2) type(ShareLock) LOG: LockAcquire: new: 29436 holder(c6274) lock(c91b4) tbl(1) proc(a42f0) xid(1446) hold(0,0,0,0,0,0,0)=0 LOG: LockCheckConflicts: conflicting: 29436 holder(c6274) lock(c91b4) tbl(1) proc(a42f0) xid(1446) hold(0,0,0,0,0,0,0)=0 LOG: WaitOnLock: sleeping on lock: 29436 lock(c91b4) tbl(1) rel(376) db(0) obj(1439) grantMask(80) req(0,0,0,0,3,0,1)=4 grant(0,0,0,0,0,0,1)=1 wait(2) type(ShareLock) LOG: ProcSleep from 0x8115763, pid 29436, proc 0xbf2f52f0 for 0xbf31a1b4, mode 5. Hmm. The initial exclusive lock is from XactLockTableInsert, the ShareLock waits are from XactLockTableWait. Everyone tries to start a transaction on the same entry? I've uploaded a larger part (500 kB) of the log to http://www.colorfullife.com/~manfred/sql-log.gz -- Manfred ---(end of broadcast)---
Re: [HACKERS] semtimedop instead of setitimer/semop/setitimer
Tom Lane wrote: Oh, pgbench ;-). Are you aware that you need a scale factor (-s) larger than the number of clients to avoid unreasonable levels of contention in pgbench? No. What about adding a few reasonable examples to README? I've switched to pgbench -c 10 -s 11 -t 1000 test. Is that ok? Now the semop calls are virtually gone. That leaves the question why sysv sem showed up high in the dbt2 benchmarks, but that's another question. I'm back to my original idea: align the data buffers to speed up the user space/kernel space transfers. It looks good: before: (with/without connection) 105.031776//105.093682 105.201246//105.260008 after aligning: 112.664320//112.730542 111.031901//111.098496 111.685869/111.751130 Tested with 7.3.4. Initially I tried to increase MAX_ALIGNOF to 16, but the result didn't work: pgbench failed with: ERROR: CREATE DATABASE cannot be executed from a function createdb: database creation failed For my test I've manually edited shmem and aligned all allocations to 16 byte offsets. I'll try to compile the 7.4 cvs tree, probably someone makes wrong assumptions about the alignment values. -- Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] semtimedop instead of setitimer/semop/setitimer
Tom Lane wrote: Manfred Spraul [EMAIL PROTECTED] writes: ... Initially I tried to increase MAX_ALIGNOF to 16, but the result didn't work: You would need to do a full recompile and initdb to alter MAX_ALIGNOF. I think I did that, but it still failed. 7.4cvs works, I'll ignore it. MAX_ALIGNOF affects the on-disk format, correct? Then I agree that it's the wrong to change it. However, if you are wanting to raise it past about 8, that's probably not the way to go anyway; it would create padding wastage in too many places. It would make more sense to allocate the buffers using a variant ShmemAlloc that could be told to align this particular object on an N-byte boundary. Then it costs you no more than N bytes in the one place. I agree, I'll write a patch. (BTW, I wonder whether there would be any win in allocating the buffers on a 4K or 8K page boundary... do any kernels use virtual memory mapping tricks to replace data copying in such cases?) Linux doesn't. Page table games are considered as evil, because tlb flushing is expensive, especially on SMP. -- Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] semtimedop instead of setitimer/semop/setitimer
I've noticed that postgres strace output contains long groups of setitimer/semop/setitimer. Just FYI: semtimedop is a special syscalls that implements a semop with a timeout. It was added just for the purpose of avoiding the setitimer calls. I know that it's supported by Solaris and recent Linux versions, I'm not sure about other operating systems. Has anyone tried to use it? Oracle pushed it to Linux, it seems to be worth the effort: http://www.ussg.iu.edu/hypermail/linux/kernel/0211.3/0485.html -- Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] Memory buffer alignment
Hi, When analyzing the kernel profile from osdl dbt benchmarks, I noticed that around 50% of the kernel time is spent in __copy_user_intel. http://khack.osdl.org/stp/280060/profile/ This function is one of two functions that does the actual memory copy from/to kernel space to/from user space. Unfortunately it's the slower one: Intel cpus have a microcode fastpath for memcopies that are 8-byte aligned. This fastpath is around 50% faster than the manual copy that is used for misaligned (i.e. only 4-byte aligned) pointers. I don't know enough about other cpus, but I'd expect that most cpus prefer well-aligned buffers. How are the user space buffers allocated? So far I found buffile.c, but struct BufFile.buffer is at offset 32, i.e. aligned, although by chance. What is the alignment of the output of palloc? Is buffile.c the main code that reads/writes data to disk? -- Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] [PATCHES] Reorganization of spinlock defines
Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: He is uncomfortable with the port/*.h changes at this point, so it seems I am going to have to add Itanium/Opteron tests to most of those files. Why don't you try to put together a proposed patch of that kind, and then we can look to see how big and ugly it is compared to the other? If the alternative is shown to be really messy, that would sway my opinion, maybe Marc's too. OK, here is an Opteron/Itanium patch that might work. I say might because I don't have a lot of confidence in the current spinlock detection code. There is an uncoupling between the definition of HAS_TEST_AND_SET, the data type used by slock_t, and the assembler code. Is the Itanium tas implementation correct? I think it should be xchg4.aqv instead of just xchg4 - as far as I know a normal atomic exchange is is not a memory barrier on Itanium. At least the Linux kernel version contains cmpxchg4.aqv. -- Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] Reorganization of spinlock defines
Manfred Spraul wrote: Is the Itanium tas implementation correct? I think it should be xchg4.aqv instead of just xchg4 - as far as I know a normal atomic exchange is is not a memory barrier on Itanium. At least the Linux kernel version contains cmpxchg4.aqv. Sorry for the noise, I'm wrong: Itanium automatically uses acquire semantics with xchg. See top of page 16 on http://h21007.www2.hp.com/dspp/files/unprotected/itanium/spinlocks.pdf -- Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] FreeBSD/i386 thread test
Jeroen Ruigrok/asmodai wrote: -On [20030908 23:52], Peter Eisentraut ([EMAIL PROTECTED]) wrote: Why would FreeBSD have a library of thread-safe libc functions (libc_r) if the functions weren't thread-safe? I think the test is faulty. A thread-safe library has a per-thread errno value (i.e. errno is a #define to a function call), thread-safe io buffers for stdio, etc. Some of these changes cause a noticable overhead, thus a seperate library for those users who want to avoid that overhead. Reentrancy is independant from _r: If you look at the prototype of gethostbyname(), it's just not possible to make that thread safe with reasonable effort - the C library would have to keep one buffer per thread around. Having libc_r is not a guarantee that all functions of libc are represented in that library as thread-safe functions. gethostbyname_r() is a notable reentrant function which is absent in FreeBSD. Is there a thread-safe alternate to gethostbyname() for FreeBSD? -- Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [osdldbt-general] Re: [HACKERS] Prelimiary DBT-2 Test results
Another question: Is it possible to apply patches to postgresql before a DBT-2 run, or is only patching the kernel supported? -- Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Prelimiary DBT-2 Test results
[EMAIL PROTECTED] wrote: http://developer.osdl.org/markw/44/ I threw together (kind of sloppily) a web page of the data I was starting to collect for our DBT-2 workload (TPC-C derivative) on PostgreSQL 7.3.4. Keep in mind not much database tuning has been done yet. Feel free to ask any questions. The kernel readprofile output is very odd: sys_ipc receives lots of hits, but that function is a trivial multiplexer. sys_timedsemop, and try_atomic_semop got 0 hits - that's the main implementation of sysv semaphores. Could you double check your readprofile scripts? -- Manfred ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [pgsql-advocacy] [HACKERS] [GENERAL] Postgresql AMD x86-64
Bruce Momjian wrote: if test $enable_debug = yes test $ac_cv_prog_cc_g = yes; then CFLAGS=$CFLAGS -g fi + + /* Compile AMD Opteron using gcc in 64-bit mode */ + if test $GCC = yes; then + case $host in + ia64-*) CFLAGS=$CFLAGS -m64 +LDFLAGS=$LDFLAGS -melf_x86_64;; + esac + fi + Sorry, I think I confused you: ia64-* is Intel's Itanium system. They are 64-bit only cpus (the 32-bit emulation is too slow to be usable). It's supported by multiple operating systems, among them HP UX, Linux, Windows. As far as I can see it's supported directly, by 7.3.3, at least RedHat builds their ia64 version without any patches. x86_64 is AMD's Operon/Athlon 64 system. They support concurrent 32-bit and 64-bit. Right now only supported by Linux, BSD and Windows support expected soon. Thus the test must be for x86_64-*. Martin: you are using debian-testing, correct? I've asked a Suse developer, and on their Linux distribution, -m64 is the default, i.e. you don't need any switches. -- Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] ECPG thread-safety
Shridhar Daithankar wrote: 2) Native freeBSD threads pthread.h in /usr/include and lc_r Do you know if FreeBSD supports pthread_rwlock with PTHREAD_PROCESS_SHARED? I'm trying to replace the LWLocks with pthread_rwlocks. What about other Unices? -- Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster