Re: Disk sync at shutdown and fusefs filesystems
On Mon, 17 Dec 2007 17:41:16 +0100 Csaba Henk <[EMAIL PROTECTED]> wrote: > [This message has also been posted to gmane.os.freebsd.devel.hackers.] > On 2007-12-17, Alejandro Pulver <[EMAIL PROTECTED]> wrote: > > > > - Some "got hung in unmount" issues are to be sorted out (these > >appeared on Linux, and they might or might not appear on FreeBSD). > > > > > > IIRC you are saying that any user could make umount hang. And you said > > this is an unintended behavior caused by the implementation, which > > appeared on Linux and we don't know if it will happen on FreeBSD. > > > > Otherwise the daemon would synchronize the fs and let umount return > > normally, and this wouldn't happen, right? > > > > If this always happens then what is the difference between happening on > > a root/non-root mount, as it will hang anyways? > > > > If I missed the point again please correct me, and clarify the following: > > Does the hang (point 2)/umount stuck (point 3) issues consist of the > > same (I assumed so)? If not, please point out the differences. > > Oh sorry, I see my wording was still not clear enough... > > Point 3) and point 2) are completely different issues. > > Point 2) is about a _bug_ which might make unmount hang (contrary to our > intentions). Point 3) is about the access control _policy_ of the > "mounting with sync unmount" feature which is DOS capable: it enables > some malicious code to make the shutdown sequence hung. > I understand now (about 3). So a user could write a FUSE daemon which never replies properly (or doesn't reply at all) to the DESTROY code, and the kernel module would be waiting indefinitely. Stalling the shutdown sequence. Maybe this could be solved with a timeout (see below). > [Btw, I used/use the "hang", "block", "make stuck" expressions > interchangeably, I'm sorry if it's not correct English or just sounds > unnaturally in some cases.] > I'm not a native English speaker, so these all seemed the same to me. Thanks for the clarifications. > > The same statement with more details: > > Point 2) is about a defect of a naive, straightforward implementation of > handling the DESTROY message in the FUSE library. That is, if the daemon > is mounted with synch umount, under certain circumstances (if I > understood correctly, this amounts to killing the daemon with a SIGTERM [ie., > the FUSE session terminates due the sigterm and not because of doing an > unmount(2) on the fs]) the umount code of the lib falls into an infinite > loop. This is a bug which may or may not affect FreeBSD once DESTROY is > implemented for it -- the umount code in the lib is platform specific, > anyway. So this is just about a possible bug which really falls into > the "we will see it when we get there" category. > O.K. > OTOH, by the essence of the synch umount, mounting an fs daemon w/ synch > umount means that the daemon gets the control over the termination > of the unmount syscall. So being able to mount w/ synch umount assumes > some kind of trusted state -- it enables a malicious daemon to block its > unmounting. It's not a real risk if the unmount is done manually -- when > the person who unmounts the fs observes that the daemon is blocking the > unmount, she can turn to either a forced unmount or killing the daemon. > However, during the shutdown sequence, which is automated, noone will be > there to forcedly terminate the FUSE session, and shutdown might get > stuck this way. > > So we have to decide how to control the access to mounting with synch > umount. This is point 3). > I can see 2 approaches to solving this (not sure if both are possible though): Maybe adding a timeout to the FUSE kernel module? I've seen the syncer daemon say "giving up on ..." when I had ATA errors related to configuration/cable problems (it retried for a while, and as disk couldn't sync, it terminated anyways). So maybe something similar could be implemented. Otherwise IMHO only root should be allowed to do such mounts, or having a sysctl disabled by default to allow users do this (like vfs.usermount). > > Probably I just provide an implementation of it for the kernel module > and add a "sync_umount" option to mount_fusefs(8) and let it be as is > and we'll see how well it works out, and what to do about access control > -wise. > > Csaba When the implementation is ready, and if these problems are sorted out, do you think it could be enabled by default (at least for root)? Because that's the behavior most filesystems would prefer I think. Best Regards, Ale signature.asc Description: PGP signature
Re: Disk sync at shutdown and fusefs filesystems
On Mon, 17 Dec 2007 02:21:53 +0100 Csaba Henk <[EMAIL PROTECTED]> wrote: > >> They have already discovered issues with system shutdown on Linux, and > >> Miklos has implemented a solution for this dubbed as "synchronous > >> umount". According to this, the protocol is enhanced with a new message > >> called DESTROY. Upon unmounting the fs, the kernel sends a DESTROY to > >> the daemon and waits for answer. That is, unmount(2) won't complete > >> until the fs says to the kernel "OK, I'm done". > >> [...] > > Hmm, I don't know much of this, but isn't the Linux problem related to > > flushing its own block device cache? In FreeBSD it doesn't exist (i.e. > > ublio is only user-space), so I wonder if just unmounting before > > shutdown solves the issue. I mean, does the kernel still keep > > information after a FUSE filesystem is unmounted? > > > > Please correct me if I'm wrong. > > > > At least the currently discussed trick only works because it waits a few > > seconds after unmounting to let it flush the cache (but I think it's a > > common fact that filesystems get registered/unregistered with a small > > delay, and may not be related to that). > > The point in synch umount is that you don't need to wait for an ad hoc > amount of time in order to have the various caches flushed / media > sync'd -- it enables the filesystem daemon itself to notify the umount > procedure that it's done and the world can go on. > > The exact nature of caches (userspace / in-kernel) doesn't really make a > difference from this POV. (Implementing the appropriate synchronization > mechanisms is up to the fs daemon.) > I see, thanks for the clarification. > >> - Some "got hung in unmount" issues are to be sorted out (these > >>appeared on Linux, and they might or might not appear on FreeBSD). > >> > > > > Never seen this, but also never unmounted at shutdown before. I have a > > patch for it (see thread). Then we could easily see if it get stalled > > at shutdown (or when manually stopping the rc.d script). > > Of course you've never seen this! -- these have appeared as consequences > of the synchronous umount (ie., the DESTROY message) which is not yet > implemented on FreeBSD. > That's logical. I missed the point. > The actual question is whether it is worth to implement it. For me it > seems "yes", but I don't know the ins and outs of the FreeBSD > init/shutdown system, that's why I'd like to hear the opinion of people > like you about this before I go and code it. > I'm not in the kernel side, but I think it's the correct thing to do. The kernel syncer does the same with other filesystems, so... > Whether hangs occur or not if fuse4bsd does sync umount is not that > important. I mean, first I would code a basic implementation of DESTROY > (that's pretty simple to do!) and we'd see how well that works and if we > see problems I try to tune the implementation. That's just business as > usual. > For what you said, we won't know if there are hangs until we have an implementation of DESTROY. So this will be attended later as you said. > The next issue is more important... > > >> - Security issue: with synch unmount, any user who can mount (w/ synch > >>unmount), is capable of making the unmount stuck (which is easy to > >>fix when the system is up -- just kill the fs daemon -- but can > >>make the shutdown process hopelessly stuck). So we'd have to > >>decide who/when shall be able to do mounts for which the unmount is > >>synchronous. (The current criteria for this on Linux -- ie., > >>is the fuseblk fs variant being used? -- is N/A to FreeBSD for > >>reasons which are OT here. However, Miklos decided to > >>change this so that sych unmount will be tied to the "allow_other" > >>option, which is tied to root privileges, and does make sense > >>on FreeBSD, too. I'd be happy to hear more suitable criteria. > >> > > > > This would depend on the previous point. > > It's more important to specify a suitable security policy -- who/when > should be capable of mounting an fs in a way so that the umount will > be synchronous? (Short term for this: "mounting with synch umount"). > > I think it's quite a "standalone" question, it depends on nothing else. - Some "got hung in unmount" issues are to be sorted out (these appeared on Linux, and they might or might not appear on FreeBSD). IIRC you are saying that any user could make umount hang. And you said this is an unintended behavior caused by the implementation, which appeared on Linux and we don't know if it will happen on FreeBSD. Otherwise the daemon would synchronize the fs and let umount return normally, and this wouldn't happen, right? If this always happens then what is the difference between happening on a root/non-root mount, as it will hang anyways? If I missed the point again please correct me, and clarify the following: Does the hang (point 2)/umount stuck (point 3) issues consist of the same (I assumed so)
Re: Disk sync at shutdown and fusefs filesystems
On Wed, 12 Dec 2007 03:00:07 +0100 Csaba Henk <[EMAIL PROTECTED]> wrote: > [This message has also been posted to gmane.os.freebsd.devel.hackers.] > On 2007-12-11, Alejandro Pulver <[EMAIL PROTECTED]> wrote: > > The problem with NTFS-3G (and all other FUSE based drivers maybe) is > > that it doesn't flush the cache data to the disk at shutdown, but it > > does when unmounted (and I guess this doesn't happen automatically). I > > noticed this when files I write before manually unmounting persist, and > > otherwise sometimes they don't. > > I just happen to discuss this issue with Szaka (ntfs-3g developer) and > Miklos Szeredi (FUSE developer). At least, we're discussing something > which might have a relevance here. > > They have already discovered issues with system shutdown on Linux, and > Miklos has implemented a solution for this dubbed as "synchronous > umount". According to this, the protocol is enhanced with a new message > called DESTROY. Upon unmounting the fs, the kernel sends a DESTROY to > the daemon and waits for answer. That is, unmount(2) won't complete > until the fs says to the kernel "OK, I'm done". > > This was introduced in the following commit (as seen in my HG mirror): > > http://mercurial.creo.hu/repos/fuse-hg/?rev/a5df6fb4a0e6 > > and it's already included in the current sysutils/fusefs-libs port. > > And it wouldn't be hard to add kernel side support for FreeBSD. There > are some questions though: > > - Do you think it could be actually useful for solving the shutdown >issue on FreeBSD? > Hmm, I don't know much of this, but isn't the Linux problem related to flushing its own block device cache? In FreeBSD it doesn't exist (i.e. ublio is only user-space), so I wonder if just unmounting before shutdown solves the issue. I mean, does the kernel still keep information after a FUSE filesystem is unmounted? Please correct me if I'm wrong. At least the currently discussed trick only works because it waits a few seconds after unmounting to let it flush the cache (but I think it's a common fact that filesystems get registered/unregistered with a small delay, and may not be related to that). > - Some "got hung in unmount" issues are to be sorted out (these >appeared on Linux, and they might or might not appear on FreeBSD). > Never seen this, but also never unmounted at shutdown before. I have a patch for it (see thread). Then we could easily see if it get stalled at shutdown (or when manually stopping the rc.d script). > - Security issue: with synch unmount, any user who can mount (w/ synch >unmount), is capable of making the unmount stuck (which is easy to >fix when the system is up -- just kill the fs daemon -- but can >make the shutdown process hopelessly stuck). So we'd have to >decide who/when shall be able to do mounts for which the unmount is >synchronous. (The current criteria for this on Linux -- ie., >is the fuseblk fs variant being used? -- is N/A to FreeBSD for >reasons which are OT here. However, Miklos decided to >change this so that sych unmount will be tied to the "allow_other" >option, which is tied to root privileges, and does make sense >on FreeBSD, too. I'd be happy to hear more suitable criteria. > This would depend on the previous point. Please CC me as I'm not (yet) subscribed. Best Regards, Ale ? ../fusefs-kmod/fusefs-kmod-0.3.9.p1_3.tbz Index: ../fusefs-kmod/Makefile === RCS file: /home/pcvs/ports/sysutils/fusefs-kmod/Makefile,v retrieving revision 1.16 diff -u -r1.16 Makefile --- ../fusefs-kmod/Makefile 15 Nov 2007 19:46:42 - 1.16 +++ ../fusefs-kmod/Makefile 12 Dec 2007 03:12:14 - @@ -7,7 +7,7 @@ PORTNAME= fusefs DISTVERSION= 0.3.9-pre1 -PORTREVISION= 2 +PORTREVISION= 3 CATEGORIES= sysutils kld MASTER_SITES= http://fuse4bsd.creo.hu/downloads/ \ http://am-productions.biz/docs/ Index: ../fusefs-kmod/files/fusefs.in === RCS file: /home/pcvs/ports/sysutils/fusefs-kmod/files/fusefs.in,v retrieving revision 1.4 diff -u -r1.4 fusefs.in --- ../fusefs-kmod/files/fusefs.in 30 Oct 2007 03:10:09 - 1.4 +++ ../fusefs-kmod/files/fusefs.in 12 Dec 2007 03:12:14 - @@ -25,13 +25,29 @@ fusefs_start() { + if kldstat | grep -q fuse\\.ko; then + echo "${name} is already running." + return 0 + fi echo "Starting ${name}." kldload $kmod } fusefs_stop() { + if ! kldstat | grep -q fuse\\.ko; then + echo "${name} is not running." + return 1 + fi echo "Stopping ${name}." + mount | while read dev d1 mountpoint d2; do + case "$dev&q
Re: Disk sync at shutdown and fusefs filesystems
On Tue, 11 Dec 2007 12:22:35 -0800 (PST) Doug Barton <[EMAIL PROTECTED]> wrote: > On Tue, 11 Dec 2007, Alejandro Pulver wrote: > > > Thanks, here is what I've got so far: it seems /dev/fuse[0-9]* devices > > aren't removed after the corresponding filesystem is unmounted (I guess > > they are reused), so instead of listing /dev the list has to be taken > > from 'mount'. > > Yeah, I think that's better than using fstab anyway, since this way we get > them all with limited processing. Wish I'd thought of it. :) > Actually, I tried first with "umount -a -t {fusefs,ntfs-3g,fuse,...}" but didn't work. > > Also there should be a delay between the 'umount' and > > 'kldunload' commands. What do you think about the following > > (replacement for fusefs_stop function)? > > I suppose this is mostly a style difference, but I like to avoid all those > subshells if we can. I also think it might be a good idea to wait a second > between unmounts, just to be paranoid. How about: > > mount | while read dev d1 mountpoint d2; do > case "$dev" in > /dev/fuse[0-9]*) umount $mountpoint ; sleep 1 ;; > esac > done > sleep 1 > It looks fine to me. And what about echoing the mountpoints as they are unmounted? mount | while read dev d1 mountpoint d2; do case "$dev" in /dev/fuse[0-9]*) echo "fusefs: unmounting ${mountpoint}." umount $mountpoint ; sleep 1 ;; esac done Also this checks would avoid kldload/kldunload errors: In fusefs_start: if kldstat | grep -q fuse\\.ko; then echo "${name} is already running." return 0 fi In fusefs_stop: if ! kldstat | grep -q fuse\\.ko; then echo "${name} is not running." return 1 fi Well, the word "loaded" instead of "running" would be better. Also a status command could be added, but I don't think it's needed. Also signature.asc Description: PGP signature
Re: Disk sync at shutdown and fusefs filesystems
On Mon, 10 Dec 2007 20:18:26 -0800 Doug Barton <[EMAIL PROTECTED]> wrote: > Alejandro Pulver wrote: > > > Then I have to look for some way to manually > > unmount FUSE filesystems at shutdown, because they are already mounted > > at startup. I thought about instructing the fusefs-kmod rc.d script to > > unmount FUSE filesystems before attempting to unload the kernel module > > (currently it only loads/unloads fuse.ko). > > Yes, I think that given what we're working with here, that would be a > good idea regardless. It should be pretty easy to do, you can find a > sample of something like what you would want in /etc/rc.d/dumpon. Let > me know if you need help, I'm more than a little interested in getting > fuse-ntfs set up here. > Thanks, here is what I've got so far: it seems /dev/fuse[0-9]* devices aren't removed after the corresponding filesystem is unmounted (I guess they are reused), so instead of listing /dev the list has to be taken from 'mount'. Also there should be a delay between the 'umount' and 'kldunload' commands. What do you think about the following (replacement for fusefs_stop function)? echo "Stopping ${name}." for fs in `mount | grep '^/dev/fuse[0-9]*' | cut -d ' ' -f 1`; do umount $fs done sleep 2 kldunload $kmod Unfortunately it doesn't have a status function to avoid loading when already loaded and the other way, but can easily be added. Best Regards, Ale signature.asc Description: PGP signature
Disk sync at shutdown and fusefs filesystems
Hello. The port fusefs-ntfs (NTFS-3G is the official name) is a NTFS read/write driver using FUSE (a user-space kernel independent API for writing filesystem drivers). The latter uses a (user-space) cache for improving performance as there isn't a block device cache in the kernel, and it was originally made in Linux with that assumption. The problem with NTFS-3G (and all other FUSE based drivers maybe) is that it doesn't flush the cache data to the disk at shutdown, but it does when unmounted (and I guess this doesn't happen automatically). I noticed this when files I write before manually unmounting persist, and otherwise sometimes they don't. So I guess with native (here I mean written directly for the system kernel, using the kernel cache) FreeBSD filesystems the kernel flushes the cache at shutdown, but they aren't unmounted. Generally this isn't a problem since most FUSE filesystems are "virtual" (for example: over SSH, FTP, HTTP, etc.) and don't use cache nor need flushing. But this isn't the case with NTFS-3G. Are my assumptions right? Then I have to look for some way to manually unmount FUSE filesystems at shutdown, because they are already mounted at startup. I thought about instructing the fusefs-kmod rc.d script to unmount FUSE filesystems before attempting to unload the kernel module (currently it only loads/unloads fuse.ko). Thanks and Best Regards, Ale signature.asc Description: PGP signature
Re: dlopen: resolving external library symbols to calling program
On Fri, 30 Nov 2007 19:02:01 +0200 Kostik Belousov <[EMAIL PROTECTED]> wrote: > On Fri, Nov 30, 2007 at 01:28:58PM -0300, Alejandro Pulver wrote: > > Hello. > > > > When I was updating the games/deng port, I found it failed at runtime > > with the following error: > > > > % doomsday > > While opening dynamic library > > /usr/local/lib/libdropengl.so: > > /usr/local/lib/libdropengl.so: Undefined symbol "ArgExists" > > DD_InitDGL: Loading of libdropengl.so failed. > > (null). > > > > The function is defined in m_args.c which is included in both > > "doomsday" and "libdropengl.so". But nm(1) reports it as undefined for > > "libdropengl.so". Also, it is loaded with RTLD_NOW. > > > > % nm `which doomsday` | grep ArgExists > > 080d9ef0 T ArgExists > You are looking at the wrong symbol table. ELF objects have the dynamic > symbol table that is used during run-time linking, and symbol table used > by the static linker ld. The former table is shown by nm -D. > > I suspect that you need to link the doomsday binary with the > --export-dynamic flag. See the info ld for details. > > It worked, thank you very much. I am reading some books that explain the basics of COFF/ELF formats (like Write Great Code Volume 2: Thinking Low-Level, Writing High-Level), but didn't know about the dynamic symbol table. I found the following article which briefly describes it (though it's for Solaris): http://blogs.sun.com/ali/entry/inside_elf_symbol_tables Now that I remember, the games/quakeforge port had the same problem. But someone fixed it by referencing the symbol (it was only one function) with a function pointer so it got exported in the dynamic table. In this case, could that be done with "-u symbol" when linking the executable, or it isn't possible to export a symbol with linker parameters? Thanks and Best Regards, Ale signature.asc Description: PGP signature
dlopen: resolving external library symbols to calling program
Hello. When I was updating the games/deng port, I found it failed at runtime with the following error: % doomsday While opening dynamic library /usr/local/lib/libdropengl.so: /usr/local/lib/libdropengl.so: Undefined symbol "ArgExists" DD_InitDGL: Loading of libdropengl.so failed. (null). The function is defined in m_args.c which is included in both "doomsday" and "libdropengl.so". But nm(1) reports it as undefined for "libdropengl.so". Also, it is loaded with RTLD_NOW. % nm `which doomsday` | grep ArgExists 080d9ef0 T ArgExists % nm /usr/local/lib/libdropengl.so | grep ArgExists U ArgExists The files are linked with the "-flat namespace" and "-undefined suppress" flags in Mac OS X (don't know if it's relevant here). I think the simplest solution (if possible, of course) would be to make dlopen() resolve these symbols to the main executable. I tried to do this with RTLD_GLOBAL without success. The port is available here (note that the application uses cmake to build): ftp://ftp.alepulver.com.ar/deng.tar.bz2 If you need any other information just ask me. I will appreciate any help. Thanks and Best Regards, Ale signature.asc Description: PGP signature
Re: High disk load +mount/atacontrol/NFS/SMBFS crashes the system
On Mon, 23 Apr 2007 22:59:44 -0500 "Rick C. Petty" <[EMAIL PROTECTED]> wrote: > On Mon, Apr 23, 2007 at 09:58:58PM -0300, Alejandro Pulver wrote: > > > > In the machine which was recently upgraded to 6.2 using "atacontrol" > > when the disk is reading/writing crashes the system half or more of the > > times. > > Which atacontrol command were you doing? Just plain "atacontrol" shouldn't > do anything useful. > To change DMA modes, like: # atacontrol mode ad0 UDMA100 Best Regards, Ale signature.asc Description: PGP signature
Re: High disk load +mount/atacontrol/NFS/SMBFS crashes the system
On Sun, 22 Apr 2007 23:26:33 -0300 Alejandro Pulver <[EMAIL PROTECTED]> wrote: [...] > The strange crash in the new 6.2 machine when using atacontrol is still > unexplained and I couldn't make it happen again (it now refuses to > switch to UDMA100 mode when it is SATA300, maybe they aren't supported > in SATA drives, but the other time it just crashed without advise). > In the machine which was recently upgraded to 6.2 using "atacontrol" when the disk is reading/writing crashes the system half or more of the times. Maybe it's a BIOS or hard disk issue, but I can't try it in the new machine because it uses "SATA300" and other modes aren't documented in the manual page. The other machine supports UDMA* modes. Otherwise there is a problem in "atacontrol". When this happens, the disk light keeps blinking but the system freezes, and keeps waiting for the disk to respond forever. Best Regards, Ale signature.asc Description: PGP signature
Re: High disk load +mount/atacontrol/NFS/SMBFS crashes the system
On Sun, 15 Apr 2007 23:33:47 -0700 Garrett Cooper <[EMAIL PROTECTED]> wrote: > Ale, > I'm not sure what's going on exactly based on the information you > provided, but I would try the following steps to isolate the issue: > > 1) See if you can upgrade the first machine to a later version of > FreeBSD, say 6.2. I believe that there were related issues resolved in > 6.2, but my memory could be incorrect. See if your problems occur after > that. I did that. > 2) Try grabbing a different machine if possible and see if the same > issue occurs when you put the new machine as server and client with one > of the other machines. I used a Win XP machine as client / server. > 3) Try switching roles with the 2 machines. If machine 1 is usually > server, let it play client and vice versa with machine 2. Also did this. > 4) Remove the new drive if possible, see if issue goes away. If it does, > try acquiring a cheap(er) drive and put it > It's the only drive it has, I meant the second machine is all new, not just the disk. > Also, it appears that another FreeBSD team member had a similar issue > (see: http://people.freebsd.org/~pho/stress/log/cons205.html and > http://people.freebsd.org/~pho/stress/log/cons225.html). I dunno how but > it showed up as one of the leading searches on Google. > > It looks like a (localized) filesystem issue, but I'm not sure what it > is exactly. > The fsync() problem seems to be related to that, but the rest could be be a different thing. Also I only got it twice. Maybe the filesystem issues were only derived from the crashes. I was unable to reproduce the problem in the first machine, maybe it was fixed on FreeBSD 6.2 as you said. The only things I also did when testing was unloading fuse.ko (unused) and linprocfs.ko (after umounting it). However I will test it a few times more, and let you know the results. The strange crash in the new 6.2 machine when using atacontrol is still unexplained and I couldn't make it happen again (it now refuses to switch to UDMA100 mode when it is SATA300, maybe they aren't supported in SATA drives, but the other time it just crashed without advise). Thank you for your help with this. Best Regards, Ale signature.asc Description: PGP signature
Re: Gaim log writing delays the system
On Sat, 14 Apr 2007 19:18:42 -0300 Alejandro Pulver <[EMAIL PROTECTED]> wrote: > Hello. > > I have enabled logging in Gaim, and when a chat message arrives and it > is logged the disk writing delays (freezes) the system for less than a > second, it can be noticed for example with XMMS which does a strange > sound during that period. > > I think this problem is related to the system and not the port, that's > why I asked here. Also I guess more information is needed about this, > like ktrace/truss output of Gaim together with kernel statistics > (vmstat/iostat). But other than ktrace, I don't use them. What would be > the commands to get the most relevant information about this? > > I am using FreeBSD 6.2-RELEASE, and the boot message is here (the file > dmesg_machine_2.txt): > > http://people.freebsd.org/~alepulver/disk-crash.tar.bz2 > > Thanks and Best Regards, > Ale > > P.S.: please CC me as I'm not subscribed. Hello. The issue only happens with gaim-devel, and seems to be related to the sound system (doesn't happen when disabling sounds). Sorry for the noise, I've been somewhat paranoid with this because it happened when I had a broken NVidia card (or incompatible with my motherboard), and some disk problems (these were apparently fixed in FreeBSD 6.2). So I thought it was a kernel problem (it happened on a machine with an old disk and on a new one so I discarded the possibility of hardware failing). Anyways the system does freeze for less than a second when that happens. I will make a more descriptive thread about this. Best Regards, Ale P.S.: I've also sent this to freebsd-performance@, because I thought that was more appropiate when I thought it was a disk driver problem. signature.asc Description: PGP signature
Re: High disk load +mount/atacontrol/NFS/SMBFS crashes the system
On Sat, 14 Apr 2007 17:40:38 -0700 Garrett Cooper <[EMAIL PROTECTED]> wrote: > Alejandro Pulver wrote: > > Hello. > > > > I have experienced the following problem a couple of times in 2 > > different machines and FreeBSD versions (see below): when the disk is > > continuously reading/writing (like when copying/extracting a file, > > checking the filesystem in the background, etc.) my system crashes > > sometimes (it's not an everyday thing, but quite frustrating when it > > happens). > > > > When copying from another machine by NFS/SMBFS more than one file at > > the same time (or when using the disk, like described above) often > > crashes (and the disk light indicator turns off). Running "atacontrol > > ad0 mode UDMA100" when it was UDMA133 crashed the system (the disk > > activity indicator was always on) when I tried to solve the problem > > that way. Also when I was installing a port which installs many files > > on the second machine without using NFS/SMBFS, trying to mount a local > > NTFS filesystem (with kernel driver) crashed. > > [...] > > Ale, Hello. Thank you for your reply. > Could you provide more information about your machine, in particular > the devices attached (lspci -vv from sysutils/pciutils does the trick) > and the options enabled in your custom kernel please? Sure. I have updated the file (added pci_machine_1.txt and pci_machine_2.txt). The kernel configuration is already there (named ATHLON-PHOBOS), the second machine has a default SMP kernel. http://people.freebsd.org/~alepulver/disk-crash.tar.bz2 > Also, could you provide more information about what the settings are > that you are using for NFS and SMBFS (-rsize, -wsize, special > mountd/rpcbind options, etc). > -Garrett I am not using nothing special here. In rc.conf: rpcbind_enable="YES" nfs_server_enable="YES" nfs_client_enable="YES" And the commands (at different times): # mount deimos:/wxp /mnt # mount -t smbfs //[EMAIL PROTECTED]/c /mnt After both FreeBSD machines crashed when the problem happened (because of the NFS waiting infinitely), I started using "-i". The second command was to copy some data from a Windows machine. BTW I don't think the problem is related to NFS/SMBFS but to the disk drivers, since it happens without them too. One is ATA (has an year) and the other is SATA (new). However I am not experienced in this to tell. Thanks and Best Regards, Ale signature.asc Description: PGP signature
Gaim log writing delays the system
Hello. I have enabled logging in Gaim, and when a chat message arrives and it is logged the disk writing delays (freezes) the system for less than a second, it can be noticed for example with XMMS which does a strange sound during that period. I think this problem is related to the system and not the port, that's why I asked here. Also I guess more information is needed about this, like ktrace/truss output of Gaim together with kernel statistics (vmstat/iostat). But other than ktrace, I don't use them. What would be the commands to get the most relevant information about this? I am using FreeBSD 6.2-RELEASE, and the boot message is here (the file dmesg_machine_2.txt): http://people.freebsd.org/~alepulver/disk-crash.tar.bz2 Thanks and Best Regards, Ale P.S.: please CC me as I'm not subscribed. signature.asc Description: PGP signature
High disk load +mount/atacontrol/NFS/SMBFS crashes the system
Hello. I have experienced the following problem a couple of times in 2 different machines and FreeBSD versions (see below): when the disk is continuously reading/writing (like when copying/extracting a file, checking the filesystem in the background, etc.) my system crashes sometimes (it's not an everyday thing, but quite frustrating when it happens). When copying from another machine by NFS/SMBFS more than one file at the same time (or when using the disk, like described above) often crashes (and the disk light indicator turns off). Running "atacontrol ad0 mode UDMA100" when it was UDMA133 crashed the system (the disk activity indicator was always on) when I tried to solve the problem that way. Also when I was installing a port which installs many files on the second machine without using NFS/SMBFS, trying to mount a local NTFS filesystem (with kernel driver) crashed. The first machine is an Athlon XP 2400+ with FreeBSD 6.1-RELEASE and custom kernel (see below) and the second one a new Athlon64 X2 3500 with FreeBSD 6.2-RELEASE running in i386 mode, with generic SMP kernel. See the boot messages and kernel config here: http://people.freebsd.org/~alepulver/disk-crash.tar.bz2 Also I got (only twice, when checking the filesystem after one of these crashes) the following error on the first machine, that I don't know if it's related or not to the previous problems: fsync: giving up on dirty 0xc51d6990: tag devfs, type VCHR usecount 1, writecount 0, refcount 806 mountedhere 0xc51a4000 flags () v_object 0xc144cb58 ref 0 pages 3232 lock type devfs: EXCL (count 1) by thread 0xc54e2c00 (pid 837) dev ad2s1f I would appreciate any help. If you need more information just ask. Thanks and Best Regards, Ale P.S.: please CC me as I'm not subscribed. signature.asc Description: PGP signature
sysutils/fusefs-ntfs: slow reading/writing speed
Hello. I have tried sysutils/fusefs-ntfs (version 1.0) and had a maximum write speed of 1.2MB/Sec. Reading is a little faster: 2MB/Sec. There were some discussions about this in the ntfs-3g forums, and they said was fixed in the new beta version (now it's stable, see official site), note that by default it uses an option that is not available on FreeBSD's mount_fusefs, so try with "-o no_def_opts". http://forum.ntfs-3g.org/viewtopic.php?p=1330&sid=8e59dcb7050a15378eb93d5659c04409 It makes no difference for me. However I found the following which says "the reason for the slow copy on FreeBSD is the lack of buffer cache for block devices which should be solved in FreeBSD 7.0", and mentions "ublio", a library for user space cache which can be used with it, made by the author of fuse4bsd. The rest of the thread is irrelevant for the matter. http://forum.ntfs-3g.org/viewtopic.php?p=1153&sid=cde9378447762e86345a89130fd267d5 Unfortunately I couldn't make it work, if someone has time, please take a look. It would be really appreciated. I tried contacting the port maintainer without response. Thanks and Best Regards, Ale P.S.: please CC me as I'm not subscribed. signature.asc Description: PGP signature
Re: Program not being executed at all
On Sat, 30 Dec 2006 16:31:03 +0200 Kostik Belousov <[EMAIL PROTECTED]> wrote: [...] > > > > Interestingly 'ldd' also crashes when examining it, outputting the > > > > following (however 'ktrace' has more information): > > > > > > > > /usr/local/bin/quake2max: > > > > /usr/local/bin/quake2max: signal 6 > > > > [...] > > > Please, show the output of the commands > > > file /usr/local/bin/quake2max > > > readelf -ld /usr/local/bin/quake2max > > > [...] > Signal 6 is sent by elf image activator upon exec() when old address space > is destroyed, but new image cannot be loaded. In your case, I guess that > extra large bss section size (where uninitialized global/static variables > are placed) causes loader to fail: > > > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > > LOAD 0x073000 0x080bb000 0x080bb000 0x02cc4 0x28a20e34 RW > > 0x1000 > > Look at MemSiz column. VirtAddr + MemSiz >= 0x3000, and elf interpreter > (/libexec/ld-elf.so.1) is usually mmapped at 0x2800. > > Look at the source for huge global arrays/objects. Hello. Thank you very much for your help, I have found the array; see below. I searched the diff for increments in the macros (it has many global arrays of a size defined with '#define') and the only thing I could find is the following: -#define MAX_DECAL_FRAGMENTS 32 +#define MAX_DECAL_FRAGMENTS 64 But the problem is here: #define MAX_PARTICLES 4096 typedef struct particle_s { /* skip */ decalpolys_tdecal[MAX_DECAL_FRAGMENTS]; /* skip */ } cparticle_t; cparticle_t particles[MAX_PARTICLES]; The size of the cparticle_t type is 68 in my machine. So 68*32*4096 = 8912896, and in the new version it was doubled to 17825792. I have changed the definition back to 32, and now 'readelf' reports the size has been reduced considerably: LOAD 0x07 0x080b8000 0x080b8000 0x03010 0x149a1954 RW 0x1000 BTW this works in Linux (I haven't tried myself but someone else told me), so just for curiosity, does it allocate more memory for loading programs? Best Regards, Ale signature.asc Description: PGP signature
Re: Program not being executed at all
On Sat, 30 Dec 2006 14:21:50 +0200 Kostik Belousov <[EMAIL PROTECTED]> wrote: > On Sat, Dec 30, 2006 at 02:47:18AM -0300, Alejandro Pulver wrote: > > Hello. > > > > I tried to update the port I maintain "games/quake2max", a Quake II > > engine, but when I try to run the compiled executables, except for > > the dedicated server (quake2max-ded) they output "Abort" and quit. > > > > The output of 'ktrace' is the following (it just stops before running > > it): > > > > 82753 ktrace RET ktrace 0 > > 82753 ktrace CALL execve(0xbfbfe320,0xbfbfe844,0xbfbfe84c) > > 82753 ktrace NAMI "/sbin/quake2max" > > 82753 ktrace RET execve -1 errno 2 No such file or directory > > 82753 ktrace CALL execve(0xbfbfe320,0xbfbfe844,0xbfbfe84c) > > 82753 ktrace NAMI "/bin/quake2max" > > 82753 ktrace RET execve -1 errno 2 No such file or directory > > 82753 ktrace CALL execve(0xbfbfe320,0xbfbfe844,0xbfbfe84c) > > 82753 ktrace NAMI "/usr/sbin/quake2max" > > 82753 ktrace RET execve -1 errno 2 No such file or directory > > 82753 ktrace CALL execve(0xbfbfe320,0xbfbfe844,0xbfbfe84c) > > 82753 ktrace NAMI "/usr/bin/quake2max" > > 82753 ktrace RET execve -1 errno 2 No such file or directory > > 82753 ktrace CALL execve(0xbfbfe320,0xbfbfe844,0xbfbfe84c) > > 82753 ktrace NAMI "/usr/games/quake2max" > > 82753 ktrace RET execve -1 errno 2 No such file or directory > > 82753 ktrace CALL execve(0xbfbfe320,0xbfbfe844,0xbfbfe84c) > > 82753 ktrace NAMI "/usr/local/sbin/quake2max" > > 82753 ktrace RET execve -1 errno 2 No such file or directory > > 82753 ktrace CALL execve(0xbfbfe320,0xbfbfe844,0xbfbfe84c) > > 82753 ktrace NAMI "/usr/local/bin/quake2max" > > > > Interestingly 'ldd' also crashes when examining it, outputting the > > following (however 'ktrace' has more information): > > > > /usr/local/bin/quake2max: > > /usr/local/bin/quake2max: signal 6 > > > > My first thought was that it was a GCC bug, so I tried compiling it > > with 4.1 (my system is a FreeBSD 6.1-RELEASE-p1 with GCC 3.4.4 > > 20050518) but it made no difference. > > > > Interestingly the actual "games/quake2max" port works just fine > > (version 0.44), and I couldn't see something suspicious with a quick > > look to the 'diff' output. I have attached a patch to update the port > > in the tree to the 0.45 version. > > > > Could someone investigate this please? > > > > Thanks and Best Regards, > > Ale > > > > P.S.: please CC me since I am not subscribed to the list. > > Please, show the output of the commands > file /usr/local/bin/quake2max > readelf -ld /usr/local/bin/quake2max > > Hello. Thank you for your reply. Here is the output: % file /usr/local/bin/quake2max /usr/local/bin/quake2max: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), for FreeBSD 6.1, dynamically linked (uses shared libs), stripped % readelf -ld /usr/local/bin/quake2max Elf file type is EXEC (Executable file) Entry point 0x80497d0 There are 6 program headers, starting at offset 52 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x34 0x08048034 0x08048034 0x000c0 0x000c0 R E 0x4 INTERP 0xf4 0x080480f4 0x080480f4 0x00015 0x00015 R 0x1 [Requesting program interpreter: /libexec/ld-elf.so.1] LOAD 0x00 0x08048000 0x08048000 0x72875 0x72875 R E 0x1000 LOAD 0x073000 0x080bb000 0x080bb000 0x02cc4 0x28a20e34 RW 0x1000 DYNAMIC0x075a70 0x080bda70 0x080bda70 0x000c0 0x000c0 RW 0x4 NOTE 0x00010c 0x0804810c 0x0804810c 0x00018 0x00018 R 0x4 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .hash .dynsym .dynstr .rel.dyn .rel.plt .init .plt .text .fini .rodata 03 .data .eh_frame .dynamic .ctors .dtors .jcr .got .bss 04 .dynamic 05 .note.ABI-tag Dynamic segment at offset 0x75a70 contains 19 entries: TagType Name/Value 0x0001 (NEEDED) Shared library: [libm.so.4] 0x0001 (NEEDED) Shared library: [libz.so.3] 0x0001 (NEEDED) Shared library: [libc.so.6] 0x000c (INIT) 0x80491dc 0x000d (FINI) 0x80b1748 0x0004 (HASH) 0x8048124 0x0005 (STRTAB) 0x8048b5c 0x0006 (SYMTAB) 0x804846c 0x000a (STRSZ
Re: Program receiving SIGSEGV after exit()
On Sat, 30 Dec 2006 14:15:57 +0200 Kostik Belousov <[EMAIL PROTECTED]> wrote: > On Sat, Dec 30, 2006 at 03:10:35AM -0300, Alejandro Pulver wrote: > > Hello. > > > > The port "games/qudos" keeps running in a loop after exiting from the > > main menu. This is because after calling exit() the program receives a > > SIGSEGV signal, and the signal handler, after intercepting it, calls > > exit() again. > > > > I think it is a problem with the application itself, but I don't know > > what to look for in the source code. > > > > I have attached a 'gdb' backtrace. > > > > Could someone please point me in the right direction? > > > > Thanks and Best Regards, > > Ale > Use _exit() instead of exit() in the handler. Aside question is whether > this handler is needed at all. > > Hello. Thank you for your reply. I searched in the SVN repository and found it originally was _exit(), but changed to exit() just before the release, and after that it was backed out. Yes, I guess the handler is not very useful (also it doesn't produce a core file like the default one). Best Regards, Ale signature.asc Description: PGP signature
Program not being executed at all
Hello. I tried to update the port I maintain "games/quake2max", a Quake II engine, but when I try to run the compiled executables, except for the dedicated server (quake2max-ded) they output "Abort" and quit. The output of 'ktrace' is the following (it just stops before running it): 82753 ktrace RET ktrace 0 82753 ktrace CALL execve(0xbfbfe320,0xbfbfe844,0xbfbfe84c) 82753 ktrace NAMI "/sbin/quake2max" 82753 ktrace RET execve -1 errno 2 No such file or directory 82753 ktrace CALL execve(0xbfbfe320,0xbfbfe844,0xbfbfe84c) 82753 ktrace NAMI "/bin/quake2max" 82753 ktrace RET execve -1 errno 2 No such file or directory 82753 ktrace CALL execve(0xbfbfe320,0xbfbfe844,0xbfbfe84c) 82753 ktrace NAMI "/usr/sbin/quake2max" 82753 ktrace RET execve -1 errno 2 No such file or directory 82753 ktrace CALL execve(0xbfbfe320,0xbfbfe844,0xbfbfe84c) 82753 ktrace NAMI "/usr/bin/quake2max" 82753 ktrace RET execve -1 errno 2 No such file or directory 82753 ktrace CALL execve(0xbfbfe320,0xbfbfe844,0xbfbfe84c) 82753 ktrace NAMI "/usr/games/quake2max" 82753 ktrace RET execve -1 errno 2 No such file or directory 82753 ktrace CALL execve(0xbfbfe320,0xbfbfe844,0xbfbfe84c) 82753 ktrace NAMI "/usr/local/sbin/quake2max" 82753 ktrace RET execve -1 errno 2 No such file or directory 82753 ktrace CALL execve(0xbfbfe320,0xbfbfe844,0xbfbfe84c) 82753 ktrace NAMI "/usr/local/bin/quake2max" Interestingly 'ldd' also crashes when examining it, outputting the following (however 'ktrace' has more information): /usr/local/bin/quake2max: /usr/local/bin/quake2max: signal 6 My first thought was that it was a GCC bug, so I tried compiling it with 4.1 (my system is a FreeBSD 6.1-RELEASE-p1 with GCC 3.4.4 20050518) but it made no difference. Interestingly the actual "games/quake2max" port works just fine (version 0.44), and I couldn't see something suspicious with a quick look to the 'diff' output. I have attached a patch to update the port in the tree to the 0.45 version. Could someone investigate this please? Thanks and Best Regards, Ale P.S.: please CC me since I am not subscribed to the list. Index: quake2max/Makefile === RCS file: /home/pcvs/ports/games/quake2max/Makefile,v retrieving revision 1.4 diff -u -r1.4 Makefile --- quake2max/Makefile 28 Dec 2006 13:55:46 - 1.4 +++ quake2max/Makefile 30 Dec 2006 05:46:07 - @@ -6,8 +6,7 @@ # PORTNAME= quake2max -PORTVERSION= 0.44 -PORTREVISION= 1 +PORTVERSION= 0.45 CATEGORIES= games MASTER_SITES= http://qudos.quakedev.com/linux/quake2/engines/Quake2MaX/:src \ ${MASTER_SITE_LOCAL:S/$/:data/} @@ -24,9 +23,9 @@ USE_BZIP2= yes USE_GMAKE= yes -USE_GCC= 3.2+ +USE_GCC= 3.4+ ALL_TARGET= release -WRKSRC= ${WRKDIR}/Quake2maX-44-src_unix +WRKSRC= ${WRKDIR}/${DISTNAME:S/quake2max/Quake2maX/} OPTIONS= CLIENT "Build client" on \ DEDICATED "Build dedicated server" on \ @@ -40,10 +39,7 @@ PLIST_SUB= LIBDIR="${LIBDIR:S/${PREFIX}\///}" LIBDIR= ${PREFIX}/lib/${PORTNAME} -Q2MAX_DATA= ${PORTNAME}.${PORTVERSION:S/.//}.rar - -# The data is not available and compiled executables do not work for 0.45. -PORTSCOUT= skipv:0.45 +Q2MAX_DATA= ${PORTNAME}.044.rar .include "${.CURDIR}/../quake2-data/Makefile.include" Index: quake2max/distinfo === RCS file: /home/pcvs/ports/games/quake2max/distinfo,v retrieving revision 1.1 diff -u -r1.1 distinfo --- quake2max/distinfo 28 Jul 2006 22:05:00 - 1.1 +++ quake2max/distinfo 30 Dec 2006 05:46:07 - @@ -1,6 +1,6 @@ -MD5 (Quake2maX_0.44-src_unix.tar.bz2) = 862d114541a49df2ef78f2700fde636b -SHA256 (Quake2maX_0.44-src_unix.tar.bz2) = 579aa80b1f26ebb5e7cd4dff4096504c378c7b225dd6c05fd5f076e3a4b5c8b7 -SIZE (Quake2maX_0.44-src_unix.tar.bz2) = 535440 +MD5 (Quake2maX_0.45-src_unix.tar.bz2) = 1bbc2611a8d84711f6a2416d04480430 +SHA256 (Quake2maX_0.45-src_unix.tar.bz2) = daca65e62a359f4ec526d85e809f9f22e66f7d2e70e6b8e0047daa4434499942 +SIZE (Quake2maX_0.45-src_unix.tar.bz2) = 528143 MD5 (quake2max.044.rar) = 8a18fa4a431acbe1891a9666abb210e7 SHA256 (quake2max.044.rar) = a8fd147c747e283438780bc8a4700df9c6173f4417e7ace0c67975036a08bce1 SIZE (quake2max.044.rar) = 2071329 Index: quake2max/files/patch-qcommon__files.c === RCS file: /home/pcvs/ports/games/quake2max/files/patch-qcommon__files.c,v retrieving revision 1.1 diff -u -r1.1 patch-qcommon__files.c --- quake2max/files/patch-qcommon__files.c 28 Jul 2006 22:05:00 - 1.1 +++ quake2max/files/patch-qcommon__files.c 30 Dec 2006 05:46:07 - @@ -1,16 +1,16 @@ ./qcommon/files.c.orig Wed Jan 4 07:14:49 2006 -+++ ./qcommon/files.c Fri Jul 28 13:30:29 2006 -@@ -778,6 +778,9 @@ - Cvar_FullSet ("gamedir", dir, CVAR_SERVERINFO|CVAR_NOSET); +--- qcommon/files.c.orig Wed Jan 4 07:33:05 2006 qcommon/files.c Sat Dec 30 02:02:16 2006 +@@ -775,6 +775,9
Program receiving SIGSEGV after exit()
Hello. The port "games/qudos" keeps running in a loop after exiting from the main menu. This is because after calling exit() the program receives a SIGSEGV signal, and the signal handler, after intercepting it, calls exit() again. I think it is a problem with the application itself, but I don't know what to look for in the source code. I have attached a 'gdb' backtrace. Could someone please point me in the right direction? Thanks and Best Regards, Ale signature.asc Description: PGP signature