Hi All, I've been working with Anthony on this gsort issue for a while and think I have come up with something. Firstly though, here is some info on what had been occurring.
Unsorted file: -rw-r--r-- 1 ctmmedw medwusr 11193871748 May 8 16:27 ./rfx63737_unsorted.dat.bu Filesystem: mkfs -F vxfs -o ninode=unlimited,bsize=1024,version=4,inosize=256,logsize=16384,largefiles /dev/vgdata/lvol4 558268416 Mount options: /opt/dev/edw/medw on /dev/vgdata/lvol4 ioerror=mwdisable,delaylog on Sun Apr 30 13:33:21 2006 User privileges: root ulimit: unlimited The original problem was found when using the sort utility bundled with GNU-Coreutils V5.0. We received this as a 32 bit binary as shown here... [udmdwdev]-root > file gsort.HP-UX gsort.HP-UX: PA-RISC2.0 shared executable dynamically linked -not stripped [udmdwdev]-root > The system is an HP SuperDome running HP-UX 11.11 ( 11.i ) and has 8 CPU's with 30GB RAM. This binary would correctly sort data files over 2GB in size, but failed once we got up around 10GB in size. If we used the native HP-UX supplied sort, we got not errors, but the sort process was much slower. I was thinking along the lines you mentioned below and downloaded the 5.94 version of Coreutils and compiled it in 32 bit mode using gcc. The new binary also failed with the original error as shown here... [udmdwdev]-root > timex ../../rfx/bin/gsort_32 -T . -k 1,1 -k 2,2 -t "|" rfx63737_unsorted.dat.bu -o sorted.2 ../../rfx/bin/gsort_32: write failed: ./sortPYsSm3: File too large real 54:19.61 user 48:07.21 sys 3:14.60 [udmdwdev]-root > I also tried compiling with HP's Ansi-C compiler in 32 bit mode and found it no better. However if I compiled it with the Ansi-C compiler in 64bit mode, the sort worked correctly. This started to look like the normal 32bit / 2GB limit issue, only we had used the 32bit version to sort files over 2GB with no errors. It took a while but I think what is happening is that the routine within sort that creates the temporary files is not "largefile" aware, while the final output file routine is. To test this I ran the 32bit version and altered the buffer size ( with -S ) so that the temp files being created would never be larger than 2GB. I used 100M initially so the process of the sort went like this... -read data from source file - sort data - write temp file of 85MB ( resulted in 132 temp files ) -merge sort 16 temp files - write temp file of 1.36GB ( resulted in 9 files ) -merge sort 9 temp files - write output file This worked correctly. what was occurring before was that if a buffer size was not specified, or specified within a particular range, the sort files would grow over 2GB and the sort would fail. This was tested with a buffer size of 500M... -read data from source file - sort data - write temp file of 460MB ( resulted in 26 files ) -merge sort 16 temp files - write temp file of 1.3GB ( resulted in the write error ) As a further test I set the buffer size to 895M and the sort again worked correctly. From this I'm guessing that there is a different write routine used to create the temp files rather than the final output file. It is this routine that isn't "largefile" aware and is causing the sort to fail only if the temp files grow over 2GB in size. Finally I've just completed a test after altering one part of the source code for tempname.c ( lib/tempname.c ) as follows.... Original: 269 switch (kind) 270 { 271 case __GT_FILE: 272 fd = __open (tmpl, O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR); 273 break; 274 275 case __GT_BIGFILE: 276 fd = __open64 (tmpl, O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR); 277 break; 278 279 case __GT_DIR: 280 fd = __mkdir (tmpl, S_IRUSR | S_IWUSR | S_IXUSR); 281 break; 282 Modified: 269 switch (kind) 270 { 271 case __GT_FILE: 272 fd = __open64 (tmpl, O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR); 273 break; 274 275 case __GT_BIGFILE: 276 fd = __open64 (tmpl, O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR); 277 break; 278 279 case __GT_DIR: 280 fd = __mkdir (tmpl, S_IRUSR | S_IWUSR | S_IXUSR); 281 break; 282 Note that I've just used the 64bit open for "non-largefiles". After a recompile the sort worked with the default buffer size ( 506MB in our case ) and it successfully created a 7GB temp file. This leads me to think that "tempname.c" is setting the file open options once at 32 bit and not checking latter to see if any new files should be 64 bit. Regards, Simon. Simon Wing-Tang Senior Systems Engineer IT [EMAIL PROTECTED] Unix Systems CML IT (03) 9635-4208 -----Original Message----- From: Bob Proulx [mailto:[EMAIL PROTECTED] Sent: Thursday, 6 April 2006 5:36 PM To: Anthony Tiemens Cc: bug-coreutils@gnu.org Subject: Re: gsort problem Anthony Tiemens wrote: > My client is using gsort (32bit, HPUX), input file is 10GB. What version of GNU sort are you using? You can find this using the sort --version output. sort --version > gsort fails when writing the output file with the following error > "../../rfx/bin/gsort: write failed: ./sortIjyaoF: File too large" > > file details are "-rw------- 1 root medwusr 2094530560 Apr 6 08:27 sortIjyaoF > " > > It appears to be having trouble writing an output file > 2GB. > > Could you please help. Your compilation of GNU sort on your platform was apparently compiled without large file support. This was common with older versions. On older versions compiled on HP-UX large file support needed to be forced at configured time. This leads me to believe you are using a very old version. The current stable version is 5.94. I am sure your problem is resolved in the current version. ftp://ftp.gnu.org/gnu/coreutils/coreutils-5.94.tar.gz (7.6MB) ftp://ftp.gnu.org/gnu/coreutils/coreutils-5.94.tar.bz2 (4.9MB) Bob This email and any attachments may contain privileged and confidential information and are intended for the named addressee only. If you have received this e-mail in error, please notify the sender and delete this e-mail immediately. Any confidentiality, privilege or copyright is not waived or lost because this e-mail has been sent to you in error. It is your responsibility to check this e-mail and any attachments for viruses. No warranty is made that this material is free from computer virus or any other defect or error. Any loss/damage incurred by using this material is not the sender's responsibility. The sender's entire liability will be limited to resupplying the material. _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils