Re: A performance question (patch included)
John Beckett wrote: A.J.Mechelynck wrote: What about a different function to return, say, the number of 1K blocks (or the number of times 2^n bytes, with a parameter passed to the function) that a file uses? Yes, that's a much more general and better idea. Since there's probably not much need for this, I think that simplicity would be good. That is, have the function work in a fixed way with no options. Re Dr.Chip's LargeFile script: It occurs to me that another workaround would be to use system() to capture the output of 'ls -l file' or 'dir file' (need an option for which). Then do some funky editing to calculate the number of digits in the file length. If more than 9, treat file as large. I'm playing with a tiny utility to help the LargeFile script. Bluesky: Its code (64-bit file size) could potentially be incorporated in Vim. I'll post results in vim-dev. (I've moved this over to vim-dev) I've attached a patch to vim 7.1 which extends getfsize(); with the patch, getfsize() takes an optional second parameter which gives one the ability to specify a unitsize. In other words, getfsize(eval.c) - 478347 (after the patch) getfsize(eval.c,1000) - 479 (truncated upwards) I'll be awaiting Bram's input before making use of this in LargeFile.vim ! Regards, Chip Campbell *** src/o_eval.c2007-05-25 08:52:12.0 -0400 --- src/eval.c 2007-05-25 09:04:43.0 -0400 *** *** 7094,7100 {getcwd,0, 0, f_getcwd}, {getfontname, 0, 1, f_getfontname}, {getfperm, 1, 1, f_getfperm}, ! {getfsize, 1, 1, f_getfsize}, {getftime, 1, 1, f_getftime}, {getftype, 1, 1, f_getftype}, {getline, 1, 2, f_getline}, --- 7094,7100 {getcwd,0, 0, f_getcwd}, {getfontname, 0, 1, f_getfontname}, {getfperm, 1, 1, f_getfperm}, ! {getfsize, 1, 2, f_getfsize}, {getftime, 1, 1, f_getftime}, {getftype, 1, 1, f_getftype}, {getline, 1, 2, f_getline}, *** *** 10135,10142 { if (mch_isdir(fname)) rettv-vval.v_number = 0; ! else rettv-vval.v_number = (varnumber_T)st.st_size; } else rettv-vval.v_number = -1; --- 10135,10151 { if (mch_isdir(fname)) rettv-vval.v_number = 0; ! else if (argvars[1].v_type == VAR_UNKNOWN) rettv-vval.v_number = (varnumber_T)st.st_size; + else + { + unsigned long unitsize; + unsigned long stsize; + unitsize= get_tv_number(argvars[1]); + stsize= st.st_size/unitsize; + if(stsize*unitsize st.st_size) ++stsize; + rettv-vval.v_number = (varnumber_T) stsize; + } } else rettv-vval.v_number = -1; *** runtime/doc/o_eval.txt 2007-05-25 09:00:08.0 -0400 --- runtime/doc/eval.txt2007-05-25 09:06:19.0 -0400 *** *** 1615,1621 getcmdtype() String return the current command-line type getcwd() String the current working directory getfperm( {fname})String file permissions of file {fname} ! getfsize( {fname})Number size in bytes of file {fname} getfontname( [{name}])String name of font being used getftime( {fname})Number last modification time of file getftype( {fname})String description of type of file {fname} --- 1615,1621 getcmdtype() String return the current command-line type getcwd() String the current working directory getfperm( {fname})String file permissions of file {fname} ! getfsize( {fname} [,unitsize])Number size in bytes of file {fname} getfontname( [{name}])String name of font being used getftime( {fname})Number last modification time of file getftype( {fname})String description of type of file {fname} *** *** 2819,2827 getcwd() The result is a String, which is the name of the current working directory. ! getfsize({fname}) *getfsize()* The result is a Number, which is the size in bytes of the given file {fname}. If {fname} is a directory, 0 is returned. If the file {fname} can't be found, -1 is returned. --- 2819,2829 getcwd() The result is a String, which is the name of the current working directory. ! getfsize({fname} [,unitsize]) *getfsize()* The result is a Number, which is the size in bytes of the given file {fname}. + If unitsize is given, then the file {fname}'s size will be + returned in units of size unitsize
Re: A performance question (patch included)
A.J.Mechelynck wrote: I'm not sure what varnumber_T means: will st.stsize (the dividend) be wide enough to avoid losing bits on the left? varnumber_T is int (long if an sizeof(int) = 3). st.stsize 's size depends on whether 32bit or 64bit integers are available. So, its possible to lose bits: pick a small enough unitsize and a large enough file, st.stsize will end up not being able to fit into a varnumber_T. After all, unitsize could be 1, and getfsize() will behave no differently than it does now. However, unitsize could be 100. My patch divides st.stsize by the unitsize first; presumably in whatever arithmetic is appropriate for working with st.stsize. Regards, Chip Campbell
Re: A performance question (patch included)
A.J.Mechelynck wrote: Yes, yes, but before the division, will it be able to hold the file size? (sorry, I meant st.st_size) Will mch_stat (at line 10134, one line before the context of your patch) be able to return huge file sizes? mch_stat is variously defined, depending on o/s. Under unix, that's the fstat function. This function returns a pointer to a struct stat; the member in question is: st_size. (off_t st_size;/* total size, in bytes */) So, st_size is an off_t. Under linux, an off_t is typedef __kernel_off_toff_t So, I suspect that st_size will be sized by the o/s to handle whatever size files it can handle. Someone with a 64-bit machine, perhaps, could examine this further? BTW, I'm also under the impression that ls itself uses fstat(), so its not likely to be any more informative. Regards, Chip Campbell
Re: A performance question (patch included)
Charles E Campbell Jr wrote: A.J.Mechelynck wrote: I'm not sure what varnumber_T means: will st.stsize (the dividend) be wide enough to avoid losing bits on the left? varnumber_T is int (long if an sizeof(int) = 3). st.stsize 's size depends on whether 32bit or 64bit integers are available. So, its possible to lose bits: pick a small enough unitsize and a large enough file, st.stsize will end up not being able to fit into a varnumber_T. After all, unitsize could be 1, and getfsize() will behave no differently than it does now. However, unitsize could be 100. My patch divides st.stsize by the unitsize first; presumably in whatever arithmetic is appropriate for working with st.stsize. Regards, Chip Campbell Yes, yes, but before the division, will it be able to hold the file size? (sorry, I meant st.st_size) Will mch_stat (at line 10134, one line before the context of your patch) be able to return huge file sizes? Best regards, Tony. -- Real Programmers don't play tennis, or any other sport that requires you to change clothes. Mountain climbing is OK, and real programmers wear their climbing boots to work in case a mountain should suddenly spring up in the middle of the machine room.
Re: A performance question (patch included)
Yakov Lerner wrote: [...] stat() on Linux has 32-bit st_size field (off_t is 32-bit). There is stat64() syscall which uses 'struct stat64' structure where st_size is 64-bit. By defining __USE_LARGEFILE64 at compile-time, stat() is redirected to stat64(). I don't know whether default Linux vim build defines __USE_LARGEFILE64 or not. Yakov :version says: [...] Compilation: gcc -c -I. -Iproto -DHAVE_CONFIG_H -DFEAT_GUI_GTK -I/usr/include/cairo -I/usr/include/freetype2 -I/usr/include/libpng12 -I/opt/gnome/include/gtk-2.0 -I/opt/gnome/lib/gtk-2.0/include -I/opt/gnome/include/atk-1.0 -I/opt/gnome/include/pango-1.0 -I/opt/gnome/include/glib-2.0 -I/opt/gnome/lib/glib-2.0/include -DORBIT2=1 -pthread -I/usr/include/libart-2.0 -I/usr/include/cairo -I/usr/include/freetype2 -I/usr/include/libpng12 -I/usr/include/libxml2 -I/opt/gnome/include/libgnomeui-2.0 -I/opt/gnome/include/libgnome-2.0 -I/opt/gnome/include/libgnomecanvas-2.0 -I/opt/gnome/include/gtk-2.0 -I/opt/gnome/include/gconf/2 -I/opt/gnome/include/libbonoboui-2.0 -I/opt/gnome/include/gnome-vfs-2.0 -I/opt/gnome/lib/gnome-vfs-2.0/include -I/opt/gnome/include/gnome-keyring-1 -I/opt/gnome/include/glib-2.0 -I/opt/gnome/lib/glib-2.0/include -I/opt/gnome/include/orbit-2.0 -I/opt/gnome/include/libbonobo-2.0 -I/opt/gnome/include/bonobo-activation-2.0 -I/opt/gnome/include/pango-1.0 -I/opt/gnome/lib/gtk-2.0/include -I/opt/gnome/include/atk-1.0 -O2 -fno-strength-reduce -Wall -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/lib/perl5/5.8.8/i586-linux-thread-multi/CORE -I/usr/include/python2.5 -pthread -I/usr/include -D_LARGEFILE64_SOURCE=1 -I/usr/lib/ruby/1.8/i586-linux Linking: [...] so, maybe we'll have to check what happens when _LARGEFILE64_SOURCE is defined? I don't find a match in src/ or src/auto/. Best regards, Tony. -- If all these sweet young things were laid end-to-end, I wouldn't be a bit surprised. -- Dorothy Parker
Re: A performance question (utility included)
Charles E Campbell Jr wrote: I've attached a patch to vim 7.1 which extends getfsize() As I've mentioned, I think further testing will be needed before patching Vim for 64-bit file lengths. Here is a possible interim workaround to allow Dr.Chip's LargeFile.vim script to accurately detect large files on many platforms. Attached is a tiny C program to build a tool called filemeg. Example usage: filemeg /path/to/file gives output: 42 which means that the specified file is 42 megabytes (actually, any value from 42 to nearly 43). I was going to work out how to adapt the LargeFile script to use this tool, if the user sets an option to invoke it. But it's taking too long because I don't know enough about Vim, so I'm just presenting the tool at this stage. People may like to check how filemeg works on various systems and report back. I have tried it on files over 4GB on Fedora Core 6 and Windows XP (x86-32 platform). Putting something like this inside Vim would be a bit of a nightmare IMHO because of the extraordinary range of supported compilers, operating systems and hardware. Adapting LargeFile.vim to work with filemeg: - Compile the source and test running at command line. - Put the executable in your path (better: in a Vim directory which the script could invoke somehow). - Set a new script option to use filemeg. - The script BufReadPre would call a new script function. - That function would check the file size with: let result = system('filemeg /path/to/file') - If result is a number, it is the file size in megabytes. - Otherwise, result is Error... and the script should treat the file as large (or maybe not...). I've attached the C source, and included it below for those who don't mind a little wrapping. John /* Output length of specified file in megabytes. * John Beckett 2007/05/25 * For Linux with LFS (large file support), and Win32. * * Output is suitable for reading by a script. * Output is always one line. * If any problem occurs, line starts with Error. * Otherwise, line is the size of the specified file in megabytes. * Size is a truncated integer (file of 3.9MB would give result 3). * The size won't overflow a 32-bit signed integer (Error if it does). * If argument is a directory, result is 0 (done by stat64()). */ #if defined(__linux) # define _LARGEFILE64_SOURCE #elif defined(_WIN32) # define off64_t__int64 # define stat64 _stati64 #endif #include stdio.h #include sys/types.h #include sys/stat.h int main(int argc, char *argv[]) { off64_t size, overflowmask; struct stat64 sb; if ( argc != 2 ) { puts(Error: Need path of file to report its size in megabytes.); return 1; } if ( stat64(argv[1], sb) != 0 ) { puts(Error: Could not get file information.); return 1; } size = sb.st_size 20; /* 2^20 = 1 meg */ overflowmask = 0x7fff;/* ensure 64-bit calculation */ if ( (size ~overflowmask) != 0 ) { puts(Error: File size in megabytes overflows 32-bit signed integer.); return 1; } printf(%d\n, (int)size); return 0; } filemeg.c Description: Binary data
Re: A performance question (patch included)
Charles E Campbell Jr wrote: I'm also under the impression that ls itself uses fstat(), so its not likely to be any more informative. That's likely on some systems, but 'ls -l' gives correct results for files over 4GB on Fedora Core 6 using x86-32. John