Re: A performance question (patch included)

2007-05-25 Thread Charles E Campbell Jr

John Beckett wrote:


A.J.Mechelynck wrote:


What about a different function to return, say, the number of
1K blocks (or the number of times 2^n bytes, with a parameter
passed to the function) that a file uses?



Yes, that's a much more general and better idea.

Since there's probably not much need for this, I think that
simplicity would be good. That is, have the function work in a
fixed way with no options.

Re Dr.Chip's LargeFile script: It occurs to me that another
workaround would be to use system() to capture the output of
'ls -l file' or 'dir file' (need an option for which).

Then do some funky editing to calculate the number of digits in
the file length. If more than 9, treat file as large.

I'm playing with a tiny utility to help the LargeFile script.
Bluesky: Its code (64-bit file size) could potentially be
incorporated in Vim. I'll post results in vim-dev.



(I've moved this over to vim-dev)

I've attached a patch to vim 7.1 which extends getfsize(); with the 
patch, getfsize() takes an optional
second parameter which gives one the ability to specify a unitsize.  
In other words,


getfsize(eval.c)  - 478347 (after the patch)

getfsize(eval.c,1000)  - 479   (truncated upwards)

I'll be awaiting Bram's input before making use of this in LargeFile.vim !

Regards,
Chip Campbell



*** src/o_eval.c2007-05-25 08:52:12.0 -0400
--- src/eval.c  2007-05-25 09:04:43.0 -0400
***
*** 7094,7100 
  {getcwd,0, 0, f_getcwd},
  {getfontname,   0, 1, f_getfontname},
  {getfperm,  1, 1, f_getfperm},
! {getfsize,  1, 1, f_getfsize},
  {getftime,  1, 1, f_getftime},
  {getftype,  1, 1, f_getftype},
  {getline,   1, 2, f_getline},
--- 7094,7100 
  {getcwd,0, 0, f_getcwd},
  {getfontname,   0, 1, f_getfontname},
  {getfperm,  1, 1, f_getfperm},
! {getfsize,  1, 2, f_getfsize},
  {getftime,  1, 1, f_getftime},
  {getftype,  1, 1, f_getftype},
  {getline,   1, 2, f_getline},
***
*** 10135,10142 
  {
if (mch_isdir(fname))
rettv-vval.v_number = 0;
!   else
rettv-vval.v_number = (varnumber_T)st.st_size;
  }
  else
  rettv-vval.v_number = -1;
--- 10135,10151 
  {
if (mch_isdir(fname))
rettv-vval.v_number = 0;
!   else if (argvars[1].v_type == VAR_UNKNOWN)
rettv-vval.v_number = (varnumber_T)st.st_size;
+   else
+   {
+   unsigned long unitsize;
+   unsigned long stsize;
+   unitsize= get_tv_number(argvars[1]);
+   stsize= st.st_size/unitsize;
+   if(stsize*unitsize  st.st_size) ++stsize;
+   rettv-vval.v_number = (varnumber_T) stsize;
+   }
  }
  else
  rettv-vval.v_number = -1;
*** runtime/doc/o_eval.txt  2007-05-25 09:00:08.0 -0400
--- runtime/doc/eval.txt2007-05-25 09:06:19.0 -0400
***
*** 1615,1621 
  getcmdtype()  String  return the current command-line type
  getcwd()  String  the current working directory
  getfperm( {fname})String  file permissions of file {fname}
! getfsize( {fname})Number  size in bytes of file {fname}
  getfontname( [{name}])String  name of font being used
  getftime( {fname})Number  last modification time of file
  getftype( {fname})String  description of type of file {fname}
--- 1615,1621 
  getcmdtype()  String  return the current command-line type
  getcwd()  String  the current working directory
  getfperm( {fname})String  file permissions of file {fname}
! getfsize( {fname} [,unitsize])Number  size in bytes of file {fname}
  getfontname( [{name}])String  name of font being used
  getftime( {fname})Number  last modification time of file
  getftype( {fname})String  description of type of file {fname}
***
*** 2819,2827 
  getcwd()  The result is a String, which is the name of the current
working directory.
  
! getfsize({fname}) *getfsize()*
The result is a Number, which is the size in bytes of the
given file {fname}.
If {fname} is a directory, 0 is returned.
If the file {fname} can't be found, -1 is returned.
  
--- 2819,2829 
  getcwd()  The result is a String, which is the name of the current
working directory.
  
! getfsize({fname} [,unitsize]) *getfsize()*
The result is a Number, which is the size in bytes of the
given file {fname}.
+   If unitsize is given, then the file {fname}'s size will be
+   returned in units of size unitsize 

Re: A performance question (patch included)

2007-05-25 Thread Charles E Campbell Jr

A.J.Mechelynck wrote:

I'm not sure what varnumber_T means: will st.stsize (the dividend) be 
wide enough to avoid losing bits on the left?


varnumber_T is int (long if an sizeof(int) = 3).

st.stsize 's size depends on whether 32bit or 64bit integers are available.

So, its possible to lose bits: pick a small enough unitsize and a large 
enough file, st.stsize will end up not being able
to fit into a varnumber_T.  After all, unitsize could be 1, and 
getfsize() will behave no differently than it does now.
However, unitsize could be 100.  My patch divides st.stsize by the 
unitsize first; presumably in whatever

arithmetic is appropriate for working with st.stsize.

Regards,
Chip Campbell



Re: A performance question (patch included)

2007-05-25 Thread Charles E Campbell Jr

A.J.Mechelynck wrote:

Yes, yes, but before the division, will it be able to hold the file 
size? (sorry, I meant st.st_size) Will mch_stat (at line 10134, one 
line before the context of your patch) be able to return huge file 
sizes?


mch_stat is variously defined, depending on o/s.
Under unix, that's the fstat function.
This function returns a pointer to a struct stat; the member in question 
is: st_size.

(off_t st_size;/* total size, in bytes */)

So, st_size is an off_t.

Under linux, an off_t is  typedef __kernel_off_toff_t

So, I suspect that st_size will be sized by the o/s to handle whatever 
size files it can handle.

Someone with a 64-bit machine, perhaps, could examine this further?

BTW, I'm also under the impression that ls itself uses fstat(), so its 
not likely to be any

more informative.

Regards,
Chip Campbell



Re: A performance question (patch included)

2007-05-25 Thread A.J.Mechelynck

Charles E Campbell Jr wrote:

A.J.Mechelynck wrote:

I'm not sure what varnumber_T means: will st.stsize (the dividend) be 
wide enough to avoid losing bits on the left?


varnumber_T is int (long if an sizeof(int) = 3).

st.stsize 's size depends on whether 32bit or 64bit integers are available.

So, its possible to lose bits: pick a small enough unitsize and a large 
enough file, st.stsize will end up not being able
to fit into a varnumber_T.  After all, unitsize could be 1, and 
getfsize() will behave no differently than it does now.
However, unitsize could be 100.  My patch divides st.stsize by the 
unitsize first; presumably in whatever

arithmetic is appropriate for working with st.stsize.

Regards,
Chip Campbell



Yes, yes, but before the division, will it be able to hold the file size? 
(sorry, I meant st.st_size) Will mch_stat (at line 10134, one line before the 
context of your patch) be able to return huge file sizes?



Best regards,
Tony.
--
Real Programmers don't play tennis, or any other sport that requires
you to change clothes.  Mountain climbing is OK, and real programmers
wear their climbing boots to work in case a mountain should suddenly
spring up in the middle of the machine room.


Re: A performance question (patch included)

2007-05-25 Thread A.J.Mechelynck

Yakov Lerner wrote:
[...]
stat() on Linux has 32-bit st_size field (off_t is 32-bit). There is 
stat64()

syscall which uses 'struct stat64' structure where st_size is 64-bit. By
defining __USE_LARGEFILE64 at compile-time, stat() is redirected to
stat64(). I don't know whether default Linux vim build defines
__USE_LARGEFILE64 or not.

Yakov



:version says:

[...]
Compilation: gcc -c -I. -Iproto -DHAVE_CONFIG_H -DFEAT_GUI_GTK 
-I/usr/include/cairo -I/usr/include/freetype2 -I/usr/include/libpng12 
-I/opt/gnome/include/gtk-2.0 -I/opt/gnome/lib/gtk-2.0/include 
-I/opt/gnome/include/atk-1.0 -I/opt/gnome/include/pango-1.0 
-I/opt/gnome/include/glib-2.0 -I/opt/gnome/lib/glib-2.0/include   -DORBIT2=1 
-pthread -I/usr/include/libart-2.0 -I/usr/include/cairo 
-I/usr/include/freetype2 -I/usr/include/libpng12 -I/usr/include/libxml2 
-I/opt/gnome/include/libgnomeui-2.0 -I/opt/gnome/include/libgnome-2.0 
-I/opt/gnome/include/libgnomecanvas-2.0 -I/opt/gnome/include/gtk-2.0 
-I/opt/gnome/include/gconf/2 -I/opt/gnome/include/libbonoboui-2.0 
-I/opt/gnome/include/gnome-vfs-2.0 -I/opt/gnome/lib/gnome-vfs-2.0/include 
-I/opt/gnome/include/gnome-keyring-1 -I/opt/gnome/include/glib-2.0 
-I/opt/gnome/lib/glib-2.0/include -I/opt/gnome/include/orbit-2.0 
-I/opt/gnome/include/libbonobo-2.0 -I/opt/gnome/include/bonobo-activation-2.0 
-I/opt/gnome/include/pango-1.0 -I/opt/gnome/lib/gtk-2.0/include 
-I/opt/gnome/include/atk-1.0 -O2 -fno-strength-reduce -Wall 
-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING 
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 
-I/usr/lib/perl5/5.8.8/i586-linux-thread-multi/CORE  -I/usr/include/python2.5 
-pthread -I/usr/include  -D_LARGEFILE64_SOURCE=1  -I/usr/lib/ruby/1.8/i586-linux

Linking: [...]

so, maybe we'll have to check what happens when _LARGEFILE64_SOURCE is 
defined? I don't find a match in src/ or src/auto/.



Best regards,
Tony.
--
If all these sweet young things were laid end-to-end, I wouldn't be a
bit surprised.
-- Dorothy Parker


Re: A performance question (utility included)

2007-05-25 Thread John Beckett

Charles E Campbell Jr wrote:

I've attached a patch to vim 7.1 which extends getfsize()


As I've mentioned, I think further testing will be needed before
patching Vim for 64-bit file lengths.

Here is a possible interim workaround to allow Dr.Chip's
LargeFile.vim script to accurately detect large files on many
platforms.

Attached is a tiny C program to build a tool called filemeg.
 Example usage:  filemeg /path/to/file
 gives output:   42

which means that the specified file is 42 megabytes (actually,
any value from 42 to nearly 43).

I was going to work out how to adapt the LargeFile script to use
this tool, if the user sets an option to invoke it. But it's
taking too long because I don't know enough about Vim, so I'm
just presenting the tool at this stage.

People may like to check how filemeg works on various systems
and report back. I have tried it on files over 4GB on Fedora
Core 6 and Windows XP (x86-32 platform).

Putting something like this inside Vim would be a bit of a
nightmare IMHO because of the extraordinary range of supported
compilers, operating systems and hardware.

Adapting LargeFile.vim to work with filemeg:
- Compile the source and test running at command line.
- Put the executable in your path (better: in a Vim
 directory which the script could invoke somehow).
- Set a new script option to use filemeg.
- The script BufReadPre would call a new script function.
- That function would check the file size with:
   let result = system('filemeg /path/to/file')
- If result is a number, it is the file size in megabytes.
- Otherwise, result is Error... and the script should
 treat the file as large (or maybe not...).

I've attached the C source, and included it below for those who
don't mind a little wrapping.

John

/* Output length of specified file in megabytes.
* John Beckett 2007/05/25
* For Linux with LFS (large file support), and Win32.
*
* Output is suitable for reading by a script.
* Output is always one line.
* If any problem occurs, line starts with Error.
* Otherwise, line is the size of the specified file in megabytes.
* Size is a truncated integer (file of 3.9MB would give result 3).
* The size won't overflow a 32-bit signed integer (Error if it does).
* If argument is a directory, result is 0 (done by stat64()).
*/

#if defined(__linux)
# define _LARGEFILE64_SOURCE
#elif defined(_WIN32)
# define off64_t__int64
# define stat64 _stati64
#endif

#include stdio.h
#include sys/types.h
#include sys/stat.h

int main(int argc, char *argv[])
{
   off64_t size, overflowmask;
   struct stat64 sb;
   if ( argc != 2 ) {
   puts(Error: Need path of file to report its size in megabytes.);
   return 1;
   }
   if ( stat64(argv[1], sb) != 0 ) {
   puts(Error: Could not get file information.);
   return 1;
   }
   size = sb.st_size  20;  /* 2^20 = 1 meg */
   overflowmask = 0x7fff;/* ensure 64-bit calculation */
   if ( (size  ~overflowmask) != 0 ) {
   puts(Error: File size in megabytes overflows 32-bit signed 
integer.);

   return 1;
   }
   printf(%d\n, (int)size);
   return 0;
}


filemeg.c
Description: Binary data


Re: A performance question (patch included)

2007-05-25 Thread John Beckett

Charles E Campbell Jr wrote:

I'm also under the impression that ls itself uses fstat(),
so its not likely to be any more informative.


That's likely on some systems, but 'ls -l' gives correct results
for files over 4GB on Fedora Core 6 using x86-32.

John