[Factor-talk] to get total file size for millions of files

2015-09-28 Thread HP wei
In our environment, we sometime have a folder with
as many as a couple million files in it.

(1) the word: each-file ( path bfs? quot -- )
 Does it handle a file successively by quot without first gathering all
 the file-paths in the path ?

(2) what is the idiomatic way to get the total file size for all files
 in a folder (and its sub-folders) ?
 Using each-file in (1), I am forced to set up a global 'variable'
called
 total-size.

 --

 If I would to process a big file to collect some info, I could write:

 "path-to-file"  ascii  [  V{ }  [ quot ]  each-line  ]
with-file-reader

 The collected info is on the stack after the above finishes.

 

 To go through a huge directory (folder),
 do you know if the current factor can set up something similar ?


Thanks
HP Wei
--
___
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk


Re: [Factor-talk] to get total file size for millions of files

2015-09-28 Thread John Benediktsson
Answers below:

(1) the word: each-file ( path bfs? quot -- )
>  Does it handle a file successively by quot without first gathering all
>  the file-paths in the path ?
>

It has a queue of paths to process, for each directory it pushes all the
paths into the queue, and then for each one it calls the quotation on, and
if its a directory, recurses to handle that directory contents (depth-first
or breadth-first).  So, in your case with a couple million files in a
single directory, it would make a queue of a couple million, then process
it.  Now, if your million files are in a tree of directories, then it would
be more efficient because it would list a directory into the queue, then
continue processing.


> (2) what is the idiomatic way to get the total file size for all files
>  in a folder (and its sub-folders) ?
>  Using each-file in (1), I am forced to set up a global 'variable'
> called
>  total-size.
>

Does this not work?

0 "/path/to/directory" t [ link-info size>> + ] each-file

 --
>
>  If I would to process a big file to collect some info, I could write:
>
>  "path-to-file"  ascii  [  V{ }  [ quot ]  each-line  ]
> with-file-reader
>
>  The collected info is on the stack after the above finishes.
>
>  
>
>  To go through a huge directory (folder),
>  do you know if the current factor can set up something similar ?
>

I'm not sure what you're asking - unless you're wondering if a directory
will millions of files are efficiently handled iteratively, and the answer
is somewhat, since we currently list all then process.

If you look at how ``(directory-entries)`` is implemented, we could easily
build a word that applies a quotation to each entry in an iterative fashion
without making the sequence of all entries first.  If that's important for
your performance...

Thanks,
John.
--
___
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk