2015-10-02 20:12 GMT+02:00 HP wei <hpwe...@gmail.com>:
> First,
> In factor's listener terminal (not in the gui window, though),
> Jon Harper suggested to hit Control-C and t to terminate
> a long running code.
> I hit Control-C in below case (1), it brings out a low level debugger (what
> a pleasant surprise).
>
> Let me ask a question first before I write more about investigating the
> issue.
> *** in the low-level debugger, one of the commands is 'data' to dump data
> heap.
>      Is there any way to dump the result to a file ??

No. But you can easily log the console output:

    ./factor -run=readline-listener |& tee -i out.log


> Summary of further investigation.
>
> The code
> 0 "a_path_to_big_folder" x [ link-info dup symbolic-link? [ drop ] [ size>>
> + ] if  ] each-file

I believe this code is a rough example on how to do it. To count disk
usage in a real Linux directory tree is much more involved than
that. You need to account for hard links, virtual file systems, volatile
files and much more. Look at all switches "man du" lists -- it is
complicated.


> (1) when x = t  (breadth-first  BFS)
>      the memory usage reported by linux's  'top' shows steady increase
>      from around 190M to as high as 2GB before either I killed it or it hit
> tge
>      missing file issue.

I don't think you are hitting a missing file issue. In
/proc/<factor-pid>/fd there is an extra ephemeral file which shows up
because listing the contents of a directory requires opening a file
which creates a file descriptor. You can trigger the same problem in
Python using:

    [os.stat(f) for f in os.listdir('/proc/%d/fd' % os.getpid())]


>       But the total-file-size of about 280GB is incorrect.  It should be
> around 74GB.

This could be because the size of /proc files are counted. Especially
the /proc/kcore file is enormous.


> For the above disk,  DFS appears to consume much less memory !
> But the resulting file size is incorrect (280GB instead of 70GB).
> This is presumably due to (NOTE-A) and the code must have scanned through
> those
> OTHER disks.  But then the extra scanning appears to be incomplete!

It's hard to say what might be up. But if the disks are mounted under
the directory you supplied to each-file, then the files on those disks
will be counted.

> In closing,  the simple code (with DFS)
>     0 "a_path_to_big_folder" f [ link-info dup symbolic-link? [ drop ] [
> size>> + ] if  ] each-file
> could NOT achieve the intended action --- to sum up the file-size for files
> residing in a
> disk (as pointed to by a_path_to_big_folder).

That is not surprising. Here is a better method to do it:

USING: accessors combinators.short-circuit continuations
io.directories.search io.files.info io.files.types kernel math
math.order namespaces sets ;

! Filter hardlinks
SYMBOL: seen-inos

: regular-file-size ( file-info -- s )
    ! In case it's one of the fake huge /proc files
    [ size>> ] [ size-on-disk>> ] bi min ;

: count-file-info? ( link -- s )
    {
        [ type>> +regular-file+ = ]
        [
            { [ nlink>> 1 = ] [ ino>> seen-inos get ?adjoin ] } 1||
        ]
    } 1&& ;

: file-info-size ( link -- s )
    dup count-file-info? [ regular-file-size ] [ drop 0 ] if ;

: file-size ( path -- s )
    [ link-info file-info-size ] [ 2drop 0 ] recover ;

: du-tree ( path -- s )
    HS{ } clone seen-inos set
    0 swap t [ file-size + ] each-file ;

It gives a decent disk usage counts for me. It underreports the total in
comparison with "du -s --si" because I excluded directory sizes.


--
mvh/best regards Björn Lindqvist

------------------------------------------------------------------------------
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Reply via email to