First,
In factor's listener terminal (not in the gui window, though),
Jon Harper suggested to hit Control-C and t to terminate
a long running code.
I hit Control-C in below case (1), it brings out a low level debugger (what
a pleasant surprise).

Let me ask a question first before I write more about investigating the
issue.
*** in the low-level debugger, one of the commands is 'data' to dump data
heap.
     Is there any way to dump the result to a file ??

--------------------------------------------------------------------------------------------

Summary of further investigation.

The code
0 "a_path_to_big_folder" x [ link-info dup symbolic-link? [ drop ] [ size>>
+ ] if  ] each-file

(1) when x = t  (breadth-first  BFS)
     the memory usage reported by linux's  'top' shows steady increase
     from around 190M to as high as 2GB before either I killed it or it hit
tge
     missing file issue.

(2) when x = f  (Depth-first DFS)
     Watching RES from 'top', I noticed that
     the memory usage even drops from 190M to around 94M before I went home
     and let the code run in the office.
     The next morning, I found that it finished OK with a total-file-size
on the data stack

      But the total-file-size of about 280GB is incorrect.  It should be
around 74GB.

-------------------------------------------------

Just a reminder, our disk has the following properties.

   it is a disk with a tree of directories.
   directory count ~ 6000
   total number of files as of now ~ 1.1 million
   total number of softlinks ~ 570000
   total file size ~ 70GB

   number of files in each sub-directory (not including the files in
sub-directory inside it)
   range from a few hundreds to as high as of the order of <~10K.

   (NOTE-A) Some of the folders are in fact softlinks that links to "OTHER
disk locations".

--------------------------------------------------------

For the above disk,  DFS appears to consume much less memory !
But the resulting file size is incorrect (280GB instead of 70GB).
This is presumably due to (NOTE-A) and the code must have scanned through
those
OTHER disks.  But then the extra scanning appears to be incomplete!
Becasue 280GB is too small.  A complete traversing the above disk plus all
those
OTEHR disks will amount to a few Terabytes.

So, somewhere the traversing was screwed up. This may be another
investigation for another day.

------------------------------------------------------------------

For the case (1), I did Control-C to bring up the low-level debugger.
And type 'data' to look at the data heap content.
It is a LONG LONG list of stuff containing many tuples describing the
directory-entries.

I type 'c' to let the code continue for a while.
Control-C again.
then 'data' to look at the data heap.
Since the list is TOO long to fit the screen, I could not see any
significant difference
in the last few lines of the output between this 'data' and the last one.

It will be nice to be able to dump the 'data' result to a file.
Then a more comprehensive comparison can be done.

I also tried to type 'gc'  to invoke a round of garbage collecting.
But nothing seems to be affected.  The memory as monitored by 'top'
remains unchanged.

----------------------------------------------------------------------------

In closing,  the simple code (with DFS)
    0 "a_path_to_big_folder" f [ link-info dup symbolic-link? [ drop ] [
size>> + ] if  ] each-file
could NOT achieve the intended action --- to sum up the file-size for files
residing in a
disk (as pointed to by a_path_to_big_folder).

A custom iterator needs to be coded, after all.

Finally, the memory issue in the BFS may be just due to that the algorithm
requires a LOT of
memory to store all directory-entries at a certain depth in the tree.
If we can dump the 'data' content in the debugger to a file, I could see
more clearly
by comparing the content at two distinctive moments (say when RES (from
'top')
reaches 1gb and when it reaches 2gb).

--HP
------------------------------------------------------------------------------
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Reply via email to