First, In factor's listener terminal (not in the gui window, though), Jon Harper suggested to hit Control-C and t to terminate a long running code. I hit Control-C in below case (1), it brings out a low level debugger (what a pleasant surprise).
Let me ask a question first before I write more about investigating the issue. *** in the low-level debugger, one of the commands is 'data' to dump data heap. Is there any way to dump the result to a file ?? -------------------------------------------------------------------------------------------- Summary of further investigation. The code 0 "a_path_to_big_folder" x [ link-info dup symbolic-link? [ drop ] [ size>> + ] if ] each-file (1) when x = t (breadth-first BFS) the memory usage reported by linux's 'top' shows steady increase from around 190M to as high as 2GB before either I killed it or it hit tge missing file issue. (2) when x = f (Depth-first DFS) Watching RES from 'top', I noticed that the memory usage even drops from 190M to around 94M before I went home and let the code run in the office. The next morning, I found that it finished OK with a total-file-size on the data stack But the total-file-size of about 280GB is incorrect. It should be around 74GB. ------------------------------------------------- Just a reminder, our disk has the following properties. it is a disk with a tree of directories. directory count ~ 6000 total number of files as of now ~ 1.1 million total number of softlinks ~ 570000 total file size ~ 70GB number of files in each sub-directory (not including the files in sub-directory inside it) range from a few hundreds to as high as of the order of <~10K. (NOTE-A) Some of the folders are in fact softlinks that links to "OTHER disk locations". -------------------------------------------------------- For the above disk, DFS appears to consume much less memory ! But the resulting file size is incorrect (280GB instead of 70GB). This is presumably due to (NOTE-A) and the code must have scanned through those OTHER disks. But then the extra scanning appears to be incomplete! Becasue 280GB is too small. A complete traversing the above disk plus all those OTEHR disks will amount to a few Terabytes. So, somewhere the traversing was screwed up. This may be another investigation for another day. ------------------------------------------------------------------ For the case (1), I did Control-C to bring up the low-level debugger. And type 'data' to look at the data heap content. It is a LONG LONG list of stuff containing many tuples describing the directory-entries. I type 'c' to let the code continue for a while. Control-C again. then 'data' to look at the data heap. Since the list is TOO long to fit the screen, I could not see any significant difference in the last few lines of the output between this 'data' and the last one. It will be nice to be able to dump the 'data' result to a file. Then a more comprehensive comparison can be done. I also tried to type 'gc' to invoke a round of garbage collecting. But nothing seems to be affected. The memory as monitored by 'top' remains unchanged. ---------------------------------------------------------------------------- In closing, the simple code (with DFS) 0 "a_path_to_big_folder" f [ link-info dup symbolic-link? [ drop ] [ size>> + ] if ] each-file could NOT achieve the intended action --- to sum up the file-size for files residing in a disk (as pointed to by a_path_to_big_folder). A custom iterator needs to be coded, after all. Finally, the memory issue in the BFS may be just due to that the algorithm requires a LOT of memory to store all directory-entries at a certain depth in the tree. If we can dump the 'data' content in the debugger to a file, I could see more clearly by comparing the content at two distinctive moments (say when RES (from 'top') reaches 1gb and when it reaches 2gb). --HP
------------------------------------------------------------------------------
_______________________________________________ Factor-talk mailing list Factor-talk@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/factor-talk