Cluster Summary I am running a crawl on about 1 Million web domains. After 30% Map is done I see the following usage The Non DFS uses seems very high like 31G. This means nutch is creating too many temporary files local to that node. Is this correct ? Hoping someone will answer this post with at least a Ok/not Ok. First Crawl on the hadoop. No other jobs running. DFS had 10G of data before this job started
314 files and directories, 460 blocks = 774 total. Heap Size is 14.82 MB / 966.69 MB (1%) Configured Capacity : 377.91 GB DFS Used : 60.31 GB Non DFS Used : 31.58 GB DFS Remaining : 286.02 GB DFS Used% : 15.96 % DFS Remaining% : 75.69 % Live Nodes : 8 Dead Nodes : 0 -- View this message in context: http://www.nabble.com/Nutch1.0-hadoop-dfs-usage-doesnt-seem-right-.-experience-users-please-comment-tp23454975p23454975.html Sent from the Nutch - User mailing list archive at Nabble.com.
