If I'm reading the NDFS.java code correctly, it should be possible to have different data nodes with different capacity, is that right? What needs to be done is just to read the capacity value (currently hardcoded to "60 * GIGABYTE") from each node's NutchConf (BTW: while I'm here I could change it do teh right thing, per the comment..).
Yes, it should be able to do that. Another option that we'd like to add is to be able to configure it to use all space available on a device. For that we need to use Runtime.exec("du") and parse the output.
I'm curious though how the block replication works then...
Only Mike knows for sure, but I'll give you my understanding. The name node is the only node that knows block replication counts. When it detects that a data node is down (no recent heartbeat) then its blocks replication counts are decremented. Replication requests are then issued for all blocks which are now under-replicated. (The throttling of replication requests is not yet right, I think.) When a node comes online, it reports which blocks it has, and the block counts are incremented. Any blocks whose replication count is too high will cause the name node to issue block deletion requests. Is that right, Mike?
Is that what you were asking about, Andrzej?
Doug
------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
