Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "NutchFileFormats" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/NutchFileFormats?action=diff&rev1=6&rev2=7

  
  To economize the handling of large data volumes, 
[[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/MapFile.html|MapFile]]
 manages a mapping as two separate files in a subdirectory of its own. The 
large "data" file stores all keys and values, sorted by the key. The much 
smaller "index" file points to byte offsets in the data file for a small sample 
of keys. Only the index file is read into memory.
  
- 
[[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/ArrayFile.html|ArrayFile]]
 is a specialization of MapFile, specifically a dense file-based mapping from 
integers to values where the keys are long integers. Finally you can also see 
[[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/SetFile.html|SetFile]
 which is a file representing a file-based set of keys.
+ 
[[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/ArrayFile.html|ArrayFile]]
 is a specialization of MapFile, specifically a dense file-based mapping from 
integers to values where the keys are long integers. Finally you can also see 
[[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/SetFile.html|SetFile]]
 which is a file representing a file-based set of keys.
  
  Additional files in 
[[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/package-summary.html|org.apache.hadoop.io.*]]
 package contains the actual Writer, Reader and Sorter implementations as well.
  

Reply via email to