Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "NutchFileFormats" page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/NutchFileFormats?action=diff&rev1=6&rev2=7 To economize the handling of large data volumes, [[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/MapFile.html|MapFile]] manages a mapping as two separate files in a subdirectory of its own. The large "data" file stores all keys and values, sorted by the key. The much smaller "index" file points to byte offsets in the data file for a small sample of keys. Only the index file is read into memory. - [[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/ArrayFile.html|ArrayFile]] is a specialization of MapFile, specifically a dense file-based mapping from integers to values where the keys are long integers. Finally you can also see [[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/SetFile.html|SetFile] which is a file representing a file-based set of keys. + [[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/ArrayFile.html|ArrayFile]] is a specialization of MapFile, specifically a dense file-based mapping from integers to values where the keys are long integers. Finally you can also see [[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/SetFile.html|SetFile]] which is a file representing a file-based set of keys. Additional files in [[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/package-summary.html|org.apache.hadoop.io.*]] package contains the actual Writer, Reader and Sorter implementations as well.