By snapshots, I mean that you can freeze a copy of a portion of the the file system for later use as a backup or reference. By mirror, I mean that a snapshot can be transported to another location in the same cluster or to another cluster and the mirrored image will be updated atomically to the new state.
See http://mapr.com/products/why-mapr for more info. On Tue, Nov 8, 2011 at 3:42 AM, donal0412 <donal0...@gmail.com> wrote: > Thanks! That's exactly what I want. > And Ted, what do you mean by "snapshots and mirrors" ? > On 2011/11/8 16:21, Harsh J wrote: > >> Block sizes are per-file, not permanently set on the HDFS. So create >> your files with a sufficiently large block size (2G is OK if it fits >> your usecase well). This way you won't have block splits, as you >> desire. >> >> For example, to upload a file via the shell with a tweaked blocksize, I'd >> do: >> >> hadoop dfs -Ddfs.block.size=2147483648 -copyFromLocal localFile >> remoteFile >> >> Packet sizes are not what you want to tweak here. >> >> On Tue, Nov 8, 2011 at 1:02 PM, donal0412<donal0...@gmail.com> wrote: >> >>> Hi, >>> I want to store lots of files in HDFS, the file size is<= 2G. >>> I don't want the file to split into blocks,because I need the whole file >>> while processing it, and I don't want to transfer blocks to one node when >>> processing it. >>> A easy way to do this would be set dfs.write.packet.size to 2G, I wonder >>> if >>> some one has similar experiences or known whether this is practicable. >>> Will there be performance problems when set the packet size to a big >>> number? >>> >>> Thanks! >>> donal >>> >>> >> >> >