Hello, It seems that data replication in HDFS is simply data copy among nodes. Has anyone considered to use a better encoding to reduce the data size? Say, a block of data is split into N pieces, and as long as M pieces of data survive in the network, we can regenerate original data.
There are many benefits to reduce the data size. It can save network and disk benefit, and thus reduce energy consumption. Computation power might be a concern, but we can use GPU to encode and decode. But maybe the idea is stupid or it's hard to reduce the data size. I would like to hear your comments. Thanks, Da