Hadoop serialization compression and precision loss

Dhruv Kumar Thu, 14 Jul 2011 06:54:51 -0700

What are the algorithms and codecs used in Hadoop to compress data and pass
it around between mappers and reducers? I'm curious to understand the
effects it has (if any) on double precision values.


So far my trainer (MAHOUT-627) uses unscaled EM training and I'm soon
starting the work on using log-scaled values for improved accuracy and
minimizing underflow. It will be interesting to compare the accuracy of the
unscaled and log scaled variants so I'm curious.

Hadoop serialization compression and precision loss

Reply via email to