What are the algorithms and codecs used in Hadoop to compress data and pass
it around between mappers and reducers? I'm curious to understand the
effects it has (if any) on double precision values.

So far my trainer (MAHOUT-627) uses unscaled EM training and I'm soon
starting the work on using log-scaled values for improved accuracy and
minimizing underflow. It will be interesting to compare the accuracy of the
unscaled and log scaled variants so I'm curious.

Reply via email to