Re: Hadoop serialization compression and precision loss

Dhruv Kumar Thu, 14 Jul 2011 07:41:12 -0700

On Thu, Jul 14, 2011 at 10:29 AM, Sean Owen <[email protected]> wrote:


> Serialization itself has no effect on accuracy; doubles are encoded exactly
> as they are in memory.
> That's not to say that there may be an accuracy issue in how some
> computation proceeds, but it is not a function of serialization.
>

Interesting, are there factors specific to Hadoop (not just subtleties of
Java or the OS) which can affect accuracy and I should be concerned about?

Also, Sequence File stores compressed key value pairs does it not? Is that
compression lossy?


> On Thu, Jul 14, 2011 at 2:54 PM, Dhruv Kumar <[email protected]> wrote:
>
> > What are the algorithms and codecs used in Hadoop to compress data and
> pass
> > it around between mappers and reducers? I'm curious to understand the
> > effects it has (if any) on double precision values.
> >
> > So far my trainer (MAHOUT-627) uses unscaled EM training and I'm soon
> > starting the work on using log-scaled values for improved accuracy and
> > minimizing underflow. It will be interesting to compare the accuracy of
> the
> > unscaled and log scaled variants so I'm curious.
> >
>

Re: Hadoop serialization compression and precision loss

Reply via email to