Thanks Josh, I have a few following questions
so let's say with the default scaleFactor how much approximation should we
assume like +/- 1%?
How does scaleFactor affect the size of the object?
Can this be a part of Crunch as an enhancement to the current Join strategy?

Thanks
Jinal


On Mon, Feb 24, 2014 at 1:01 PM, Josh Wills <[email protected]> wrote:

> Ah, cool. the long getSize() method will return Crunch's estimate of the
> size of the object in bytes, but it's good to keep in mind that it's a very
> rough approximation based on the size of the file on disk and any info we
> have about the behavior of any DoFns that are applied to the PTable when it
> is processed, which is communicated via the scaleFactor() function on each
> DoFn.
>
>
> On Mon, Feb 24, 2014 at 10:57 AM, Jinal Shah <[email protected]
> >wrote:
>
> > By size I meant the memory size sorry for the confusion. Like how much
> > memory will a PTable object require. Basically what I'm trying to do is
> if
> > the object is not that large and if it could fit in memory I wanted to
> > apply map-side join to optimize the join and depending on that I also
> > wanted to determine which one is smaller to use the Left join.
> >
> >
> > On Mon, Feb 24, 2014 at 12:45 PM, Josh Wills <[email protected]>
> wrote:
> >
> > > There is the length() method, which will return a PObject<Long> with
> the
> > > number of elements in the PCollection. It requires running an MR job
> > > though.
> > >
> > > J
> > >
> > >
> > > On Mon, Feb 24, 2014 at 10:03 AM, Jinal Shah <[email protected]
> > > >wrote:
> > >
> > > > Hi,
> > > >
> > > > Is there a way possible in crunch to find the size of a particular
> > > > PCollection or PTable in whole.
> > > >
> > > > Thanks
> > > > Jinal
> > > >
> > >
> > >
> > >
> > > --
> > > Director of Data Science
> > > Cloudera <http://www.cloudera.com>
> > > Twitter: @josh_wills <http://twitter.com/josh_wills>
> > >
> >
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Reply via email to