Thanks Josh, I have a few following questions so let's say with the default scaleFactor how much approximation should we assume like +/- 1%? How does scaleFactor affect the size of the object? Can this be a part of Crunch as an enhancement to the current Join strategy?
Thanks Jinal On Mon, Feb 24, 2014 at 1:01 PM, Josh Wills <[email protected]> wrote: > Ah, cool. the long getSize() method will return Crunch's estimate of the > size of the object in bytes, but it's good to keep in mind that it's a very > rough approximation based on the size of the file on disk and any info we > have about the behavior of any DoFns that are applied to the PTable when it > is processed, which is communicated via the scaleFactor() function on each > DoFn. > > > On Mon, Feb 24, 2014 at 10:57 AM, Jinal Shah <[email protected] > >wrote: > > > By size I meant the memory size sorry for the confusion. Like how much > > memory will a PTable object require. Basically what I'm trying to do is > if > > the object is not that large and if it could fit in memory I wanted to > > apply map-side join to optimize the join and depending on that I also > > wanted to determine which one is smaller to use the Left join. > > > > > > On Mon, Feb 24, 2014 at 12:45 PM, Josh Wills <[email protected]> > wrote: > > > > > There is the length() method, which will return a PObject<Long> with > the > > > number of elements in the PCollection. It requires running an MR job > > > though. > > > > > > J > > > > > > > > > On Mon, Feb 24, 2014 at 10:03 AM, Jinal Shah <[email protected] > > > >wrote: > > > > > > > Hi, > > > > > > > > Is there a way possible in crunch to find the size of a particular > > > > PCollection or PTable in whole. > > > > > > > > Thanks > > > > Jinal > > > > > > > > > > > > > > > > -- > > > Director of Data Science > > > Cloudera <http://www.cloudera.com> > > > Twitter: @josh_wills <http://twitter.com/josh_wills> > > > > > > > > > -- > Director of Data Science > Cloudera <http://www.cloudera.com> > Twitter: @josh_wills <http://twitter.com/josh_wills> >
