> On May 18, 2016, 10:19 p.m., Matthew Hayes wrote: > > datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java, line 147 > > <https://reviews.apache.org/r/46701/diff/1/?file=1361707#file1361707line147> > > > > What about clearing the set so we don't have to garbage collect? > > Eyal Allweil wrote: > I just reassigned it because the clear() method in HashSet uses > Array.fill and I thought it would be more expensive than just letting it be > garbage collected and making a new one.
I would think GCing the hashset would be more expensive than clearing. I did a quick benchmark and it seems that clear is significantly faster for large and small hashsets. - Matthew ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/46701/#review133803 ----------------------------------------------------------- On April 27, 2016, 7:44 a.m., Eyal Allweil wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/46701/ > ----------------------------------------------------------- > > (Updated April 27, 2016, 7:44 a.m.) > > > Review request for DataFu. > > > Repository: datafu > > > Description > ------- > > DATAFU-117 - New UDF - CountDistinctUpTo > > > Diffs > ----- > > datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java > PRE-CREATION > datafu-pig/src/test/java/datafu/test/pig/bags/BagTests.java > 28292db0c01a1967ea53d9cc3d316e9906d942a8 > > Diff: https://reviews.apache.org/r/46701/diff/ > > > Testing > ------- > > > Thanks, > > Eyal Allweil > >