Hi all,

This thread is to do with the github issues raised in #11879, #10297, #8203
of and possibly others that didn't appear in my search.

The main issue is that histogram(bins='auto') will sometimes raise a memory
error if the number of automatically-generated bin edges is too large. In
all documented cases, the conditions producing outsized bin numbers are
when the auto-binning defaults to the 'fd' method.

I have taken a crack at minimizing the number of bins used by setting bins
to 'auto' in numpy's histogram method. Based on suggestions from
eric-weiser, the approach merges empty bins.

The method works for the sample datasets in all the issues related to the
FD estimator (#11879, #10297, #8203).

Note that this method produces unequal bin widths.

You can see some code I've already written in a comment on issue #11879 in
the link below.

https://github.com/numpy/numpy/issues/11879#issuecomment-516686087

Thanks for reading this far. Would be happy to turn this into a PR if there
is interest.

-areeves87
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to