That sounds like a reasonable extension - but I think there still exist cases where you want to treat the data as one uniform set when computing bins (toggling between orthogonal subsets of data) so isn't really a useful replacement.
I suppose this becomes relevant when `density` is passed to the individual histogram invocations. Does matplotlib handle that correctly for stacked histograms? On Thu, Mar 15, 2018, 20:14 Nathaniel Smith <n...@pobox.com> wrote: > Instead of an nobs argument, maybe we should have a version that accepts > multiple data sets, so that we have the full information and can improve > the algorithm over time. > > On Mar 15, 2018 7:57 PM, "Thomas Caswell" <tcasw...@gmail.com> wrote: > >> Yes I like the name. >> >> The primary use-case for Matplotlib is that our `hist` method can take in >> a list of arrays and produces N histograms in one shot. Currently with >> 'auto' we only use the first data set to sort out what the bins should be >> and then re-use those for the rest of the data sets. This will let us get >> the bins on the merged input, but I take Josef's point that this is not >> actually what we want.... >> >> Tom >> >> On Mon, Mar 12, 2018 at 11:35 PM <josef.p...@gmail.com> wrote: >> >>> On Mon, Mar 12, 2018 at 11:20 PM, Eric Wieser >>> <wieser.eric+nu...@gmail.com> wrote: >>> >> Given that the bin selection are data driven, transferring them >>> across datasets might not be so useful. >>> > >>> > The main application would be to compute bins across the union of all >>> > datasets. This is already possibly by using `np.histogram` and >>> > discarding the first result, but that's super wasteful. >>> >>> assuming "union" means a combined dataset. >>> >>> If you stack datasets, then the number of observations will not be >>> correct for individual datasets. >>> >>> In that case an additional keyword like nobs, or whatever name would >>> be appropriate for numpy, would be useful, e.g. use the average number >>> of observations across datasets. >>> Auxiliary statistic like std could then be computed on the total >>> dataset (if that makes sense, which would not be the case if the >>> variance across datasets is larger than the variance within datasets. >>> >>> Josef >>> >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion@python.org >>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion