Re: [Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram

Eric Wieser Thu, 15 Mar 2018 22:11:09 -0700

That sounds like a reasonable extension - but I think there still exist
cases where you want to treat the data as one uniform set when computing
bins (toggling between orthogonal subsets of data) so isn't really a useful
replacement.


I suppose this becomes relevant when `density` is passed to the individual
histogram invocations. Does matplotlib handle that correctly for stacked
histograms?

On Thu, Mar 15, 2018, 20:14 Nathaniel Smith <[email protected]> wrote:

> Instead of an nobs argument, maybe we should have a version that accepts
> multiple data sets, so that we have the full information and can improve
> the algorithm over time.
>
> On Mar 15, 2018 7:57 PM, "Thomas Caswell" <[email protected]> wrote:
>
>> Yes I like the name.
>>
>> The primary use-case for Matplotlib is that our `hist` method can take in
>> a list of arrays and produces N histograms in one shot. Currently with
>> 'auto' we only use the first data set to sort out what the bins should be
>> and then re-use those for the rest of the data sets.  This will let us get
>> the bins on the merged input, but I take Josef's point that this is not
>> actually what we want....
>>
>> Tom
>>
>> On Mon, Mar 12, 2018 at 11:35 PM <[email protected]> wrote:
>>
>>> On Mon, Mar 12, 2018 at 11:20 PM, Eric Wieser
>>> <[email protected]> wrote:
>>> >> Given that the bin selection are data driven, transferring them
>>> across datasets might not be so useful.
>>> >
>>> > The main application would be to compute bins across the union of all
>>> > datasets. This is already possibly by using `np.histogram` and
>>> > discarding the first result, but that's super wasteful.
>>>
>>> assuming "union" means a combined dataset.
>>>
>>> If you stack  datasets, then the number of observations will not be
>>> correct for individual datasets.
>>>
>>> In that case an additional keyword like nobs, or whatever name would
>>> be appropriate for numpy, would be useful, e.g. use the average number
>>> of observations across datasets.
>>> Auxiliary statistic like std could then be computed on the total
>>> dataset (if that makes sense, which would not be the case if the
>>> variance across datasets is larger than the variance within datasets.
>>>
>>> Josef
>>>
>>> > _______________________________________________
>>> > NumPy-Discussion mailing list
>>> > [email protected]
>>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> [email protected]
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram

Reply via email to