[Numpy-discussion] Adding "weights" option to quantile/percentile

2022-03-10 Thread Chun-Wei Yuan
Dear all,

I have a long-standing PR (#9211 )
about adding a "weights" option to the quantile/percentile methods' API.
The code is tested for all 13 different interpolation methods, and I
welcome any additional testing suggestions.

A couple of remarks:

1.) weights can be any positive real number, and there's no distinction
between frequency weights and probability weights.  Underneath, along the
data axis you're applying weights to, if any weight is less than 1, then
the code will renormalize the weights such that the minimum is 1.

For example, if weights=[0.1, 0.9], then the quantiles will be computed as
if weights=[1, 9].  If weights=[0.25, 0.75], then they'll be treated as if
weights=[1, 3].  In other words, probability weights are converted to
frequency weights for the calculation.

If another function using quantile() wants to make that distinction between
frequency/probability weights in its API, it is free to do so.  But right
now I do not see any reason for it in quantile()/percentile().

2.) The use of np.vectorize().  In computing quantiles, it's necessary to
sort the data values along the data axis, and rearrange the corresponding
weights according to the sort order.  Hence each column down the data axis
has a different rearrangement.  vectorize() allows for such synchronized
operations between equal-shaped arrays.  I don't see that from, say,
apply_along_axis().

Lastly, I'll be sure to add more docs/comments to better explain the steps
in the code.

Best,

Chun
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


Re: [Numpy-discussion] [JOB] Principal Software Engineer position at IHME

2019-09-09 Thread Chun-Wei Yuan
I see.  Sorry.  I think I misinterpreted "It is okay to post job ads for
work involving NumPy/SciPy and related packages if you put [JOB] in the
subject".  Thanks for the clarification.

On Mon, Sep 9, 2019 at 3:19 PM Ralf Gommers  wrote:

>
>
> On Mon, Sep 9, 2019 at 2:27 PM Chun-Wei Yuan 
> wrote:
>
>> *The Institute for Health Metrics and Evaluation (IHME) *has an
>> outstanding opportunity for a full-time *Principal Software Engineer *on
>> our Forecasting/Future Health Scenarios (FHS) team*.* The development
>> arm of the team is responsible for the design and implementation of
>> software to support this effort, and the Principal Software Engineer will
>> lead the development work and supervise engineers on that team. IHME’s aim
>> within the FHS portfolio is to create an analytic engine that can model the
>> impact of a wide array of determinants on the trajectory of health outcomes
>> and risks in different countries, projected 25 years into the future, that
>> will allow decision-makers to assess the impact of their potential actions
>> analytically.  A recent publication can be found here:
>> https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(18)31694-5/fulltext
>>
>>
>>
>> If you join IHME, you’ll be joining a team of mission-oriented people who
>> are committed to creating a welcoming and diverse workforce that respects
>> and appreciates differences, and embraces collaboration.
>>
>>
>>
>> *Further Information: *See IHME’s website: www.healthdata.org
>>
>> *To Apply and see the whole job description: *Please apply at uw.edu/jobs
>> <https://uwhires.admin.washington.edu/eng/candidates/default.cfm?szCategory=jobprofile&szOrderID=171527&szCandidateID=0&szSearchWords=&szReturnToSearch=1>
>>  and search for req 171527
>>
>>
>> Please direct your questions to Megan at mkma...@uw.edu
>>
>
> Hi Chun-Wei, while this seems like an interesting job, it's not clear that
> it provides an opportunity to contribute back to NumPy or other community
> projects (that'd be awesome though, and I would encourage you to make that
> part of this job). For general software job ads (even if they use NumPy),
> we'd prefer to keep those off this list.
>
> Thank you,
> Ralf
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [JOB] Principal Software Engineer position at IHME

2019-09-09 Thread Chun-Wei Yuan
*The Institute for Health Metrics and Evaluation (IHME) *has an outstanding
opportunity for a full-time *Principal Software Engineer *on our
Forecasting/Future Health Scenarios (FHS) team*.* The development arm of
the team is responsible for the design and implementation of software to
support this effort, and the Principal Software Engineer will lead the
development work and supervise engineers on that team. IHME’s aim within
the FHS portfolio is to create an analytic engine that can model the impact
of a wide array of determinants on the trajectory of health outcomes and
risks in different countries, projected 25 years into the future, that will
allow decision-makers to assess the impact of their potential actions
analytically.  A recent publication can be found here:
https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(18)31694-5/fulltext



If you join IHME, you’ll be joining a team of mission-oriented people who
are committed to creating a welcoming and diverse workforce that respects
and appreciates differences, and embraces collaboration.



*Further Information: *See IHME’s website: www.healthdata.org

*To Apply and see the whole job description: *Please apply at uw.edu/jobs

 and search for req 171527


Please direct your questions to Megan at mkma...@uw.edu
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] PR to add "weights" option to np.quantile

2018-09-13 Thread Chun-Wei Yuan
Hi all,

I have a long-standing PR to add a "weights" option to
np.quantile/percentile:

https://github.com/numpy/numpy/pull/9211

For a little background, there are quite a few ways to define "quantile" to
begin with.  Numpy defines it the same way as R's default (Type 7):

https://www.rdocumentation.org/packages/stats/versions/3.5.0/topics/quantile

What PR 9211 does is introducing a "weights" option while staying
consistent with the Type 7 definition of quantile.

There is certainly support on adding this feature, but as also can be
perused from the thread, there are doubts on

1.) what is the right way to define "weighted" quantile?
2.) whether Numpy should support other types of quantiles in the first
place, etc.

I've stated my answers to those questions in the thread.  This PR has
fallen off the radar and needs a little popular jolt to get some movement.
Please chime in to keep it alive.

Best,

Chun
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] quantile() or percentile()

2017-08-03 Thread Chun-Wei Yuan
Cool.  Just as a heads up, for my algorithm to work, I actually need the
indices, which is why argsort() is so important to me.  I use it to get
both ap_sorted and ws_sorted variables.  If your weighted-quantile algo is
faster and doesn't require those indices, please by all means change my
implementation.  Thanks.

On Thu, Aug 3, 2017 at 11:10 AM, Joseph Fox-Rabinovitz <
jfoxrabinov...@gmail.com> wrote:

> Not that I know of. The algorithm is very simple, requiring a
> relatively small addition to the current introselect algorithm used
> for `np.partition`. My biggest hurdle is figuring out how the calling
> machinery really works so that I can figure out which input type
> permutations I need to generate, and how to get the right backend
> running for a given function call.
>
> -Joe
>
> On Thu, Aug 3, 2017 at 1:00 PM, Chun-Wei Yuan 
> wrote:
> > Any way I can help expedite this?
> >
> > On Fri, Jul 21, 2017 at 4:42 PM, Chun-Wei Yuan 
> > wrote:
> >>
> >> That would be great.  I just used np.argsort because it was familiar to
> >> me.  Didn't know about the C code.
> >>
> >> On Fri, Jul 21, 2017 at 3:43 PM, Joseph Fox-Rabinovitz
> >>  wrote:
> >>>
> >>> While #9211 is a good start, it is pretty inefficient in terms of the
> >>> fact that it performs an O(nlogn) sort of the array. It is possible to
> >>> reduce the time to O(n) by using a similar partitioning algorithm to
> the one
> >>> in the C code of percentile. I will look into it as soon as I can.
> >>>
> >>> -Joe
> >>>
> >>> On Fri, Jul 21, 2017 at 5:34 PM, Chun-Wei Yuan  >
> >>> wrote:
> >>>>
> >>>> Just to provide some context, 9213 actually spawned off of this guy:
> >>>>
> >>>> https://github.com/numpy/numpy/pull/9211
> >>>>
> >>>> which might address the weighted inputs issue Joe brought up.
> >>>>
> >>>> C
> >>>>
> >>>> On Fri, Jul 21, 2017 at 2:21 PM, Joseph Fox-Rabinovitz
> >>>>  wrote:
> >>>>>
> >>>>> I think that there would be a very good reason to have a separate
> >>>>> function if we were to introduce weights to the inputs, similarly to
> the way
> >>>>> that we have mean and average. This would have some (positive)
> repercussions
> >>>>> like making weighted histograms with the Freedman-Diaconis binwidth
> >>>>> estimator a possibility. I have had this change on the back-burner
> for a
> >>>>> long time, mainly because I was too lazy to figure out how to
> include it in
> >>>>> the C code. However, I will take a closer look.
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> -Joe
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Fri, Jul 21, 2017 at 5:11 PM, Chun-Wei Yuan <
> chunwei.y...@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> There's an ongoing effort to introduce quantile() into numpy.  You'd
> >>>>>> use it just like percentile(), but would input your q value in
> probability
> >>>>>> space (0.5 for 50%):
> >>>>>>
> >>>>>> https://github.com/numpy/numpy/pull/9213
> >>>>>>
> >>>>>> Since there's a great deal of overlap between these two functions,
> >>>>>> we'd like to solicit opinions on how to move forward on this.
> >>>>>>
> >>>>>> The current thinking is to tolerate the redundancy and keep both,
> >>>>>> using one as the engine for the other.  I'm partial to having
> quantile
> >>>>>> because 1.) I prefer probability space, and 2.) I have a PR waiting
> on
> >>>>>> quantile().
> >>>>>>
> >>>>>> Best,
> >>>>>>
> >>>>>> C
> >>>>>>
> >>>>>> ___
> >>>>>> NumPy-Discussion mailing list
> >>>>>> NumPy-Discussion@python.org
> >>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
> >>>>>>
> >>>>>
> >>>>>
> >>>>> ___
> >>>>> NumPy-Discussion mailing list
> >>>>> NumPy-Discussion@python.org
> >>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
> >>>>>
> >>>>
> >>>>
> >>>> ___
> >>>> NumPy-Discussion mailing list
> >>>> NumPy-Discussion@python.org
> >>>> https://mail.python.org/mailman/listinfo/numpy-discussion
> >>>>
> >>>
> >>>
> >>> ___
> >>> NumPy-Discussion mailing list
> >>> NumPy-Discussion@python.org
> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
> >>>
> >>
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] quantile() or percentile()

2017-08-03 Thread Chun-Wei Yuan
Any way I can help expedite this?

On Fri, Jul 21, 2017 at 4:42 PM, Chun-Wei Yuan 
wrote:

> That would be great.  I just used np.argsort because it was familiar to
> me.  Didn't know about the C code.
>
> On Fri, Jul 21, 2017 at 3:43 PM, Joseph Fox-Rabinovitz <
> jfoxrabinov...@gmail.com> wrote:
>
>> While #9211 is a good start, it is pretty inefficient in terms of the
>> fact that it performs an O(nlogn) sort of the array. It is possible to
>> reduce the time to O(n) by using a similar partitioning algorithm to the
>> one in the C code of percentile. I will look into it as soon as I can.
>>
>> -Joe
>>
>> On Fri, Jul 21, 2017 at 5:34 PM, Chun-Wei Yuan 
>> wrote:
>>
>>> Just to provide some context, 9213 actually spawned off of this guy:
>>>
>>> https://github.com/numpy/numpy/pull/9211
>>>
>>> which might address the weighted inputs issue Joe brought up.
>>>
>>> C
>>>
>>> On Fri, Jul 21, 2017 at 2:21 PM, Joseph Fox-Rabinovitz <
>>> jfoxrabinov...@gmail.com> wrote:
>>>
>>>> I think that there would be a very good reason to have a separate
>>>> function if we were to introduce weights to the inputs, similarly to the
>>>> way that we have mean and average. This would have some (positive)
>>>> repercussions like making weighted histograms with the Freedman-Diaconis
>>>> binwidth estimator a possibility. I have had this change on the back-burner
>>>> for a long time, mainly because I was too lazy to figure out how to include
>>>> it in the C code. However, I will take a closer look.
>>>>
>>>> Regards,
>>>>
>>>> -Joe
>>>>
>>>>
>>>>
>>>> On Fri, Jul 21, 2017 at 5:11 PM, Chun-Wei Yuan 
>>>> wrote:
>>>>
>>>>> There's an ongoing effort to introduce quantile() into numpy.  You'd
>>>>> use it just like percentile(), but would input your q value in probability
>>>>> space (0.5 for 50%):
>>>>>
>>>>> https://github.com/numpy/numpy/pull/9213
>>>>>
>>>>> Since there's a great deal of overlap between these two functions,
>>>>> we'd like to solicit opinions on how to move forward on this.
>>>>>
>>>>> The current thinking is to tolerate the redundancy and keep both,
>>>>> using one as the engine for the other.  I'm partial to having quantile
>>>>> because 1.) I prefer probability space, and 2.) I have a PR waiting on
>>>>> quantile().
>>>>>
>>>>> Best,
>>>>>
>>>>> C
>>>>>
>>>>> ___
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion@python.org
>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>>
>>>>>
>>>>
>>>> ___
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion@python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] quantile() or percentile()

2017-07-21 Thread Chun-Wei Yuan
That would be great.  I just used np.argsort because it was familiar to
me.  Didn't know about the C code.

On Fri, Jul 21, 2017 at 3:43 PM, Joseph Fox-Rabinovitz <
jfoxrabinov...@gmail.com> wrote:

> While #9211 is a good start, it is pretty inefficient in terms of the fact
> that it performs an O(nlogn) sort of the array. It is possible to reduce
> the time to O(n) by using a similar partitioning algorithm to the one in
> the C code of percentile. I will look into it as soon as I can.
>
> -Joe
>
> On Fri, Jul 21, 2017 at 5:34 PM, Chun-Wei Yuan 
> wrote:
>
>> Just to provide some context, 9213 actually spawned off of this guy:
>>
>> https://github.com/numpy/numpy/pull/9211
>>
>> which might address the weighted inputs issue Joe brought up.
>>
>> C
>>
>> On Fri, Jul 21, 2017 at 2:21 PM, Joseph Fox-Rabinovitz <
>> jfoxrabinov...@gmail.com> wrote:
>>
>>> I think that there would be a very good reason to have a separate
>>> function if we were to introduce weights to the inputs, similarly to the
>>> way that we have mean and average. This would have some (positive)
>>> repercussions like making weighted histograms with the Freedman-Diaconis
>>> binwidth estimator a possibility. I have had this change on the back-burner
>>> for a long time, mainly because I was too lazy to figure out how to include
>>> it in the C code. However, I will take a closer look.
>>>
>>> Regards,
>>>
>>> -Joe
>>>
>>>
>>>
>>> On Fri, Jul 21, 2017 at 5:11 PM, Chun-Wei Yuan 
>>> wrote:
>>>
>>>> There's an ongoing effort to introduce quantile() into numpy.  You'd
>>>> use it just like percentile(), but would input your q value in probability
>>>> space (0.5 for 50%):
>>>>
>>>> https://github.com/numpy/numpy/pull/9213
>>>>
>>>> Since there's a great deal of overlap between these two functions, we'd
>>>> like to solicit opinions on how to move forward on this.
>>>>
>>>> The current thinking is to tolerate the redundancy and keep both, using
>>>> one as the engine for the other.  I'm partial to having quantile because
>>>> 1.) I prefer probability space, and 2.) I have a PR waiting on quantile().
>>>>
>>>> Best,
>>>>
>>>> C
>>>>
>>>> ___
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion@python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] quantile() or percentile()

2017-07-21 Thread Chun-Wei Yuan
Just to provide some context, 9213 actually spawned off of this guy:

https://github.com/numpy/numpy/pull/9211

which might address the weighted inputs issue Joe brought up.

C

On Fri, Jul 21, 2017 at 2:21 PM, Joseph Fox-Rabinovitz <
jfoxrabinov...@gmail.com> wrote:

> I think that there would be a very good reason to have a separate function
> if we were to introduce weights to the inputs, similarly to the way that we
> have mean and average. This would have some (positive) repercussions like
> making weighted histograms with the Freedman-Diaconis binwidth estimator a
> possibility. I have had this change on the back-burner for a long time,
> mainly because I was too lazy to figure out how to include it in the C
> code. However, I will take a closer look.
>
> Regards,
>
> -Joe
>
>
>
> On Fri, Jul 21, 2017 at 5:11 PM, Chun-Wei Yuan 
> wrote:
>
>> There's an ongoing effort to introduce quantile() into numpy.  You'd use
>> it just like percentile(), but would input your q value in probability
>> space (0.5 for 50%):
>>
>> https://github.com/numpy/numpy/pull/9213
>>
>> Since there's a great deal of overlap between these two functions, we'd
>> like to solicit opinions on how to move forward on this.
>>
>> The current thinking is to tolerate the redundancy and keep both, using
>> one as the engine for the other.  I'm partial to having quantile because
>> 1.) I prefer probability space, and 2.) I have a PR waiting on quantile().
>>
>> Best,
>>
>> C
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] quantile() or percentile()

2017-07-21 Thread Chun-Wei Yuan
There's an ongoing effort to introduce quantile() into numpy.  You'd use it
just like percentile(), but would input your q value in probability space
(0.5 for 50%):

https://github.com/numpy/numpy/pull/9213

Since there's a great deal of overlap between these two functions, we'd
like to solicit opinions on how to move forward on this.

The current thinking is to tolerate the redundancy and keep both, using one
as the engine for the other.  I'm partial to having quantile because 1.) I
prefer probability space, and 2.) I have a PR waiting on quantile().

Best,

C
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion