Re: [Numpy-discussion] Computing Simple Statistics When Only they Frequency Distribution is Known

2009-11-29 Thread Anne Archibald
2009/11/28 Wayne Watson sierra_mtnv...@sbcglobal.net:
 Anne Archibald wrote:
 2009/11/28 Wayne Watson sierra_mtnv...@sbcglobal.net:

 I was only illustrating a way that I would not consider, since the
 hardware has already created the pdf. I've already coded it pretty much
 as you have suggested. As I think I mention ed above, I'm a bit
 surprised numpy doesn't provide the code you suggest as part of some
 function. CalcSimplefromPDF(xvalues=mydatarray, avg=ture, minmax=true,
 ...).


 Feel free to submit an implementation to numpy's issue tracker. I
 suggest modifying mean, std, and var (at least) so that, like average,
 they take an array of weights.

 How would I do that?


Obtain a copy of recent numpy source code - a .tar file from the
website, or using SVN, or git. Then add the feature plus some tests,
confirm that the tests pass, and post a request and your patch to the
bug tracker.

Anne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Computing Simple Statistics When Only they Frequency Distribution is Known

2009-11-29 Thread Charles R Harris
On Sun, Nov 29, 2009 at 6:15 PM, Anne Archibald
peridot.face...@gmail.comwrote:

 2009/11/28 Wayne Watson sierra_mtnv...@sbcglobal.net:
  Anne Archibald wrote:
  2009/11/28 Wayne Watson sierra_mtnv...@sbcglobal.net:
 
  I was only illustrating a way that I would not consider, since the
  hardware has already created the pdf. I've already coded it pretty much
  as you have suggested. As I think I mention ed above, I'm a bit
  surprised numpy doesn't provide the code you suggest as part of some
  function. CalcSimplefromPDF(xvalues=mydatarray, avg=ture, minmax=true,
  ...).
 
 
  Feel free to submit an implementation to numpy's issue tracker. I
  suggest modifying mean, std, and var (at least) so that, like average,
  they take an array of weights.
 
  How would I do that?
 

 Obtain a copy of recent numpy source code - a .tar file from the
 website, or using SVN, or git. Then add the feature plus some tests,
 confirm that the tests pass, and post a request and your patch to the
 bug tracker.


You might also want to use average as a starting point.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Computing Simple Statistics When Only they Frequency Distribution is Known

2009-11-28 Thread David Goldsmith
On Fri, Nov 27, 2009 at 9:25 PM, Wayne Watson
sierra_mtnv...@sbcglobal.netwrote:

 I actually wrote my own several days ago. When I began getting myself
 more familiar with numpy, I was hoping there would be an easy to use
 version in it for this frequency approach. If not, then I'll just stick
 with what I have. It seems something like this should be common.

 A simple way to do it with the present capabilities would be to unwind
 the frequencies,  For example, given [2,1,3] for some corresponding set
 of x, say, [1,2,3], produce[1, 1, 2, 3, 3, 3]. I have no idea if numpy
 does anything like that, but, if so, the typical mean, std, ... could be
 used. In my case, it's sort of pointless. It would produce an array of
 307,200 items for 256 x (0,1,2,...,255), and just slow down the
 computations unwinding it in software. The sub-processor hardware
 already produced the 256 frequencies.

 Basically, this amounts to having a pdf, and values of x.
 Mathematically, the statistics are produced directly from it.

 josef.p...@gmail.com wrote:
  On Fri, Nov 27, 2009 at 9:47 PM, Wayne Watson
  sierra_mtnv...@sbcglobal.net wrote:
 
  How do I compute avg, std dev, min, max and other simple stats if I only
  know the frequency distribution?
 
 
  If you are willing to assign to all observations in a bin the value at
  the bin midpoint, then you could do it with weights in the statistics
  calculations. However, numpy.average is, I think, the only statistic
  that takes weights. min max are independent of weight, but std and var
  need to be calculated indirectly.
 
  If you need more stats with weights, then the attachment in
  http://projects.scipy.org/scipy/ticket/604  is a good start.
 
  Josef


Wayne:

There is no need to unwind: If Y(X) is the (unnormalized) freq. distr. of
random variable/data X, start by computing y = Y/(Y.sum()) (if Y is already
normalized, skip this step).  Then:

av(X) = np.dot(X, y), sd(X) = np.sqrt(np.dot((X*X), y) - (av(X))^2), and
higher moment statistics can be calculated utilizing similar formulae.

DG
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Computing Simple Statistics When Only they Frequency Distribution is Known

2009-11-28 Thread Wayne Watson


David Goldsmith wrote:
 On Fri, Nov 27, 2009 at 9:25 PM, Wayne Watson 
 sierra_mtnv...@sbcglobal.net mailto:sierra_mtnv...@sbcglobal.net 
 wrote:

 I actually wrote my own several days ago. When I began getting myself
 more familiar with numpy, I was hoping there would be an easy to use
 version in it for this frequency approach. If not, then I'll just
 stick
 with what I have. It seems something like this should be common.

 ...

  If you need more stats with weights, then the attachment in
  http://projects.scipy.org/scipy/ticket/604  is a good start.
 
  Josef


 Wayne:

 There is no need to unwind: If Y(X) is the (unnormalized) freq. 
 distr. of random variable/data X, start by computing y = Y/(Y.sum()) 
 (if Y is already normalized, skip this step).  Then:

 av(X) = np.dot(X, y), sd(X) = np.sqrt(np.dot((X*X), y) - (av(X))^2), 
 and higher moment statistics can be calculated utilizing similar 
 formulae.

 DG
I was only illustrating a way that I would not consider, since the 
hardware has already created the pdf. I've already coded it pretty much 
as you have suggested. As I think I mention ed above, I'm a bit 
surprised numpy doesn't provide the code you suggest as part of some 
function. CalcSimplefromPDF(xvalues=mydatarray, avg=ture, minmax=true, 
...).

-- 
   Wayne Watson (Watson Adventures, Prop., Nevada City, CA)

 (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
  Obz Site:  39° 15' 7 N, 121° 2' 32 W, 2700 feet  

   350 350 350 350 350 350 350 350 350 350
 Make the number famous. See 350.org
The major event has passed, but keep the number alive.
 
Web Page: www.speckledwithstars.net/

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Computing Simple Statistics When Only they Frequency Distribution is Known

2009-11-28 Thread Anne Archibald
2009/11/28 Wayne Watson sierra_mtnv...@sbcglobal.net:

 I was only illustrating a way that I would not consider, since the
 hardware has already created the pdf. I've already coded it pretty much
 as you have suggested. As I think I mention ed above, I'm a bit
 surprised numpy doesn't provide the code you suggest as part of some
 function. CalcSimplefromPDF(xvalues=mydatarray, avg=ture, minmax=true,
 ...).

Feel free to submit an implementation to numpy's issue tracker. I
suggest modifying mean, std, and var (at least) so that, like average,
they take an array of weights.

Anne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Computing Simple Statistics When Only they Frequency Distribution is Known

2009-11-28 Thread Wayne Watson
How would I do that?

Anne Archibald wrote:
 2009/11/28 Wayne Watson sierra_mtnv...@sbcglobal.net:
   
 I was only illustrating a way that I would not consider, since the
 hardware has already created the pdf. I've already coded it pretty much
 as you have suggested. As I think I mention ed above, I'm a bit
 surprised numpy doesn't provide the code you suggest as part of some
 function. CalcSimplefromPDF(xvalues=mydatarray, avg=ture, minmax=true,
 ...).
 

 Feel free to submit an implementation to numpy's issue tracker. I
 suggest modifying mean, std, and var (at least) so that, like average,
 they take an array of weights.

 Anne
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

   

-- 
   Wayne Watson (Watson Adventures, Prop., Nevada City, CA)

 (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
  Obz Site:  39° 15' 7 N, 121° 2' 32 W, 2700 feet  

   350 350 350 350 350 350 350 350 350 350
 Make the number famous. See 350.org
The major event has passed, but keep the number alive.
 
Web Page: www.speckledwithstars.net/

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Computing Simple Statistics When Only they Frequency Distribution is Known

2009-11-27 Thread Wayne Watson
How do I compute avg, std dev, min, max and other simple stats if I only 
know the frequency distribution?

-- 
   Wayne Watson (Watson Adventures, Prop., Nevada City, CA)

 (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
  Obz Site:  39° 15' 7 N, 121° 2' 32 W, 2700 feet  

   350 350 350 350 350 350 350 350 350 350
 Make the number famous. See 350.org
The major event has passed, but keep the number alive.
 
Web Page: www.speckledwithstars.net/

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Computing Simple Statistics When Only they Frequency Distribution is Known

2009-11-27 Thread josef . pktd
On Fri, Nov 27, 2009 at 9:47 PM, Wayne Watson
sierra_mtnv...@sbcglobal.net wrote:
 How do I compute avg, std dev, min, max and other simple stats if I only
 know the frequency distribution?

If you are willing to assign to all observations in a bin the value at
the bin midpoint, then you could do it with weights in the statistics
calculations. However, numpy.average is, I think, the only statistic
that takes weights. min max are independent of weight, but std and var
need to be calculated indirectly.

If you need more stats with weights, then the attachment in
http://projects.scipy.org/scipy/ticket/604  is a good start.

Josef



 --
           Wayne Watson (Watson Adventures, Prop., Nevada City, CA)

             (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
              Obz Site:  39° 15' 7 N, 121° 2' 32 W, 2700 feet

                   350 350 350 350 350 350 350 350 350 350
                     Make the number famous. See 350.org
            The major event has passed, but keep the number alive.

                    Web Page: www.speckledwithstars.net/

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Computing Simple Statistics When Only they Frequency Distribution is Known

2009-11-27 Thread Wayne Watson
I actually wrote my own several days ago. When I began getting myself 
more familiar with numpy, I was hoping there would be an easy to use 
version in it for this frequency approach. If not, then I'll just stick 
with what I have. It seems something like this should be common.

A simple way to do it with the present capabilities would be to unwind 
the frequencies,  For example, given [2,1,3] for some corresponding set 
of x, say, [1,2,3], produce[1, 1, 2, 3, 3, 3]. I have no idea if numpy 
does anything like that, but, if so, the typical mean, std, ... could be 
used. In my case, it's sort of pointless. It would produce an array of 
307,200 items for 256 x (0,1,2,...,255), and just slow down the 
computations unwinding it in software. The sub-processor hardware 
already produced the 256 frequencies.

Basically, this amounts to having a pdf, and values of x. 
Mathematically, the statistics are produced directly from it.

josef.p...@gmail.com wrote:
 On Fri, Nov 27, 2009 at 9:47 PM, Wayne Watson
 sierra_mtnv...@sbcglobal.net wrote:
   
 How do I compute avg, std dev, min, max and other simple stats if I only
 know the frequency distribution?
 

 If you are willing to assign to all observations in a bin the value at
 the bin midpoint, then you could do it with weights in the statistics
 calculations. However, numpy.average is, I think, the only statistic
 that takes weights. min max are independent of weight, but std and var
 need to be calculated indirectly.

 If you need more stats with weights, then the attachment in
 http://projects.scipy.org/scipy/ticket/604  is a good start.

 Josef


   
 --
   Wayne Watson (Watson Adventures, Prop., Nevada City, CA)

 (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
  Obz Site:  39° 15' 7 N, 121° 2' 32 W, 2700 feet

   350 350 350 350 350 350 350 350 350 350
 Make the number famous. See 350.org
The major event has passed, but keep the number alive.

Web Page: www.speckledwithstars.net/

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

   

-- 
   Wayne Watson (Watson Adventures, Prop., Nevada City, CA)

 (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
  Obz Site:  39° 15' 7 N, 121° 2' 32 W, 2700 feet  

   350 350 350 350 350 350 350 350 350 350
 Make the number famous. See 350.org
The major event has passed, but keep the number alive.
 
Web Page: www.speckledwithstars.net/

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion