On Thu, Sep 3, 2009 at 9:23 AM, Tim Michelsen<timmichel...@gmx-topmail.de> wrote: >> > Hello, > I have checked the snippets you proposed. > It does what I wanted to achieve. > Obviously, I had to substract the values as Robert > demonstrated. This could also be perceived from > the figure I posted. > > I still have see how I can optimise the code > (c.f. below) or modify to be less complicated. > It seemed so simple in the spreadsheet... > >> eisf_sums = ecdf_sums[-1] - ecdf_sums >> # empirical inverse survival
this should have inverse in it, it was a cut and paste error empirical survival function would be just 1-ecdf however, as distributions they would require to be normed to 1, >> function of weights > Can you recommend me a (literature) source where > I can look up this term? > I learned statistics in my mother tongue and seem > to need a refresher on distributions... > I would like to come up with the right terms > next time. My first stop is usually wikipedia: http://en.wikipedia.org/wiki/Survival_function http://de.wikipedia.org/wiki/Verteilungsfunktion#.C3.9Cberlebenswahrscheinlichkeit and the ISI - INTERNATIONAL STATISTICAL INSTITUTE glossary for terms in different languages http://isi.cbs.nl/glossary/bloken83.htm > >> Are you sure you want cumulative weights in >>the histogram? > You mean it doesn't make sense at all? It depends on what you want, ecdf as it is calculated, with the weights argument in the histogram, gives you the cumulative sum of the values, not the count. In the case of the weight of pigs, it would be to cumulative weight of all pigs with a weight less than the given bin boundary weight. If values were income, then it would be the aggregated income of all individual with an income below the bin bin boundary. So it makes sense, given this is what you want (below). > > I need: > 1) the count of occurrences sorted in each bin > counts = np.histogram(values, > normed=normed, > bins=bins) > => here I obtain now the same as in the > spreadsheet > > 2) the sum of all values sorted in each bin > sums = np.histogram(values, weights=values, > normed=normed, > bins=bins) > > => here I still obtain different values for the first > histogram value (eisf_sums[0]): > Numpy: eisf_sums > 335.50026738, 319.21363636, 266.07724942, > 198.10258741, 126.69270396, 67.98125874, > 38.47335664, 24.75062937, 13.42121212, > 2.48636364, 0. > > Spreadsheet: > 335.2351159, 319.2136364, 266.0772494, > 198.1025874, 126.692704, 67.98125874, > 38.47335664, 24.75062937, 13.42121212, > 2.486363636, 0 there might be a mistake in the treatment of a cell when reversing, when I run your example the highest value is not equal to values.sum() this might match the spreadsheet, but I haven't compared isf = sums[0][::-1].cumsum()[::-1] But I'm not sure yet, what's going on. Josef > > Additionally, I would like to see these implemented > as convenience functions in numpy or scipy. > There should be out of the box functions for all kinds > of distributions. > Where is the best place to contrubute a final version? > The scipy.stats? > > Thanks again for your input, > Timmie > > ##### below the distilled code ##### > ## histogram settings > normed = False > bins = 10 > > ## counts: gives expected results > counts = np.histogram(values, > normed=normed, > bins=bins) > > ecdf_counts = np.hstack([1.0, counts[0].cumsum() ]) > ecdf_inv_counts = ecdf_counts[::-1] > # empirical inverse survival function of weights > eisf_counts = ecdf_counts[-1] - ecdf_counts > > > ### sum: does have deviations > sums = np.histogram(values, weights=values, > normed=normed, > bins=bins) > ecdf_sums = np.hstack([1.0, sums[0].cumsum() ]) > ecdf_inv_sums = ecdf_sums[::-1] > # empirical inverse survival function of weights > eisf_sums = ecdf_sums[-1] - ecdf_sums > > ## > # configure plot > xlabel = 'Bins' > ylabel_left = 'Counts' > ylabel_right = 'Sum' > > > fig1 = plt.figure() > ax1 = fig1.add_subplot(111) > > # counts > ax1.plot(counts[1], ecdf_inv_counts, 'r-') > ax1.set_xlabel(xlabel) > ax1.set_ylabel(ylabel_left, color='b') > for tl in ax1.get_yticklabels(): > tl.set_color('b') > > # sums > ax2 = ax1.twinx() > ax2.plot(sums[1], eisf_sums, 'b-') > ax2.set_ylabel(ylabel_right, color='r') > for tl in ax2.get_yticklabels(): > tl.set_color('r') > plt.show() > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion