On Sun, Nov 21, 2010 at 5:09 PM, Keith Goodman <[email protected]> wrote: > On Sun, Nov 21, 2010 at 12:30 PM, <[email protected]> wrote: >> On Sun, Nov 21, 2010 at 2:48 PM, Keith Goodman <[email protected]> wrote: >>> On Sun, Nov 21, 2010 at 10:25 AM, Wes McKinney <[email protected]> wrote: >>>> On Sat, Nov 20, 2010 at 7:24 PM, Keith Goodman <[email protected]> wrote: >>>>> On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney <[email protected]> wrote: >>>>> >>>>>> Keith (and others), >>>>>> >>>>>> What would you think about creating a library of mostly Cython-based >>>>>> "domain specific functions"? So stuff like rolling statistical >>>>>> moments, nan* functions like you have here, and all that-- NumPy-array >>>>>> only functions that don't necessarily belong in NumPy or SciPy (but >>>>>> could be included on down the road). You were already talking about >>>>>> this on the statsmodels mailing list for larry. I spent a lot of time >>>>>> writing a bunch of these for pandas over the last couple of years, and >>>>>> I would have relatively few qualms about moving these outside of >>>>>> pandas and introducing a dependency. You could do the same for larry-- >>>>>> then we'd all be relying on the same well-vetted and tested codebase. >>>>> >>>>> I've started working on moving window statistics cython functions. I >>>>> plan to make it into a package called Roly (for rolling). The >>>>> signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, >>>>> window, axis=-1), etc. >>>>> >>>>> I think of Nanny and Roly as two separate packages. A narrow focus is >>>>> good for a new package. But maybe each package could be a subpackage >>>>> in a super package? >>>>> >>>>> Would the function signatures in Nanny (exact duplicates of the >>>>> corresponding functions in Numpy and Scipy) work for pandas? I plan to >>>>> use Nanny in larry. I'll try to get the structure of the Nanny package >>>>> in place. But if it doesn't attract any interest after that then I may >>>>> fold it into larry. >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> [email protected] >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>> >>>> Why make multiple packages? It seems like all these functions are >>>> somewhat related: practical tools for real-world data analysis (where >>>> observations are often missing). I suspect having everything under one >>>> hood would create more interest than chopping things up-- would be >>>> very useful to folks in many different disciplines (finance, >>>> economics, statistics, etc.). In R, for example, NA-handling is just a >>>> part of every day life. Of course in R there is a special NA value >>>> which is distinct from NaN-- many folks object to the use of NaN for >>>> missing values. The alternative is masked arrays, but in my case I >>>> wasn't willing to sacrifice so much performance for purity's sake. >>>> >>>> I could certainly use the nan* functions to replace code in pandas >>>> where I've handled things in a somewhat ad hoc way. >>> >>> A package focused on NaN-aware functions sounds like a good idea. I >>> think a good plan would be to start by making faster, drop-in >>> replacements for the NaN functions that are already in numpy and >>> scipy. That is already a lot of work. After that, one possibility is >>> to add stuff like nancumsum, nanprod, etc. After that moving window >>> stuff? >> >> and maybe group functions after that? > > Yes, group functions are on my list. > >> If there is a lot of repetition, you could use templating. Even simple >> string substitution, if it is only replacing the dtype, works pretty >> well. It would at least reduce some copy-paste. > > Unit test coverage should be good enough to mess around with trying > templating. What's a good way to go? Write my own script that creates > the .pyx file and call it from the make file? Or are there packages > for doing the templating?
Depends on the scale, I tried once with simple string templates http://codespeak.net/pipermail/cython-dev/2009-August/006614.html here is a pastbin of another version by ....(?), http://pastebin.com/f1a49143d discussed on the cython-dev mailing list. The cython list has the discussion every once in a while but I haven't seen any conclusion yet. For heavier duty templating a proper templating package (Jinja?) might be better. I'm not an expert. Josef > > I added nanmean (the first scipy function to enter nanny) and nanmin. > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
