Hi all, this is the first of a series of posts introducing new features and changes that will be part of the upcoming PyMVPA release 0.5 (expected some time in early 2010).
The new dataset --------------- In comparision to the old dataset class the new one is a lot simpler, but at the same time considerably more powerful. The new class is implemented to be a container and just that -- nothing else. All convoluted logic (e.g. setting labels to a permuted status, and reset them later on) has been stripped from the class. The major differences are: * A dataset can now have an arbitrary number of attributes per sample (i.e. not just labels!) and also _per feature_. This will allow us to have multiple label sets and also to add information for grouping features, as we had for samples before (i.e. to implement distinct ROI sets or add stat maps to a dataset). * However, while it can have more attributes it can also have less. The new dataset allows for unlabeled data -- no need to invent pointless placeholder labels. In the simplest case a dataset can be created from just a 2D samples matrix/array. * Datasets will no longer copy data over and over. While the former behavior might be considered safer, the new one is leaner and potentially leads to more speed. Datasets can now be sliced just like Numpy arrays (using boolean masks, index sequences or slicing arguments). Whenever Numpy allows for slicing without copying PyMVPA datasets will offer the same. As a consequence more and more functions will simply return new datasets instead of modifying existing ones and keeping track of their modifications. For example labels permutation simply produces a shallow copy of a datasets, assigns permuted labels, but uses views for all remaining dataset information. When no longer needed the permuted dataset can simply be dumped -- no need to restore labels to their previous state. * __init__ became much simpler. There are no fishy **kwargs anymore. The contructor is no generic and valid for all Dataset subclasses. Additional ways to create datasets are implemented as classmethods to avoid pseudo-overloading of __init__(). If you want to take a first look see here http://github.com/hanke/PyMVPA/blob/mh/master/mvpa/datasets/base.py for the code and examples and here http://github.com/hanke/PyMVPA/blob/mh/master/mvpa/tests/test_datasetng.py for a few test cases that show more functionality. If you spot any problems, bugs or have some recommendation -- please speak up! The reimplementation of the dataset is just the first (but a critical) step to allow for some more necessary improvement coming with PyMVPA 0.5. Stay tuned... Michael -- GPG key: 1024D/3144BE0F Michael Hanke http://mih.voxindeserto.de _______________________________________________ Pkg-ExpPsy-PyMVPA mailing list [email protected] http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

