Lars: must have missed your response earlier. i guess i was hoping for
convenient instead of good :-)

i don't concede to some of your points though. that validation is
significantly complicated is not true as presumably you just need to check
for the feature dimension of each class. what's that? a loop and a shape
check? hardly complicated or slow. i also don't know if i accept the memory
issues as machine learning isn't exactly the most optimal in terms of
memory and processing power is it? i would imagine this would add minimal
extra data as you can delete the dict memory after joining it all up.

my flow involves keeping my features in separate files for each class of
data, and it was getting a bit annoying having to use a few extra lines
before calling fit. for this process flow, reallilgnment must always be
performed with the current exposure of the fit methods. so where's the loss
in wrapping it in an estimator function over doing the alignment myself as
it has to be done anyway.

that symmetry would be broken is a good point so it is probably not
appropriate to do the dict in the fit method, but perhaps it would make
sense as a new method called "fit_dict" or something which might fit (hehe,
get it?) in with the fit_transform and other fit_predict helper methods
that are in kmeans, for example.

it is just a minor annoyance that i need to fix before every fit command.
that i needed to merge them myself seemed redundent and something that the
package might be able to handle nicely. i also accept, by the way, that
this is a highly minor point in the grand scale of things. i prefer to keep
my feature files separate because there is redundency to load the whole
recording to only access label X when there are N individual labels.

Andreas: this doesn't affect me at all when i am performing classification
as i've already "figured out" the classifier that I need and the process is
wrapped in higher level functions. it just annoys me that i have do do this
when i'm investigating new models from the command line


On Fri, Apr 5, 2013 at 11:23 AM, Andreas Mueller
<[email protected]>wrote:

> On 04/05/2013 12:19 PM, Bill Power wrote:
> > I think you misunderstood me. I meant something (more efficiently
> > written) along the lines of below.
> >
> > import numpy as np
> >
> > X0 = [[-1, 0], [0,-1]]
> > X1 = [[ 1, 0], [0, 1]]
> >
> > trData = { 0: X0, 1: X1 }
> >
> > X = np.array( [v for v in trData.values()] ).reshape( -1, 2 )
> > Y = np.array( [np.ones( len(v) ) * k for k, v in trData.iteritems()]
> > ).ravel()
> >
> I also misunderstood that. Still Lars's points remain and we won't do that.
> It is very easy to build an estimator and a Pipeline to do the conversion.
>
>
> ------------------------------------------------------------------------------
> Minimize network downtime and maximize team effectiveness.
> Reduce network management and security costs.Learn how to hire
> the most talented Cisco Certified professionals. Visit the
> Employer Resources Portal
> http://www.cisco.com/web/learning/employer_resources/index.html
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to