Why not use numpy arrays of strings all along? Their importance here is
fancy indexing... Or use X=np.arange(N) and do the fancy indexing yourself
on demand?

-Ken
On Jan 13, 2013 11:04 PM, "Robert Layton" <[email protected]> wrote:

> When using cross_validation.X, all arrays are checked in the normal way --
> using check_arrays.
> I am developing code that uses string documents as input, so I have a list
> of strings as the "data" and a numpy array as classes as normal.
> (In case anyone doesn't know, my research area is authorship analysis.)
> I have classes that use the Classifier mixins etc, so they work well with
> cross validation, except that a copy of the data is made to create the
> numpy array.
> Normally this is fine, but I'm now working with a really large dataset
> that fits into memory only once.
> The copy that gets made by check_array causes a memory error.
>
> My question: converting to numpy arrays is intended behaviour, and fits
> with the rest of the project. Should there be a way to turn it off? i.e.
> "respect_input_type=True" argument?
>
>
> - Robert
>
>
> --
>
> Public key at: http://pgp.mit.edu/ Search for this email address and
> select the key from "2011-08-19" (key id: 54BA8735)
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. SALE $99.99 this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122412
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to