Hi,
For an image classification task, I need to extract random patches and
their coordinates from images.
Until now, I used a custom code to extract them at the same time. I
recently tested the extract_patches_2d function from scikit-learn and it
seems very fast. To extract the coordinates along with the patches, I
wrote this test script <https://gist.github.com/NicolasTr/5429897>.
Logically and unfortunately, it uses 3 times more memory compared to the
same script without the coordinates extraction. I want to create a
better solution but I need your opinion:
* I could modify extract_patches_2d to return a tuple (patches,
coordinates)
o The memory consumption would probably be the same since the
coordinates are already computed in the function (here
<https://github.com/scikit-learn/scikit-learn/blob/85ec0fd1ae904f275f608b11044a2476ed4723e6/sklearn/feature_extraction/image.py#L322-L323>).
If max_patches is not specified, the function could return an
itertools.product
o It could break the existing code because the return value will
be different
* I could create a new kind of PatchExtracor:
o The existing code wouldn't break
o The random_state would need to be copied before any extraction
to have the correct coordinates with randint
What do you think?
Regards,
Nicolas Trésegnie
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general