[
http://jira.nuxeo.org/browse/NXSEM-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=62652#action_62652
]
Olivier Grisel commented on NXSEM-11:
-------------------------------------
80 milion tiny images corpus is both very big to process and requires matlab to
work with. Falling back to Caltech 101 or Caltech 256 which feature pure JPEG
collections that can be noised before GIST feature extraction to further
improve the SDAs performance.
http://www.vision.caltech.edu/Image_Datasets/Caltech101/ (131MB)
http://www.vision.caltech.edu/Image_Datasets/Caltech256/ (1.2 GB - 30607 images)
> build a training picture corpus for document hashing based on 80 milion tiny
> images corpus
> ------------------------------------------------------------------------------------------
>
> Key: NXSEM-11
> URL: http://jira.nuxeo.org/browse/NXSEM-11
> Project: Nuxeo Semantic R&D
> Issue Type: Task
> Reporter: Olivier Grisel
> Assignee: Olivier Grisel
>
> See:
> - http://horatio.cs.nyu.edu/mit/tiny/data/index.html
> - http://people.csail.mit.edu/torralba/tinyimages/
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.nuxeo.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
_______________________________________________
ECM-tickets mailing list
[email protected]
http://lists.nuxeo.com/mailman/listinfo/ecm-tickets