[JIRA] Commented: (NXSEM-11) build a training picture corpus for document hashing based on 80 milion tiny images corpus

Olivier Grisel (JIRA NUXEO) Fri, 30 Oct 2009 08:53:27 -0700

    [ 
http://jira.nuxeo.org/browse/NXSEM-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=62652#action_62652
 ]


Olivier Grisel commented on NXSEM-11:
-------------------------------------

80 milion tiny images corpus is both very big to process and requires matlab to 
work with. Falling back to Caltech 101 or Caltech 256  which feature pure JPEG 
collections that can be noised before GIST feature extraction to further 
improve the SDAs performance.

http://www.vision.caltech.edu/Image_Datasets/Caltech101/  (131MB)

http://www.vision.caltech.edu/Image_Datasets/Caltech256/ (1.2 GB - 30607 images)

> build a training picture corpus for document hashing based on 80 milion tiny 
> images corpus
> ------------------------------------------------------------------------------------------
>
>                 Key: NXSEM-11
>                 URL: http://jira.nuxeo.org/browse/NXSEM-11
>             Project: Nuxeo Semantic R&D
>          Issue Type: Task
>            Reporter: Olivier Grisel
>            Assignee: Olivier Grisel
>
> See:
> - http://horatio.cs.nyu.edu/mit/tiny/data/index.html
> - http://people.csail.mit.edu/torralba/tinyimages/

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.nuxeo.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        
_______________________________________________
ECM-tickets mailing list
[email protected]
http://lists.nuxeo.com/mailman/listinfo/ecm-tickets

[JIRA] Commented: (NXSEM-11) build a training picture corpus for document hashing based on 80 milion tiny images corpus

Reply via email to