[ https://issues.apache.org/jira/browse/NUTCH-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773053#comment-13773053 ]
Julien Nioche commented on NUTCH-1517: -------------------------------------- I had another look at the code. It should handle documents marked for deletion and have a more robust handling of the fields (e.g. with a mapping mechanism as in SOLR). It currently fails to remove unsupported characters if they are in fields which aren't the 2 you hardcoded. The regex which checks for the validity of a field name is not correct as it can let through string starting with a _ which is not allowed > CloudSearch indexer > ------------------- > > Key: NUTCH-1517 > URL: https://issues.apache.org/jira/browse/NUTCH-1517 > Project: Nutch > Issue Type: New Feature > Components: indexer > Reporter: Julien Nioche > Fix For: 1.9 > > Attachments: 0023883254_1377197869_indexer-cloudsearch.patch > > > Once we have made the indexers pluggable, we should add a plugin for Amazon > CloudSearch. See http://aws.amazon.com/cloudsearch/. Apparently it uses a > JSON based representation Search Data Format (SDF), which we could reuse for > a file based indexer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira