Steve Blackmon created STREAMS-300:
--------------------------------------

             Summary: processor to fix handling of non-string fields from 
mongoexport
                 Key: STREAMS-300
                 URL: https://issues.apache.org/jira/browse/STREAMS-300
             Project: Streams
          Issue Type: Improvement
            Reporter: Steve Blackmon


mongoexport is useful for producing files full of json documents which can be 
read by streams in lieu of paging through documents in mongo.  however, there 
are some artifacts of the export which much be cleaned up to reconstruct the 
original document.

specifically, dates and numbers show up as dictionaries instead of fields. for 
example:

    "created_at": {
        "$date": "2015-02-11T04:24:48.101+0000"
    }
    id": {
       "$numberLong": "2405068880"
    }

write a processor that can sit behind WebHdfsPersistReader and clean this up, 
such that mongoexport -> WebHdfsPersistReader -> MongoExportCleanup -> 
downstream works equivalently to MongoPersistReader -> downstream



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to