Steve Blackmon created STREAMS-300:
--------------------------------------
Summary: processor to fix handling of non-string fields from
mongoexport
Key: STREAMS-300
URL: https://issues.apache.org/jira/browse/STREAMS-300
Project: Streams
Issue Type: Improvement
Reporter: Steve Blackmon
mongoexport is useful for producing files full of json documents which can be
read by streams in lieu of paging through documents in mongo. however, there
are some artifacts of the export which much be cleaned up to reconstruct the
original document.
specifically, dates and numbers show up as dictionaries instead of fields. for
example:
"created_at": {
"$date": "2015-02-11T04:24:48.101+0000"
}
id": {
"$numberLong": "2405068880"
}
write a processor that can sit behind WebHdfsPersistReader and clean this up,
such that mongoexport -> WebHdfsPersistReader -> MongoExportCleanup ->
downstream works equivalently to MongoPersistReader -> downstream
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)