Hi All, An operator for parsing fixed width records has to be implemented. This operator shall be used to parse fixed width byte array/tuples based on a JSON Schema and emit the parsed bytearray on one port; converted POJO object on another port and the failed bytearray/tuples on an error port.
User will provide a JSON schema definition based on the schema definition as mentioned below. { “recordwidthlength”: “Integer” "recordseparator": "/n", // this would be blank if there is no record separator, default - a newline character "fields": [ { "name": "<Name of the Field>", "type": "<Data Type of Field>", “startCharNum”: “<Integer - Starting Character Position>”, “endCharNum”: “<Integer - End Character Position>”, "constraints": { } }, { "name": "adName", "type": "String", “startCharNum”: “Integer”, “endCharNum”: “Integer”, "constraints": { "required": "true", "pattern": "[az].*[az]$", } } ] } Below are the options to implement this operator. 1) Write a new custom library for parsing fixed width records as existing libraries for the same(e.g. flatowrm jffp etc.) do not have mechanism for constraint checking. The challenges in this approach will be to write a robust library from scratch to handle all our requirements. 2) Extend our already written CsvParser to handle fixed width record. In this approach in the incoming tuple we will have to add a delimiter "character" after every field in the record. The challenges in this approach would be to select a delimiter character and then if the character appears in the stream we will have to escape that character. This approach will increase the memory overhead (as extra characters are inserted as delimiters) but will be comparatively more easy to maintain and operate. Please let me know your thoughts and votes on above approaches. Regards, Hitesh