Approach 2 does not look like a clean solution. -1 for Approach 2.
~ Yogi On 7 September 2016 at 15:25, Hitesh Kapoor <hit...@datatorrent.com> wrote: > Hi All, > > An operator for parsing fixed width records has to be implemented. > This operator shall be used to parse fixed width byte array/tuples based on > a JSON Schema and emit the parsed bytearray on one port; converted POJO > object on another port and the failed bytearray/tuples on an error port. > > > User will provide a JSON schema definition based on the schema definition > as mentioned below. > > { > > “recordwidthlength”: “Integer” > > "recordseparator": "/n", // this would be blank if there is no record > separator, default - a newline character > > "fields": [ > > { > > "name": "<Name of the Field>", > > "type": "<Data Type of Field>", > > “startCharNum”: “<Integer - Starting Character Position>”, > > “endCharNum”: “<Integer - End Character Position>”, > > "constraints": { > > } > > }, > > { > > "name": "adName", > > "type": "String", > > “startCharNum”: “Integer”, > > “endCharNum”: “Integer”, > > "constraints": { > > "required": "true", > > "pattern": "[az].*[az]$", > > } > > } > ] > } > > > Below are the options to implement this operator. > > 1) Write a new custom library for parsing fixed width records as existing > libraries for the same(e.g. flatowrm jffp etc.) do not have mechanism for > constraint checking. > The challenges in this approach will be to write a robust library from > scratch to handle all our requirements. > > 2) Extend our already written CsvParser to handle fixed width record. In > this approach in the incoming tuple we will have to add a delimiter > "character" after every field in the record. > The challenges in this approach would be to select a delimiter character > and then if the character appears in the stream we will have to escape that > character. > This approach will increase the memory overhead (as extra characters are > inserted as delimiters) but will be comparatively more easy to maintain and > operate. > > Please let me know your thoughts and votes on above approaches. > > Regards, > Hitesh >