Approach 2 does not look like a clean solution.

-1 for Approach 2.

~ Yogi

On 7 September 2016 at 15:25, Hitesh Kapoor <hit...@datatorrent.com> wrote:

> Hi All,
>
> An operator for parsing fixed width records has to be implemented.
> This operator shall be used to parse fixed width byte array/tuples based on
> a JSON Schema and emit the parsed bytearray on one port; converted POJO
> object on another port and the failed bytearray/tuples on an error port.
>
>
> User will provide a JSON schema definition based on the schema definition
> as mentioned below.
>
> {
>
> “recordwidthlength”: “Integer”
>
> "recordseparator": "/n", // this would be blank if there is no record
> separator, default - a newline character
>
> "fields": [
>
> {
>
> "name": "<Name of the Field>",
>
> "type": "<Data Type of Field>",
>
> “startCharNum”: “<Integer - Starting Character Position>”,
>
> “endCharNum”: “<Integer - End Character Position>”,
>
> "constraints": {
>
> }
>
> },
>
> {
>
> "name": "adName",
>
> "type": "String",
>
> “startCharNum”: “Integer”,
>
> “endCharNum”: “Integer”,
>
> "constraints": {
>
> "required": "true",
>
> "pattern": "[a­z].*[a­z]$",
>
> }
>
> }
> ]
> }
>
>
> Below are the options to implement this operator.
>
> 1) Write a new custom library for parsing fixed width records as existing
> libraries for the same(e.g. flatowrm jffp etc.) do not have mechanism for
> constraint checking.
> The challenges in this approach will be to write a robust library from
> scratch to handle all our requirements.
>
> 2) Extend our already written CsvParser to handle fixed width record. In
> this approach in the incoming tuple we will have to add a delimiter
> "character" after every field in the record.
> The challenges in this approach would be to select a delimiter character
> and then if the character appears in the stream we will have to escape that
> character.
> This approach will increase the memory overhead (as extra characters are
> inserted as delimiters) but will be comparatively more easy to maintain and
> operate.
>
> Please let me know your thoughts and votes on above approaches.
>
> Regards,
> Hitesh
>

Reply via email to