[jira] [Commented] (NIFI-3495) TextLineDemarcator sets the wrong index when read ahead is performed in isEol operation
[ https://issues.apache.org/jira/browse/NIFI-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871983#comment-15871983 ] ASF GitHub Bot commented on NIFI-3495: -- Github user markap14 commented on the issue: https://github.com/apache/nifi/pull/1518 Code looks good to me. Verified fix. +1 Merged to master. Thanks, @olegz ! > TextLineDemarcator sets the wrong index when read ahead is performed in isEol > operation > --- > > Key: NIFI-3495 > URL: https://issues.apache.org/jira/browse/NIFI-3495 > Project: Apache NiFi > Issue Type: Bug >Reporter: Oleg Zhurakousky >Assignee: Oleg Zhurakousky >Priority: Critical > Fix For: 1.2.0 > > > This condition is very rare. It only occurs when read ahead (call to > _fill()_) is made inside of the _isEol_ operation which essentially sets the > new index which then is reset inside of the main _nextOffsetInfo_ operation. > So the fix is to basically monitor if _isEol_ had to perform read ahead and > if it did do not reset the index. > More details. > While this component is modeled after standard Java BufferedReader which > simply reads and returns lines (delimited by CR or LF or both), this reader > also holds the information about how each line terminated (i.e., EOF, or CR > or LF or CR and LF) returning it to the caller as OffsetInfo. > So for example if you have a record "foo\r\nbar" and you read it with > BuffereReader you will get 'foo' and 'bar'. However you will not know that > between the two tokens there was CR and LF and therefore will not be able to > restore (if need to) the record to its original state. The TextLineDemarcator > will return OffsetInfo which holds the delimiter and other information. > So, to accomplish the above every time we see CR (13) we need to peek at the > next byte and see if its LF(10). When at the end of the buffer such peek > becomes complicated since we need to read more data and so we did, but didn't > handle index properly essentially setting it back to the old value when the > new one was set inside of the fill(). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3495) TextLineDemarcator sets the wrong index when read ahead is performed in isEol operation
[ https://issues.apache.org/jira/browse/NIFI-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871981#comment-15871981 ] ASF GitHub Bot commented on NIFI-3495: -- Github user asfgit closed the pull request at: https://github.com/apache/nifi/pull/1518 > TextLineDemarcator sets the wrong index when read ahead is performed in isEol > operation > --- > > Key: NIFI-3495 > URL: https://issues.apache.org/jira/browse/NIFI-3495 > Project: Apache NiFi > Issue Type: Bug >Reporter: Oleg Zhurakousky >Assignee: Oleg Zhurakousky >Priority: Critical > Fix For: 1.2.0 > > > This condition is very rare. It only occurs when read ahead (call to > _fill()_) is made inside of the _isEol_ operation which essentially sets the > new index which then is reset inside of the main _nextOffsetInfo_ operation. > So the fix is to basically monitor if _isEol_ had to perform read ahead and > if it did do not reset the index. > More details. > While this component is modeled after standard Java BufferedReader which > simply reads and returns lines (delimited by CR or LF or both), this reader > also holds the information about how each line terminated (i.e., EOF, or CR > or LF or CR and LF) returning it to the caller as OffsetInfo. > So for example if you have a record "foo\r\nbar" and you read it with > BuffereReader you will get 'foo' and 'bar'. However you will not know that > between the two tokens there was CR and LF and therefore will not be able to > restore (if need to) the record to its original state. The TextLineDemarcator > will return OffsetInfo which holds the delimiter and other information. > So, to accomplish the above every time we see CR (13) we need to peek at the > next byte and see if its LF(10). When at the end of the buffer such peek > becomes complicated since we need to read more data and so we did, but didn't > handle index properly essentially setting it back to the old value when the > new one was set inside of the fill(). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3495) TextLineDemarcator sets the wrong index when read ahead is performed in isEol operation
[ https://issues.apache.org/jira/browse/NIFI-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871979#comment-15871979 ] ASF subversion and git services commented on NIFI-3495: --- Commit ec868362f3317a79b6518c780af1b9debb843f32 in nifi's branch refs/heads/master from [~ozhurakousky] [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=ec86836 ] NIFI-3495 fixed the index issue with TextLineDemarcator This closes #1518. > TextLineDemarcator sets the wrong index when read ahead is performed in isEol > operation > --- > > Key: NIFI-3495 > URL: https://issues.apache.org/jira/browse/NIFI-3495 > Project: Apache NiFi > Issue Type: Bug >Reporter: Oleg Zhurakousky >Assignee: Oleg Zhurakousky >Priority: Critical > Fix For: 1.2.0 > > > This condition is very rare. It only occurs when read ahead (call to > _fill()_) is made inside of the _isEol_ operation which essentially sets the > new index which then is reset inside of the main _nextOffsetInfo_ operation. > So the fix is to basically monitor if _isEol_ had to perform read ahead and > if it did do not reset the index. > More details. > While this component is modeled after standard Java BufferedReader which > simply reads and returns lines (delimited by CR or LF or both), this reader > also holds the information about how each line terminated (i.e., EOF, or CR > or LF or CR and LF) returning it to the caller as OffsetInfo. > So for example if you have a record "foo\r\nbar" and you read it with > BuffereReader you will get 'foo' and 'bar'. However you will not know that > between the two tokens there was CR and LF and therefore will not be able to > restore (if need to) the record to its original state. The TextLineDemarcator > will return OffsetInfo which holds the delimiter and other information. > So, to accomplish the above every time we see CR (13) we need to peek at the > next byte and see if its LF(10). When at the end of the buffer such peek > becomes complicated since we need to read more data and so we did, but didn't > handle index properly essentially setting it back to the old value when the > new one was set inside of the fill(). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3495) TextLineDemarcator sets the wrong index when read ahead is performed in isEol operation
[ https://issues.apache.org/jira/browse/NIFI-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871136#comment-15871136 ] Joseph Witt commented on NIFI-3495: --- i have verified that this corrects the issue observed. Before this patch data from this site when split would split very incorrectly. After this patch it appears to split the results perfectly. http://standards.ieee.org/develop/regauth/oui/oui.csv It still needs a code review > TextLineDemarcator sets the wrong index when read ahead is performed in isEol > operation > --- > > Key: NIFI-3495 > URL: https://issues.apache.org/jira/browse/NIFI-3495 > Project: Apache NiFi > Issue Type: Bug >Reporter: Oleg Zhurakousky >Assignee: Oleg Zhurakousky >Priority: Critical > Fix For: 1.2.0 > > > This condition is very rare. It only occurs when read ahead (call to > _fill()_) is made inside of the _isEol_ operation which essentially sets the > new index which then is reset inside of the main _nextOffsetInfo_ operation. > So the fix is to basically monitor if _isEol_ had to perform read ahead and > if it did do not reset the index. > More details. > While this component is modeled after standard Java BufferedReader which > simply reads and returns lines (delimited by CR or LF or both), this reader > also holds the information about how each line terminated (i.e., EOF, or CR > or LF or CR and LF) returning it to the caller as OffsetInfo. > So for example if you have a record "foo\r\nbar" and you read it with > BuffereReader you will get 'foo' and 'bar'. However you will not know that > between the two tokens there was CR and LF and therefore will not be able to > restore (if need to) the record to its original state. The TextLineDemarcator > will return OffsetInfo which holds the delimiter and other information. > So, to accomplish the above every time we see CR (13) we need to peek at the > next byte and see if its LF(10). When at the end of the buffer such peek > becomes complicated since we need to read more data and so we did, but didn't > handle index properly essentially setting it back to the old value when the > new one was set inside of the fill(). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3495) TextLineDemarcator sets the wrong index when read ahead is performed in isEol operation
[ https://issues.apache.org/jira/browse/NIFI-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871056#comment-15871056 ] ASF GitHub Bot commented on NIFI-3495: -- GitHub user olegz opened a pull request: https://github.com/apache/nifi/pull/1518 NIFI-3495 fixed the index issue with TextLineDemarcator Thank you for submitting a contribution to Apache NiFi. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [ ] Does your PR title start with NIFI- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? - [ ] Is your initial contribution a single, squashed commit? ### For code changes: - [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder? - [ ] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly? - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly? - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/olegz/nifi NIFI-3495 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/1518.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1518 commit d65ceef58adc448cfa321411d5f4651459063e1c Author: Oleg Zhurakousky Date: 2017-02-17T02:05:59Z NIFI-3495 fixed the index issue with TextLineDemarcator > TextLineDemarcator sets the wrong index when read ahead is performed in isEol > operation > --- > > Key: NIFI-3495 > URL: https://issues.apache.org/jira/browse/NIFI-3495 > Project: Apache NiFi > Issue Type: Bug >Reporter: Oleg Zhurakousky >Assignee: Oleg Zhurakousky >Priority: Critical > Fix For: 1.2.0 > > > This condition is very rare. It only occurs when read ahead (call to > _fill()_) is made inside of the _isEol_ operation which essentially sets the > new index which then is reset inside of the main _nextOffsetInfo_ operation. > So the fix is to basically monitor if _isEol_ had to perform read ahead and > if it did do not reset the index. > More details. > While this component is modeled after standard Java BufferedReader which > simply reads and returns lines (delimited by CR or LF or both), this reader > also holds the information about how each line terminated (i.e., EOF, or CR > or LF or CR and LF) returning it to the caller as OffsetInfo. > So for example if you have a record "foo\r\nbar" and you read it with > BuffereReader you will get 'foo' and 'bar'. However you will not know that > between the two tokens there was CR and LF and therefore will not be able to > restore (if need to) the record to its original state. The TextLineDemarcator > will return OffsetInfo which holds the delimiter and other information. > So, to accomplish the above every time we see CR (13) we need to peek at the > next byte and see if its LF(10). When at the end of the buffer such peek > becomes complicated since we need to read more data and so we did, but didn't > handle index properly essentially setting it back to the old value when the > new one was set inside of the fill(). -- This message was sent by Atlassian JIRA (v6.3.15#6346)