[jira] [Commented] (NIFI-3495) TextLineDemarcator sets the wrong index when read ahead is performed in isEol operation

2017-02-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871983#comment-15871983
 ] 

ASF GitHub Bot commented on NIFI-3495:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/1518
  
Code looks good to me. Verified fix. +1 Merged to master. Thanks, @olegz !


> TextLineDemarcator sets the wrong index when read ahead is performed in isEol 
> operation
> ---
>
> Key: NIFI-3495
> URL: https://issues.apache.org/jira/browse/NIFI-3495
> Project: Apache NiFi
>  Issue Type: Bug
>Reporter: Oleg Zhurakousky
>Assignee: Oleg Zhurakousky
>Priority: Critical
> Fix For: 1.2.0
>
>
> This condition is very rare. It only occurs when read ahead (call to 
> _fill()_)  is made inside of the _isEol_ operation which essentially sets the 
> new index which then is reset inside of the main _nextOffsetInfo_ operation. 
> So the fix is to basically monitor if _isEol_ had to perform read ahead and 
> if it did do not reset the index.
> More details.
> While this component is modeled after standard Java BufferedReader which 
> simply reads and returns lines (delimited by CR or LF or both), this reader 
> also holds the information about how each line terminated (i.e., EOF, or CR 
> or LF or CR and LF) returning it to the caller as OffsetInfo. 
> So for example if you have a record "foo\r\nbar" and you read it with 
> BuffereReader you will get 'foo' and 'bar'. However you will not know that 
> between the two tokens there was CR and LF and therefore will not be able to 
> restore (if need to) the record to its original state. The TextLineDemarcator 
> will return OffsetInfo which holds the delimiter and other information.
> So, to accomplish the above every time we see CR (13) we need to peek at the 
> next byte and see if its LF(10). When at the end of the buffer such peek 
> becomes complicated since we need to read more data and so we did, but didn't 
> handle index properly essentially setting it back to the old value when the 
> new one was set inside of the fill().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NIFI-3495) TextLineDemarcator sets the wrong index when read ahead is performed in isEol operation

2017-02-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871981#comment-15871981
 ] 

ASF GitHub Bot commented on NIFI-3495:
--

Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/1518


> TextLineDemarcator sets the wrong index when read ahead is performed in isEol 
> operation
> ---
>
> Key: NIFI-3495
> URL: https://issues.apache.org/jira/browse/NIFI-3495
> Project: Apache NiFi
>  Issue Type: Bug
>Reporter: Oleg Zhurakousky
>Assignee: Oleg Zhurakousky
>Priority: Critical
> Fix For: 1.2.0
>
>
> This condition is very rare. It only occurs when read ahead (call to 
> _fill()_)  is made inside of the _isEol_ operation which essentially sets the 
> new index which then is reset inside of the main _nextOffsetInfo_ operation. 
> So the fix is to basically monitor if _isEol_ had to perform read ahead and 
> if it did do not reset the index.
> More details.
> While this component is modeled after standard Java BufferedReader which 
> simply reads and returns lines (delimited by CR or LF or both), this reader 
> also holds the information about how each line terminated (i.e., EOF, or CR 
> or LF or CR and LF) returning it to the caller as OffsetInfo. 
> So for example if you have a record "foo\r\nbar" and you read it with 
> BuffereReader you will get 'foo' and 'bar'. However you will not know that 
> between the two tokens there was CR and LF and therefore will not be able to 
> restore (if need to) the record to its original state. The TextLineDemarcator 
> will return OffsetInfo which holds the delimiter and other information.
> So, to accomplish the above every time we see CR (13) we need to peek at the 
> next byte and see if its LF(10). When at the end of the buffer such peek 
> becomes complicated since we need to read more data and so we did, but didn't 
> handle index properly essentially setting it back to the old value when the 
> new one was set inside of the fill().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NIFI-3495) TextLineDemarcator sets the wrong index when read ahead is performed in isEol operation

2017-02-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871979#comment-15871979
 ] 

ASF subversion and git services commented on NIFI-3495:
---

Commit ec868362f3317a79b6518c780af1b9debb843f32 in nifi's branch 
refs/heads/master from [~ozhurakousky]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=ec86836 ]

NIFI-3495 fixed the index issue with TextLineDemarcator

This closes #1518.


> TextLineDemarcator sets the wrong index when read ahead is performed in isEol 
> operation
> ---
>
> Key: NIFI-3495
> URL: https://issues.apache.org/jira/browse/NIFI-3495
> Project: Apache NiFi
>  Issue Type: Bug
>Reporter: Oleg Zhurakousky
>Assignee: Oleg Zhurakousky
>Priority: Critical
> Fix For: 1.2.0
>
>
> This condition is very rare. It only occurs when read ahead (call to 
> _fill()_)  is made inside of the _isEol_ operation which essentially sets the 
> new index which then is reset inside of the main _nextOffsetInfo_ operation. 
> So the fix is to basically monitor if _isEol_ had to perform read ahead and 
> if it did do not reset the index.
> More details.
> While this component is modeled after standard Java BufferedReader which 
> simply reads and returns lines (delimited by CR or LF or both), this reader 
> also holds the information about how each line terminated (i.e., EOF, or CR 
> or LF or CR and LF) returning it to the caller as OffsetInfo. 
> So for example if you have a record "foo\r\nbar" and you read it with 
> BuffereReader you will get 'foo' and 'bar'. However you will not know that 
> between the two tokens there was CR and LF and therefore will not be able to 
> restore (if need to) the record to its original state. The TextLineDemarcator 
> will return OffsetInfo which holds the delimiter and other information.
> So, to accomplish the above every time we see CR (13) we need to peek at the 
> next byte and see if its LF(10). When at the end of the buffer such peek 
> becomes complicated since we need to read more data and so we did, but didn't 
> handle index properly essentially setting it back to the old value when the 
> new one was set inside of the fill().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NIFI-3495) TextLineDemarcator sets the wrong index when read ahead is performed in isEol operation

2017-02-16 Thread Joseph Witt (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871136#comment-15871136
 ] 

Joseph Witt commented on NIFI-3495:
---

i have verified that this corrects the issue observed.  Before this patch data 
from this site when split would split very incorrectly.  After this patch it 
appears to split the results perfectly.

http://standards.ieee.org/develop/regauth/oui/oui.csv

It still needs a code review

> TextLineDemarcator sets the wrong index when read ahead is performed in isEol 
> operation
> ---
>
> Key: NIFI-3495
> URL: https://issues.apache.org/jira/browse/NIFI-3495
> Project: Apache NiFi
>  Issue Type: Bug
>Reporter: Oleg Zhurakousky
>Assignee: Oleg Zhurakousky
>Priority: Critical
> Fix For: 1.2.0
>
>
> This condition is very rare. It only occurs when read ahead (call to 
> _fill()_)  is made inside of the _isEol_ operation which essentially sets the 
> new index which then is reset inside of the main _nextOffsetInfo_ operation. 
> So the fix is to basically monitor if _isEol_ had to perform read ahead and 
> if it did do not reset the index.
> More details.
> While this component is modeled after standard Java BufferedReader which 
> simply reads and returns lines (delimited by CR or LF or both), this reader 
> also holds the information about how each line terminated (i.e., EOF, or CR 
> or LF or CR and LF) returning it to the caller as OffsetInfo. 
> So for example if you have a record "foo\r\nbar" and you read it with 
> BuffereReader you will get 'foo' and 'bar'. However you will not know that 
> between the two tokens there was CR and LF and therefore will not be able to 
> restore (if need to) the record to its original state. The TextLineDemarcator 
> will return OffsetInfo which holds the delimiter and other information.
> So, to accomplish the above every time we see CR (13) we need to peek at the 
> next byte and see if its LF(10). When at the end of the buffer such peek 
> becomes complicated since we need to read more data and so we did, but didn't 
> handle index properly essentially setting it back to the old value when the 
> new one was set inside of the fill().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NIFI-3495) TextLineDemarcator sets the wrong index when read ahead is performed in isEol operation

2017-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871056#comment-15871056
 ] 

ASF GitHub Bot commented on NIFI-3495:
--

GitHub user olegz opened a pull request:

https://github.com/apache/nifi/pull/1518

NIFI-3495 fixed the index issue with TextLineDemarcator

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/olegz/nifi NIFI-3495

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/1518.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1518


commit d65ceef58adc448cfa321411d5f4651459063e1c
Author: Oleg Zhurakousky 
Date:   2017-02-17T02:05:59Z

NIFI-3495 fixed the index issue with TextLineDemarcator




> TextLineDemarcator sets the wrong index when read ahead is performed in isEol 
> operation
> ---
>
> Key: NIFI-3495
> URL: https://issues.apache.org/jira/browse/NIFI-3495
> Project: Apache NiFi
>  Issue Type: Bug
>Reporter: Oleg Zhurakousky
>Assignee: Oleg Zhurakousky
>Priority: Critical
> Fix For: 1.2.0
>
>
> This condition is very rare. It only occurs when read ahead (call to 
> _fill()_)  is made inside of the _isEol_ operation which essentially sets the 
> new index which then is reset inside of the main _nextOffsetInfo_ operation. 
> So the fix is to basically monitor if _isEol_ had to perform read ahead and 
> if it did do not reset the index.
> More details.
> While this component is modeled after standard Java BufferedReader which 
> simply reads and returns lines (delimited by CR or LF or both), this reader 
> also holds the information about how each line terminated (i.e., EOF, or CR 
> or LF or CR and LF) returning it to the caller as OffsetInfo. 
> So for example if you have a record "foo\r\nbar" and you read it with 
> BuffereReader you will get 'foo' and 'bar'. However you will not know that 
> between the two tokens there was CR and LF and therefore will not be able to 
> restore (if need to) the record to its original state. The TextLineDemarcator 
> will return OffsetInfo which holds the delimiter and other information.
> So, to accomplish the above every time we see CR (13) we need to peek at the 
> next byte and see if its LF(10). When at the end of the buffer such peek 
> becomes complicated since we need to read more data and so we did, but didn't 
> handle index properly essentially setting it back to the old value when the 
> new one was set inside of the fill().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)