nikie commented on a change in pull request #15901:
URL: https://github.com/apache/beam/pull/15901#discussion_r746358197
##########
File path: sdks/python/apache_beam/io/textio.py
##########
@@ -362,6 +391,15 @@ def _is_self_overlapping(delimiter):
return True
return False
+ def _is_escaped(self, read_buffer, position):
+ # Returns True if byte at position is preceded with an odd number
+ # of escapechar bytes or False if preceded by 0 or even escapes
+ # (the even number means that all the escapes are escaped themselves).
+ for current_pos in reversed(range(-1, position)):
+ if read_buffer.data[current_pos:current_pos + 1] != self._escapechar:
Review comment:
I have updated the code to use an explicit counter. Now it is easier to
understand.
Should I replace bytes slicing comparison with `if
read_buffer.data[current_pos] != self._escapechar[0]`, or let's keep it
consistent with `_find_separator_bounds`?
What about my suggestion in the first comment to have 2
`_find_separator_bounds` - one for the default case, another for custom
delimiter and/or escapechar to avoid extra ifs in the default use case?
I can prepare such version to see how it would look, if you are ready to
consider such code duplication for performance reasons.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]