[beam] branch master updated: BEAM-13189 Python TextIO: add escapechar feature. (#15901)

2021-11-11 Thread tvalentyn
This is an automated email from the ASF dual-hosted git repository.

tvalentyn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new 6a573e4  BEAM-13189 Python TextIO: add escapechar feature. (#15901)
6a573e4 is described below

commit 6a573e431a2b4e69fdd6a861c6f54517bbfa3175
Author: Eugene Nikolaiev 
AuthorDate: Thu Nov 11 10:32:33 2021 +0200

BEAM-13189 Python TextIO: add escapechar feature. (#15901)
---
 CHANGES.md|   1 +
 sdks/python/apache_beam/io/textio.py  |  70 ---
 sdks/python/apache_beam/io/textio_test.py | 139 +-
 3 files changed, 198 insertions(+), 12 deletions(-)

diff --git a/CHANGES.md b/CHANGES.md
index eab4aec..a25c1e8 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -68,6 +68,7 @@
 
 * X feature added (Java/Python) 
([BEAM-X](https://issues.apache.org/jira/browse/BEAM-X)).
 * Add custom delimiters to Python TextIO reads 
([BEAM-12730](https://issues.apache.org/jira/browse/BEAM-12730)).
+* Add escapechar parameter to Python TextIO reads 
([BEAM-13189](https://issues.apache.org/jira/browse/BEAM-13189)).
 * Splittable reading is enabled by default while reading data with ParquetIO 
([BEAM-12070](https://issues.apache.org/jira/browse/BEAM-12070)).
 * DoFn Execution Time metrics added to Go 
([BEAM-13001](https://issues.apache.org/jira/browse/BEAM-13001)).
 * Cross-bundle side input caching is now available in the Go SDK for runners 
that support the feature by setting the EnableSideInputCache hook 
([BEAM-11097](https://issues.apache.org/jira/browse/BEAM-11097)).
diff --git a/sdks/python/apache_beam/io/textio.py 
b/sdks/python/apache_beam/io/textio.py
index 7f9ea6e..f53f9b3 100644
--- a/sdks/python/apache_beam/io/textio.py
+++ b/sdks/python/apache_beam/io/textio.py
@@ -100,7 +100,8 @@ class _TextSource(filebasedsource.FileBasedSource):
validate=True,
skip_header_lines=0,
header_processor_fns=(None, None),
-   delimiter=None):
+   delimiter=None,
+   escapechar=None):
 """Initialize a _TextSource
 
 Args:
@@ -116,6 +117,8 @@ class _TextSource(filebasedsource.FileBasedSource):
   delimiter (bytes) Optional: delimiter to split records.
 Must not self-overlap, because self-overlapping delimiters cause
 ambiguous parsing.
+  escapechar (bytes) Optional: a single byte to escape the records
+delimiter, can also escape itself.
 Raises:
   ValueError: if skip_lines is negative.
 
@@ -147,6 +150,11 @@ class _TextSource(filebasedsource.FileBasedSource):
   if self._is_self_overlapping(delimiter):
 raise ValueError('Delimiter must not self-overlap.')
 self._delimiter = delimiter
+if escapechar is not None:
+  if not (isinstance(escapechar, bytes) and len(escapechar) == 1):
+raise ValueError(
+"escapechar must be bytes of size 1: '%s'" % escapechar)
+self._escapechar = escapechar
 
   def display_data(self):
 parent_dd = super().display_data()
@@ -176,7 +184,7 @@ class _TextSource(filebasedsource.FileBasedSource):
   start_offset = max(start_offset, position_after_processing_header_lines)
   if start_offset > position_after_processing_header_lines:
 # Seeking to one delimiter length before the start index and ignoring
-# the current line. If start_position is at beginning if the line, that
+# the current line. If start_position is at beginning of the line, that
 # line belongs to the current bundle, hence ignoring that is incorrect.
 # Seeking to one delimiter before prevents that.
 
@@ -185,6 +193,16 @@ class _TextSource(filebasedsource.FileBasedSource):
 else:
   required_position = start_offset - 1
 
+if self._escapechar is not None:
+  # Need more bytes to check if the delimiter is escaped.
+  # Seek until the first escapechar if any.
+  while required_position > 0:
+file_to_read.seek(required_position - 1)
+if file_to_read.read(1) == self._escapechar:
+  required_position -= 1
+else:
+  break
+
 file_to_read.seek(required_position)
 read_buffer.reset()
 sep_bounds = self._find_separator_bounds(file_to_read, read_buffer)
@@ -277,11 +295,22 @@ class _TextSource(filebasedsource.FileBasedSource):
   if next_delim >= 0:
 if (self._delimiter is None and
 read_buffer.data[next_delim - 1:next_delim] == b'\r'):
-  # Accept both '\r\n' and '\n' as a default delimiter.
-  return (next_delim - 1, next_delim + 1)
+  if self._escapechar is not None and self._is_escaped(read_buffer,
+   next_delim - 1):
+# Accept '\n' as a default delimiter, because 

[beam] branch master updated (6a573e4 -> 8fb22a8)

2021-11-11 Thread bhulette
This is an automated email from the ASF dual-hosted git repository.

bhulette pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 6a573e4  BEAM-13189 Python TextIO: add escapechar feature. (#15901)
 add 8fb22a8  [BEAM-12550] Parallelizable kurtosis Implementation (#15909)

No new revisions were added by this update.

Summary of changes:
 sdks/python/apache_beam/dataframe/frames.py  | 88 +++-
 sdks/python/apache_beam/dataframe/frames_test.py | 38 ++
 2 files changed, 91 insertions(+), 35 deletions(-)


[beam] branch master updated (8fb22a8 -> e9ebaa4)

2021-11-11 Thread tvalentyn
This is an automated email from the ASF dual-hosted git repository.

tvalentyn pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 8fb22a8  [BEAM-12550] Parallelizable kurtosis Implementation (#15909)
 add e9ebaa4  [BEAM-13016] Remove avro-python3 dependency from Beam (#15900)

No new revisions were added by this update.

Summary of changes:
 CHANGES.md |   1 +
 sdks/python/apache_beam/examples/avro_bitcoin.py   |  36 +--
 .../apache_beam/examples/fastavro_it_test.py   |  97 
 sdks/python/apache_beam/io/avroio.py   | 269 ++---
 sdks/python/apache_beam/io/avroio_test.py  |  93 ++-
 sdks/python/apache_beam/io/gcp/bigquery.py |   2 +-
 .../apache_beam/io/gcp/bigquery_avro_tools_test.py | 215 +---
 .../apache_beam/io/gcp/bigquery_read_internal.py   |   2 +-
 .../runners/dataflow/dataflow_runner.py|   7 -
 .../runners/dataflow/dataflow_runner_test.py   |  10 -
 .../apache_beam/runners/dataflow/internal/names.py |   2 +-
 .../apache_beam/testing/datatype_inference.py  |  13 +-
 .../apache_beam/testing/datatype_inference_test.py |  15 +-
 sdks/python/setup.py   |   1 -
 14 files changed, 182 insertions(+), 581 deletions(-)


[beam] branch master updated (e9ebaa4 -> 0ba6eca)

2021-11-11 Thread bhulette
This is an automated email from the ASF dual-hosted git repository.

bhulette pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from e9ebaa4  [BEAM-13016] Remove avro-python3 dependency from Beam (#15900)
 add 0ba6eca  [BEAM-13133] Loosen partitioning requirement for sample 
(#15818)

No new revisions were added by this update.

Summary of changes:
 sdks/python/apache_beam/dataframe/frames.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


[beam] branch master updated (0ba6eca -> 7a5b47b)

2021-11-11 Thread ibzib
This is an automated email from the ASF dual-hosted git repository.

ibzib pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 0ba6eca  [BEAM-13133] Loosen partitioning requirement for sample 
(#15818)
 new ff497ff  Update Beam website to release 2.34.0.
 new 3437306  Prepare docs for 2.34.0 release.
 new 7a5b47b  Merge pull request #15834 from ibzib/website-234

The 33538 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGES.md |  18 +--
 website/www/site/config.toml   |   2 +-
 website/www/site/content/en/blog/beam-2.34.0.md| 152 +
 .../www/site/content/en/get-started/downloads.md   |   8 ++
 website/www/site/static/.htaccess  |   2 +-
 5 files changed, 164 insertions(+), 18 deletions(-)
 create mode 100644 website/www/site/content/en/blog/beam-2.34.0.md


[beam] annotated tag sdks/v2.34.0 updated (b3b1843 -> 28a65ef)

2021-11-11 Thread ibzib
This is an automated email from the ASF dual-hosted git repository.

ibzib pushed a change to annotated tag sdks/v2.34.0
in repository https://gitbox.apache.org/repos/asf/beam.git.


*** WARNING: tag sdks/v2.34.0 was modified! ***

from b3b1843  (commit)
  to 28a65ef  (tag)
 tagging 15867770295f4f1e27273984d5188a0bde62e13e (tag)
  length 147 bytes
  by Kyle Weaver
  on Thu Nov 11 11:42:45 2021 -0800

- Log -
Go SDK v2.34.0
-BEGIN PGP SIGNATURE-

iQIzBAABCgAdFiEE8R431/AG0IYjKHZ5e21mc8ea6nIFAmGNcmEACgkQe21mc8ea
6nIuSw//Yc5Cb/2ZS8wwVdByU7oG15ZEFkGj+Ue6/veUdY5yml7v1AUU0pnSM5I5
G34w9bQ9S5J12QbNupM7nI+cbSXzbeL3yl1ramRxlcSe0JRhoC/yOCFAXa0Ih3ad
naaAyr0yuvzrfDL4kWmejhCy7QlzQO3xt3OuctKtJOAp3T4lVBCQvQM/fbG+oO7Q
cpC8jZuw0cjjYYQU/52mLOyuicAF6/E7f1ZsFYHIyVfrX1dcr4tMCo5F/iE5jzLo
z+fLC+RnVD5AKLEXCUcrsgO3TU3YvZzytjEMDYM43uCAu6Q0F91c/YRWLBjAuVeL
b3oIjJm9qqZNSmCu10bjw5ZEKkOJAkHCNuYKXi6Bf6Q/9O+zVR0oCSnK5BgCfesG
q6QQoffM9PRPGlFLasXUSLqWDX7n3fdXFFz0DW5ckdAgmlBQKvBsppqc7cnkZU8E
O3DBRFOq5nko3/rv+TzCyPgBJ6nRcZntXIi6fBeanY/4k5UcQXWX9+6ZHtcXMk+J
8qaN25Npk5eOP9EKJbIunkI3+DtAZbSPM6bYQJ/9XMbAHvgYvzt5mITGT2wWYX/k
qiC04yc/aHwNVOnssuyBJ8nD1iI4NoVnzB2vkJkAzxe0mMSX8ojBbiffwL7DUq3I
QxKau+vJ6+BOt74Q8AT+9+XO3EL3tJ7rY9naV/CfiO7LRELsv+8=
=zkzX
-END PGP SIGNATURE-
---


No new revisions were added by this update.

Summary of changes:


[beam] annotated tag v2.34.0 updated (b3b1843 -> 63ea678)

2021-11-11 Thread ibzib
This is an automated email from the ASF dual-hosted git repository.

ibzib pushed a change to annotated tag v2.34.0
in repository https://gitbox.apache.org/repos/asf/beam.git.


*** WARNING: tag v2.34.0 was modified! ***

from b3b1843  (commit)
  to 63ea678  (tag)
 tagging 15867770295f4f1e27273984d5188a0bde62e13e (tag)
  length 147 bytes
  by Kyle Weaver
  on Thu Nov 11 11:44:07 2021 -0800

- Log -
Apache Beam 2.34.0 release
-BEGIN PGP SIGNATURE-

iQIzBAABCgAdFiEE8R431/AG0IYjKHZ5e21mc8ea6nIFAmGNcq0ACgkQe21mc8ea
6nIpQg//foycL85HggIg/SfVTLEhU9+IOrnNKI1Z56KN1drKM7RAHj8U1Jy1AR7s
aboDQyVuIGIL4ekORcbEomq6+UyuqvT95U2c3ic6/jgZ47+tnnMVmLPEzV3QVBDo
a4D7MdCLejGC5AnZT4xob/mObCDNGNeCdtAqGILQoNGYfpRnsh0AjoycSIASXyH2
K/YBl3m3ajPwCPjyWQzmoLUwWzuLwuwKDlWwLiRIEFr6Szsl4SWmDMsSiGARbhKx
Gl1z4Dl2ORPi56QChaU6SqY013syPL9v5hdq1BydgsGrzF4VkDNRZOQHu3L16as4
kreLX/tXb8fFn/ElVGUpuJ0zoPPsRyqbBrj1xGJ3WFajnCkfUaJkyVtsrvs+63T0
wuXntzv9xx5ZGbN2XsPD04ID3nEvrgWT8yqD897x+DI0/Y1e2w+Ybeto/cR3qExK
A3T7AGL9ouDPcdud35L2tt4RrgyUEuMcHSJx6Dlb0bsTm3JGuxhgqY726FzqKYM/
B5A6XT8yRzmjCOoEyvcXWG9ksUeE0mMDU4nSLpeOtre+ToSmE7GZgb2bDh4Y7LVC
a119JJIXjTCA2nTmNiiZ1bNUN60A1VZQ4M/OiRakWnXLrBDZu9GKQqhbwru1TbjA
yuQEvDr4SSLuGwlc7EaaWD30+cP91TPDBNDMZMjeMWX+Z1xVKxs=
=6f1o
-END PGP SIGNATURE-
---


No new revisions were added by this update.

Summary of changes:


[beam] branch master updated (7a5b47b -> 31af8e5)

2021-11-11 Thread lostluck
This is an automated email from the ASF dual-hosted git repository.

lostluck pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 7a5b47b  Merge pull request #15834 from ibzib/website-234
 add 31af8e5  [BEAM-13228] fix data race in metrics.store (#15946)

No new revisions were added by this update.

Summary of changes:
 sdks/go/pkg/beam/core/metrics/metrics.go  | 2 ++
 sdks/go/pkg/beam/core/metrics/sampler_test.go | 6 +++---
 2 files changed, 5 insertions(+), 3 deletions(-)


[beam] branch master updated (31af8e5 -> 9f0ca71)

2021-11-11 Thread lostluck
This is an automated email from the ASF dual-hosted git repository.

lostluck pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 31af8e5  [BEAM-13228] fix data race in metrics.store (#15946)
 add 9f0ca71  [BEAM-3293] Add binding cases for MultiMap side inputs 
(#15943)

No new revisions were added by this update.

Summary of changes:
 sdks/go/pkg/beam/core/graph/bind.go  | 16 +---
 sdks/go/pkg/beam/core/graph/bind_test.go | 10 ++
 2 files changed, 23 insertions(+), 3 deletions(-)


[beam] branch master updated (9f0ca71 -> 17a5c26)

2021-11-11 Thread bhulette
This is an automated email from the ASF dual-hosted git repository.

bhulette pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 9f0ca71  [BEAM-3293] Add binding cases for MultiMap side inputs 
(#15943)
 add 17a5c26  [BEAM-13222] Re-enable Spanner integration tests (#15948)

No new revisions were added by this update.

Summary of changes:
 .test-infra/jenkins/job_PerformanceTests_SpannerIO_Python.groovy| 6 --
 .../test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java | 2 --
 .../java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java | 2 --
 .../apache_beam/io/gcp/experimental/spannerio_read_it_test.py   | 1 -
 .../apache_beam/io/gcp/experimental/spannerio_write_it_test.py  | 1 -
 sdks/python/apache_beam/io/gcp/tests/xlang_spannerio_it_test.py | 1 -
 6 files changed, 4 insertions(+), 9 deletions(-)


[beam] branch master updated (17a5c26 -> 39efcae)

2021-11-11 Thread pabloem
This is an automated email from the ASF dual-hosted git repository.

pabloem pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 17a5c26  [BEAM-13222] Re-enable Spanner integration tests (#15948)
 add 39efcae  Merge pull request #15926 from [BEAM-13110][Playground] 
Playground pipeline cancelation

No new revisions were added by this update.

Summary of changes:
 playground/api/v1/api.proto|  13 +
 playground/backend/cmd/server/controller.go| 125 --
 playground/backend/cmd/server/controller_test.go   | 119 +-
 playground/backend/go.mod  |   1 +
 playground/backend/go.sum  |   9 +-
 playground/backend/internal/api/v1/api.pb.go   | 451 +
 playground/backend/internal/api/v1/api_grpc.pb.go  |  72 +++-
 playground/backend/internal/cache/cache.go |   3 +
 .../backend/internal/cache/redis/redis_cache.go|   3 +
 playground/frontend/lib/api/v1/api.pb.dart |  78 +++-
 playground/frontend/lib/api/v1/api.pbenum.dart |   2 +
 playground/frontend/lib/api/v1/api.pbgrpc.dart |  24 ++
 playground/frontend/lib/api/v1/api.pbjson.dart |  20 +-
 13 files changed, 683 insertions(+), 237 deletions(-)


[beam] branch master updated (39efcae -> 55f66aa)

2021-11-11 Thread bhulette
This is an automated email from the ASF dual-hosted git repository.

bhulette pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 39efcae  Merge pull request #15926 from [BEAM-13110][Playground] 
Playground pipeline cancelation
 add 55f66aa  [BEAM-13025] Disable deduplicating messages, as Dedupe is 
broken on runner v2 (#15953)

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/beam/sdk/io/gcp/pubsublite/ReadWriteIT.java| 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)


[beam] branch master updated (55f66aa -> 4351c49)

2021-11-11 Thread lostluck
This is an automated email from the ASF dual-hosted git repository.

lostluck pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 55f66aa  [BEAM-13025] Disable deduplicating messages, as Dedupe is 
broken on runner v2 (#15953)
 add 4351c49  [BEAM-13001] collect DoFn metrics for Combine (#15911)

No new revisions were added by this update.

Summary of changes:
 sdks/go/pkg/beam/core/runtime/exec/combine.go | 16 
 1 file changed, 16 insertions(+)


[beam] branch nightly-refs/heads/master updated (8d34bd8 -> 4351c49)

2021-11-11 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch nightly-refs/heads/master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 8d34bd8  [BEAM-12651] Exclude packages from jacoco report (#15792)
 add 10d9594  [BEAM-13222] Skip spanner integration tests (#15942)
 add 2bc4913  Fix environment_service_test.go
 add f8a4782  Merge pull request #15930 from [BEAM-13176] [Playground] 
[BUGFIX] Add new argument to NewNetworkEnv in env service tests
 add 6a573e4  BEAM-13189 Python TextIO: add escapechar feature. (#15901)
 add 8fb22a8  [BEAM-12550] Parallelizable kurtosis Implementation (#15909)
 add e9ebaa4  [BEAM-13016] Remove avro-python3 dependency from Beam (#15900)
 add 0ba6eca  [BEAM-13133] Loosen partitioning requirement for sample 
(#15818)
 add ff497ff  Update Beam website to release 2.34.0.
 add 3437306  Prepare docs for 2.34.0 release.
 add 7a5b47b  Merge pull request #15834 from ibzib/website-234
 add 31af8e5  [BEAM-13228] fix data race in metrics.store (#15946)
 add 9f0ca71  [BEAM-3293] Add binding cases for MultiMap side inputs 
(#15943)
 add 17a5c26  [BEAM-13222] Re-enable Spanner integration tests (#15948)
 add 39efcae  Merge pull request #15926 from [BEAM-13110][Playground] 
Playground pipeline cancelation
 add 55f66aa  [BEAM-13025] Disable deduplicating messages, as Dedupe is 
broken on runner v2 (#15953)
 add 4351c49  [BEAM-13001] collect DoFn metrics for Combine (#15911)

No new revisions were added by this update.

Summary of changes:
 .../job_PerformanceTests_SpannerIO_Python.groovy   |   6 +-
 CHANGES.md |  20 +-
 playground/api/v1/api.proto|  13 +
 playground/backend/cmd/server/controller.go| 125 --
 playground/backend/cmd/server/controller_test.go   | 119 +-
 playground/backend/go.mod  |   1 +
 playground/backend/go.sum  |   9 +-
 playground/backend/internal/api/v1/api.pb.go   | 451 +
 playground/backend/internal/api/v1/api_grpc.pb.go  |  72 +++-
 playground/backend/internal/cache/cache.go |   3 +
 .../backend/internal/cache/redis/redis_cache.go|   3 +
 .../environment/environment_service_test.go|   4 +-
 playground/frontend/lib/api/v1/api.pb.dart |  78 +++-
 playground/frontend/lib/api/v1/api.pbenum.dart |   2 +
 playground/frontend/lib/api/v1/api.pbgrpc.dart |  24 ++
 playground/frontend/lib/api/v1/api.pbjson.dart |  20 +-
 sdks/go/pkg/beam/core/graph/bind.go|  16 +-
 sdks/go/pkg/beam/core/graph/bind_test.go   |  10 +
 sdks/go/pkg/beam/core/metrics/metrics.go   |   2 +
 sdks/go/pkg/beam/core/metrics/sampler_test.go  |   6 +-
 sdks/go/pkg/beam/core/runtime/exec/combine.go  |  16 +
 .../beam/sdk/io/gcp/pubsublite/ReadWriteIT.java|   7 +-
 sdks/python/apache_beam/dataframe/frames.py|  90 +++-
 sdks/python/apache_beam/dataframe/frames_test.py   |  38 +-
 sdks/python/apache_beam/examples/avro_bitcoin.py   |  36 +-
 .../apache_beam/examples/fastavro_it_test.py   |  97 ++---
 sdks/python/apache_beam/io/avroio.py   | 269 ++--
 sdks/python/apache_beam/io/avroio_test.py  |  93 +
 sdks/python/apache_beam/io/gcp/bigquery.py |   2 +-
 .../apache_beam/io/gcp/bigquery_avro_tools_test.py | 215 +++---
 .../apache_beam/io/gcp/bigquery_read_internal.py   |   2 +-
 sdks/python/apache_beam/io/textio.py   |  70 +++-
 sdks/python/apache_beam/io/textio_test.py  | 139 ++-
 .../runners/dataflow/dataflow_runner.py|   7 -
 .../runners/dataflow/dataflow_runner_test.py   |  10 -
 .../apache_beam/runners/dataflow/internal/names.py |   2 +-
 .../apache_beam/testing/datatype_inference.py  |  13 +-
 .../apache_beam/testing/datatype_inference_test.py |  15 +-
 sdks/python/setup.py   |   1 -
 website/www/site/config.toml   |   2 +-
 website/www/site/content/en/blog/beam-2.34.0.md| 152 +++
 .../www/site/content/en/get-started/downloads.md   |   8 +
 website/www/site/static/.htaccess  |   2 +-
 43 files changed, 1374 insertions(+), 896 deletions(-)
 create mode 100644 website/www/site/content/en/blog/beam-2.34.0.md