This is an automated email from the ASF dual-hosted git repository.
damccorm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/master by this push:
new 35a0b68c06b Fix small doc issues (#29578)
35a0b68c06b is described below
commit 35a0b68c06b5e446d17c7c7081d2a7f13c85372c
Author: liferoad <[email protected]>
AuthorDate: Fri Dec 1 09:08:21 2023 -0500
Fix small doc issues (#29578)
---
CHANGES.md | 2 +-
sdks/python/apache_beam/io/gcp/bigquery.py | 18 +++----
website/www/site/content/en/blog/beam-2.52.0.md | 69 ++++++++++++++++++++++++-
3 files changed, 78 insertions(+), 11 deletions(-)
diff --git a/CHANGES.md b/CHANGES.md
index 9318e85d477..34a653d75ce 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -107,7 +107,7 @@ should handle this.
([#25252](https://github.com/apache/beam/issues/25252)).
* Add `UseDataStreamForBatch` pipeline option to the Flink runner. When it is
set to true, Flink runner will run batch
jobs using the DataStream API. By default the option is set to false, so the
batch jobs are still executed
using the DataSet API.
-* `upload_graph` as one of the Experiments options for DataflowRunner is no
longer required when the graph is larger than 10MB for Java SDK
([PR#28621](https://github.com/apache/beam/pull/28621).
+* `upload_graph` as one of the Experiments options for DataflowRunner is no
longer required when the graph is larger than 10MB for Java SDK
([PR#28621](https://github.com/apache/beam/pull/28621)).
* state amd side input cache has been enabled to a default of 100 MB. Use
`--max_cache_memory_usage_mb=X` to provide cache size for the user state API
and side inputs. (Python)
([#28770](https://github.com/apache/beam/issues/28770)).
* Beam YAML stable release. Beam pipelines can now be written using YAML and
leverage the Beam YAML framework which includes a preliminary set of IO's and
turnkey transforms. More information can be found in the YAML root folder and
in the
[README](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/README.md).
diff --git a/sdks/python/apache_beam/io/gcp/bigquery.py
b/sdks/python/apache_beam/io/gcp/bigquery.py
index 184138af752..ac06425e95a 100644
--- a/sdks/python/apache_beam/io/gcp/bigquery.py
+++ b/sdks/python/apache_beam/io/gcp/bigquery.py
@@ -72,7 +72,8 @@ When creating a BigQuery input transform, users should
provide either a query
or a table. Pipeline construction will fail with a validation error if neither
or both are specified.
-When reading via `ReadFromBigQuery`, bytes are returned decoded as bytes.
+When reading via `ReadFromBigQuery` using `EXPORT`,
+bytes are returned decoded as bytes.
This is due to the fact that ReadFromBigQuery uses Avro exports by default.
When reading from BigQuery using `apache_beam.io.BigQuerySource`, bytes are
returned as base64-encoded bytes. To get base64-encoded bytes using
@@ -2597,6 +2598,8 @@ class StorageWriteToBigQuery(PTransform):
class ReadFromBigQuery(PTransform):
+ # pylint: disable=line-too-long,W1401
+
"""Read data from BigQuery.
This PTransform uses a BigQuery export job to take a snapshot of the table
@@ -2653,8 +2656,7 @@ class ReadFromBigQuery(PTransform):
:data:`None`, then the temp_location parameter is used.
bigquery_job_labels (dict): A dictionary with string labels to be passed
to BigQuery export and query jobs created by this transform. See:
- https://cloud.google.com/bigquery/docs/reference/rest/v2/\
- Job#JobConfiguration
+
https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfiguration
use_json_exports (bool): By default, this transform works by exporting
BigQuery data into Avro files, and reading those files. With this
parameter, the transform will instead export to JSON files. JSON files
@@ -2666,11 +2668,10 @@ class ReadFromBigQuery(PTransform):
types (datetime.date, datetime.datetime, datetime.datetime,
and datetime.datetime respectively). Avro exports are recommended.
To learn more about BigQuery types, and Time-related type
- representations, see: https://cloud.google.com/bigquery/docs/reference/\
- standard-sql/data-types
+ representations,
+ see:
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types
To learn more about type conversions between BigQuery and Avro, see:
- https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro\
- #avro_conversions
+
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro\#avro_conversions
temp_dataset (``apache_beam.io.gcp.internal.clients.bigquery.\
DatasetReference``):
Temporary dataset reference to use when reading from BigQuery using a
@@ -2690,8 +2691,7 @@ class ReadFromBigQuery(PTransform):
(`PYTHON_DICT`). There is experimental support for producing a
PCollection with a schema and yielding Beam Rows via the option
`BEAM_ROW`. For more information on schemas, see
- https://beam.apache.org/documentation/programming-guide/\
- #what-is-a-schema)
+
https://beam.apache.org/documentation/programming-guide/#what-is-a-schema)
"""
class Method(object):
EXPORT = 'EXPORT' # This is currently the default.
diff --git a/website/www/site/content/en/blog/beam-2.52.0.md
b/website/www/site/content/en/blog/beam-2.52.0.md
index 5654f16ceb3..2e604c8fabf 100644
--- a/website/www/site/content/en/blog/beam-2.52.0.md
+++ b/website/www/site/content/en/blog/beam-2.52.0.md
@@ -41,7 +41,7 @@ should handle this.
([#25252](https://github.com/apache/beam/issues/25252)).
* Add `UseDataStreamForBatch` pipeline option to the Flink runner. When it is
set to true, Flink runner will run batch
jobs using the DataStream API. By default the option is set to false, so the
batch jobs are still executed
using the DataSet API.
-* `upload_graph` as one of the Experiments options for DataflowRunner is no
longer required when the graph is larger than 10MB for Java SDK
([PR#28621](https://github.com/apache/beam/pull/28621).
+* `upload_graph` as one of the Experiments options for DataflowRunner is no
longer required when the graph is larger than 10MB for Java SDK
([PR#28621](https://github.com/apache/beam/pull/28621)).
* state amd side input cache has been enabled to a default of 100 MB. Use
`--max_cache_memory_usage_mb=X` to provide cache size for the user state API
and side inputs. (Python)
([#28770](https://github.com/apache/beam/issues/28770)).
* Beam YAML stable release. Beam pipelines can now be written using YAML and
leverage the Beam YAML framework which includes a preliminary set of IO's and
turnkey transforms. More information can be found in the YAML root folder and
in the
[README](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/README.md).
@@ -69,69 +69,136 @@ as a workaround, a copy of "old" `CountingSource` class
should be placed into a
According to git shortlog, the following people contributed to the 2.52.0
release. Thank you to all contributors!
Ahmed Abualsaud
+
Ahmet Altay
+
Aleksandr Dudko
+
Alexey Romanenko
+
Anand Inguva
+
Andrei Gurau
+
Andrey Devyatkin
+
BjornPrime
+
Bruno Volpato
+
Bulat
+
Chamikara Jayalath
+
Damon
+
Danny McCormick
+
Devansh Modi
+
Dominik Dębowczyk
+
Ferran Fernández Garrido
+
Hai Joey Tran
+
Israel Herraiz
+
Jack McCluskey
+
Jan Lukavský
+
JayajP
+
Jeff Kinard
+
Jeffrey Kinard
+
Jiangjie Qin
+
Jing
+
Joar Wandborg
+
Johanna Öjeling
+
Julien Tournay
+
Kanishk Karanawat
+
Kenneth Knowles
+
Kerry Donny-Clark
+
Luís Bianchin
+
Minbo Bae
+
Pranav Bhandari
+
Rebecca Szper
+
Reuven Lax
+
Ritesh Ghorse
+
Robert Bradshaw
+
Robert Burke
+
RyuSA
+
Shunping Huang
+
Steven van Rossum
+
Svetak Sundhar
+
Tony Tang
+
Vitaly Terentyev
+
Vivek Sumanth
+
Vlado Djerek
+
Yi Hu
+
aku019
+
brucearctor
+
caneff
+
damccorm
+
ddebowczyk92
+
dependabot[bot]
+
dpcollins-google
+
edman124
+
gabry.wu
+
illoise
+
johnjcasey
+
jonathan-lemos
+
kennknowles
+
liferoad
+
magicgoody
+
martin trieu
+
nancyxu123
+
pablo rodriguez defino
+
tvalentyn
+