[beam] branch asf-site updated: Publishing website 2019/12/11 00:59:41 at commit 11c60b8
This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/beam.git The following commit(s) were added to refs/heads/asf-site by this push: new dcf3676 Publishing website 2019/12/11 00:59:41 at commit 11c60b8 dcf3676 is described below commit dcf3676a00857826169f08fe153b223ffad65b0e Author: jenkins AuthorDate: Wed Dec 11 00:59:42 2019 + Publishing website 2019/12/11 00:59:41 at commit 11c60b8 --- .../extensions/create-external-table/index.html| 31 +++--- 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/website/generated-content/documentation/dsls/sql/extensions/create-external-table/index.html b/website/generated-content/documentation/dsls/sql/extensions/create-external-table/index.html index 2fc6503..c1a1eee 100644 --- a/website/generated-content/documentation/dsls/sql/extensions/create-external-table/index.html +++ b/website/generated-content/documentation/dsls/sql/extensions/create-external-table/index.html @@ -431,14 +431,26 @@ See the I/O specific sections for tblProperties< CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tableName (tableElement [, tableElement ]*) TYPE bigquery LOCATION '[PROJECT_ID]:[DATASET].[TABLE]' +TBLPROPERTIES '{"method": "DEFAULT"}' - LOCATION:Location of the table in the BigQuery CLI format. + LOCATION: Location of the table in the BigQuery CLI format. - PROJECT_ID: ID of the Google Cloud Project - DATASET: BigQuery Dataset ID - TABLE: BigQuery Table ID within the Dataset + PROJECT_ID: ID of the Google Cloud Project. + DATASET: BigQuery Dataset ID. + TABLE: BigQuery Table ID within the Dataset. + + + TBLPROPERTIES: + + method: Optional. Read method to use. Following options are available: + + DEFAULT: If no property is set, will be used as default. Currently uses EXPORT. + DIRECT_READ: Use the BigQuery Storage API. + EXPORT: Export data to Google Cloud Storage in Avro format and read data files from that location. + + @@ -448,6 +460,17 @@ LOCATION '[PROJECT_ID]:[DATASET].[TABLE]' Beam SQL supports reading columns with simple types (simpleType) and arrays of simple types (ARRAY). +When reading using EXPORT method the following pipeline options should be set: + + project: ID of the Google Cloud Project. + tempLocation: Bucket to store intermediate data in. Ex: gs://temp-storage/temp. + + +When reading using DIRECT_READ method, an optimizer will attempt to perform +project and predicate push-down, potentially reducing the time requited to read the data from BigQuery. + +More information about the BigQuery Storage API can be found https://beam.apache.org/documentation/io/built-in/google-bigquery/#storage-api";>here. + Write Mode if the table does not exist, Beam creates the table specified in location when
[beam] branch master updated: Update SQL BigQuery doc
This is an automated email from the ASF dual-hosted git repository. amaliujia pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/beam.git The following commit(s) were added to refs/heads/master by this push: new 92e92bc Update SQL BigQuery doc new 11c60b8 Merge pull request #10260 from 11moon11/UpdateBigQueryDoc 92e92bc is described below commit 92e92bc0b8fb01b9395e6480480a81832a86111f Author: kirillkozlov AuthorDate: Mon Dec 2 16:11:16 2019 -0800 Update SQL BigQuery doc --- .../dsls/sql/extensions/create-external-table.md | 23 ++ 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/website/src/documentation/dsls/sql/extensions/create-external-table.md b/website/src/documentation/dsls/sql/extensions/create-external-table.md index 81d7dae..2489bb3 100644 --- a/website/src/documentation/dsls/sql/extensions/create-external-table.md +++ b/website/src/documentation/dsls/sql/extensions/create-external-table.md @@ -89,18 +89,33 @@ tableElement: columnName fieldType [ NOT NULL ] CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tableName (tableElement [, tableElement ]*) TYPE bigquery LOCATION '[PROJECT_ID]:[DATASET].[TABLE]' +TBLPROPERTIES '{"method": "DEFAULT"}' ``` -* `LOCATION:`Location of the table in the BigQuery CLI format. -* `PROJECT_ID`: ID of the Google Cloud Project -* `DATASET`: BigQuery Dataset ID -* `TABLE`: BigQuery Table ID within the Dataset +* `LOCATION`: Location of the table in the BigQuery CLI format. +* `PROJECT_ID`: ID of the Google Cloud Project. +* `DATASET`: BigQuery Dataset ID. +* `TABLE`: BigQuery Table ID within the Dataset. +* `TBLPROPERTIES`: +* `method`: Optional. Read method to use. Following options are available: +* `DEFAULT`: If no property is set, will be used as default. Currently uses `EXPORT`. +* `DIRECT_READ`: Use the BigQuery Storage API. +* `EXPORT`: Export data to Google Cloud Storage in Avro format and read data files from that location. ### Read Mode Beam SQL supports reading columns with simple types (`simpleType`) and arrays of simple types (`ARRAY`). +When reading using `EXPORT` method the following pipeline options should be set: +* `project`: ID of the Google Cloud Project. +* `tempLocation`: Bucket to store intermediate data in. Ex: `gs://temp-storage/temp`. + +When reading using `DIRECT_READ` method, an optimizer will attempt to perform +project and predicate push-down, potentially reducing the time requited to read the data from BigQuery. + +More information about the BigQuery Storage API can be found [here](https://beam.apache.org/documentation/io/built-in/google-bigquery/#storage-api). + ### Write Mode if the table does not exist, Beam creates the table specified in location when
[beam] branch master updated (98ad0a6 -> 4b92c34)
This is an automated email from the ASF dual-hosted git repository. apilloud pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from 98ad0a6 Add an ML section to python SDK overview (#10233) add c43ca65 Updated the cost model to favor IO with push-down add c498c21 BigQueryFilter numSupported method add 4b92c34 Merge pull request #10060: [BEAM-8343] [SQL] Updated the cost model to favor IO with push-down. No new revisions were added by this update. Summary of changes: .../sdk/extensions/sql/impl/rel/BeamCalcRel.java | 25 +- .../sql/impl/rel/BeamPushDownIOSourceRel.java | 19 ++-- .../extensions/sql/meta/BeamSqlTableFilter.java| 24 + .../extensions/sql/meta/DefaultTableFilter.java| 5 + .../sql/meta/provider/bigquery/BigQueryFilter.java | 5 + .../sql/meta/provider/test/TestTableFilter.java| 5 + .../sql/meta/CustomTableResolverTest.java | 18 +++- .../provider/bigquery/BigQueryReadWriteIT.java | 6 +++--- 8 files changed, 77 insertions(+), 30 deletions(-)
[beam] branch asf-site updated: Publishing website 2019/12/10 23:41:39 at commit 98ad0a6
This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/beam.git The following commit(s) were added to refs/heads/asf-site by this push: new 85194d2 Publishing website 2019/12/10 23:41:39 at commit 98ad0a6 85194d2 is described below commit 85194d2863dffdba07c1c586d2e06cd00ceb6a51 Author: jenkins AuthorDate: Tue Dec 10 23:41:39 2019 + Publishing website 2019/12/10 23:41:39 at commit 98ad0a6 --- website/generated-content/documentation/sdks/python/index.html | 5 + 1 file changed, 5 insertions(+) diff --git a/website/generated-content/documentation/sdks/python/index.html b/website/generated-content/documentation/sdks/python/index.html index 4eaa12c..07e88fe 100644 --- a/website/generated-content/documentation/sdks/python/index.html +++ b/website/generated-content/documentation/sdks/python/index.html @@ -292,6 +292,7 @@ Python type safety Managing Python pipeline dependencies Developing new I/O connectors for Python + Using Beam Python SDK in your ML pipelines @@ -342,6 +343,10 @@ new I/O connectors. See the D for information about developing new I/O connectors and links to language-specific implementation guidance. +Using Beam Python SDK in your ML pipelines + +https://www.tensorflow.org/tfx";>TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines. TFX is integrated with Beam. For more information, see https://www.tensorflow.org/tfx/guide";>TFX user guide. +
[beam] branch master updated (d032994 -> 98ad0a6)
This is an automated email from the ASF dual-hosted git repository. altay pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from d032994 Merge pull request #9926 from davidcavazos/groupbykey-code add 98ad0a6 Add an ML section to python SDK overview (#10233) No new revisions were added by this update. Summary of changes: website/src/documentation/sdks/python.md | 4 1 file changed, 4 insertions(+)
[beam] branch aaltay-patch-2 updated (03fd14f -> 46acfb3)
This is an automated email from the ASF dual-hosted git repository. altay pushed a change to branch aaltay-patch-2 in repository https://gitbox.apache.org/repos/asf/beam.git. from 03fd14f fixup add 46acfb3 Reviewer comments. No new revisions were added by this update. Summary of changes: website/src/documentation/sdks/python.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[beam] branch master updated: [BEAM-7390] Add code snippet for GroupByKey
This is an automated email from the ASF dual-hosted git repository. altay pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/beam.git The following commit(s) were added to refs/heads/master by this push: new f51edc1 [BEAM-7390] Add code snippet for GroupByKey new d032994 Merge pull request #9926 from davidcavazos/groupbykey-code f51edc1 is described below commit f51edc10e1c724bdf113f84ab3b7283b9fabe19c Author: David Cavazos AuthorDate: Wed Oct 16 18:36:39 2019 -0700 [BEAM-7390] Add code snippet for GroupByKey --- .../snippets/transforms/aggregation/groupbykey.py | 47 +++ .../transforms/aggregation/groupbykey_test.py | 54 ++ 2 files changed, 101 insertions(+) diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupbykey.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupbykey.py new file mode 100644 index 000..83e4f87 --- /dev/null +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupbykey.py @@ -0,0 +1,47 @@ +# coding=utf-8 +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import absolute_import +from __future__ import print_function + + +def groupbykey(test=None): + # [START groupbykey] + import apache_beam as beam + + with beam.Pipeline() as pipeline: +produce_counts = ( +pipeline +| 'Create produce counts' >> beam.Create([ +('spring', '🍓'), +('spring', '🥕'), +('spring', '🍆'), +('spring', '🍅'), +('summer', '🥕'), +('summer', '🍅'), +('summer', '🌽'), +('fall', '🥕'), +('fall', '🍅'), +('winter', '🍆'), +]) +| 'Group counts per produce' >> beam.GroupByKey() +| beam.Map(print) +) +# [END groupbykey] +if test: + test(produce_counts) diff --git a/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupbykey_test.py b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupbykey_test.py new file mode 100644 index 000..4d8283a --- /dev/null +++ b/sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupbykey_test.py @@ -0,0 +1,54 @@ +# coding=utf-8 +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import absolute_import +from __future__ import print_function + +import unittest + +import mock + +from apache_beam.examples.snippets.util import assert_matches_stdout +from apache_beam.testing.test_pipeline import TestPipeline + +from . import groupbykey + + +def check_produce_counts(actual): + expected = '''[START produce_counts] +('spring', ['🍓', '🥕', '🍆', '🍅']) +('summer', ['🥕', '🍅', '🌽']) +('fall', ['🥕', '🍅']) +('winter', ['🍆']) +[END produce_counts]'''.splitlines()[1:-1] + # The elements order is non-deterministic, so sort them first. + assert_matches_stdout( + actual, expected, lambda pair: (pair[0], sorted(pair[1]))) + + +@mock.patch('apache_beam.Pipeline', TestPipeline) +@mock.patch( +'apache_beam.examples.snippets.transforms.aggregation.groupbykey.print', +str) +class GroupByKeyTest(unittest.TestCase): + def test_groupbykey(self): +groupbykey.groupbykey(check_produce_counts) + + +if __name__ == '__main__': + unittest.main()
[beam] branch master updated (bdd70ab -> 095ac4d)
This is an automated email from the ASF dual-hosted git repository. mikhail pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from bdd70ab [BEAM-8575] Test DoFn context params (#10130) new e58cafa Strict equality comparision for the version of tensorflow dependency new 3d7f7d2 Extract installChicagoTaxiExampleRequirements step new 3ea8077 Look up log level values using `getattr` new 095ac4d Merge pull request #10269 from kamilwu/chicago-taxi-dependencies-fix The 24609 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: ...ommit_Python_Chicago_Taxi_Example_Dataflow.groovy | 20 +--- .../testing/benchmarks/chicago_taxi/preprocess.py| 2 +- .../testing/benchmarks/chicago_taxi/process_tfma.py | 2 +- .../testing/benchmarks/chicago_taxi/requirements.txt | 3 +-- .../testing/benchmarks/chicago_taxi/run_chicago.sh | 6 +++--- .../testing/benchmarks/chicago_taxi/setup.py | 6 ++ .../chicago_taxi/tfdv_analyze_and_validate.py| 2 +- .../testing/benchmarks/chicago_taxi/trainer/task.py | 4 ++-- sdks/python/test-suites/dataflow/py2/build.gradle| 18 -- 9 files changed, 36 insertions(+), 27 deletions(-)
[beam] branch master updated (659039e -> bdd70ab)
This is an automated email from the ASF dual-hosted git repository. chamikara pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from 659039e Merge pull request #10332: [BEAM-8858] sdks/java/extensions/sql to declare used-but-undeclared dependencies add bdd70ab [BEAM-8575] Test DoFn context params (#10130) No new revisions were added by this update. Summary of changes: sdks/python/apache_beam/pipeline_test.py | 20 1 file changed, 20 insertions(+)
[beam] branch master updated (dfa2cf5 -> 659039e)
This is an automated email from the ASF dual-hosted git repository. iemejia pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from dfa2cf5 Merge pull request #10326: [BEAM-8929] Remove obsolete InterruptedException hanlding in FnApiControlClient add 2fc88ca Commons-codec dependency in extensions/sql add 131b18d Adding joda_time dependency add b01ac35 Commons_lang3 and jackson_databind dependency add 659039e Merge pull request #10332: [BEAM-8858] sdks/java/extensions/sql to declare used-but-undeclared dependencies No new revisions were added by this update. Summary of changes: sdks/java/extensions/sql/build.gradle | 4 1 file changed, 4 insertions(+)
[beam] branch master updated (44d4568 -> dfa2cf5)
This is an automated email from the ASF dual-hosted git repository. mxm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from 44d4568 Merge pull request #10312 from apache/aaltay-patch-1 add 2a3a7f7 [BEAM-8929] Remove unnecessary exception handling in FnApiControlClientPoolService. add dfa2cf5 Merge pull request #10326: [BEAM-8929] Remove obsolete InterruptedException hanlding in FnApiControlClient No new revisions were added by this update. Summary of changes: .../runners/fnexecution/control/FnApiControlClientPoolService.java | 3 --- 1 file changed, 3 deletions(-)