[jira] [Updated] (BEAM-6910) Beam does not consider BigQuery's processing location when getting query results

2019-03-25 Thread Graham Polley (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Graham Polley updated BEAM-6910:

Description: 
When using the BigQuery source with a SQL query in a pipeline, the "processing 
location" is not taken into consideration and the pipeline fails.

For example, consider the following which uses {{BigQuerySource}} to read from 
BigQuery using some SQL. The BigQuery dataset and tables are located in 
{{australia-southeast1}}. The query is submitted successfully ([Beam works out 
the processing location by examining the first table referenced in the query 
and sets it 
accordingly|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L221]),
 but when Beam attempts to poll for the job status after it has been submitted, 
it fails because it doesn't set the {{location}} to be 
{{australia-southeast1}}, which is required by BigQuery:

 
{code:java}
p | 'read' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True, 
query='SELECT * from 
`a_project_id.dataset_in_australia.table_in_australia`'){code}
 
{code:java}
HttpNotFoundError: HttpError accessing 
:
 response: <{'status': '404', 'content-length': '328', 'x-xss-protection': '1; 
mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 
'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', 
'-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Tue, 26 Mar 
2019 03:11:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'quic=":443"; 
ma=2592000; v="46,44,43,39"', 'content-type': 'application/json; 
charset=UTF-8'}>, content <{
  "error": {
    "code": 404,
    "message": "Not found: Job a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "errors": [
      {
    "message": "Not found: Job 
a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "domain": "global",
    "reason": "notFound"
  }
    ],
    "status": "NOT_FOUND"
  }
}
{code}
 

The problem can be seen/found here:

[https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571]

[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357]

The location of the job (in this case {{australia-southeast1}}) needs to 
set/inferred (or exposed via the API), otherwise its fails.

 For reference, Airflow had the same bug/problem: 
[https://github.com/apache/airflow/pull/4695]

 

 

  was:
When using the BigQuery source with a query in a pipeline, the "processing 
location" is not taken into consideration and the pipeline fails.

For example, consider the following which uses `BigQuerySource` to read from 
BigQuery using some SQL. The BigQuery dataset and tables are located in 
"australia-southeast1". The query is submitted successfully ([Beam works out 
the processing location by examining the first table referenced in the query 
and sets it 
accordingly|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L221]),
 but when Beam attempts to poll for the job status after it has been submitted, 
it fails because it doesn't set the `location` to be "australia-southeast1", 
which is required by BigQuery:

 
{code:java}
p | 'read' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True, 
query='SELECT * from 
`a_project_id.dataset_in_australia.table_in_australia`'){code}
 
{code:java}
HttpNotFoundError: HttpError accessing 
:
 response: <{'status': '404', 'content-length': '328', 'x-xss-protection': '1; 
mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 
'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', 
'-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Tue, 26 Mar 
2019 03:11:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'quic=":443"; 
ma=2592000; v="46,44,43,39"', 'content-type': 'application/json; 
charset=UTF-8'}>, content <{
  "error": {
    "code": 404,
    "message": "Not found: Job a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "errors": [
      {
    "message": "Not found: Job 
a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "domain": "global",
    "reason": "notFound"
  }
    ],
    "status": "NOT_FOUND"
  }
}
{code}
 

The problem can be seen/found here:

[https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571]

[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357]

The location of the job (in this case "australia-southeast1") needs to 
set/inferred (or exposed via the API), otherwise its fails.

 For reference, Airflow had the same bug/problem: 

[jira] [Updated] (BEAM-6910) Beam does not consider BigQuery's processing location when getting query results

2019-03-25 Thread Graham Polley (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Graham Polley updated BEAM-6910:

Description: 
When using the BigQuery source with a query in a pipeline, the "processing 
location" is not taken into consideration and the pipeline fails.

For example, consider the following which uses `BigQuerySource` to read from 
BigQuery using some SQL. The BigQuery dataset and tables are located in 
"australia-southeast1". The query is submitted successfully ([Beam works out 
the processing location by examining the first table referenced in the query 
and sets it 
accordingly|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L221]),
 but when Beam attempts to poll for the job status after it has been submitted, 
it fails because it doesn't set the `location` to be "australia-southeast1", 
which is required by BigQuery:

 
{code:java}
p | 'read' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True, 
query='SELECT * from 
`a_project_id.dataset_in_australia.table_in_australia`'){code}
 
{code:java}
HttpNotFoundError: HttpError accessing 
:
 response: <{'status': '404', 'content-length': '328', 'x-xss-protection': '1; 
mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 
'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', 
'-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Tue, 26 Mar 
2019 03:11:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'quic=":443"; 
ma=2592000; v="46,44,43,39"', 'content-type': 'application/json; 
charset=UTF-8'}>, content <{
  "error": {
    "code": 404,
    "message": "Not found: Job a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "errors": [
      {
    "message": "Not found: Job 
a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "domain": "global",
    "reason": "notFound"
  }
    ],
    "status": "NOT_FOUND"
  }
}
{code}
 

The problem can be seen/found here:

[https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571]

[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357]

The location of the job (in this case "australia-southeast1") needs to 
set/inferred (or exposed via the API), otherwise its fails.

 For reference, Airflow had the same bug/problem: 
[https://github.com/apache/airflow/pull/4695]

 

 

  was:
When using the BigQuery source with a query in a pipeline, the "processing 
location" is not taken into consideration and the pipeline fails.

For example, consider the following which uses `BigQuerySource` to read from 
BigQuery using some SQL. The BigQuery dataset and tables are located in 
"australia-southeast1". The query is submitted successfully ([Beam works out 
the processing location by examining the first table referenced in the query 
and sets it 
accordingly|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L221]),
 but when Beam attempts to poll for the job status after it has been submitted, 
it fails because it doesn't set the `location` to be "australia-southeast1", 
which is required by BigQuery:

 

 
{code:java}
p | 'read' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True, 
query='SELECT * from 
`a_project_id.dataset_in_australia.table_in_australia`'){code}
 

 

 
{code:java}
HttpNotFoundError: HttpError accessing 
:
 response: <{'status': '404', 'content-length': '328', 'x-xss-protection': '1; 
mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 
'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', 
'-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Tue, 26 Mar 
2019 03:11:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'quic=":443"; 
ma=2592000; v="46,44,43,39"', 'content-type': 'application/json; 
charset=UTF-8'}>, content <{
  "error": {
    "code": 404,
    "message": "Not found: Job a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "errors": [
      {
    "message": "Not found: Job 
a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "domain": "global",
    "reason": "notFound"
  }
    ],
    "status": "NOT_FOUND"
  }
}
{code}
 

The problem can be seen here:

[https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571]

[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357]

The location of the job (in this case "australia-southeast1") needs to 
set/inferred (or exposed via the API), otherwise its fails.

 

For reference, Airflow had the same bug/problem: 
https://github.com/apache/airflow/pull/4695

 


[jira] [Created] (BEAM-6910) Beam does not consider BigQuery's processing location when getting query results

2019-03-25 Thread Graham Polley (JIRA)
Graham Polley created BEAM-6910:
---

 Summary: Beam does not consider BigQuery's processing location 
when getting query results
 Key: BEAM-6910
 URL: https://issues.apache.org/jira/browse/BEAM-6910
 Project: Beam
  Issue Type: Bug
  Components: dependencies, runner-dataflow, sdk-py-core
Affects Versions: 2.11.0
 Environment: Python
Reporter: Graham Polley


When using the BigQuery source with a query in a pipeline, the "processing 
location" is not taken into consideration and the pipeline fails.

For example, consider the following which uses `BigQuerySource` to read from 
BigQuery using some SQL. The BigQuery dataset and tables are located in 
"australia-southeast1". The query is submitted successfully ([Beam works out 
the processing location by examining the first table referenced in the query 
and sets it 
accordingly|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L221]),
 but when Beam attempts to poll for the job status after it has been submitted, 
it fails because it doesn't set the `location` to be "australia-southeast1", 
which is required by BigQuery:

 

 
{code:java}
p | 'read' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True, 
query='SELECT * from 
`a_project_id.dataset_in_australia.table_in_australia`'){code}
 

 

 
{code:java}
HttpNotFoundError: HttpError accessing 
:
 response: <{'status': '404', 'content-length': '328', 'x-xss-protection': '1; 
mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 
'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', 
'-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Tue, 26 Mar 
2019 03:11:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'quic=":443"; 
ma=2592000; v="46,44,43,39"', 'content-type': 'application/json; 
charset=UTF-8'}>, content <{
  "error": {
    "code": 404,
    "message": "Not found: Job a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "errors": [
      {
    "message": "Not found: Job 
a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "domain": "global",
    "reason": "notFound"
  }
    ],
    "status": "NOT_FOUND"
  }
}
{code}
 

The problem can be seen here:

[https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571]

[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357]

The location of the job (in this case "australia-southeast1") needs to 
set/inferred (or exposed via the API), otherwise its fails.

 

For reference, Airflow had the same bug/problem: 
https://github.com/apache/airflow/pull/4695

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-6474) Cannot reference field when using SqlTransform (need to use "EXPR$N" instead)

2019-01-21 Thread Graham Polley (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Graham Polley closed BEAM-6474.
---
   Resolution: Not A Problem
Fix Version/s: 2.9.0

Closing. Easy to fix with an alias in the SQL string.

> Cannot reference field when using SqlTransform (need to use "EXPR$N" instead)
> -
>
> Key: BEAM-6474
> URL: https://issues.apache.org/jira/browse/BEAM-6474
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.9.0
> Environment: MacOS
>Reporter: Graham Polley
>Assignee: Graham Polley
>Priority: Minor
> Fix For: 2.9.0
>
>
> Maybe I've done something wrong, but when you try to access a field this has 
> been generated in a SqlTransform it throws an exception:
>  
> {code:java}
> java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: 
> java.lang.IllegalArgumentException: Cannot find field views in schema Fields: 
> Field{name=wikimedia_project, description=, type=FieldType{typeName=STRING, 
> collectionElementType=null, collectionElementTypeNullable=null, 
> mapKeyType=null, mapValueType=null, mapValueTypeNullable=null, 
> rowSchema=null, metadata=null}, nullable=false} Field{name=EXPR$1, 
> description=, type=FieldType{typeName=INT32, collectionElementType=null, 
> collectionElementTypeNullable=null, mapKeyType=null, mapValueType=null, 
> mapValueTypeNullable=null, rowSchema=null, metadata=null}, 
> nullable=false}{code}
> Instead of being able to access the `views` field, it has been named `EXPR$1` 
> by Beam/Dataflow. So, to get the value of the field I need to do this:
> {code:java}
> bqRow.set("views", row.getInt32("EXPR$1"));{code}
> instead of:
> {code:java}
> bqRow.set("views", row.getInt32("views"));{code}
>  
> {code:java}
> PCollection outputStream =
>  sqlRows.setRowSchema(SCHEMA)
>  .apply("sql_transform",
>  SqlTransform.query(
>  "select wikimedia_project, sum(views) " +
>  "from PCOLLECTION " +
>  "group by wikimedia_project"));{code}
>  
> Pipeline is reading a file from GCS, transforming it (using SqlTransform) and 
> writing to BigQuery. Code can be found here:
> [https://github.com/polleyg/gcp-batch-ingestion-bigquery/blob/beam_sql/src/main/java/org/polleyg/TemplatePipeline.java]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6474) Cannot reference field when using SqlTransform (need to use "EXPR$N" instead)

2019-01-21 Thread Graham Polley (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748321#comment-16748321
 ] 

Graham Polley commented on BEAM-6474:
-

For posterity, this fixes it:
{code:java}
SqlTransform.query("SELECT lang, SUM(views) as sum_views FROM PCOLLECTION GROUP 
BY lang"){code}
{code:java}
[..]c.element().getInt32("sum_views"){code}

> Cannot reference field when using SqlTransform (need to use "EXPR$N" instead)
> -
>
> Key: BEAM-6474
> URL: https://issues.apache.org/jira/browse/BEAM-6474
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.9.0
> Environment: MacOS
>Reporter: Graham Polley
>Assignee: Graham Polley
>Priority: Minor
>
> Maybe I've done something wrong, but when you try to access a field this has 
> been generated in a SqlTransform it throws an exception:
>  
> {code:java}
> java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: 
> java.lang.IllegalArgumentException: Cannot find field views in schema Fields: 
> Field{name=wikimedia_project, description=, type=FieldType{typeName=STRING, 
> collectionElementType=null, collectionElementTypeNullable=null, 
> mapKeyType=null, mapValueType=null, mapValueTypeNullable=null, 
> rowSchema=null, metadata=null}, nullable=false} Field{name=EXPR$1, 
> description=, type=FieldType{typeName=INT32, collectionElementType=null, 
> collectionElementTypeNullable=null, mapKeyType=null, mapValueType=null, 
> mapValueTypeNullable=null, rowSchema=null, metadata=null}, 
> nullable=false}{code}
> Instead of being able to access the `views` field, it has been named `EXPR$1` 
> by Beam/Dataflow. So, to get the value of the field I need to do this:
> {code:java}
> bqRow.set("views", row.getInt32("EXPR$1"));{code}
> instead of:
> {code:java}
> bqRow.set("views", row.getInt32("views"));{code}
>  
> {code:java}
> PCollection outputStream =
>  sqlRows.setRowSchema(SCHEMA)
>  .apply("sql_transform",
>  SqlTransform.query(
>  "select wikimedia_project, sum(views) " +
>  "from PCOLLECTION " +
>  "group by wikimedia_project"));{code}
>  
> Pipeline is reading a file from GCS, transforming it (using SqlTransform) and 
> writing to BigQuery. Code can be found here:
> [https://github.com/polleyg/gcp-batch-ingestion-bigquery/blob/beam_sql/src/main/java/org/polleyg/TemplatePipeline.java]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6474) Cannot reference field when using SqlTransform (need to use "EXPR$N" instead)

2019-01-21 Thread Graham Polley (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748320#comment-16748320
 ] 

Graham Polley commented on BEAM-6474:
-

Doh! Of course. I didn't think to alias the commuted val/column.

Works now. Thanks Ken.

G

> Cannot reference field when using SqlTransform (need to use "EXPR$N" instead)
> -
>
> Key: BEAM-6474
> URL: https://issues.apache.org/jira/browse/BEAM-6474
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.9.0
> Environment: MacOS
>Reporter: Graham Polley
>Assignee: Graham Polley
>Priority: Minor
>
> Maybe I've done something wrong, but when you try to access a field this has 
> been generated in a SqlTransform it throws an exception:
>  
> {code:java}
> java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: 
> java.lang.IllegalArgumentException: Cannot find field views in schema Fields: 
> Field{name=wikimedia_project, description=, type=FieldType{typeName=STRING, 
> collectionElementType=null, collectionElementTypeNullable=null, 
> mapKeyType=null, mapValueType=null, mapValueTypeNullable=null, 
> rowSchema=null, metadata=null}, nullable=false} Field{name=EXPR$1, 
> description=, type=FieldType{typeName=INT32, collectionElementType=null, 
> collectionElementTypeNullable=null, mapKeyType=null, mapValueType=null, 
> mapValueTypeNullable=null, rowSchema=null, metadata=null}, 
> nullable=false}{code}
> Instead of being able to access the `views` field, it has been named `EXPR$1` 
> by Beam/Dataflow. So, to get the value of the field I need to do this:
> {code:java}
> bqRow.set("views", row.getInt32("EXPR$1"));{code}
> instead of:
> {code:java}
> bqRow.set("views", row.getInt32("views"));{code}
>  
> {code:java}
> PCollection outputStream =
>  sqlRows.setRowSchema(SCHEMA)
>  .apply("sql_transform",
>  SqlTransform.query(
>  "select wikimedia_project, sum(views) " +
>  "from PCOLLECTION " +
>  "group by wikimedia_project"));{code}
>  
> Pipeline is reading a file from GCS, transforming it (using SqlTransform) and 
> writing to BigQuery. Code can be found here:
> [https://github.com/polleyg/gcp-batch-ingestion-bigquery/blob/beam_sql/src/main/java/org/polleyg/TemplatePipeline.java]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6474) Cannot reference field when using SqlTransform (need to use "EXPR$N" instead)

2019-01-20 Thread Graham Polley (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Graham Polley updated BEAM-6474:

Description: 
Maybe I've done something wrong, but when you try to access a field this has 
been generated in a SqlTransform it throws an exception:

 
{code:java}
java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: 
java.lang.IllegalArgumentException: Cannot find field views in schema Fields: 
Field{name=wikimedia_project, description=, type=FieldType{typeName=STRING, 
collectionElementType=null, collectionElementTypeNullable=null, 
mapKeyType=null, mapValueType=null, mapValueTypeNullable=null, rowSchema=null, 
metadata=null}, nullable=false} Field{name=EXPR$1, description=, 
type=FieldType{typeName=INT32, collectionElementType=null, 
collectionElementTypeNullable=null, mapKeyType=null, mapValueType=null, 
mapValueTypeNullable=null, rowSchema=null, metadata=null}, nullable=false}{code}
Instead of being able to access the `views` field, it has been named `EXPR$1` 
by Beam/Dataflow. So, to get the value of the field I need to do this:
{code:java}
bqRow.set("views", row.getInt32("EXPR$1"));{code}
instead of:
{code:java}
bqRow.set("views", row.getInt32("views"));{code}
 
{code:java}
PCollection outputStream =
 sqlRows.setRowSchema(SCHEMA)
 .apply("sql_transform",
 SqlTransform.query(
 "select wikimedia_project, sum(views) " +
 "from PCOLLECTION " +
 "group by wikimedia_project"));{code}
 

Pipeline is reading a file from GCS, transforming it (using SqlTransform) and 
writing to BigQuery. Code can be found here:

[https://github.com/polleyg/gcp-batch-ingestion-bigquery/blob/beam_sql/src/main/java/org/polleyg/TemplatePipeline.java]

 

  was:
Maybe I've done something wrong, but when you try to access a field this has 
been generated in a SqlTransform it throws an exception:

 
{code:java}
java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: 
java.lang.IllegalArgumentException: Cannot find field views in schema Fields: 
Field{name=wikimedia_project, description=, type=FieldType{typeName=STRING, 
collectionElementType=null, collectionElementTypeNullable=null, 
mapKeyType=null, mapValueType=null, mapValueTypeNullable=null, rowSchema=null, 
metadata=null}, nullable=false} Field{name=EXPR$1, description=, 
type=FieldType{typeName=INT32, collectionElementType=null, 
collectionElementTypeNullable=null, mapKeyType=null, mapValueType=null, 
mapValueTypeNullable=null, rowSchema=null, metadata=null}, nullable=false}{code}
Instead of being able to access the `views` field, it has been named `EXPR$1` 
by Beam/Dataflow. So, to get the value of the field I need to do this:
{code:java}
bqRow.set("views", row.getInt32("EXPR$1"));{code}
instead of:
{code:java}
bqRow.set("views", row.getInt32("views"));{code}
Pipeline is reading a file from GCS, transforming it (using SqlTransform) and 
writing to BigQuery. Code can be found here:

https://github.com/polleyg/gcp-batch-ingestion-bigquery/blob/beam_sql/src/main/java/org/polleyg/TemplatePipeline.java

 


> Cannot reference field when using SqlTransform (need to use "EXPR$N" instead)
> -
>
> Key: BEAM-6474
> URL: https://issues.apache.org/jira/browse/BEAM-6474
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model, dsl-sql, runner-dataflow
>Affects Versions: 2.9.0
> Environment: MacOS
>Reporter: Graham Polley
>Assignee: Kenneth Knowles
>Priority: Major
>
> Maybe I've done something wrong, but when you try to access a field this has 
> been generated in a SqlTransform it throws an exception:
>  
> {code:java}
> java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: 
> java.lang.IllegalArgumentException: Cannot find field views in schema Fields: 
> Field{name=wikimedia_project, description=, type=FieldType{typeName=STRING, 
> collectionElementType=null, collectionElementTypeNullable=null, 
> mapKeyType=null, mapValueType=null, mapValueTypeNullable=null, 
> rowSchema=null, metadata=null}, nullable=false} Field{name=EXPR$1, 
> description=, type=FieldType{typeName=INT32, collectionElementType=null, 
> collectionElementTypeNullable=null, mapKeyType=null, mapValueType=null, 
> mapValueTypeNullable=null, rowSchema=null, metadata=null}, 
> nullable=false}{code}
> Instead of being able to access the `views` field, it has been named `EXPR$1` 
> by Beam/Dataflow. So, to get the value of the field I need to do this:
> {code:java}
> bqRow.set("views", row.getInt32("EXPR$1"));{code}
> instead of:
> {code:java}
> bqRow.set("views", row.getInt32("views"));{code}
>  
> {code:java}
> PCollection outputStream =
>  sqlRows.setRowSchema(SCHEMA)
>  .apply("sql_transform",
>  SqlTransform.query(
>  "select wikimedia_project, sum(views) " +
>  "from PCOLLECTION " +
>  

[jira] [Created] (BEAM-6474) Cannot reference field when using SqlTransform (need to use "EXPR$N" instead)

2019-01-20 Thread Graham Polley (JIRA)
Graham Polley created BEAM-6474:
---

 Summary: Cannot reference field when using SqlTransform (need to 
use "EXPR$N" instead)
 Key: BEAM-6474
 URL: https://issues.apache.org/jira/browse/BEAM-6474
 Project: Beam
  Issue Type: Bug
  Components: beam-model, dsl-sql, runner-dataflow
Affects Versions: 2.9.0
 Environment: MacOS
Reporter: Graham Polley
Assignee: Kenneth Knowles


Maybe I've done something wrong, but when you try to access a field this has 
been generated in a SqlTransform it throws an exception:

 
{code:java}
java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: 
java.lang.IllegalArgumentException: Cannot find field views in schema Fields: 
Field{name=wikimedia_project, description=, type=FieldType{typeName=STRING, 
collectionElementType=null, collectionElementTypeNullable=null, 
mapKeyType=null, mapValueType=null, mapValueTypeNullable=null, rowSchema=null, 
metadata=null}, nullable=false} Field{name=EXPR$1, description=, 
type=FieldType{typeName=INT32, collectionElementType=null, 
collectionElementTypeNullable=null, mapKeyType=null, mapValueType=null, 
mapValueTypeNullable=null, rowSchema=null, metadata=null}, nullable=false}{code}
Instead of being able to access the `views` field, it has been named `EXPR$1` 
by Beam/Dataflow. So, to get the value of the field I need to do this:
{code:java}
bqRow.set("views", row.getInt32("EXPR$1"));{code}
instead of:
{code:java}
bqRow.set("views", row.getInt32("views"));{code}
Pipeline is reading a file from GCS, transforming it (using SqlTransform) and 
writing to BigQuery. Code can be found here:

https://github.com/polleyg/gcp-batch-ingestion-bigquery/blob/beam_sql/src/main/java/org/polleyg/TemplatePipeline.java

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)