[jira] [Created] (BEAM-3754) Can't have commitOffsetsInFinalizeEnabled set to false with KafkaIO.readBytes()

2018-02-27 Thread Benjamin BENOIST (JIRA)
Benjamin BENOIST created BEAM-3754:
--

 Summary: Can't have commitOffsetsInFinalizeEnabled set to false 
with KafkaIO.readBytes()
 Key: BEAM-3754
 URL: https://issues.apache.org/jira/browse/BEAM-3754
 Project: Beam
  Issue Type: Bug
  Components: io-java-kafka
Affects Versions: 2.3.0
 Environment: Dataflow pipeline using Kafka as a Sink
Reporter: Benjamin BENOIST
Assignee: Raghu Angadi


Beam v2.3 introduces finalized offsets, in order to reduce the gaps or 
duplicate processing of records while restarting a pipeline.

_read()_ sets this parameter to false [by 
default|https://github.com/apache/beam/blob/release-2.3.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L307]
 but _readBytes()_ 
[doesn't|https://github.com/apache/beam/blob/release-2.3.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L282],
 thus creating an exception:
{noformat}
Exception in thread "main" java.lang.IllegalStateException: Missing required 
properties: commitOffsetsInFinalizeEnabled
     at 
org.apache.beam.sdk.io.kafka.AutoValue_KafkaIO_Read$Builder.build(AutoValue_KafkaIO_Read.java:344)
     at 
org.apache.beam.sdk.io.kafka.KafkaIO.readBytes(KafkaIO.java:291){noformat}
The parameter can be set to true with _commitOffsetsInFinalize()_ but never to 
false.

Using _read()_ in the definition of _readBytes()_ could prevent this kind of 
error in the future:
{code:java}
public static Read readBytes() {
  return read()
.setKeyDeserializer(ByteArrayDeserializer.class)
.setValueDeserializer(ByteArrayDeserializer.class)
.build();
}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3754) KAFKA - Can't set commitOffsetsInFinalizeEnabled to false with KafkaIO.readBytes()

2018-02-27 Thread Benjamin BENOIST (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin BENOIST updated BEAM-3754:
---
Summary: KAFKA - Can't set commitOffsetsInFinalizeEnabled to false with 
KafkaIO.readBytes()  (was: KAFKA - Can't have commitOffsetsInFinalizeEnabled 
set to false with KafkaIO.readBytes())

> KAFKA - Can't set commitOffsetsInFinalizeEnabled to false with 
> KafkaIO.readBytes()
> --
>
> Key: BEAM-3754
> URL: https://issues.apache.org/jira/browse/BEAM-3754
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kafka
>Affects Versions: 2.3.0
> Environment: Dataflow pipeline using Kafka as a Sink
>Reporter: Benjamin BENOIST
>Assignee: Raghu Angadi
>Priority: Minor
>  Labels: patch
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Beam v2.3 introduces finalized offsets, in order to reduce the gaps or 
> duplicate processing of records while restarting a pipeline.
> _read()_ sets this parameter to false [by 
> default|https://github.com/apache/beam/blob/release-2.3.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L307]
>  but _readBytes()_ 
> [doesn't|https://github.com/apache/beam/blob/release-2.3.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L282],
>  thus creating an exception:
> {noformat}
> Exception in thread "main" java.lang.IllegalStateException: Missing required 
> properties: commitOffsetsInFinalizeEnabled
>      at 
> org.apache.beam.sdk.io.kafka.AutoValue_KafkaIO_Read$Builder.build(AutoValue_KafkaIO_Read.java:344)
>      at 
> org.apache.beam.sdk.io.kafka.KafkaIO.readBytes(KafkaIO.java:291){noformat}
> The parameter can be set to true with _commitOffsetsInFinalize()_ but never 
> to false.
> Using _read()_ in the definition of _readBytes()_ could prevent this kind of 
> error in the future:
> {code:java}
> public static Read readBytes() {
>   return read()
> .setKeyDeserializer(ByteArrayDeserializer.class)
> .setValueDeserializer(ByteArrayDeserializer.class)
> .build();
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3754) KAFKA - Can't have commitOffsetsInFinalizeEnabled set to false with KafkaIO.readBytes()

2018-02-27 Thread Benjamin BENOIST (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin BENOIST updated BEAM-3754:
---
Summary: KAFKA - Can't have commitOffsetsInFinalizeEnabled set to false 
with KafkaIO.readBytes()  (was: Can't have commitOffsetsInFinalizeEnabled set 
to false with KafkaIO.readBytes())

> KAFKA - Can't have commitOffsetsInFinalizeEnabled set to false with 
> KafkaIO.readBytes()
> ---
>
> Key: BEAM-3754
> URL: https://issues.apache.org/jira/browse/BEAM-3754
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kafka
>Affects Versions: 2.3.0
> Environment: Dataflow pipeline using Kafka as a Sink
>Reporter: Benjamin BENOIST
>Assignee: Raghu Angadi
>Priority: Minor
>  Labels: patch
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Beam v2.3 introduces finalized offsets, in order to reduce the gaps or 
> duplicate processing of records while restarting a pipeline.
> _read()_ sets this parameter to false [by 
> default|https://github.com/apache/beam/blob/release-2.3.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L307]
>  but _readBytes()_ 
> [doesn't|https://github.com/apache/beam/blob/release-2.3.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L282],
>  thus creating an exception:
> {noformat}
> Exception in thread "main" java.lang.IllegalStateException: Missing required 
> properties: commitOffsetsInFinalizeEnabled
>      at 
> org.apache.beam.sdk.io.kafka.AutoValue_KafkaIO_Read$Builder.build(AutoValue_KafkaIO_Read.java:344)
>      at 
> org.apache.beam.sdk.io.kafka.KafkaIO.readBytes(KafkaIO.java:291){noformat}
> The parameter can be set to true with _commitOffsetsInFinalize()_ but never 
> to false.
> Using _read()_ in the definition of _readBytes()_ could prevent this kind of 
> error in the future:
> {code:java}
> public static Read readBytes() {
>   return read()
> .setKeyDeserializer(ByteArrayDeserializer.class)
> .setValueDeserializer(ByteArrayDeserializer.class)
> .build();
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-3766) BigQueryIO - Must specify numFileShards when using FILE_LOADS with unbounded PCollection

2018-03-01 Thread Benjamin BENOIST (JIRA)
Benjamin BENOIST created BEAM-3766:
--

 Summary: BigQueryIO - Must specify numFileShards when using 
FILE_LOADS with unbounded PCollection
 Key: BEAM-3766
 URL: https://issues.apache.org/jira/browse/BEAM-3766
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp
Affects Versions: 2.3.0
Reporter: Benjamin BENOIST
Assignee: Chamikara Jayalath


Since Beam v2.2 it's possible to use 
[FILE_LOADS|https://beam.apache.org/documentation/sdks/javadoc/2.2.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html#FILE_LOADS].
 The documentation states that we have to specify _withTriggeringFrequency_ 
when using it, but doesn't talk about _withNumFileShards_, whereas if we don't 
specify it we get the below exception:

{code:java}
Exception in thread "main" java.lang.IllegalArgumentException
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
at 
org.apache.beam.sdk.io.gcp.bigquery.BatchLoads.expandTriggered(BatchLoads.java:209)
at 
org.apache.beam.sdk.io.gcp.bigquery.BatchLoads.expand(BatchLoads.java:546)
at 
org.apache.beam.sdk.io.gcp.bigquery.BatchLoads.expand(BatchLoads.java:79)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:472)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:286)
at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.expandTyped(BigQueryIO.java:1550)
at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.expand(BigQueryIO.java:1497)
at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.expand(BigQueryIO.java:980)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:491)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:299)
at com.travelaudience.data.job.rtbtobigquery.Main$.main(Main.scala:74)
at com.travelaudience.data.job.rtbtobigquery.Main.main(Main.scala)
{code}

Either default _numFileShards_ should be used or it should be precised in the 
documentation that this has to be set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

2018-03-02 Thread Benjamin BENOIST (JIRA)
Benjamin BENOIST created BEAM-3772:
--

 Summary: BigQueryIO - Can't use DynamicDestination with 
CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
 Key: BEAM-3772
 URL: https://issues.apache.org/jira/browse/BEAM-3772
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp
Affects Versions: 2.3.0, 2.2.0
 Environment: Dataflow streaming pipeline
Reporter: Benjamin BENOIST
Assignee: Chamikara Jayalath


My workflow : KAFKA -> Dataflow streaming -> BigQuery

Given that having low-latency isn't important in my case, I use FILE_LOADS to 
reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
which is a table with the current hour as a suffix.

This _BigQueryIO.Write_ is configured like this :
{code:java}
.withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
.withMethod(Method.FILE_LOADS)
.withTriggeringFrequency(triggeringFrequency)
.withNumFileShards(100)
{code}

The first table is successfully created and is written to. But then the 
following tables are never created and I get these exceptions:

{code:java}
(99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job with 
id prefix 
5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023, 
reached max retries: 3, last failed load job: {
  "configuration" : {
"load" : {
  "createDisposition" : "CREATE_NEVER",
  "destinationTable" : {
"datasetId" : "dev_mydataset",
"projectId" : "myproject-id",
"tableId" : "mytable_20180302_16"
  },
{code}

The _CreateDisposition_ used is _CREATE_NEVER_, contrary as _CREATE_IF_NEEDED_ 
as specified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-3224) Add support for path with braces for Google Cloud Storage

2017-11-20 Thread Benjamin BENOIST (JIRA)
Benjamin BENOIST created BEAM-3224:
--

 Summary: Add support for path with braces for Google Cloud Storage
 Key: BEAM-3224
 URL: https://issues.apache.org/jira/browse/BEAM-3224
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-gcp
Reporter: Benjamin BENOIST
Assignee: Chamikara Jayalath
Priority: Minor


At the moment we can not use braces in Google Cloud Storage paths, as explained 
[here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].

The path is backed by a file pattern defined as a Java glob and is then then 
expanded to a regex in 
_sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
 in the _wildcardToRegexp_ function.

{{gs://bucket/{file1,file2,file3}}} should match {{gs://bucket/file1}}, 
{{gs://bucket/file2}} and {{gs://bucket/file3}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-3224) Add support for path with braces for Google Cloud Storage

2017-11-20 Thread Benjamin BENOIST (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin BENOIST updated BEAM-3224:
---
Description: 
At the moment we can not use braces in Google Cloud Storage paths, as explained 
[here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].

The path is backed by a file pattern defined as a Java glob and is then then 
expanded to a regex in 
_sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
 in the _wildcardToRegexp_ function.

{{gs://bucket/{file1,file2,file3} }} should match {{gs://bucket/file1}}, 
{{gs://bucket/file2}} and {{gs://bucket/file3}}

  was:
At the moment we can not use braces in Google Cloud Storage paths, as explained 
[here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].

The path is backed by a file pattern defined as a Java glob and is then then 
expanded to a regex in 
_sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
 in the _wildcardToRegexp_ function.

{{gs://bucket/{file1,file2,file3}}} should match {{gs://bucket/file1}}, 
{{gs://bucket/file2}} and {{gs://bucket/file3}}


> Add support for path with braces for Google Cloud Storage
> -
>
> Key: BEAM-3224
> URL: https://issues.apache.org/jira/browse/BEAM-3224
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Benjamin BENOIST
>Assignee: Chamikara Jayalath
>Priority: Minor
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> At the moment we can not use braces in Google Cloud Storage paths, as 
> explained 
> [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].
> The path is backed by a file pattern defined as a Java glob and is then then 
> expanded to a regex in 
> _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
>  in the _wildcardToRegexp_ function.
> {{gs://bucket/{file1,file2,file3} }} should match {{gs://bucket/file1}}, 
> {{gs://bucket/file2}} and {{gs://bucket/file3}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-3224) Add support for path with braces for Google Cloud Storage

2017-11-20 Thread Benjamin BENOIST (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin BENOIST updated BEAM-3224:
---
Description: 
At the moment we can not use braces in Google Cloud Storage paths, as explained 
[here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].

The path is backed by a file pattern defined as a Java glob and is then then 
expanded to a regex in 
_sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
 in the _wildcardToRegexp_ function.

{{gs://bucket/{file1,file2,file3}. }} should match {{gs://bucket/file1}}, 
{{gs://bucket/file2}} and {{gs://bucket/file3}}

  was:
At the moment we can not use braces in Google Cloud Storage paths, as explained 
[here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].

The path is backed by a file pattern defined as a Java glob and is then then 
expanded to a regex in 
_sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
 in the _wildcardToRegexp_ function.

{{gs://bucket/{file1,file2,file3} }} should match {{gs://bucket/file1}}, 
{{gs://bucket/file2}} and {{gs://bucket/file3}}


> Add support for path with braces for Google Cloud Storage
> -
>
> Key: BEAM-3224
> URL: https://issues.apache.org/jira/browse/BEAM-3224
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Benjamin BENOIST
>Assignee: Chamikara Jayalath
>Priority: Minor
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> At the moment we can not use braces in Google Cloud Storage paths, as 
> explained 
> [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].
> The path is backed by a file pattern defined as a Java glob and is then then 
> expanded to a regex in 
> _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
>  in the _wildcardToRegexp_ function.
> {{gs://bucket/{file1,file2,file3}. }} should match {{gs://bucket/file1}}, 
> {{gs://bucket/file2}} and {{gs://bucket/file3}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-3224) Add support for path with braces for Google Cloud Storage

2017-11-20 Thread Benjamin BENOIST (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin BENOIST updated BEAM-3224:
---
Description: 
At the moment we can not use braces in Google Cloud Storage paths, as explained 
[here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].

The path is backed by a file pattern defined as a Java glob and is then then 
expanded to a regex in 
_sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
 in the _wildcardToRegexp_ function.

{{gs://bucket/\{file1,file2,file3\}}} should match {{gs://bucket/file1}}, 
{{gs://bucket/file2}} and {{gs://bucket/file3}}

  was:
At the moment we can not use braces in Google Cloud Storage paths, as explained 
[here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].

The path is backed by a file pattern defined as a Java glob and is then then 
expanded to a regex in 
_sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
 in the _wildcardToRegexp_ function.

{{gs://bucket/{file1,file2,file3}. }} should match {{gs://bucket/file1}}, 
{{gs://bucket/file2}} and {{gs://bucket/file3}}


> Add support for path with braces for Google Cloud Storage
> -
>
> Key: BEAM-3224
> URL: https://issues.apache.org/jira/browse/BEAM-3224
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Benjamin BENOIST
>Assignee: Chamikara Jayalath
>Priority: Minor
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> At the moment we can not use braces in Google Cloud Storage paths, as 
> explained 
> [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].
> The path is backed by a file pattern defined as a Java glob and is then then 
> expanded to a regex in 
> _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
>  in the _wildcardToRegexp_ function.
> {{gs://bucket/\{file1,file2,file3\}}} should match {{gs://bucket/file1}}, 
> {{gs://bucket/file2}} and {{gs://bucket/file3}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-3224) Add support for path with braces for Google Cloud Storage

2017-11-20 Thread Benjamin BENOIST (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin BENOIST updated BEAM-3224:
---
Description: 
At the moment we can not use braces in Google Cloud Storage paths, as explained 
[here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].

The path is backed by a file pattern defined as a Java glob and is then then 
expanded to a regex in 
_sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
 in the _wildcardToRegexp_ function.

{{gs://bucket/\{file1,file2,file3\\}}} should match {{gs://bucket/file1}}, 
{{gs://bucket/file2}} and {{gs://bucket/file3}}

  was:
At the moment we can not use braces in Google Cloud Storage paths, as explained 
[here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].

The path is backed by a file pattern defined as a Java glob and is then then 
expanded to a regex in 
_sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
 in the _wildcardToRegexp_ function.

{{gs://bucket/\{file1,file2,file3\} }} should match {{gs://bucket/file1}}, 
{{gs://bucket/file2}} and {{gs://bucket/file3}}


> Add support for path with braces for Google Cloud Storage
> -
>
> Key: BEAM-3224
> URL: https://issues.apache.org/jira/browse/BEAM-3224
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Benjamin BENOIST
>Assignee: Chamikara Jayalath
>Priority: Minor
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> At the moment we can not use braces in Google Cloud Storage paths, as 
> explained 
> [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].
> The path is backed by a file pattern defined as a Java glob and is then then 
> expanded to a regex in 
> _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
>  in the _wildcardToRegexp_ function.
> {{gs://bucket/\{file1,file2,file3\\}}} should match {{gs://bucket/file1}}, 
> {{gs://bucket/file2}} and {{gs://bucket/file3}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-3224) Add support for path with braces for Google Cloud Storage

2017-11-20 Thread Benjamin BENOIST (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin BENOIST updated BEAM-3224:
---
Description: 
At the moment we can not use braces in Google Cloud Storage paths, as explained 
[here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].

The path is backed by a file pattern defined as a Java glob and is then then 
expanded to a regex in 
_sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
 in the _wildcardToRegexp_ function.

{{gs://bucket/\{file1,file2,file3\} }} should match {{gs://bucket/file1}}, 
{{gs://bucket/file2}} and {{gs://bucket/file3}}

  was:
At the moment we can not use braces in Google Cloud Storage paths, as explained 
[here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].

The path is backed by a file pattern defined as a Java glob and is then then 
expanded to a regex in 
_sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
 in the _wildcardToRegexp_ function.

{{gs://bucket/\{file1,file2,file3\}}} should match {{gs://bucket/file1}}, 
{{gs://bucket/file2}} and {{gs://bucket/file3}}


> Add support for path with braces for Google Cloud Storage
> -
>
> Key: BEAM-3224
> URL: https://issues.apache.org/jira/browse/BEAM-3224
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Benjamin BENOIST
>Assignee: Chamikara Jayalath
>Priority: Minor
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> At the moment we can not use braces in Google Cloud Storage paths, as 
> explained 
> [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].
> The path is backed by a file pattern defined as a Java glob and is then then 
> expanded to a regex in 
> _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
>  in the _wildcardToRegexp_ function.
> {{gs://bucket/\{file1,file2,file3\} }} should match {{gs://bucket/file1}}, 
> {{gs://bucket/file2}} and {{gs://bucket/file3}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-3224) Add support for path with braces for Google Cloud Storage

2017-11-20 Thread Benjamin BENOIST (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin BENOIST updated BEAM-3224:
---
Description: 
At the moment we can not use braces in Google Cloud Storage paths, as explained 
[here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].

The path is backed by a file pattern defined as a Java glob and is then then 
expanded to a regex in 
_sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
 in the _wildcardToRegexp_ function.

{{gs://bucket/{file1,file2,file3}}} should match 
{{gs://bucket/file1}}, {{gs://bucket/file2}} and {{gs://bucket/file3}}

  was:
At the moment we can not use braces in Google Cloud Storage paths, as explained 
[here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].

The path is backed by a file pattern defined as a Java glob and is then then 
expanded to a regex in 
_sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
 in the _wildcardToRegexp_ function.

{{gs://bucket/\{file1,file2,file3\\}}} should match {{gs://bucket/file1}}, 
{{gs://bucket/file2}} and {{gs://bucket/file3}}


> Add support for path with braces for Google Cloud Storage
> -
>
> Key: BEAM-3224
> URL: https://issues.apache.org/jira/browse/BEAM-3224
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Benjamin BENOIST
>Assignee: Chamikara Jayalath
>Priority: Minor
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> At the moment we can not use braces in Google Cloud Storage paths, as 
> explained 
> [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific].
> The path is backed by a file pattern defined as a Java glob and is then then 
> expanded to a regex in 
> _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_
>  in the _wildcardToRegexp_ function.
> {{gs://bucket/{file1,file2,file3}}} should match 
> {{gs://bucket/file1}}, {{gs://bucket/file2}} and {{gs://bucket/file3}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

2018-03-19 Thread Benjamin BENOIST (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405139#comment-16405139
 ] 

Benjamin BENOIST commented on BEAM-3772:


I had the issue with both Beam 2.2.0 and 2.3.0

> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded 
> PCollection and FILE_LOADS
> 
>
> Key: BEAM-3772
> URL: https://issues.apache.org/jira/browse/BEAM-3772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.2.0, 2.3.0
> Environment: Dataflow streaming pipeline
>Reporter: Benjamin BENOIST
>Assignee: Eugene Kirpichov
>Priority: Major
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to 
> reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
> which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the 
> following tables are never created and I get these exceptions:
> {code:java}
> (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job 
> with id prefix 
> 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023,
>  reached max retries: 3, last failed load job: {
>   "configuration" : {
> "load" : {
>   "createDisposition" : "CREATE_NEVER",
>   "destinationTable" : {
> "datasetId" : "dev_mydataset",
> "projectId" : "myproject-id",
> "tableId" : "mytable_20180302_16"
>   },
> {code}
> The _CreateDisposition_ used is _CREATE_NEVER_, contrary as 
> _CREATE_IF_NEEDED_ as specified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)