[jira] [Work logged] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?focusedWorklogId=322092=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322092
 ]

ASF GitHub Bot logged work on BEAM-8303:


Author: ASF GitHub Bot
Created on: 02/Oct/19 17:47
Start Date: 02/Oct/19 17:47
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9711: 
[release-2.16.0][BEAM-8303] Ensure FileSystems registration code runs in non 
UDF Flink operators
URL: https://github.com/apache/beam/pull/9711
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322092)
Time Spent: 2h 50m  (was: 2h 40m)

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = write.withNoSpilling();{code}
> This actually seemed to fix the issue, only to have it reemerge as I scaled 
> up the data set size.  The stack trace, while very similar, reads:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found 

[jira] [Work logged] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?focusedWorklogId=322091=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322091
 ]

ASF GitHub Bot logged work on BEAM-8303:


Author: ASF GitHub Bot
Created on: 02/Oct/19 17:46
Start Date: 02/Oct/19 17:46
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9711: 
[release-2.16.0][BEAM-8303] Ensure FileSystems registration code runs in non 
UDF Flink operators
URL: https://github.com/apache/beam/pull/9711#issuecomment-537604995
 
 
   CC @tweise 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322091)
Time Spent: 2h 40m  (was: 2.5h)

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = write.withNoSpilling();{code}
> This actually seemed to fix the issue, only to have it reemerge as I scaled 
> up the data set size.  The stack trace, while very similar, reads:
> {code:java}
>  

[jira] [Work logged] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?focusedWorklogId=322090=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322090
 ]

ASF GitHub Bot logged work on BEAM-8303:


Author: ASF GitHub Bot
Created on: 02/Oct/19 17:45
Start Date: 02/Oct/19 17:45
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9711: 
[release-2.16.0][BEAM-8303] Ensure FileSystems registration code runs in non 
UDF Flink operators
URL: https://github.com/apache/beam/pull/9711#issuecomment-537604587
 
 
   Unrelated test failures: 
https://builds.apache.org/job/beam_PreCommit_Java_Commit/7944/testReport/
   ```
   
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful
 
   org.apache.beam.sdk.io.TextIOWriteTest.testWriteViaSink
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322090)
Time Spent: 2.5h  (was: 2h 20m)

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the 

[jira] [Work logged] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?focusedWorklogId=321899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-321899
 ]

ASF GitHub Bot logged work on BEAM-8303:


Author: ASF GitHub Bot
Created on: 02/Oct/19 14:07
Start Date: 02/Oct/19 14:07
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9711: 
[release-2.16.0][BEAM-8303] Ensure FileSystems registration code runs in non 
UDF Flink operators
URL: https://github.com/apache/beam/pull/9711
 
 
   Backport of #9688.
   
   The FileBasedSink$FileResultCoder depends on the FileSystems code to be
   initialized. We had previously assumed that this would only be necessary for
   user-defined code, but as it stands, also coders may access the file system.
   
   Without this, the coder may fail during decoding with the following:
   
   ```
   Caused by: java.lang.IllegalArgumentException: No filesystem found for 
scheme s3
at 
org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:463)
at 
org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:533)
at 
org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
at 
org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
at 
org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
at 
org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:592)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:583)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:529)
at 
org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
at 
org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
at 
org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
at 
org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
at 
org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
at 
org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
at 
org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
at 
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)
   
   ```
   
   To mitigate such failures, we should always make sure the FileSystems code is
   initialized in the current Task. The class loaders of each Tasks are isolated
   from each other.
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 

[jira] [Work logged] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?focusedWorklogId=321898=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-321898
 ]

ASF GitHub Bot logged work on BEAM-8303:


Author: ASF GitHub Bot
Created on: 02/Oct/19 14:05
Start Date: 02/Oct/19 14:05
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9688: [BEAM-8303] Ensure 
FileSystems registration code runs in non UDF Flink operators
URL: https://github.com/apache/beam/pull/9688
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 321898)
Time Spent: 2h 10m  (was: 2h)

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = write.withNoSpilling();{code}
> This actually seemed to fix the issue, only to have it reemerge as I scaled 
> up the data set size.  The stack trace, while very similar, reads:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 

[jira] [Work logged] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?focusedWorklogId=321888=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-321888
 ]

ASF GitHub Bot logged work on BEAM-8303:


Author: ASF GitHub Bot
Created on: 02/Oct/19 14:04
Start Date: 02/Oct/19 14:04
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9688: [BEAM-8303] Ensure 
FileSystems registration code runs in non UDF Flink operators
URL: https://github.com/apache/beam/pull/9688#issuecomment-537507816
 
 
   ```
   15:47:20 FAILURE: Build failed with an exception.
   15:47:20 
   15:47:20 * What went wrong:
   15:47:20 Execution failed for task ':runners:samza:test'.
   ```
   
   No other failures.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 321888)
Time Spent: 2h  (was: 1h 50m)

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = write.withNoSpilling();{code}
> This actually seemed to fix the 

[jira] [Work logged] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?focusedWorklogId=321856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-321856
 ]

ASF GitHub Bot logged work on BEAM-8303:


Author: ASF GitHub Bot
Created on: 02/Oct/19 12:50
Start Date: 02/Oct/19 12:50
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9688: [BEAM-8303] Ensure 
FileSystems registration code runs in non UDF Flink operators
URL: https://github.com/apache/beam/pull/9688#issuecomment-537453495
 
 
   Java PreCommit broken on master: 
https://builds.apache.org/job/beam_PreCommit_Java_Cron/1862/console
   
   ```
   13:16:26 FAILURE: Build failed with an exception.
   13:16:26 
   13:16:26 * What went wrong:
   13:16:26 Execution failed for task ':runners:java-fn-execution:spotbugsMain'.
   ```
   
   Have to fix this on master first.
   
   edit: Fixed via #9710.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 321856)
Time Spent: 1h 40m  (was: 1.5h)

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 

[jira] [Work logged] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?focusedWorklogId=321833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-321833
 ]

ASF GitHub Bot logged work on BEAM-8303:


Author: ASF GitHub Bot
Created on: 02/Oct/19 11:48
Start Date: 02/Oct/19 11:48
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9688: [BEAM-8303] Ensure 
FileSystems registration code runs in non UDF Flink operators
URL: https://github.com/apache/beam/pull/9688#issuecomment-537453495
 
 
   Java PreCommit broken on master: 
https://builds.apache.org/job/beam_PreCommit_Java_Cron/1862/console
   
   ```
   13:16:26 FAILURE: Build failed with an exception.
   13:16:26 
   13:16:26 * What went wrong:
   13:16:26 Execution failed for task ':runners:java-fn-execution:spotbugsMain'.
   ```
   
   Have to fix this on master first.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 321833)
Time Spent: 1.5h  (was: 1h 20m)

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some 

[jira] [Work logged] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?focusedWorklogId=321830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-321830
 ]

ASF GitHub Bot logged work on BEAM-8303:


Author: ASF GitHub Bot
Created on: 02/Oct/19 11:40
Start Date: 02/Oct/19 11:40
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9688: [BEAM-8303] Ensure 
FileSystems registration code runs in non UDF Flink operators
URL: https://github.com/apache/beam/pull/9688#issuecomment-537453495
 
 
   Java PreCommit broken on master: 
https://builds.apache.org/job/beam_PreCommit_Java_Cron/1862/console
   
   ```
   
   ```
   
   Have to fix this on master first.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 321830)
Time Spent: 1h 20m  (was: 1h 10m)

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = write.withNoSpilling();{code}
> This actually seemed to fix the issue, only to have it 

[jira] [Work logged] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?focusedWorklogId=321811=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-321811
 ]

ASF GitHub Bot logged work on BEAM-8303:


Author: ASF GitHub Bot
Created on: 02/Oct/19 10:46
Start Date: 02/Oct/19 10:46
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9688: [BEAM-8303] Ensure 
FileSystems registration code runs in non UDF Flink operators
URL: https://github.com/apache/beam/pull/9688#issuecomment-537438527
 
 
   I added comments in all the places to clarify why this is necessary. I'll 
wait until the tests pass again, then I'd merge because the fix is 
straight-forward and it is currently blocking a user. Still happy to hear your 
comments on this. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 321811)
Time Spent: 1h 10m  (was: 1h)

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = 

[jira] [Work logged] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-10-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?focusedWorklogId=321139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-321139
 ]

ASF GitHub Bot logged work on BEAM-8303:


Author: ASF GitHub Bot
Created on: 01/Oct/19 09:25
Start Date: 01/Oct/19 09:25
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9688: [BEAM-8303] Ensure 
FileSystems registration code runs in non UDF Flink operators
URL: https://github.com/apache/beam/pull/9688#issuecomment-536950982
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 321139)
Time Spent: 1h  (was: 50m)

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = write.withNoSpilling();{code}
> This actually seemed to fix the issue, only to have it reemerge as I scaled 
> up the data set size.  The stack trace, while very similar, reads:
> {code:java}
>  java.lang.IllegalArgumentException: No 

[jira] [Work logged] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?focusedWorklogId=320948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-320948
 ]

ASF GitHub Bot logged work on BEAM-8303:


Author: ASF GitHub Bot
Created on: 30/Sep/19 22:48
Start Date: 30/Sep/19 22:48
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9688: [BEAM-8303] 
Ensure FileSystems registration code runs in non UDF Flink operators
URL: https://github.com/apache/beam/pull/9688#issuecomment-536783742
 
 
   Fix to Portable_Python failure is merged in 
https://issues.apache.org/jira/browse/BEAM-8324.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 320948)
Time Spent: 50m  (was: 40m)

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.16.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = write.withNoSpilling();{code}
> This actually seemed to fix the issue, only to have it reemerge as I scaled 
> up the data set size.  The stack trace, while very 

[jira] [Work logged] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?focusedWorklogId=320852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-320852
 ]

ASF GitHub Bot logged work on BEAM-8303:


Author: ASF GitHub Bot
Created on: 30/Sep/19 20:46
Start Date: 30/Sep/19 20:46
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9688: [BEAM-8303] 
Ensure FileSystems registration code runs in non UDF Flink operators
URL: https://github.com/apache/beam/pull/9688#issuecomment-536744819
 
 
   Portable_Python failure is known and tracking in 
https://issues.apache.org/jira/browse/BEAM-8324.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 320852)
Time Spent: 40m  (was: 0.5h)

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.16.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = write.withNoSpilling();{code}
> This actually seemed to fix the issue, only to have it reemerge as I scaled 
> up the data set size.  The stack trace, 

[jira] [Work logged] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?focusedWorklogId=320583=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-320583
 ]

ASF GitHub Bot logged work on BEAM-8303:


Author: ASF GitHub Bot
Created on: 30/Sep/19 15:38
Start Date: 30/Sep/19 15:38
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9688: [BEAM-8303] Ensure 
FileSystems registration code runs in non UDF Flink operators
URL: https://github.com/apache/beam/pull/9688#issuecomment-536620562
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 320583)
Time Spent: 0.5h  (was: 20m)

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.16.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = write.withNoSpilling();{code}
> This actually seemed to fix the issue, only to have it reemerge as I scaled 
> up the data set size.  The stack trace, while very similar, reads:
> {code:java}
>  java.lang.IllegalArgumentException: 

[jira] [Work logged] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?focusedWorklogId=320443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-320443
 ]

ASF GitHub Bot logged work on BEAM-8303:


Author: ASF GitHub Bot
Created on: 30/Sep/19 10:35
Start Date: 30/Sep/19 10:35
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9688: [BEAM-8303] Ensure 
FileSystems registration code runs in non UDF Flink operators
URL: https://github.com/apache/beam/pull/9688#issuecomment-536504441
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 320443)
Time Spent: 20m  (was: 10m)

> Filesystems not properly registered using FileIO.write()
> 
>
> Key: BEAM-8303
> URL: https://issues.apache.org/jira/browse/BEAM-8303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Preston Koprivica
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.16.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I’m getting the following error when attempting to use the FileIO apis 
> (beam-2.15.0) and integrating with AWS S3.  I have setup the PipelineOptions 
> with all the relevant AWS options, so the filesystem registry **should** be 
> properly seeded by the time the graph is compiled and executed:
> {code:java}
>  java.lang.IllegalArgumentException: No filesystem found for scheme s3
>     at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
>     at 
> org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
>     at 
> org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
>     at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
>     at 
> org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:543)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:534)
>     at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:480)
>     at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
>     at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
>     at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
>     at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>     at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
>     at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
>     at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> For reference, the write code resembles this:
> {code:java}
>  FileIO.Write write = FileIO.write()
>     .via(ParquetIO.sink(schema))
>     .to(options.getOutputDir()). // will be something like: 
> s3:///
>     .withSuffix(".parquet");
> records.apply(String.format("Write(%s)", options.getOutputDir()), 
> write);{code}
> The issue does not appear to be related to ParquetIO.sink().  I am able to 
> reliably reproduce the issue using JSON formatted records and TextIO.sink(), 
> as well.  Moreover, AvroIO is affected if withWindowedWrites() option is 
> added.
> Just trying some different knobs, I went ahead and set the following option:
> {code:java}
> write = write.withNoSpilling();{code}
> This actually seemed to fix the issue, only to have it reemerge as I scaled 
> up the data set size.  The stack trace, while very similar, reads:
> {code:java}
>  java.lang.IllegalArgumentException: No 

[jira] [Work logged] (BEAM-8303) Filesystems not properly registered using FileIO.write()

2019-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8303?focusedWorklogId=320405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-320405
 ]

ASF GitHub Bot logged work on BEAM-8303:


Author: ASF GitHub Bot
Created on: 30/Sep/19 09:11
Start Date: 30/Sep/19 09:11
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9688: [BEAM-8303] Ensure 
FileSystems registration code runs in non UDF Flink operators
URL: https://github.com/apache/beam/pull/9688
 
 
   The FileBasedSink$FileResultCoder depends on the FileSystems code to be
   initialized. We had previously assumed that this would only be necessary for
   user-defined code, but as it stands, also coders may access the file system.
   
   Without this, the coder may fail during decoding with the following:
   
   ```
   Caused by: java.lang.IllegalArgumentException: No filesystem found for 
scheme s3
at 
org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:463)
at 
org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:533)
at 
org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1149)
at 
org.apache.beam.sdk.io.FileBasedSink$FileResultCoder.decode(FileBasedSink.java:1105)
at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
at 
org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:83)
at 
org.apache.beam.sdk.transforms.join.UnionCoder.decode(UnionCoder.java:32)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:592)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:583)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:529)
at 
org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:93)
at 
org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
at 
org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
at 
org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
at 
org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
at 
org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
at 
org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:107)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
at 
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)
   
   ```
   
   To mitigate such failures, we should always make sure the FileSystems code is
   initialized in the current Task. The class loaders of each Tasks are isolated
   from each other.
   
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build