[ https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902012#comment-16902012 ]
Marcelo Pio de Castro edited comment on BEAM-6923 at 8/7/19 11:57 AM: ---------------------------------------------------------------------- Sorry for the late response. [~iemejia], the library was guava-20, but at the time I was using a mismatched version of beam and spark (spark was in a earlier version than the dependency of Apache beam). I'll retest this with the newer version of beam and get back to you. As for the thread issue. I was uploading files that could have 300+ GB file size using the AvroIO sink. At the time I had a very limited cluster of spark workers, with about 4GB of RAM. The problem is at the async http upload that the gcs lib uses. I think that an option should exist to choose to use a sync method instead on limited cluster memory scenarios. (If such option exist, I couldn't find it) I solved my problem doing the upload by myself in a ParDo instead of using the AvroIO sink, by uploading in a sync way. Even an async method can be done with a more limited ram, but that is a problem with the Google library. was (Author: marcelo.castro): Sorry for the late response. [~iemejia], the library was guava-20, but at the time I was using a mismatched version of beam and spark (spark was in a earlier version than the dependency of Apache beam). I'll retest this with the newer version of beam and get back to you. As for the thread issue. I was uploading files that could 300+ GB file size using the AvroIO sink. At the time I had a very limited cluster of spark workers, with about 4GB of RAM. The problem is at the async http upload that the gcs lib uses. I think that an option should exist to choose to use a sync method instead on limited cluster memory scenarios. (If such option exist, I couldn't find it) I solved my problem doing the upload by myself in a ParDo instead of using the AvroIO sink, by uploading in a sync way. Even an async method can be done with a more limited ram, but that is a problem with the Google library. > OOM errors in jobServer when using GCS artifactDir > -------------------------------------------------- > > Key: BEAM-6923 > URL: https://issues.apache.org/jira/browse/BEAM-6923 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness > Reporter: Lukasz Gajowy > Priority: Major > Attachments: Instance counts.png, Paths to GC root.png, > Telemetries.png, heapdump size-sorted.png > > > When starting jobServer with artifactDir pointing to a GCS bucket: > {code:java} > ./gradlew :beam-runners-flink_2.11-job-server:runShadow > -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code} > and running a Java portable pipeline with the following, portability related > pipeline options: > {code:java} > --runner=PortableRunner --jobEndpoint=localhost:8099 > --defaultEnvironmentType=DOCKER > --defaultEnvironmentConfig=gcr.io/<my-freshly-built-sdk-harness-image>/java:latest'{code} > > I'm facing a series of OOM errors, like this: > {code:java} > Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: > Java heap space > at > com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606) > at > com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408) > at > com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432) > at > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549) > at > com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > > This does not happen when I'm using a local filesystem for the artifact > staging location. > -- This message was sent by Atlassian JIRA (v7.6.14#76016)