[jira] [Comment Edited] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

Marcelo Pio de Castro (JIRA) Wed, 07 Aug 2019 04:58:20 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902012#comment-16902012
 ]


Marcelo Pio de Castro edited comment on BEAM-6923 at 8/7/19 11:57 AM:
----------------------------------------------------------------------

Sorry for the late response.

[~iemejia], the library was guava-20, but at the time I was using a mismatched 
version of beam and spark (spark was in a earlier version than the dependency 
of Apache beam). I'll retest this with the newer version of beam and get back 
to you.

 

As for the thread issue. I was uploading files that could have 300+ GB file 
size using the AvroIO sink. At the time I had a very limited cluster of spark 
workers, with about 4GB of RAM. The problem is at the async http upload that 
the gcs lib uses.

I think that an option should exist to choose to use a sync method instead on 
limited cluster memory scenarios. (If such option exist, I couldn't find it)

 

I solved my problem doing the upload by myself in a ParDo instead of using the 
AvroIO sink, by uploading in a sync way. Even an async method can be done with 
a more limited ram, but that is a problem with the Google library.

 


was (Author: marcelo.castro):
Sorry for the late response.

[~iemejia], the library was guava-20, but at the time I was using a mismatched 
version of beam and spark (spark was in a earlier version than the dependency 
of Apache beam). I'll retest this with the newer version of beam and get back 
to you.

 

As for the thread issue. I was uploading files that could 300+ GB file size 
using the AvroIO sink. At the time I had a very limited cluster of spark 
workers, with about 4GB of RAM. The problem is at the async http upload that 
the gcs lib uses.

I think that an option should exist to choose to use a sync method instead on 
limited cluster memory scenarios. (If such option exist, I couldn't find it)

 

I solved my problem doing the upload by myself in a ParDo instead of using the 
AvroIO sink, by uploading in a sync way. Even an async method can be done with 
a more limited ram, but that is a problem with the Google library.

 

> OOM errors in jobServer when using GCS artifactDir
> --------------------------------------------------
>
>                 Key: BEAM-6923
>                 URL: https://issues.apache.org/jira/browse/BEAM-6923
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-harness
>            Reporter: Lukasz Gajowy
>            Priority: Major
>         Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, heapdump size-sorted.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io/<my-freshly-built-sdk-harness-image>/java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Comment Edited] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

Reply via email to