[jira] [Commented] (SPARK-12546) Writing to partitioned parquet table can fail with OOM

Jakub Liska (JIRA) Tue, 15 Mar 2016 05:00:56 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-12546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195157#comment-15195157
 ]


Jakub Liska commented on SPARK-12546:
-------------------------------------

Hey, I migrated to 1.6.0, is it possible that it somehow relates to this really 
weird problem? Because there are plenty of resources and the sample data is 
really really tiny : 
{code}
val coreRdd = 
sc.textFile("s3n://gwiq-views-p/external/core/tsv/*.tsv").map(_.split("\t")).map(
 fields => Row(fields:_*) )
val coreDataFrame = sqlContext.createDataFrame(coreRdd, schema)
coreDataFrame.registerTempTable("core")
coreDataFrame.persist(StorageLevel.DISK_ONLY)
{code}

{code}
SELECT COUNT(*) FROM core
{code}

{code}
------ Create new SparkContext spark://master:7077 -------
Exception in thread "pool-1-thread-5" java.lang.OutOfMemoryError: GC overhead 
limit exceeded
        at 
com.google.gson.internal.bind.ObjectTypeAdapter.read(ObjectTypeAdapter.java:66)
        at 
com.google.gson.internal.bind.ObjectTypeAdapter.read(ObjectTypeAdapter.java:69)
        at 
com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40)
        at 
com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:188)
        at 
com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:146)
        at com.google.gson.Gson.fromJson(Gson.java:791)
        at com.google.gson.Gson.fromJson(Gson.java:757)
        at com.google.gson.Gson.fromJson(Gson.java:706)
        at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.convert(RemoteInterpreterServer.java:417)
        at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:384)
        at 
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1376)
        at 
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1361)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}

> Writing to partitioned parquet table can fail with OOM
> ------------------------------------------------------
>
>                 Key: SPARK-12546
>                 URL: https://issues.apache.org/jira/browse/SPARK-12546
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>            Reporter: Nong Li
>            Assignee: Michael Armbrust
>            Priority: Blocker
>              Labels: releasenotes
>             Fix For: 1.6.1, 2.0.0
>
>
> It is possible to have jobs fail with OOM when writing to a partitioned 
> parquet table. While this was probably always possible, it is more likely in 
> 1.6 due to the memory manager changes. The unified memory manager enables 
> Spark to use more of the process memory (in particular, for execution) which 
> gets us in this state more often. This issue can happen for libraries that 
> consume a lot of memory, such as parquet. Prior to 1.6, these libraries would 
> more likely use memory that spark was not using (i.e. from the storage pool). 
> In 1.6, this storage memory can now be used for execution.
> There are a couple of configs that can help with this issue.
>   - parquet.memory.pool.ratio: This is a parquet config on how much of the 
> heap the parquet writers should use. This default to .95. Consider a much 
> lower value (e.g. 0.1)
>   - spark.memory.faction: This is a spark config to control how much of the 
> memory should be allocated to spark. Consider setting this to 0.6.
> This should cause jobs to potentially spill more but require less memory. 
> More aggressive tuning will control this trade off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12546) Writing to partitioned parquet table can fail with OOM

Reply via email to