[jira] [Commented] (SPARK-12546) Writing to partitioned parquet table can fail with OOM

2016-03-15 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195967#comment-15195967
 ] 

Michael Armbrust commented on SPARK-12546:
--

There is no partitioning in that example so it is not the same problem.  Also, 
stacktrace is in zeplin code, not spark.

> Writing to partitioned parquet table can fail with OOM
> --
>
> Key: SPARK-12546
> URL: https://issues.apache.org/jira/browse/SPARK-12546
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Nong Li
>Assignee: Michael Armbrust
>Priority: Blocker
>  Labels: releasenotes
> Fix For: 1.6.1, 2.0.0
>
>
> It is possible to have jobs fail with OOM when writing to a partitioned 
> parquet table. While this was probably always possible, it is more likely in 
> 1.6 due to the memory manager changes. The unified memory manager enables 
> Spark to use more of the process memory (in particular, for execution) which 
> gets us in this state more often. This issue can happen for libraries that 
> consume a lot of memory, such as parquet. Prior to 1.6, these libraries would 
> more likely use memory that spark was not using (i.e. from the storage pool). 
> In 1.6, this storage memory can now be used for execution.
> There are a couple of configs that can help with this issue.
>   - parquet.memory.pool.ratio: This is a parquet config on how much of the 
> heap the parquet writers should use. This default to .95. Consider a much 
> lower value (e.g. 0.1)
>   - spark.memory.faction: This is a spark config to control how much of the 
> memory should be allocated to spark. Consider setting this to 0.6.
> This should cause jobs to potentially spill more but require less memory. 
> More aggressive tuning will control this trade off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12546) Writing to partitioned parquet table can fail with OOM

2016-03-15 Thread Jakub Liska (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195157#comment-15195157
 ] 

Jakub Liska commented on SPARK-12546:
-

Hey, I migrated to 1.6.0, is it possible that it somehow relates to this really 
weird problem? Because there are plenty of resources and the sample data is 
really really tiny : 
{code}
val coreRdd = 
sc.textFile("s3n://gwiq-views-p/external/core/tsv/*.tsv").map(_.split("\t")).map(
 fields => Row(fields:_*) )
val coreDataFrame = sqlContext.createDataFrame(coreRdd, schema)
coreDataFrame.registerTempTable("core")
coreDataFrame.persist(StorageLevel.DISK_ONLY)
{code}

{code}
SELECT COUNT(*) FROM core
{code}

{code}
-- Create new SparkContext spark://master:7077 ---
Exception in thread "pool-1-thread-5" java.lang.OutOfMemoryError: GC overhead 
limit exceeded
at 
com.google.gson.internal.bind.ObjectTypeAdapter.read(ObjectTypeAdapter.java:66)
at 
com.google.gson.internal.bind.ObjectTypeAdapter.read(ObjectTypeAdapter.java:69)
at 
com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40)
at 
com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:188)
at 
com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:146)
at com.google.gson.Gson.fromJson(Gson.java:791)
at com.google.gson.Gson.fromJson(Gson.java:757)
at com.google.gson.Gson.fromJson(Gson.java:706)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.convert(RemoteInterpreterServer.java:417)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:384)
at 
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1376)
at 
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1361)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

> Writing to partitioned parquet table can fail with OOM
> --
>
> Key: SPARK-12546
> URL: https://issues.apache.org/jira/browse/SPARK-12546
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Nong Li
>Assignee: Michael Armbrust
>Priority: Blocker
>  Labels: releasenotes
> Fix For: 1.6.1, 2.0.0
>
>
> It is possible to have jobs fail with OOM when writing to a partitioned 
> parquet table. While this was probably always possible, it is more likely in 
> 1.6 due to the memory manager changes. The unified memory manager enables 
> Spark to use more of the process memory (in particular, for execution) which 
> gets us in this state more often. This issue can happen for libraries that 
> consume a lot of memory, such as parquet. Prior to 1.6, these libraries would 
> more likely use memory that spark was not using (i.e. from the storage pool). 
> In 1.6, this storage memory can now be used for execution.
> There are a couple of configs that can help with this issue.
>   - parquet.memory.pool.ratio: This is a parquet config on how much of the 
> heap the parquet writers should use. This default to .95. Consider a much 
> lower value (e.g. 0.1)
>   - spark.memory.faction: This is a spark config to control how much of the 
> memory should be allocated to spark. Consider setting this to 0.6.
> This should cause jobs to potentially spill more but require less memory. 
> More aggressive tuning will control this trade off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12546) Writing to partitioned parquet table can fail with OOM

2016-02-22 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157605#comment-15157605
 ] 

Apache Spark commented on SPARK-12546:
--

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/11308

> Writing to partitioned parquet table can fail with OOM
> --
>
> Key: SPARK-12546
> URL: https://issues.apache.org/jira/browse/SPARK-12546
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Nong Li
>Assignee: Michael Armbrust
>Priority: Blocker
>
> It is possible to have jobs fail with OOM when writing to a partitioned 
> parquet table. While this was probably always possible, it is more likely in 
> 1.6 due to the memory manager changes. The unified memory manager enables 
> Spark to use more of the process memory (in particular, for execution) which 
> gets us in this state more often. This issue can happen for libraries that 
> consume a lot of memory, such as parquet. Prior to 1.6, these libraries would 
> more likely use memory that spark was not using (i.e. from the storage pool). 
> In 1.6, this storage memory can now be used for execution.
> There are a couple of configs that can help with this issue.
>   - parquet.memory.pool.ratio: This is a parquet config on how much of the 
> heap the parquet writers should use. This default to .95. Consider a much 
> lower value (e.g. 0.1)
>   - spark.memory.faction: This is a spark config to control how much of the 
> memory should be allocated to spark. Consider setting this to 0.6.
> This should cause jobs to potentially spill more but require less memory. 
> More aggressive tuning will control this trade off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12546) Writing to partitioned parquet table can fail with OOM

2016-01-19 Thread Nong Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107258#comment-15107258
 ] 

Nong Li commented on SPARK-12546:
-

A better workaround might be to figure the max number of concurrent output 
files to 1. This can be done by setting 
"spark.sql.sources.maxConcurrentWrites=1"

> Writing to partitioned parquet table can fail with OOM
> --
>
> Key: SPARK-12546
> URL: https://issues.apache.org/jira/browse/SPARK-12546
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Nong Li
>
> It is possible to have jobs fail with OOM when writing to a partitioned 
> parquet table. While this was probably always possible, it is more likely in 
> 1.6 due to the memory manager changes. The unified memory manager enables 
> Spark to use more of the process memory (in particular, for execution) which 
> gets us in this state more often. This issue can happen for libraries that 
> consume a lot of memory, such as parquet. Prior to 1.6, these libraries would 
> more likely use memory that spark was not using (i.e. from the storage pool). 
> In 1.6, this storage memory can now be used for execution.
> There are a couple of configs that can help with this issue.
>   - parquet.memory.pool.ratio: This is a parquet config on how much of the 
> heap the parquet writers should use. This default to .95. Consider a much 
> lower value (e.g. 0.1)
>   - spark.memory.faction: This is a spark config to control how much of the 
> memory should be allocated to spark. Consider setting this to 0.6.
> This should cause jobs to potentially spill more but require less memory. 
> More aggressive tuning will control this trade off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org