Spark SQL question: is cached SchemaRDD storage controlled by "spark.storage.memoryFraction"?

2014-09-26 Thread Haopu Wang
Hi, I'm querying a big table using Spark SQL. I see very long GC time in
some stages. I wonder if I can improve it by tuning the storage
parameter.

The question is: the schemaRDD has been cached with "cacheTable()"
function. So is the cached schemaRDD part of memory storage controlled
by the "spark.storage.memoryFraction" parameter?

Thanks!

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark SQL question: is cached SchemaRDD storage controlled by "spark.storage.memoryFraction"?

2014-09-26 Thread Cheng Lian
Yes it is. The in-memory storage used with |SchemaRDD| also uses 
|RDD.cache()| under the hood.


On 9/26/14 4:04 PM, Haopu Wang wrote:


Hi, I'm querying a big table using Spark SQL. I see very long GC time in
some stages. I wonder if I can improve it by tuning the storage
parameter.

The question is: the schemaRDD has been cached with "cacheTable()"
function. So is the cached schemaRDD part of memory storage controlled
by the "spark.storage.memoryFraction" parameter?

Thanks!

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


​


Fwd: Spark SQL question: is cached SchemaRDD storage controlled by "spark.storage.memoryFraction"?

2014-09-26 Thread Liquan Pei
-- Forwarded message --
From: Liquan Pei 
Date: Fri, Sep 26, 2014 at 1:33 AM
Subject: Re: Spark SQL question: is cached SchemaRDD storage controlled by
"spark.storage.memoryFraction"?
To: Haopu Wang 


Hi Haopu,

Internally, cactheTable on a schemaRDD is implemented as a cache() on a
MapPartitionsRDD. As memory reserved for caching RDDs is controlled by
spark.storage.memoryFraction,
memory storage of cached schemaRDD is controlled by
spark.storage.memoryFraction.

Hope this helps!
Liquan

On Fri, Sep 26, 2014 at 1:04 AM, Haopu Wang  wrote:

> Hi, I'm querying a big table using Spark SQL. I see very long GC time in
> some stages. I wonder if I can improve it by tuning the storage
> parameter.
>
> The question is: the schemaRDD has been cached with "cacheTable()"
> function. So is the cached schemaRDD part of memory storage controlled
> by the "spark.storage.memoryFraction" parameter?
>
> Thanks!
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst



-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst