subject:"defaultMinPartitions in textFile"

RE: defaultMinPartitions in textFile

2014-07-21 Thread Wang, Jensen

Yes, Great.  I thought it was math.max instead of math.min on that line. Thank 
you!

From: Ye Xianjin [mailto:advance...@gmail.com]
Sent: Tuesday, July 22, 2014 11:37 AM
To: user@spark.apache.org
Subject: Re: defaultMinPartitions in textFile

well, I think you miss this line of code in SparkContext.scala
line 1242-1243(master):
 /** Default min number of partitions for Hadoop RDDs when not given by user */
  def defaultMinPartitions: Int = math.min(defaultParallelism, 2)
so the defaultMinPartitions will be 2 unless the defaultParallelism is less 
than 2...

--
Ye Xianjin
Sent with Sparrow<http://www.sparrowmailapp.com/?sig>

On Tuesday, July 22, 2014 at 10:18 AM, Wang, Jensen wrote:

Hi,

I started to use spark on yarn recently and found a problem while 
tuning my program.

When SparkContext is initialized as sc and ready to read text file from hdfs, 
the textFile(path, defaultMinPartitions) method is called.

I traced down the second parameter in the spark source code and finally found 
this:

   conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2))  
in  CoarseGrainedSchedulerBackend.scala

I do not specify the property “spark.default.parallelism” anywhere so the 
getInt will return value from the larger one between totalCoreCount and 2.

When I submit the application using spark-submit and specify the parameter: 
--num-executors  2   --executor-cores 6, I suppose the totalCoreCount will be

2*6 = 12, so defaultMinPartitions will be 12.

But when I print the value of defaultMinPartitions in my program, I still get 2 
in return,  How does this happen, or where do I make a mistake?

Re: defaultMinPartitions in textFile

2014-07-21 Thread Ye Xianjin

well, I think you miss this line of code in SparkContext.scala
line 1242-1243(master):
 /** Default min number of partitions for Hadoop RDDs when not given by user */
  def defaultMinPartitions: Int = math.min(defaultParallelism, 2)

so the defaultMinPartitions will be 2 unless the defaultParallelism is less 
than 2...


--  
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, July 22, 2014 at 10:18 AM, Wang, Jensen wrote:

> Hi,  
> I started to use spark on yarn recently and found a problem while 
> tuning my program.
>   
> When SparkContext is initialized as sc and ready to read text file from hdfs, 
> the textFile(path, defaultMinPartitions) method is called.
> I traced down the second parameter in the spark source code and finally found 
> this:
>conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 
> 2))  in  CoarseGrainedSchedulerBackend.scala
>   
> I do not specify the property “spark.default.parallelism” anywhere so the 
> getInt will return value from the larger one between totalCoreCount and 2.
>   
> When I submit the application using spark-submit and specify the parameter: 
> --num-executors  2   --executor-cores 6, I suppose the totalCoreCount will be 
>  
> 2*6 = 12, so defaultMinPartitions will be 12.
>   
> But when I print the value of defaultMinPartitions in my program, I still get 
> 2 in return,  How does this happen, or where do I make a mistake?
>  
>  
>

defaultMinPartitions in textFile

2014-07-21 Thread Wang, Jensen

Hi,
I started to use spark on yarn recently and found a problem while 
tuning my program.

When SparkContext is initialized as sc and ready to read text file from hdfs, 
the textFile(path, defaultMinPartitions) method is called.
I traced down the second parameter in the spark source code and finally found 
this:
   conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2))  
in  CoarseGrainedSchedulerBackend.scala

I do not specify the property "spark.default.parallelism" anywhere so the 
getInt will return value from the larger one between totalCoreCount and 2.

When I submit the application using spark-submit and specify the parameter: 
--num-executors  2   --executor-cores 6, I suppose the totalCoreCount will be
2*6 = 12, so defaultMinPartitions will be 12.

But when I print the value of defaultMinPartitions in my program, I still get 2 
in return,  How does this happen, or where do I make a mistake?

RE: defaultMinPartitions in textFile

Re: defaultMinPartitions in textFile

defaultMinPartitions in textFile

3 matches

Site Navigation

Mail list logo

Footer information