Re: RDD Location

2016-12-30 Thread Fei Hu
It will be very appreciated if you can give more details about why runJob
function could not be called in getPreferredLocations()

In the NewHadoopRDD class and HadoopRDD class, they get the location
information from the inputSplit. But there may be an issue in NewHadoopRDD,
because it generates all of the inputSplits on the master node, which means
I can only use a single node to generate and filter the inputSplits even if
the number of inputSplits is huge. Will it be a performance bottleneck?

Thanks,
Fei





On Fri, Dec 30, 2016 at 10:41 PM, Sun Rui  wrote:

> You can’t call runJob inside getPreferredLocations().
> You can take a look at the source  code of HadoopRDD to help you implement 
> getPreferredLocations()
> appropriately.
>
> On Dec 31, 2016, at 09:48, Fei Hu  wrote:
>
> That is a good idea.
>
> I tried add the following code to get getPreferredLocations() function:
>
> val results: Array[Array[DataChunkPartition]] = context.runJob(
>   partitionsRDD, (context: TaskContext, partIter:
> Iterator[DataChunkPartition]) => partIter.toArray, dd, allowLocal = true)
>
> But it seems to be suspended when executing this function. But if I move
> the code to other places, like the main() function, it runs well.
>
> What is the reason for it?
>
> Thanks,
> Fei
>
> On Fri, Dec 30, 2016 at 2:38 AM, Sun Rui  wrote:
>
>> Maybe you can create your own subclass of RDD and override the
>> getPreferredLocations() to implement the logic of dynamic changing of the
>> locations.
>> > On Dec 30, 2016, at 12:06, Fei Hu  wrote:
>> >
>> > Dear all,
>> >
>> > Is there any way to change the host location for a certain partition of
>> RDD?
>> >
>> > "protected def getPreferredLocations(split: Partition)" can be used to
>> initialize the location, but how to change it after the initialization?
>> >
>> >
>> > Thanks,
>> > Fei
>> >
>> >
>>
>>
>>
>
>


Re: RDD Location

2016-12-30 Thread Sun Rui
You can’t call runJob inside getPreferredLocations().
You can take a look at the source  code of HadoopRDD to help you implement 
getPreferredLocations() appropriately.
> On Dec 31, 2016, at 09:48, Fei Hu  wrote:
> 
> That is a good idea.
> 
> I tried add the following code to get getPreferredLocations() function:
> 
> val results: Array[Array[DataChunkPartition]] = context.runJob(
>   partitionsRDD, (context: TaskContext, partIter: 
> Iterator[DataChunkPartition]) => partIter.toArray, dd, allowLocal = true)
> 
> But it seems to be suspended when executing this function. But if I move the 
> code to other places, like the main() function, it runs well.
> 
> What is the reason for it?
> 
> Thanks,
> Fei
> 
> On Fri, Dec 30, 2016 at 2:38 AM, Sun Rui  > wrote:
> Maybe you can create your own subclass of RDD and override the 
> getPreferredLocations() to implement the logic of dynamic changing of the 
> locations.
> > On Dec 30, 2016, at 12:06, Fei Hu  > > wrote:
> >
> > Dear all,
> >
> > Is there any way to change the host location for a certain partition of RDD?
> >
> > "protected def getPreferredLocations(split: Partition)" can be used to 
> > initialize the location, but how to change it after the initialization?
> >
> >
> > Thanks,
> > Fei
> >
> >
> 
> 
> 



Re: RDD Location

2016-12-30 Thread Fei Hu
That is a good idea.

I tried add the following code to get getPreferredLocations() function:

val results: Array[Array[DataChunkPartition]] = context.runJob(
  partitionsRDD, (context: TaskContext, partIter:
Iterator[DataChunkPartition]) => partIter.toArray, dd, allowLocal = true)

But it seems to be suspended when executing this function. But if I move
the code to other places, like the main() function, it runs well.

What is the reason for it?

Thanks,
Fei

On Fri, Dec 30, 2016 at 2:38 AM, Sun Rui  wrote:

> Maybe you can create your own subclass of RDD and override the
> getPreferredLocations() to implement the logic of dynamic changing of the
> locations.
> > On Dec 30, 2016, at 12:06, Fei Hu  wrote:
> >
> > Dear all,
> >
> > Is there any way to change the host location for a certain partition of
> RDD?
> >
> > "protected def getPreferredLocations(split: Partition)" can be used to
> initialize the location, but how to change it after the initialization?
> >
> >
> > Thanks,
> > Fei
> >
> >
>
>
>


Re: RDD Location

2016-12-29 Thread Sun Rui
Maybe you can create your own subclass of RDD and override the 
getPreferredLocations() to implement the logic of dynamic changing of the 
locations.
> On Dec 30, 2016, at 12:06, Fei Hu  wrote:
> 
> Dear all,
> 
> Is there any way to change the host location for a certain partition of RDD?
> 
> "protected def getPreferredLocations(split: Partition)" can be used to 
> initialize the location, but how to change it after the initialization?
> 
> 
> Thanks,
> Fei
> 
> 



-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org