Re: indexing an RDD [Python]

2015-04-29 Thread Sven Krasser
Hey Roberto,

You will likely want to use a cogroup() then, but it hinges all on how your
data looks, i.e. if you have the index in the key. Here's an example:
http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html#cogroup
.

Clone: RDDs are immutable, so if you need to make changes to it, those will
result in a new RDD.

Best,
-Sven


On Fri, Apr 24, 2015 at 4:49 PM, Pagliari, Roberto 
wrote:

> Hi,
>
> I may need to read many values. The list [0,4,5,6,8] is the locations of
> the rows I’d like to extract from the RDD (of labledPoints). Could you
> possibly provide a quick example?
>
>
>
> Also, I’m not quite sure how this work, but the resulting RDD should be a
> clone, as I may need to modify the values and preserve the original ones.
>
>
>
> Thank you,
>
>
>
>
>
> *From:* Sven Krasser [mailto:kras...@gmail.com]
> *Sent:* Friday, April 24, 2015 5:56 PM
> *To:* Pagliari, Roberto
> *Cc:* user@spark.apache.org
> *Subject:* Re: indexing an RDD [Python]
>
>
>
> The solution depends largely on your use case. I assume the index is in
> the key. In that case, you can make a second RDD out of the list of indices
> and then use cogroup() on both.
>
> If the list of indices is small, just using filter() will work well.
>
> If you need to read back a few select values to the driver, take a look at
> lookup().
>
>
>
> On Fri, Apr 24, 2015 at 1:51 PM, Pagliari, Roberto <
> rpagli...@appcomsci.com> wrote:
>
> I have an RDD of LabledPoints.
> Is it possible to select a subset of it based on a list of indeces?
> For example with idx=[0,4,5,6,8], I'd like to be able to create a new RDD
> with elements 0,4,5,6 and 8.
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>
> --
>
> www.skrasser.com <http://www.skrasser.com/?utm_source=sig>
>



-- 
www.skrasser.com <http://www.skrasser.com/?utm_source=sig>


RE: indexing an RDD [Python]

2015-04-24 Thread Pagliari, Roberto
Hi,
I may need to read many values. The list [0,4,5,6,8] is the locations of the 
rows I’d like to extract from the RDD (of labledPoints). Could you possibly 
provide a quick example?

Also, I’m not quite sure how this work, but the resulting RDD should be a 
clone, as I may need to modify the values and preserve the original ones.

Thank you,


From: Sven Krasser [mailto:kras...@gmail.com]
Sent: Friday, April 24, 2015 5:56 PM
To: Pagliari, Roberto
Cc: user@spark.apache.org
Subject: Re: indexing an RDD [Python]

The solution depends largely on your use case. I assume the index is in the 
key. In that case, you can make a second RDD out of the list of indices and 
then use cogroup() on both.
If the list of indices is small, just using filter() will work well.
If you need to read back a few select values to the driver, take a look at 
lookup().

On Fri, Apr 24, 2015 at 1:51 PM, Pagliari, Roberto 
mailto:rpagli...@appcomsci.com>> wrote:
I have an RDD of LabledPoints.
Is it possible to select a subset of it based on a list of indeces?
For example with idx=[0,4,5,6,8], I'd like to be able to create a new RDD with 
elements 0,4,5,6 and 8.


-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>



--
www.skrasser.com<http://www.skrasser.com/?utm_source=sig>


Re: indexing an RDD [Python]

2015-04-24 Thread Sven Krasser
The solution depends largely on your use case. I assume the index is in the
key. In that case, you can make a second RDD out of the list of indices and
then use cogroup() on both.

If the list of indices is small, just using filter() will work well.

If you need to read back a few select values to the driver, take a look at
lookup().

On Fri, Apr 24, 2015 at 1:51 PM, Pagliari, Roberto 
wrote:

> I have an RDD of LabledPoints.
> Is it possible to select a subset of it based on a list of indeces?
> For example with idx=[0,4,5,6,8], I'd like to be able to create a new RDD
> with elements 0,4,5,6 and 8.
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
www.skrasser.com 


indexing an RDD [Python]

2015-04-24 Thread Pagliari, Roberto
I have an RDD of LabledPoints. 
Is it possible to select a subset of it based on a list of indeces?
For example with idx=[0,4,5,6,8], I'd like to be able to create a new RDD with 
elements 0,4,5,6 and 8.


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org