Re: Performance problem on collect

2014-08-19 Thread Emmanuel Castanier
It did the job.
Thanks. :)

Le 19 août 2014 à 10:20, Sean Owen  a écrit :

> In that case, why not collectAsMap() and have the whole result as a
> simple Map in memory? then lookups are trivial. RDDs aren't
> distributed maps.
> 
> On Tue, Aug 19, 2014 at 9:17 AM, Emmanuel Castanier
>  wrote:
>> Thanks for your answer.
>> In my case, that’s sad cause we have only 60 entries in the final RDD, I was 
>> thinking it will be fast to get the needed one.
>> 
>> 
>> Le 19 août 2014 à 09:58, Sean Owen  a écrit :
>> 
>>> You can use the function lookup() to accomplish this too; it may be a
>>> bit faster.
>>> 
>>> It will never be efficient like a database lookup since this is
>>> implemented by scanning through all of the data. There is no index or
>>> anything.
>>> 
>>> On Tue, Aug 19, 2014 at 8:43 AM, Emmanuel Castanier
>>>  wrote:
 Hi all,
 
 I’m totally newbie on Spark, so my question may be a dumb one.
 I tried Spark to compute values, on this side all works perfectly (and 
 it's fast :) ).
 
 At the end of the process, I have an RDD with Key(String)/Values(Array
 of String), on this I want to get only one entry like this :
 
 myRdd.filter(t => t._1.equals(param))
 
 If I make a collect to get the only « tuple » , It takes about 12 seconds 
 to execute, I imagine that’s because Spark may be used differently...
 
 Best regards,
 
 Emmanuel
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
>> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Performance problem on collect

2014-08-19 Thread Sean Owen
In that case, why not collectAsMap() and have the whole result as a
simple Map in memory? then lookups are trivial. RDDs aren't
distributed maps.

On Tue, Aug 19, 2014 at 9:17 AM, Emmanuel Castanier
 wrote:
> Thanks for your answer.
> In my case, that’s sad cause we have only 60 entries in the final RDD, I was 
> thinking it will be fast to get the needed one.
>
>
> Le 19 août 2014 à 09:58, Sean Owen  a écrit :
>
>> You can use the function lookup() to accomplish this too; it may be a
>> bit faster.
>>
>> It will never be efficient like a database lookup since this is
>> implemented by scanning through all of the data. There is no index or
>> anything.
>>
>> On Tue, Aug 19, 2014 at 8:43 AM, Emmanuel Castanier
>>  wrote:
>>> Hi all,
>>>
>>> I’m totally newbie on Spark, so my question may be a dumb one.
>>> I tried Spark to compute values, on this side all works perfectly (and it's 
>>> fast :) ).
>>>
>>> At the end of the process, I have an RDD with Key(String)/Values(Array
>>> of String), on this I want to get only one entry like this :
>>>
>>> myRdd.filter(t => t._1.equals(param))
>>>
>>> If I make a collect to get the only « tuple » , It takes about 12 seconds 
>>> to execute, I imagine that’s because Spark may be used differently...
>>>
>>> Best regards,
>>>
>>> Emmanuel
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Performance problem on collect

2014-08-19 Thread Emmanuel Castanier
Thanks for your answer.
In my case, that’s sad cause we have only 60 entries in the final RDD, I was 
thinking it will be fast to get the needed one.


Le 19 août 2014 à 09:58, Sean Owen  a écrit :

> You can use the function lookup() to accomplish this too; it may be a
> bit faster.
> 
> It will never be efficient like a database lookup since this is
> implemented by scanning through all of the data. There is no index or
> anything.
> 
> On Tue, Aug 19, 2014 at 8:43 AM, Emmanuel Castanier
>  wrote:
>> Hi all,
>> 
>> I’m totally newbie on Spark, so my question may be a dumb one.
>> I tried Spark to compute values, on this side all works perfectly (and it's 
>> fast :) ).
>> 
>> At the end of the process, I have an RDD with Key(String)/Values(Array
>> of String), on this I want to get only one entry like this :
>> 
>> myRdd.filter(t => t._1.equals(param))
>> 
>> If I make a collect to get the only « tuple » , It takes about 12 seconds to 
>> execute, I imagine that’s because Spark may be used differently...
>> 
>> Best regards,
>> 
>> Emmanuel
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Performance problem on collect

2014-08-19 Thread Sean Owen
You can use the function lookup() to accomplish this too; it may be a
bit faster.

It will never be efficient like a database lookup since this is
implemented by scanning through all of the data. There is no index or
anything.

On Tue, Aug 19, 2014 at 8:43 AM, Emmanuel Castanier
 wrote:
> Hi all,
>
> I’m totally newbie on Spark, so my question may be a dumb one.
> I tried Spark to compute values, on this side all works perfectly (and it's 
> fast :) ).
>
> At the end of the process, I have an RDD with Key(String)/Values(Array
> of String), on this I want to get only one entry like this :
>
> myRdd.filter(t => t._1.equals(param))
>
> If I make a collect to get the only « tuple » , It takes about 12 seconds to 
> execute, I imagine that’s because Spark may be used differently...
>
> Best regards,
>
> Emmanuel
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Performance problem on collect

2014-08-19 Thread Emmanuel Castanier
Hi all,

I’m totally newbie on Spark, so my question may be a dumb one.
I tried Spark to compute values, on this side all works perfectly (and it's 
fast :) ).

At the end of the process, I have an RDD with Key(String)/Values(Array
of String), on this I want to get only one entry like this :

myRdd.filter(t => t._1.equals(param))

If I make a collect to get the only « tuple » , It takes about 12 seconds to 
execute, I imagine that’s because Spark may be used differently...

Best regards,

Emmanuel
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org