aha ok, thanks.

If I create different RDDs from a parent RDD and force evaluation
thread-by-thread, then it should presumably be fine, correct? Or do I need
to checkpoint the child RDDs as a precaution in case it needs to be removed
from memory and recomputed?

On Sat, Feb 28, 2015 at 4:28 AM, Shixiong Zhu <zsxw...@gmail.com> wrote:

> RDD is not thread-safe. You should not use it in multiple threads.
>
> Best Regards,
> Shixiong Zhu
>
> 2015-02-27 23:14 GMT+08:00 rok <rokros...@gmail.com>:
>
>> I'm seeing this java.util.NoSuchElementException: key not found: exception
>> pop up sometimes when I run operations on an RDD from multiple threads in
>> a
>> python application. It ends up shutting down the SparkContext so I'm
>> assuming this is a bug -- from what I understand, I should be able to run
>> operations on the same RDD from multiple threads or is this not
>> recommended?
>>
>> I can't reproduce it all the time and I've tried eliminating caching
>> wherever possible to see if that would have an effect, but it doesn't seem
>> to. Each thread first splits the base RDD and then runs the
>> LogisticRegressionWithSGD on the subset.
>>
>> Is there a workaround to this exception?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/java-util-NoSuchElementException-key-not-found-tp21848.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to