Re: Cartesian issue with user defined objects

Imran Rashid Thu, 26 Feb 2015 13:27:36 -0800

any chance your input RDD is being read from hdfs, and you are running into
this issue (in the docs on SparkContext#hadoopFile):

* '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable
object for each
* record, directly caching the returned RDD or directly passing it to an
aggregation or shuffle
* operation will create many references to the same object.
* If you plan to directly cache, sort, or aggregate Hadoop writable
objects, you should first
* copy them using a `map` function.

On Thu, Feb 26, 2015 at 10:38 AM, mrk91 <marcogaid...@gmail.com> wrote:

> Hello,
>
> I have an issue with the cartesian method. When I use it with the Java
> types everything is ok, but when I use it with RDD made of objects defined
> by me it has very strage behaviors which depends on whether the RDD is
> cached or not (you can see here
> <http://stackoverflow.com/questions/28727823/creating-a-matrix-of-neighbors-with-spark-cartesian-issue>
> what happens).
>
> Is this due to a bug in its implementation or are there any requirements
> for the objects to be passed to it?
> Thanks.
> Best regards.
> Marco
> ------------------------------
> View this message in context: Cartesian issue with user defined objects
> <http://apache-spark-user-list.1001560.n3.nabble.com/Cartesian-issue-with-user-defined-objects-tp21826.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>

Re: Cartesian issue with user defined objects

Reply via email to