Re: Duplicate entries in output of mllib column similarities

Reza Zadeh Thu, 07 May 2015 18:59:13 -0700

This shouldn't be happening, do you have an example to reproduce it?

On Thu, May 7, 2015 at 4:17 PM, rbolkey <rbol...@gmail.com> wrote:


> Hi,
>
> I have a question regarding one of the oddities we encountered while
> running
> mllib's column similarities operation. When we examine the output, we find
> duplicate matrix entries (the same i,j). Sometimes the entries have the
> same
> value/similarity score, but they're frequently different too.
>
> Is this a known issue? An artifact of the probabilistic nature of the
> output? Which output score should we trust (lower vs higher one when
> different)? We're using a threshold of 0.3, and running Spark 1.3.1 on a 10
> node cluster.
>
> Thanks
> Rick
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Duplicate-entries-in-output-of-mllib-column-similarities-tp22807.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Duplicate entries in output of mllib column similarities

Reply via email to