[ 
https://issues.apache.org/jira/browse/SPARK-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603302#comment-14603302
 ] 

Alok Singh edited comment on SPARK-8647 at 6/26/15 5:51 PM:
------------------------------------------------------------

Hi Xiangrui,

1.Same instances
=============
   In that case, why not use the scala object to have singleton. 
Is it since MatrixUDT is used in the pyspark which might work better with class 
type than object type. Also in java we will have extra $ in the end for the 
object?

But if the goal is to have the same instance, isn't it would be nice to have 
hashCode to be 

override def hashCode():Int  = 
"org.apache.spark.mllib.linalg.MatrixUDT".hashCode()

what are your thoughts?

2.Performance
==========
I think in MatrixUDT case this will not be the pb, as there won't be many 
classes similar to  MatrixUDT with constant hashCode which is also 1994.
I was refering to 
http://java-performance.info/hashcode-method-performance-tuning/
However,  if we use the solution of  "Same Instance" section above, we may not 
have this issue.


Summary
=======
for practical purpose it won't be the performance issue, but I think,  it would 
be nicer from aesthetic perspective to use the "same instance" section, if we 
can't use the scala object.


Please suggest, should i change just the code docs "explaining the reason " or 
as per the 1. above.

thanks
Alok


was (Author: aloknsingh):
Hi Xiangrui,

1.Same instances
=============
   In that case, why not use the scala object to have singleton. 
Is it since MatrixUDT is used in the pyspark which might work better with class 
type than object type. Also in java we will have extra $ in the end for the 
object?

But if the goal is to have the same instance, isn't it would be nice to have 
hashCode to be 

override def hashCode():Int  = 
"org.apache.spark.mllib.linalg.MatrixUDT".hashCode()

what are your thoughts?

2.Performance
==========
I think in MatrixUDT case this will not be the pb, as there won't be many 
classes of MatrixUDT with constant hashCode .
I was refering to 
http://java-performance.info/hashcode-method-performance-tuning/
However,  if we use the solution of  "Same Instance" section above, we may not 
have this issue.


Summary
=======
for practical purpose it won't be the performance issue, but I think,  it would 
be nicer from aesthetic perspective to use the "same instance" section, if we 
can't use the scala object.


Please suggest, should i change just the code docs "explaining the reason " or 
as per the 1. above.

thanks
Alok

> Potential issues with the constant hashCode 
> --------------------------------------------
>
>                 Key: SPARK-8647
>                 URL: https://issues.apache.org/jira/browse/SPARK-8647
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.4.0
>            Reporter: Alok Singh
>            Priority: Minor
>              Labels: performance
>
> Hi,
> This may be potential bug or performance issue or just the code docs.
> The issue is wrt to MatrixUDT class.
>  If we decide to put instance of MatrixUDT into the hash based collection.
> The hashCode function is returning constant and even though equals method is 
> consistant with hashCode. I don't see the reason why hashCode() = 1994 (i.e 
> constant) has been used.
> I was expecting it to be similar to the other matrix class or the vector 
> class .
> If there is the reason why we have this code, we should document it properly 
> in the code so that others reading it is fine.
> regards,
> Alok
> Details
> =====
> a)
> In reference to the file 
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
> line 188-197 ie
>  override def equals(o: Any): Boolean = {
> o match {
> case v: MatrixUDT => true
> case _ => false
> }
> }
> override def hashCode(): Int = 1994
> b) the commit is 
> https://github.com/apache/spark/commit/11e025956be3818c00effef0d650734f8feeb436
> on March 20.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to