RE: Pattern Matching / Equals on Case Classes in Spark Not Working

2015-01-13 Thread Rosner, Frank (Allianz SE)
Yep it is in the REPL. I will try your solution and also to submit the whole 
thing as a job jar. If this is true, this should be fixed, right? I will check 
whether there is a ticket already. Somebody pointed me to 
https://issues.apache.org/jira/browse/SPARK-2620 but I need to investigate.

Thanks!
Frank

From: Matei Zaharia [mailto:matei.zaha...@gmail.com]
Sent: Montag, 12. Januar 2015 19:54
To: Rosner, Frank (Allianz SE)
Cc: user@spark.apache.org
Subject: Re: Pattern Matching / Equals on Case Classes in Spark Not Working

Is this in the Spark shell? Case classes don't work correctly in the Spark 
shell unfortunately (though they do work in the Scala shell) because we change 
the way lines of code compile to allow shipping functions across the network. 
The best way to get case classes in there is to compile them into a JAR and 
then add that to your spark-shell's classpath with --jars.

Matei

On Jan 12, 2015, at 10:04 AM, Rosner, Frank (Allianz SE) 
frank.ros...@allianz.commailto:frank.ros...@allianz.com wrote:

Dear Spark Users,

I googled the web for several hours now but I don't find a solution for my 
problem. So maybe someone from this list can help.

I have an RDD of case classes, generated from CSV files with Spark. When I used 
the distinct operator, there were still duplicates. So I investigated and found 
out that the equals returns false although the two objects were equal (so were 
their individual fields as well as toStrings).

After googling it I found that the case class equals might break in case the 
two objects are created by different class loaders. So I implemented my own 
equals method using mattern matching (code example below). It still didn't 
work. Some debugging revealed that the problem lies in the pattern matching. 
Depending on the objects I compare (and maybe the split / classloader they are 
generated in?) the patternmatching works /doesn't:

case class Customer(id: String, age: Option[Int], entryDate: 
Option[java.util.Date]) {
  def equals(that: Any): Boolean = that match {
case Customer(id, age, entryDate) = {
  println(Pattern matching worked!)
  this.id == id  this.age == age  this.entryDate == entryDate
}
case _ = false
  }
}

//val x: Array[Customer]
// ... some spark code to filter original data and collect x

scala x(0)
Customer(a, Some(5), Some(Fri Sep 23 00:00:00 CEST 1994))
scala x(1)
Customer(a, None, None)
scala x(2)
Customer(a, None, None)
scala x(3)
Customer(a, None, None)

scala x(0) == x(0) // should be true and works
Pattern matching works!
res0: Boolean = true
scala x(0) == x(1) // should be false and works
Pattern matching works!
res1: Boolean = false
scala x(1) == x(2) // should be true, does not work
res2: Boolean = false
scala x(2) == x(3) // should be true, does not work
Pattern matching works!
res3: Boolean = true
scala x(0) == x(3) // should be false, does not work
res4: Boolean = false

Why is the pattern matching not working? It seems that there are two kinds of 
Customers: 0,1 and 2,3 which don't match somehow. Is this related to some 
classloaders? Is there a way around this other than using instanceof and 
defining a custom equals operation for every case class I write?

Thanks for the help!
Frank



Pattern Matching / Equals on Case Classes in Spark Not Working

2015-01-12 Thread Rosner, Frank (Allianz SE)
Dear Spark Users,

I googled the web for several hours now but I don't find a solution for my 
problem. So maybe someone from this list can help.

I have an RDD of case classes, generated from CSV files with Spark. When I used 
the distinct operator, there were still duplicates. So I investigated and found 
out that the equals returns false although the two objects were equal (so were 
their individual fields as well as toStrings).

After googling it I found that the case class equals might break in case the 
two objects are created by different class loaders. So I implemented my own 
equals method using mattern matching (code example below). It still didn't 
work. Some debugging revealed that the problem lies in the pattern matching. 
Depending on the objects I compare (and maybe the split / classloader they are 
generated in?) the patternmatching works /doesn't:

case class Customer(id: String, age: Option[Int], entryDate: 
Option[java.util.Date]) {
  def equals(that: Any): Boolean = that match {
case Customer(id, age, entryDate) = {
  println(Pattern matching worked!)
  this.id == id  this.age == age  this.entryDate == entryDate
}
case _ = false
  }
}

//val x: Array[Customer]
// ... some spark code to filter original data and collect x

scala x(0)
Customer(a, Some(5), Some(Fri Sep 23 00:00:00 CEST 1994))
scala x(1)
Customer(a, None, None)
scala x(2)
Customer(a, None, None)
scala x(3)
Customer(a, None, None)

scala x(0) == x(0) // should be true and works
Pattern matching works!
res0: Boolean = true
scala x(0) == x(1) // should be false and works
Pattern matching works!
res1: Boolean = false
scala x(1) == x(2) // should be true, does not work
res2: Boolean = false
scala x(2) == x(3) // should be true, does not work
Pattern matching works!
res3: Boolean = true
scala x(0) == x(3) // should be false, does not work
res4: Boolean = false

Why is the pattern matching not working? It seems that there are two kinds of 
Customers: 0,1 and 2,3 which don't match somehow. Is this related to some 
classloaders? Is there a way around this other than using instanceof and 
defining a custom equals operation for every case class I write?

Thanks for the help!
Frank


Re: Pattern Matching / Equals on Case Classes in Spark Not Working

2015-01-12 Thread Matei Zaharia
Is this in the Spark shell? Case classes don't work correctly in the Spark 
shell unfortunately (though they do work in the Scala shell) because we change 
the way lines of code compile to allow shipping functions across the network. 
The best way to get case classes in there is to compile them into a JAR and 
then add that to your spark-shell's classpath with --jars.

Matei

 On Jan 12, 2015, at 10:04 AM, Rosner, Frank (Allianz SE) 
 frank.ros...@allianz.com wrote:
 
 Dear Spark Users,
  
 I googled the web for several hours now but I don't find a solution for my 
 problem. So maybe someone from this list can help.
  
 I have an RDD of case classes, generated from CSV files with Spark. When I 
 used the distinct operator, there were still duplicates. So I investigated 
 and found out that the equals returns false although the two objects were 
 equal (so were their individual fields as well as toStrings).
  
 After googling it I found that the case class equals might break in case the 
 two objects are created by different class loaders. So I implemented my own 
 equals method using mattern matching (code example below). It still didn't 
 work. Some debugging revealed that the problem lies in the pattern matching. 
 Depending on the objects I compare (and maybe the split / classloader they 
 are generated in?) the patternmatching works /doesn't:
  
 case class Customer(id: String, age: Option[Int], entryDate: 
 Option[java.util.Date]) {
   def equals(that: Any): Boolean = that match {
 case Customer(id, age, entryDate) = {
   println(Pattern matching worked!)
   this.id == id  this.age == age  this.entryDate == entryDate
 }
 case _ = false
   }
 }
  
 //val x: Array[Customer]
 // ... some spark code to filter original data and collect x
  
 scala x(0)
 Customer(a, Some(5), Some(Fri Sep 23 00:00:00 CEST 1994))
 scala x(1)
 Customer(a, None, None)
 scala x(2)
 Customer(a, None, None)
 scala x(3)
 Customer(a, None, None)
  
 scala x(0) == x(0) // should be true and works
 Pattern matching works!
 res0: Boolean = true
 scala x(0) == x(1) // should be false and works
 Pattern matching works!
 res1: Boolean = false
 scala x(1) == x(2) // should be true, does not work
 res2: Boolean = false
 scala x(2) == x(3) // should be true, does not work
 Pattern matching works!
 res3: Boolean = true
 scala x(0) == x(3) // should be false, does not work
 res4: Boolean = false
  
 Why is the pattern matching not working? It seems that there are two kinds of 
 Customers: 0,1 and 2,3 which don't match somehow. Is this related to some 
 classloaders? Is there a way around this other than using instanceof and 
 defining a custom equals operation for every case class I write?
  
 Thanks for the help!
 Frank