Hello,

I want to know what are the cons and performance impacts of using a var
inside class object in a Rdd.


Here is a example:

Animal is a huge class with n number of val type variables (approx >600
variables), but frequently, we will have to update Age(just 1 variable)
after some computation. What is the best way to do it?

Class Animal(age: Int, name; String) = {
 var animalAge:Int  = age
 val animalName:String  = name
val ......
}


val animalRdd = sc.parallelize(List(Animal(1,"XYZ"), Animal(2,"ABC") ))
...
...
animalRdd.map(ani=>{
     if(ani.yearChange()) ani.animalAge+=1
     ani
})


Is it advisable to use var in this case? Or can I do ani.copy(animalAge=2)
which will reallocate the memory altogether for the animal. Please advice
which is the best way to handle such cases.



Regards
Hemalatha

Reply via email to