Re: enum-like types in Spark

Aaron Davidson Mon, 16 Mar 2015 15:09:36 -0700

It's unrelated to the proposal, but Enum#ordinal() should be much faster,
assuming it's not serialized to JVMs with different versions of the enum :)


On Mon, Mar 16, 2015 at 12:12 PM, Kevin Markey <[email protected]>
wrote:

> In some applications, I have rather heavy use of Java enums which are
> needed for related Java APIs that the application uses.  And unfortunately,
> they are also used as keys.  As such, using the native hashcodes makes any
> function over keys unstable and unpredictable, so we now use Enum.name() as
> the key instead.  Oh well.  But it works and seems to work well.
>
> Kevin
>
>
> On 03/05/2015 09:49 PM, Mridul Muralidharan wrote:
>
>>    I have a strong dislike for java enum's due to the fact that they
>> are not stable across JVM's - if it undergoes serde, you end up with
>> unpredictable results at times [1].
>> One of the reasons why we prevent enum's from being key : though it is
>> highly possible users might depend on it internally and shoot
>> themselves in the foot.
>>
>> Would be better to keep away from them in general and use something more
>> stable.
>>
>> Regards,
>> Mridul
>>
>> [1] Having had to debug this issue for 2 weeks - I really really hate it.
>>
>>
>> On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <[email protected]>
>> wrote:
>>
>>> I have a very strong dislike for #1 (scala enumerations).   I'm ok with
>>> #4
>>> (with Xiangrui's final suggestion, especially making it sealed &
>>> available
>>> in Java), but I really think #2, java enums, are the best option.
>>>
>>> Java enums actually have some very real advantages over the other
>>> approaches -- you get values(), valueOf(), EnumSet, and EnumMap.  There
>>> has
>>> been endless debate in the Scala community about the problems with the
>>> approaches in Scala.  Very smart, level-headed Scala gurus have
>>> complained
>>> about their short-comings (Rex Kerr's name is coming to mind, though I'm
>>> not positive about that); there have been numerous well-thought out
>>> proposals to give Scala a better enum.  But the powers-that-be in Scala
>>> always reject them.  IIRC the explanation for rejecting is basically that
>>> (a) enums aren't important enough for introducing some new special
>>> feature,
>>> scala's got bigger things to work on and (b) if you really need a good
>>> enum, just use java's enum.
>>>
>>> I doubt it really matters that much for Spark internals, which is why I
>>> think #4 is fine.  But I figured I'd give my spiel, because every
>>> developer
>>> loves language wars :)
>>>
>>> Imran
>>>
>>>
>>>
>>> On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <[email protected]> wrote:
>>>
>>>  `case object` inside an `object` doesn't show up in Java. This is the
>>>> minimal code I found to make everything show up correctly in both
>>>> Scala and Java:
>>>>
>>>> sealed abstract class StorageLevel // cannot be a trait
>>>>
>>>> object StorageLevel {
>>>>    private[this] case object _MemoryOnly extends StorageLevel
>>>>    final val MemoryOnly: StorageLevel = _MemoryOnly
>>>>
>>>>    private[this] case object _DiskOnly extends StorageLevel
>>>>    final val DiskOnly: StorageLevel = _DiskOnly
>>>> }
>>>>
>>>> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <[email protected]>
>>>> wrote:
>>>>
>>>>> I like #4 as well and agree with Aaron's suggestion.
>>>>>
>>>>> - Patrick
>>>>>
>>>>> On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <[email protected]>
>>>>>
>>>> wrote:
>>>>
>>>>> I'm cool with #4 as well, but make sure we dictate that the values
>>>>>>
>>>>> should
>>>>
>>>>> be defined within an object with the same name as the enumeration (like
>>>>>>
>>>>> we
>>>>
>>>>> do for StorageLevel). Otherwise we may pollute a higher namespace.
>>>>>>
>>>>>> e.g. we SHOULD do:
>>>>>>
>>>>>> trait StorageLevel
>>>>>> object StorageLevel {
>>>>>>    case object MemoryOnly extends StorageLevel
>>>>>>    case object DiskOnly extends StorageLevel
>>>>>> }
>>>>>>
>>>>>> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
>>>>>>
>>>>> [email protected]>
>>>>
>>>>> wrote:
>>>>>>
>>>>>>  #4 with a preference for CamelCaseEnums
>>>>>>>
>>>>>>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <
>>>>>>> [email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>  another vote for #4
>>>>>>>> People are already used to adding "()" in Java.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <[email protected]>
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> #4 but with MemoryOnly (more scala-like)
>>>>>>>>>
>>>>>>>>> http://docs.scala-lang.org/style/naming-conventions.html
>>>>>>>>>
>>>>>>>>> Constants, Values, Variable and Methods
>>>>>>>>>
>>>>>>>>> Constant names should be in upper camel case. That is, if the
>>>>>>>>>
>>>>>>>> member is
>>>>
>>>>> final, immutable and it belongs to a package object or an object,
>>>>>>>>>
>>>>>>>> it
>>>>
>>>>> may
>>>>>>>
>>>>>>>> be
>>>>>>>>
>>>>>>>>> considered a constant (similar to Java'sstatic final members):
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     1. object Container {
>>>>>>>>>     2.     val MyConstant = ...
>>>>>>>>>     3. }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <[email protected]>:
>>>>>>>>>
>>>>>>>>>  Hi all,
>>>>>>>>>>
>>>>>>>>>> There are many places where we use enum-like types in Spark, but
>>>>>>>>>>
>>>>>>>>> in
>>>>
>>>>> different ways. Every approach has both pros and cons. I wonder
>>>>>>>>>> whether there should be an "official" approach for enum-like
>>>>>>>>>>
>>>>>>>>> types in
>>>>
>>>>> Spark.
>>>>>>>>>>
>>>>>>>>>> 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc)
>>>>>>>>>>
>>>>>>>>>> * All types show up as Enumeration.Value in Java.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  http://spark.apache.org/docs/latest/api/java/org/apache/
>>>> spark/scheduler/SchedulingMode.html
>>>>
>>>>> 2. Java's Enum (e.g., SaveMode, IOMode)
>>>>>>>>>>
>>>>>>>>>> * Implementation must be in a Java file.
>>>>>>>>>> * Values doesn't show up in the ScalaDoc:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  http://spark.apache.org/docs/latest/api/scala/#org.apache.
>>>> spark.network.util.IOMode
>>>>
>>>>> 3. Static fields in Java (e.g., TripletFields)
>>>>>>>>>>
>>>>>>>>>> * Implementation must be in a Java file.
>>>>>>>>>> * Doesn't need "()" in Java code.
>>>>>>>>>> * Values don't show up in the ScalaDoc:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  http://spark.apache.org/docs/latest/api/scala/#org.apache.
>>>> spark.graphx.TripletFields
>>>>
>>>>> 4. Objects in Scala. (e.g., StorageLevel)
>>>>>>>>>>
>>>>>>>>>> * Needs "()" in Java code.
>>>>>>>>>> * Values show up in both ScalaDoc and JavaDoc:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  http://spark.apache.org/docs/latest/api/scala/#org.apache.
>>>> spark.storage.StorageLevel$
>>>>
>>>>>
>>>>>>>>>>  http://spark.apache.org/docs/latest/api/java/org/apache/
>>>> spark/storage/StorageLevel.html
>>>>
>>>>> It would be great if we have an "official" approach for this as
>>>>>>>>>>
>>>>>>>>> well
>>>>
>>>>> as the naming convention for enum-like values ("MEMORY_ONLY" or
>>>>>>>>>> "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
>>>>>>>>>>
>>>>>>>>> thoughts?
>>>>>>>
>>>>>>>> Best,
>>>>>>>>>> Xiangrui
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  ------------------------------------------------------------
>>>> ---------
>>>>
>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  ------------------------------------------------------------
>>>> ---------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>>  ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: enum-like types in Spark

Reply via email to