Re: enum-like types in Spark
Hi Stephen, I'm not sure which link you are referring to for the example code -- but yes, the recommendation is that you create the enum in Java, eg. see https://github.com/apache/spark/blob/v1.4.0/core/src/main/java/org/apache/spark/status/api/v1/StageStatus.java Then nothing special is required to use it in scala. This method both uses the overall type of the enum in the return value, and uses specific values in the body: https://github.com/apache/spark/blob/v1.4.0/core/src/main/scala/org/apache/spark/status/api/v1/AllStagesResource.scala#L114 (I did delete the branches for the code that is *not* recommended anymore) Imran On Wed, Jul 1, 2015 at 5:53 PM, Stephen Boesch java...@gmail.com wrote: I am reviving an old thread here. The link for the example code for the java enum based solution is now dead: would someone please post an updated link showing the proper interop? Specifically: it is my understanding that java enum's may not be created within Scala. So is the proposed solution requiring dropping out into Java to create the enum's? 2015-04-09 17:16 GMT-07:00 Xiangrui Meng men...@gmail.com: Using Java enums sound good. We can list the values in the JavaDoc and hope Scala will be able to correctly generate docs for Java enums in the future. -Xiangrui On Thu, Apr 9, 2015 at 10:59 AM, Imran Rashid iras...@cloudera.com wrote: any update here? This is relevant for a currently open PR of mine -- I've got a bunch of new public constants defined w/ format #4, but I'd gladly switch to java enums. (Even if we are just going to postpone this decision, I'm still inclined to switch to java enums ...) just to be clear about the existing problem with enums scaladoc: right now, the scaladoc knows about the enum class, and generates a page for it, but it does not display the enum constants. It is at least labeled as a java enum, though, so a savvy user could switch to the javadocs to see the constants. On Mon, Mar 23, 2015 at 4:50 PM, Imran Rashid iras...@cloudera.com wrote: well, perhaps I overstated things a little, I wouldn't call it the official solution, just a recommendation in the never-ending debate (and the recommendation from folks with their hands on scala itself). Even if we do get this fixed in scaladoc eventually -- as its not in the current versions, where does that leave this proposal? personally I'd *still* prefer java enums, even if it doesn't get into scaladoc. btw, even with sealed traits, the scaladoc still isn't great -- you don't see the values from the class, you only see them listed from the companion object. (though, that is somewhat standard for scaladoc, so maybe I'm reaching a little) On Mon, Mar 23, 2015 at 4:11 PM, Patrick Wendell pwend...@gmail.com wrote: If the official solution from the Scala community is to use Java enums, then it seems strange they aren't generated in scaldoc? Maybe we can just fix that w/ Typesafe's help and then we can use them. On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen so...@cloudera.com wrote: Yeah the fully realized #4, which gets back the ability to use it in switch statements (? in Scala but not Java?) does end up being kind of huge. I confess I'm swayed a bit back to Java enums, seeing what it involves. The hashCode() issue can be 'solved' with the hash of the String representation. On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid iras...@cloudera.com wrote: I've just switched some of my code over to the new format, and I just want to make sure everyone realizes what we are getting into. I went from 10 lines as java enums https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20 to 30 lines with the new format: https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250 its not just that its verbose. each name has to be repeated 4 times, with potential typos in some locations that won't be caught by the compiler. Also, you have to manually maintain the values as you update the set of enums, the compiler won't do it for you. The only downside I've heard for java enums is enum.hashcode(). OTOH, the downsides for this version are: maintainability / verbosity, no values(), more cumbersome to use from java, no enum map / enumset. I did put together a little util to at least get back the equivalent of enum.valueOf() with this format https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala I'm not trying to prevent us from moving forward on this, its fine if this is still what everyone wants, but I feel pretty strongly java enums make more sense. thanks, Imran
Re: enum-like types in Spark
I am reviving an old thread here. The link for the example code for the java enum based solution is now dead: would someone please post an updated link showing the proper interop? Specifically: it is my understanding that java enum's may not be created within Scala. So is the proposed solution requiring dropping out into Java to create the enum's? 2015-04-09 17:16 GMT-07:00 Xiangrui Meng men...@gmail.com: Using Java enums sound good. We can list the values in the JavaDoc and hope Scala will be able to correctly generate docs for Java enums in the future. -Xiangrui On Thu, Apr 9, 2015 at 10:59 AM, Imran Rashid iras...@cloudera.com wrote: any update here? This is relevant for a currently open PR of mine -- I've got a bunch of new public constants defined w/ format #4, but I'd gladly switch to java enums. (Even if we are just going to postpone this decision, I'm still inclined to switch to java enums ...) just to be clear about the existing problem with enums scaladoc: right now, the scaladoc knows about the enum class, and generates a page for it, but it does not display the enum constants. It is at least labeled as a java enum, though, so a savvy user could switch to the javadocs to see the constants. On Mon, Mar 23, 2015 at 4:50 PM, Imran Rashid iras...@cloudera.com wrote: well, perhaps I overstated things a little, I wouldn't call it the official solution, just a recommendation in the never-ending debate (and the recommendation from folks with their hands on scala itself). Even if we do get this fixed in scaladoc eventually -- as its not in the current versions, where does that leave this proposal? personally I'd *still* prefer java enums, even if it doesn't get into scaladoc. btw, even with sealed traits, the scaladoc still isn't great -- you don't see the values from the class, you only see them listed from the companion object. (though, that is somewhat standard for scaladoc, so maybe I'm reaching a little) On Mon, Mar 23, 2015 at 4:11 PM, Patrick Wendell pwend...@gmail.com wrote: If the official solution from the Scala community is to use Java enums, then it seems strange they aren't generated in scaldoc? Maybe we can just fix that w/ Typesafe's help and then we can use them. On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen so...@cloudera.com wrote: Yeah the fully realized #4, which gets back the ability to use it in switch statements (? in Scala but not Java?) does end up being kind of huge. I confess I'm swayed a bit back to Java enums, seeing what it involves. The hashCode() issue can be 'solved' with the hash of the String representation. On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid iras...@cloudera.com wrote: I've just switched some of my code over to the new format, and I just want to make sure everyone realizes what we are getting into. I went from 10 lines as java enums https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20 to 30 lines with the new format: https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250 its not just that its verbose. each name has to be repeated 4 times, with potential typos in some locations that won't be caught by the compiler. Also, you have to manually maintain the values as you update the set of enums, the compiler won't do it for you. The only downside I've heard for java enums is enum.hashcode(). OTOH, the downsides for this version are: maintainability / verbosity, no values(), more cumbersome to use from java, no enum map / enumset. I did put together a little util to at least get back the equivalent of enum.valueOf() with this format https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala I'm not trying to prevent us from moving forward on this, its fine if this is still what everyone wants, but I feel pretty strongly java enums make more sense. thanks, Imran - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: enum-like types in Spark
any update here? This is relevant for a currently open PR of mine -- I've got a bunch of new public constants defined w/ format #4, but I'd gladly switch to java enums. (Even if we are just going to postpone this decision, I'm still inclined to switch to java enums ...) just to be clear about the existing problem with enums scaladoc: right now, the scaladoc knows about the enum class, and generates a page for it, but it does not display the enum constants. It is at least labeled as a java enum, though, so a savvy user could switch to the javadocs to see the constants. On Mon, Mar 23, 2015 at 4:50 PM, Imran Rashid iras...@cloudera.com wrote: well, perhaps I overstated things a little, I wouldn't call it the official solution, just a recommendation in the never-ending debate (and the recommendation from folks with their hands on scala itself). Even if we do get this fixed in scaladoc eventually -- as its not in the current versions, where does that leave this proposal? personally I'd *still* prefer java enums, even if it doesn't get into scaladoc. btw, even with sealed traits, the scaladoc still isn't great -- you don't see the values from the class, you only see them listed from the companion object. (though, that is somewhat standard for scaladoc, so maybe I'm reaching a little) On Mon, Mar 23, 2015 at 4:11 PM, Patrick Wendell pwend...@gmail.com wrote: If the official solution from the Scala community is to use Java enums, then it seems strange they aren't generated in scaldoc? Maybe we can just fix that w/ Typesafe's help and then we can use them. On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen so...@cloudera.com wrote: Yeah the fully realized #4, which gets back the ability to use it in switch statements (? in Scala but not Java?) does end up being kind of huge. I confess I'm swayed a bit back to Java enums, seeing what it involves. The hashCode() issue can be 'solved' with the hash of the String representation. On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid iras...@cloudera.com wrote: I've just switched some of my code over to the new format, and I just want to make sure everyone realizes what we are getting into. I went from 10 lines as java enums https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20 to 30 lines with the new format: https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250 its not just that its verbose. each name has to be repeated 4 times, with potential typos in some locations that won't be caught by the compiler. Also, you have to manually maintain the values as you update the set of enums, the compiler won't do it for you. The only downside I've heard for java enums is enum.hashcode(). OTOH, the downsides for this version are: maintainability / verbosity, no values(), more cumbersome to use from java, no enum map / enumset. I did put together a little util to at least get back the equivalent of enum.valueOf() with this format https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala I'm not trying to prevent us from moving forward on this, its fine if this is still what everyone wants, but I feel pretty strongly java enums make more sense. thanks, Imran - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: enum-like types in Spark
Yeah the fully realized #4, which gets back the ability to use it in switch statements (? in Scala but not Java?) does end up being kind of huge. I confess I'm swayed a bit back to Java enums, seeing what it involves. The hashCode() issue can be 'solved' with the hash of the String representation. On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid iras...@cloudera.com wrote: I've just switched some of my code over to the new format, and I just want to make sure everyone realizes what we are getting into. I went from 10 lines as java enums https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20 to 30 lines with the new format: https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250 its not just that its verbose. each name has to be repeated 4 times, with potential typos in some locations that won't be caught by the compiler. Also, you have to manually maintain the values as you update the set of enums, the compiler won't do it for you. The only downside I've heard for java enums is enum.hashcode(). OTOH, the downsides for this version are: maintainability / verbosity, no values(), more cumbersome to use from java, no enum map / enumset. I did put together a little util to at least get back the equivalent of enum.valueOf() with this format https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala I'm not trying to prevent us from moving forward on this, its fine if this is still what everyone wants, but I feel pretty strongly java enums make more sense. thanks, Imran - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: enum-like types in Spark
If scaladoc can show the Java enum types, I do think the best way is then just Java enum types. On Mon, Mar 23, 2015 at 2:11 PM, Patrick Wendell pwend...@gmail.com wrote: If the official solution from the Scala community is to use Java enums, then it seems strange they aren't generated in scaldoc? Maybe we can just fix that w/ Typesafe's help and then we can use them. On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen so...@cloudera.com wrote: Yeah the fully realized #4, which gets back the ability to use it in switch statements (? in Scala but not Java?) does end up being kind of huge. I confess I'm swayed a bit back to Java enums, seeing what it involves. The hashCode() issue can be 'solved' with the hash of the String representation. On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid iras...@cloudera.com wrote: I've just switched some of my code over to the new format, and I just want to make sure everyone realizes what we are getting into. I went from 10 lines as java enums https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20 to 30 lines with the new format: https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250 its not just that its verbose. each name has to be repeated 4 times, with potential typos in some locations that won't be caught by the compiler. Also, you have to manually maintain the values as you update the set of enums, the compiler won't do it for you. The only downside I've heard for java enums is enum.hashcode(). OTOH, the downsides for this version are: maintainability / verbosity, no values(), more cumbersome to use from java, no enum map / enumset. I did put together a little util to at least get back the equivalent of enum.valueOf() with this format https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala I'm not trying to prevent us from moving forward on this, its fine if this is still what everyone wants, but I feel pretty strongly java enums make more sense. thanks, Imran - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: enum-like types in Spark
well, perhaps I overstated things a little, I wouldn't call it the official solution, just a recommendation in the never-ending debate (and the recommendation from folks with their hands on scala itself). Even if we do get this fixed in scaladoc eventually -- as its not in the current versions, where does that leave this proposal? personally I'd *still* prefer java enums, even if it doesn't get into scaladoc. btw, even with sealed traits, the scaladoc still isn't great -- you don't see the values from the class, you only see them listed from the companion object. (though, that is somewhat standard for scaladoc, so maybe I'm reaching a little) On Mon, Mar 23, 2015 at 4:11 PM, Patrick Wendell pwend...@gmail.com wrote: If the official solution from the Scala community is to use Java enums, then it seems strange they aren't generated in scaldoc? Maybe we can just fix that w/ Typesafe's help and then we can use them. On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen so...@cloudera.com wrote: Yeah the fully realized #4, which gets back the ability to use it in switch statements (? in Scala but not Java?) does end up being kind of huge. I confess I'm swayed a bit back to Java enums, seeing what it involves. The hashCode() issue can be 'solved' with the hash of the String representation. On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid iras...@cloudera.com wrote: I've just switched some of my code over to the new format, and I just want to make sure everyone realizes what we are getting into. I went from 10 lines as java enums https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20 to 30 lines with the new format: https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250 its not just that its verbose. each name has to be repeated 4 times, with potential typos in some locations that won't be caught by the compiler. Also, you have to manually maintain the values as you update the set of enums, the compiler won't do it for you. The only downside I've heard for java enums is enum.hashcode(). OTOH, the downsides for this version are: maintainability / verbosity, no values(), more cumbersome to use from java, no enum map / enumset. I did put together a little util to at least get back the equivalent of enum.valueOf() with this format https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala I'm not trying to prevent us from moving forward on this, its fine if this is still what everyone wants, but I feel pretty strongly java enums make more sense. thanks, Imran - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: enum-like types in Spark
The only issue I knew of with Java enums was that it does not appear in the Scala documentation. On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen so...@cloudera.com wrote: Yeah the fully realized #4, which gets back the ability to use it in switch statements (? in Scala but not Java?) does end up being kind of huge. I confess I'm swayed a bit back to Java enums, seeing what it involves. The hashCode() issue can be 'solved' with the hash of the String representation. On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid iras...@cloudera.com wrote: I've just switched some of my code over to the new format, and I just want to make sure everyone realizes what we are getting into. I went from 10 lines as java enums https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20 to 30 lines with the new format: https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250 its not just that its verbose. each name has to be repeated 4 times, with potential typos in some locations that won't be caught by the compiler. Also, you have to manually maintain the values as you update the set of enums, the compiler won't do it for you. The only downside I've heard for java enums is enum.hashcode(). OTOH, the downsides for this version are: maintainability / verbosity, no values(), more cumbersome to use from java, no enum map / enumset. I did put together a little util to at least get back the equivalent of enum.valueOf() with this format https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala I'm not trying to prevent us from moving forward on this, its fine if this is still what everyone wants, but I feel pretty strongly java enums make more sense. thanks, Imran
Re: enum-like types in Spark
If the official solution from the Scala community is to use Java enums, then it seems strange they aren't generated in scaldoc? Maybe we can just fix that w/ Typesafe's help and then we can use them. On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen so...@cloudera.com wrote: Yeah the fully realized #4, which gets back the ability to use it in switch statements (? in Scala but not Java?) does end up being kind of huge. I confess I'm swayed a bit back to Java enums, seeing what it involves. The hashCode() issue can be 'solved' with the hash of the String representation. On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid iras...@cloudera.com wrote: I've just switched some of my code over to the new format, and I just want to make sure everyone realizes what we are getting into. I went from 10 lines as java enums https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20 to 30 lines with the new format: https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250 its not just that its verbose. each name has to be repeated 4 times, with potential typos in some locations that won't be caught by the compiler. Also, you have to manually maintain the values as you update the set of enums, the compiler won't do it for you. The only downside I've heard for java enums is enum.hashcode(). OTOH, the downsides for this version are: maintainability / verbosity, no values(), more cumbersome to use from java, no enum map / enumset. I did put together a little util to at least get back the equivalent of enum.valueOf() with this format https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala I'm not trying to prevent us from moving forward on this, its fine if this is still what everyone wants, but I feel pretty strongly java enums make more sense. thanks, Imran - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: enum-like types in Spark
): 1. object Container { 2. val MyConstant = ... 3. } 2015-03-04 17:11 GMT-08:00 Xiangrui Meng men...@gmail.com: Hi all, There are many places where we use enum-like types in Spark, but in different ways. Every approach has both pros and cons. I wonder whether there should be an official approach for enum-like types in Spark. 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc) * All types show up as Enumeration.Value in Java. http://spark.apache.org/docs/latest/api/java/org/apache/ spark/scheduler/SchedulingMode.html 2. Java's Enum (e.g., SaveMode, IOMode) * Implementation must be in a Java file. * Values doesn't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache. spark.network.util.IOMode 3. Static fields in Java (e.g., TripletFields) * Implementation must be in a Java file. * Doesn't need () in Java code. * Values don't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache. spark.graphx.TripletFields 4. Objects in Scala. (e.g., StorageLevel) * Needs () in Java code. * Values show up in both ScalaDoc and JavaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache. spark.storage.StorageLevel$ http://spark.apache.org/docs/latest/api/java/org/apache/ spark/storage/StorageLevel.html It would be great if we have an official approach for this as well as the naming convention for enum-like values (MEMORY_ONLY or MemoryOnly). Personally, I like 4) with MEMORY_ONLY. Any thoughts? Best, Xiangrui - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: enum-like types in Spark
Hey Xiangrui, Do you want to write up a straw man proposal based on this line of discussion? - Patrick On Mon, Mar 16, 2015 at 12:12 PM, Kevin Markey kevin.mar...@oracle.com wrote: In some applications, I have rather heavy use of Java enums which are needed for related Java APIs that the application uses. And unfortunately, they are also used as keys. As such, using the native hashcodes makes any function over keys unstable and unpredictable, so we now use Enum.name() as the key instead. Oh well. But it works and seems to work well. Kevin On 03/05/2015 09:49 PM, Mridul Muralidharan wrote: I have a strong dislike for java enum's due to the fact that they are not stable across JVM's - if it undergoes serde, you end up with unpredictable results at times [1]. One of the reasons why we prevent enum's from being key : though it is highly possible users might depend on it internally and shoot themselves in the foot. Would be better to keep away from them in general and use something more stable. Regards, Mridul [1] Having had to debug this issue for 2 weeks - I really really hate it. On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid iras...@cloudera.com wrote: I have a very strong dislike for #1 (scala enumerations). I'm ok with #4 (with Xiangrui's final suggestion, especially making it sealed available in Java), but I really think #2, java enums, are the best option. Java enums actually have some very real advantages over the other approaches -- you get values(), valueOf(), EnumSet, and EnumMap. There has been endless debate in the Scala community about the problems with the approaches in Scala. Very smart, level-headed Scala gurus have complained about their short-comings (Rex Kerr's name is coming to mind, though I'm not positive about that); there have been numerous well-thought out proposals to give Scala a better enum. But the powers-that-be in Scala always reject them. IIRC the explanation for rejecting is basically that (a) enums aren't important enough for introducing some new special feature, scala's got bigger things to work on and (b) if you really need a good enum, just use java's enum. I doubt it really matters that much for Spark internals, which is why I think #4 is fine. But I figured I'd give my spiel, because every developer loves language wars :) Imran On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng men...@gmail.com wrote: `case object` inside an `object` doesn't show up in Java. This is the minimal code I found to make everything show up correctly in both Scala and Java: sealed abstract class StorageLevel // cannot be a trait object StorageLevel { private[this] case object _MemoryOnly extends StorageLevel final val MemoryOnly: StorageLevel = _MemoryOnly private[this] case object _DiskOnly extends StorageLevel final val DiskOnly: StorageLevel = _DiskOnly } On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell pwend...@gmail.com wrote: I like #4 as well and agree with Aaron's suggestion. - Patrick On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson ilike...@gmail.com wrote: I'm cool with #4 as well, but make sure we dictate that the values should be defined within an object with the same name as the enumeration (like we do for StorageLevel). Otherwise we may pollute a higher namespace. e.g. we SHOULD do: trait StorageLevel object StorageLevel { case object MemoryOnly extends StorageLevel case object DiskOnly extends StorageLevel } On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust mich...@databricks.com wrote: #4 with a preference for CamelCaseEnums On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley jos...@databricks.com wrote: another vote for #4 People are already used to adding () in Java. On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch java...@gmail.com wrote: #4 but with MemoryOnly (more scala-like) http://docs.scala-lang.org/style/naming-conventions.html Constants, Values, Variable and Methods Constant names should be in upper camel case. That is, if the member is final, immutable and it belongs to a package object or an object, it may be considered a constant (similar to Java'sstatic final members): 1. object Container { 2. val MyConstant = ... 3. } 2015-03-04 17:11 GMT-08:00 Xiangrui Meng men...@gmail.com: Hi all, There are many places where we use enum-like types in Spark, but in different ways. Every approach has both pros and cons. I wonder whether there should be an official approach for enum-like types in Spark. 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc) * All types show up as Enumeration.Value in Java. http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html 2. Java's Enum (e.g., SaveMode, IOMode) * Implementation must be in a Java file. * Values doesn't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api
Re: enum-like types in Spark
It's unrelated to the proposal, but Enum#ordinal() should be much faster, assuming it's not serialized to JVMs with different versions of the enum :) On Mon, Mar 16, 2015 at 12:12 PM, Kevin Markey kevin.mar...@oracle.com wrote: In some applications, I have rather heavy use of Java enums which are needed for related Java APIs that the application uses. And unfortunately, they are also used as keys. As such, using the native hashcodes makes any function over keys unstable and unpredictable, so we now use Enum.name() as the key instead. Oh well. But it works and seems to work well. Kevin On 03/05/2015 09:49 PM, Mridul Muralidharan wrote: I have a strong dislike for java enum's due to the fact that they are not stable across JVM's - if it undergoes serde, you end up with unpredictable results at times [1]. One of the reasons why we prevent enum's from being key : though it is highly possible users might depend on it internally and shoot themselves in the foot. Would be better to keep away from them in general and use something more stable. Regards, Mridul [1] Having had to debug this issue for 2 weeks - I really really hate it. On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid iras...@cloudera.com wrote: I have a very strong dislike for #1 (scala enumerations). I'm ok with #4 (with Xiangrui's final suggestion, especially making it sealed available in Java), but I really think #2, java enums, are the best option. Java enums actually have some very real advantages over the other approaches -- you get values(), valueOf(), EnumSet, and EnumMap. There has been endless debate in the Scala community about the problems with the approaches in Scala. Very smart, level-headed Scala gurus have complained about their short-comings (Rex Kerr's name is coming to mind, though I'm not positive about that); there have been numerous well-thought out proposals to give Scala a better enum. But the powers-that-be in Scala always reject them. IIRC the explanation for rejecting is basically that (a) enums aren't important enough for introducing some new special feature, scala's got bigger things to work on and (b) if you really need a good enum, just use java's enum. I doubt it really matters that much for Spark internals, which is why I think #4 is fine. But I figured I'd give my spiel, because every developer loves language wars :) Imran On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng men...@gmail.com wrote: `case object` inside an `object` doesn't show up in Java. This is the minimal code I found to make everything show up correctly in both Scala and Java: sealed abstract class StorageLevel // cannot be a trait object StorageLevel { private[this] case object _MemoryOnly extends StorageLevel final val MemoryOnly: StorageLevel = _MemoryOnly private[this] case object _DiskOnly extends StorageLevel final val DiskOnly: StorageLevel = _DiskOnly } On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell pwend...@gmail.com wrote: I like #4 as well and agree with Aaron's suggestion. - Patrick On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson ilike...@gmail.com wrote: I'm cool with #4 as well, but make sure we dictate that the values should be defined within an object with the same name as the enumeration (like we do for StorageLevel). Otherwise we may pollute a higher namespace. e.g. we SHOULD do: trait StorageLevel object StorageLevel { case object MemoryOnly extends StorageLevel case object DiskOnly extends StorageLevel } On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust mich...@databricks.com wrote: #4 with a preference for CamelCaseEnums On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley jos...@databricks.com wrote: another vote for #4 People are already used to adding () in Java. On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch java...@gmail.com wrote: #4 but with MemoryOnly (more scala-like) http://docs.scala-lang.org/style/naming-conventions.html Constants, Values, Variable and Methods Constant names should be in upper camel case. That is, if the member is final, immutable and it belongs to a package object or an object, it may be considered a constant (similar to Java'sstatic final members): 1. object Container { 2. val MyConstant = ... 3. } 2015-03-04 17:11 GMT-08:00 Xiangrui Meng men...@gmail.com: Hi all, There are many places where we use enum-like types in Spark, but in different ways. Every approach has both pros and cons. I wonder whether there should be an official approach for enum-like types in Spark. 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc) * All types show up as Enumeration.Value in Java. http://spark.apache.org/docs/latest/api/java/org/apache/ spark/scheduler/SchedulingMode.html 2. Java's Enum (e.g., SaveMode, IOMode) * Implementation must be in a Java file. * Values doesn't show up
Re: enum-like types in Spark
} On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell pwend...@gmail.com wrote: I like #4 as well and agree with Aaron's suggestion. - Patrick On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson ilike...@gmail.com wrote: I'm cool with #4 as well, but make sure we dictate that the values should be defined within an object with the same name as the enumeration (like we do for StorageLevel). Otherwise we may pollute a higher namespace. e.g. we SHOULD do: trait StorageLevel object StorageLevel { case object MemoryOnly extends StorageLevel case object DiskOnly extends StorageLevel } On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust mich...@databricks.com wrote: #4 with a preference for CamelCaseEnums On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley jos...@databricks.com wrote: another vote for #4 People are already used to adding () in Java. On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch java...@gmail.com wrote: #4 but with MemoryOnly (more scala-like) http://docs.scala-lang.org/style/naming-conventions.html Constants, Values, Variable and Methods Constant names should be in upper camel case. That is, if the member is final, immutable and it belongs to a package object or an object, it may be considered a constant (similar to Java'sstatic final members): 1. object Container { 2. val MyConstant = ... 3. } 2015-03-04 17:11 GMT-08:00 Xiangrui Meng men...@gmail.com : Hi all, There are many places where we use enum-like types in Spark, but in different ways. Every approach has both pros and cons. I wonder whether there should be an official approach for enum-like types in Spark. 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc) * All types show up as Enumeration.Value in Java. http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html 2. Java's Enum (e.g., SaveMode, IOMode) * Implementation must be in a Java file. * Values doesn't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode 3. Static fields in Java (e.g., TripletFields) * Implementation must be in a Java file. * Doesn't need () in Java code. * Values don't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields 4. Objects in Scala. (e.g., StorageLevel) * Needs () in Java code. * Values show up in both ScalaDoc and JavaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$ http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html It would be great if we have an official approach for this as well as the naming convention for enum-like values (MEMORY_ONLY or MemoryOnly). Personally, I like 4) with MEMORY_ONLY. Any thoughts? Best, Xiangrui - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: enum-like types in Spark
with #4 as well, but make sure we dictate that the values should be defined within an object with the same name as the enumeration (like we do for StorageLevel). Otherwise we may pollute a higher namespace. e.g. we SHOULD do: trait StorageLevel object StorageLevel { case object MemoryOnly extends StorageLevel case object DiskOnly extends StorageLevel } On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust mich...@databricks.com wrote: #4 with a preference for CamelCaseEnums On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley jos...@databricks.com wrote: another vote for #4 People are already used to adding () in Java. On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch java...@gmail.com wrote: #4 but with MemoryOnly (more scala-like) http://docs.scala-lang.org/style/naming-conventions.html Constants, Values, Variable and Methods Constant names should be in upper camel case. That is, if the member is final, immutable and it belongs to a package object or an object, it may be considered a constant (similar to Java'sstatic final members): 1. object Container { 2. val MyConstant = ... 3. } 2015-03-04 17:11 GMT-08:00 Xiangrui Meng men...@gmail.com : Hi all, There are many places where we use enum-like types in Spark, but in different ways. Every approach has both pros and cons. I wonder whether there should be an official approach for enum-like types in Spark. 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc) * All types show up as Enumeration.Value in Java. http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html 2. Java's Enum (e.g., SaveMode, IOMode) * Implementation must be in a Java file. * Values doesn't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode 3. Static fields in Java (e.g., TripletFields) * Implementation must be in a Java file. * Doesn't need () in Java code. * Values don't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields 4. Objects in Scala. (e.g., StorageLevel) * Needs () in Java code. * Values show up in both ScalaDoc and JavaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$ http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html It would be great if we have an official approach for this as well as the naming convention for enum-like values (MEMORY_ONLY or MemoryOnly). Personally, I like 4) with MEMORY_ONLY. Any thoughts? Best, Xiangrui - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: enum-like types in Spark
Constants, Values, Variable and Methods Constant names should be in upper camel case. That is, if the member is final, immutable and it belongs to a package object or an object, it may be considered a constant (similar to Java'sstatic final members): 1. object Container { 2. val MyConstant = ... 3. } 2015-03-04 17:11 GMT-08:00 Xiangrui Meng men...@gmail.com: Hi all, There are many places where we use enum-like types in Spark, but in different ways. Every approach has both pros and cons. I wonder whether there should be an official approach for enum-like types in Spark. 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc) * All types show up as Enumeration.Value in Java. http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html 2. Java's Enum (e.g., SaveMode, IOMode) * Implementation must be in a Java file. * Values doesn't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode 3. Static fields in Java (e.g., TripletFields) * Implementation must be in a Java file. * Doesn't need () in Java code. * Values don't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields 4. Objects in Scala. (e.g., StorageLevel) * Needs () in Java code. * Values show up in both ScalaDoc and JavaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$ http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html It would be great if we have an official approach for this as well as the naming convention for enum-like values (MEMORY_ONLY or MemoryOnly). Personally, I like 4) with MEMORY_ONLY. Any thoughts? Best, Xiangrui - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: enum-like types in Spark
This has some disadvantage for Java, I think. You can't switch on an object defined like this, but you can with an enum. And although the scala compiler understands that the set of values is fixed because of 'sealed' and so can warn about missing cases, the JVM won't know this, and can't do the same. On Fri, Mar 6, 2015 at 3:58 AM, Xiangrui Meng men...@gmail.com wrote: For #4, my previous proposal may confuse the IDEs with additional types generated by the case objects, and their toString contain the underscore. The following works better: sealed abstract class StorageLevel object StorageLevel { final val MemoryOnly: StorageLevel = { case object MemoryOnly extends StorageLevel MemoryOnly } final val DiskOnly: StorageLevel = { case object DiskOnly extends StorageLevel DiskOnly } } MemoryOnly and DiskOnly can be used in pattern matching. If people are okay with this approach, I can add it to the code style guide. Imran, this is not just for internal APIs, which are relatively more flexible. It is good to use the same approach to implement public enum-like types from now on. Best, Xiangrui On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid iras...@cloudera.com wrote: I have a very strong dislike for #1 (scala enumerations). I'm ok with #4 (with Xiangrui's final suggestion, especially making it sealed available in Java), but I really think #2, java enums, are the best option. Java enums actually have some very real advantages over the other approaches -- you get values(), valueOf(), EnumSet, and EnumMap. There has been endless debate in the Scala community about the problems with the approaches in Scala. Very smart, level-headed Scala gurus have complained about their short-comings (Rex Kerr's name is coming to mind, though I'm not positive about that); there have been numerous well-thought out proposals to give Scala a better enum. But the powers-that-be in Scala always reject them. IIRC the explanation for rejecting is basically that (a) enums aren't important enough for introducing some new special feature, scala's got bigger things to work on and (b) if you really need a good enum, just use java's enum. I doubt it really matters that much for Spark internals, which is why I think #4 is fine. But I figured I'd give my spiel, because every developer loves language wars :) Imran On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng men...@gmail.com wrote: `case object` inside an `object` doesn't show up in Java. This is the minimal code I found to make everything show up correctly in both Scala and Java: sealed abstract class StorageLevel // cannot be a trait object StorageLevel { private[this] case object _MemoryOnly extends StorageLevel final val MemoryOnly: StorageLevel = _MemoryOnly private[this] case object _DiskOnly extends StorageLevel final val DiskOnly: StorageLevel = _DiskOnly } On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell pwend...@gmail.com wrote: I like #4 as well and agree with Aaron's suggestion. - Patrick On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson ilike...@gmail.com wrote: I'm cool with #4 as well, but make sure we dictate that the values should be defined within an object with the same name as the enumeration (like we do for StorageLevel). Otherwise we may pollute a higher namespace. e.g. we SHOULD do: trait StorageLevel object StorageLevel { case object MemoryOnly extends StorageLevel case object DiskOnly extends StorageLevel } On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust mich...@databricks.com wrote: #4 with a preference for CamelCaseEnums On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley jos...@databricks.com wrote: another vote for #4 People are already used to adding () in Java. On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch java...@gmail.com wrote: #4 but with MemoryOnly (more scala-like) http://docs.scala-lang.org/style/naming-conventions.html Constants, Values, Variable and Methods Constant names should be in upper camel case. That is, if the member is final, immutable and it belongs to a package object or an object, it may be considered a constant (similar to Java'sstatic final members): 1. object Container { 2. val MyConstant = ... 3. } 2015-03-04 17:11 GMT-08:00 Xiangrui Meng men...@gmail.com: Hi all, There are many places where we use enum-like types in Spark, but in different ways. Every approach has both pros and cons. I wonder whether there should be an official approach for enum-like types in Spark. 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc) * All types show up as Enumeration.Value in Java. http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html 2. Java's Enum (e.g
Re: enum-like types in Spark
Yes - only new or internal API's. I doubt we'd break any exposed APIs for the purpose of clean up. Patrick On Mar 5, 2015 12:16 AM, Mridul Muralidharan mri...@gmail.com wrote: While I dont have any strong opinions about how we handle enum's either way in spark, I assume the discussion is targetted at (new) api being designed in spark. Rewiring what we already have exposed will lead to incompatible api change (StorageLevel for example, is in 1.0). Regards, Mridul On Wed, Mar 4, 2015 at 11:45 PM, Aaron Davidson ilike...@gmail.com wrote: That's kinda annoying, but it's just a little extra boilerplate. Can you call it as StorageLevel.DiskOnly() from Java? Would it also work if they were case classes with empty constructors, without the field? On Wed, Mar 4, 2015 at 11:35 PM, Xiangrui Meng men...@gmail.com wrote: `case object` inside an `object` doesn't show up in Java. This is the minimal code I found to make everything show up correctly in both Scala and Java: sealed abstract class StorageLevel // cannot be a trait object StorageLevel { private[this] case object _MemoryOnly extends StorageLevel final val MemoryOnly: StorageLevel = _MemoryOnly private[this] case object _DiskOnly extends StorageLevel final val DiskOnly: StorageLevel = _DiskOnly } On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell pwend...@gmail.com wrote: I like #4 as well and agree with Aaron's suggestion. - Patrick On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson ilike...@gmail.com wrote: I'm cool with #4 as well, but make sure we dictate that the values should be defined within an object with the same name as the enumeration (like we do for StorageLevel). Otherwise we may pollute a higher namespace. e.g. we SHOULD do: trait StorageLevel object StorageLevel { case object MemoryOnly extends StorageLevel case object DiskOnly extends StorageLevel } On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust mich...@databricks.com wrote: #4 with a preference for CamelCaseEnums On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley jos...@databricks.com wrote: another vote for #4 People are already used to adding () in Java. On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch java...@gmail.com wrote: #4 but with MemoryOnly (more scala-like) http://docs.scala-lang.org/style/naming-conventions.html Constants, Values, Variable and Methods Constant names should be in upper camel case. That is, if the member is final, immutable and it belongs to a package object or an object, it may be considered a constant (similar to Java'sstatic final members): 1. object Container { 2. val MyConstant = ... 3. } 2015-03-04 17:11 GMT-08:00 Xiangrui Meng men...@gmail.com: Hi all, There are many places where we use enum-like types in Spark, but in different ways. Every approach has both pros and cons. I wonder whether there should be an official approach for enum-like types in Spark. 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc) * All types show up as Enumeration.Value in Java. http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html 2. Java's Enum (e.g., SaveMode, IOMode) * Implementation must be in a Java file. * Values doesn't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode 3. Static fields in Java (e.g., TripletFields) * Implementation must be in a Java file. * Doesn't need () in Java code. * Values don't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields 4. Objects in Scala. (e.g., StorageLevel) * Needs () in Java code. * Values show up in both ScalaDoc and JavaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$ http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html It would be great if we have an official approach for this as well as the naming convention for enum-like values (MEMORY_ONLY or MemoryOnly). Personally, I like 4) with MEMORY_ONLY. Any thoughts? Best, Xiangrui - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: enum-like types in Spark
While I dont have any strong opinions about how we handle enum's either way in spark, I assume the discussion is targetted at (new) api being designed in spark. Rewiring what we already have exposed will lead to incompatible api change (StorageLevel for example, is in 1.0). Regards, Mridul On Wed, Mar 4, 2015 at 11:45 PM, Aaron Davidson ilike...@gmail.com wrote: That's kinda annoying, but it's just a little extra boilerplate. Can you call it as StorageLevel.DiskOnly() from Java? Would it also work if they were case classes with empty constructors, without the field? On Wed, Mar 4, 2015 at 11:35 PM, Xiangrui Meng men...@gmail.com wrote: `case object` inside an `object` doesn't show up in Java. This is the minimal code I found to make everything show up correctly in both Scala and Java: sealed abstract class StorageLevel // cannot be a trait object StorageLevel { private[this] case object _MemoryOnly extends StorageLevel final val MemoryOnly: StorageLevel = _MemoryOnly private[this] case object _DiskOnly extends StorageLevel final val DiskOnly: StorageLevel = _DiskOnly } On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell pwend...@gmail.com wrote: I like #4 as well and agree with Aaron's suggestion. - Patrick On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson ilike...@gmail.com wrote: I'm cool with #4 as well, but make sure we dictate that the values should be defined within an object with the same name as the enumeration (like we do for StorageLevel). Otherwise we may pollute a higher namespace. e.g. we SHOULD do: trait StorageLevel object StorageLevel { case object MemoryOnly extends StorageLevel case object DiskOnly extends StorageLevel } On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust mich...@databricks.com wrote: #4 with a preference for CamelCaseEnums On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley jos...@databricks.com wrote: another vote for #4 People are already used to adding () in Java. On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch java...@gmail.com wrote: #4 but with MemoryOnly (more scala-like) http://docs.scala-lang.org/style/naming-conventions.html Constants, Values, Variable and Methods Constant names should be in upper camel case. That is, if the member is final, immutable and it belongs to a package object or an object, it may be considered a constant (similar to Java'sstatic final members): 1. object Container { 2. val MyConstant = ... 3. } 2015-03-04 17:11 GMT-08:00 Xiangrui Meng men...@gmail.com: Hi all, There are many places where we use enum-like types in Spark, but in different ways. Every approach has both pros and cons. I wonder whether there should be an official approach for enum-like types in Spark. 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc) * All types show up as Enumeration.Value in Java. http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html 2. Java's Enum (e.g., SaveMode, IOMode) * Implementation must be in a Java file. * Values doesn't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode 3. Static fields in Java (e.g., TripletFields) * Implementation must be in a Java file. * Doesn't need () in Java code. * Values don't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields 4. Objects in Scala. (e.g., StorageLevel) * Needs () in Java code. * Values show up in both ScalaDoc and JavaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$ http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html It would be great if we have an official approach for this as well as the naming convention for enum-like values (MEMORY_ONLY or MemoryOnly). Personally, I like 4) with MEMORY_ONLY. Any thoughts? Best, Xiangrui - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: enum-like types in Spark
I have a strong dislike for java enum's due to the fact that they are not stable across JVM's - if it undergoes serde, you end up with unpredictable results at times [1]. One of the reasons why we prevent enum's from being key : though it is highly possible users might depend on it internally and shoot themselves in the foot. Would be better to keep away from them in general and use something more stable. Regards, Mridul [1] Having had to debug this issue for 2 weeks - I really really hate it. On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid iras...@cloudera.com wrote: I have a very strong dislike for #1 (scala enumerations). I'm ok with #4 (with Xiangrui's final suggestion, especially making it sealed available in Java), but I really think #2, java enums, are the best option. Java enums actually have some very real advantages over the other approaches -- you get values(), valueOf(), EnumSet, and EnumMap. There has been endless debate in the Scala community about the problems with the approaches in Scala. Very smart, level-headed Scala gurus have complained about their short-comings (Rex Kerr's name is coming to mind, though I'm not positive about that); there have been numerous well-thought out proposals to give Scala a better enum. But the powers-that-be in Scala always reject them. IIRC the explanation for rejecting is basically that (a) enums aren't important enough for introducing some new special feature, scala's got bigger things to work on and (b) if you really need a good enum, just use java's enum. I doubt it really matters that much for Spark internals, which is why I think #4 is fine. But I figured I'd give my spiel, because every developer loves language wars :) Imran On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng men...@gmail.com wrote: `case object` inside an `object` doesn't show up in Java. This is the minimal code I found to make everything show up correctly in both Scala and Java: sealed abstract class StorageLevel // cannot be a trait object StorageLevel { private[this] case object _MemoryOnly extends StorageLevel final val MemoryOnly: StorageLevel = _MemoryOnly private[this] case object _DiskOnly extends StorageLevel final val DiskOnly: StorageLevel = _DiskOnly } On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell pwend...@gmail.com wrote: I like #4 as well and agree with Aaron's suggestion. - Patrick On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson ilike...@gmail.com wrote: I'm cool with #4 as well, but make sure we dictate that the values should be defined within an object with the same name as the enumeration (like we do for StorageLevel). Otherwise we may pollute a higher namespace. e.g. we SHOULD do: trait StorageLevel object StorageLevel { case object MemoryOnly extends StorageLevel case object DiskOnly extends StorageLevel } On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust mich...@databricks.com wrote: #4 with a preference for CamelCaseEnums On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley jos...@databricks.com wrote: another vote for #4 People are already used to adding () in Java. On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch java...@gmail.com wrote: #4 but with MemoryOnly (more scala-like) http://docs.scala-lang.org/style/naming-conventions.html Constants, Values, Variable and Methods Constant names should be in upper camel case. That is, if the member is final, immutable and it belongs to a package object or an object, it may be considered a constant (similar to Java'sstatic final members): 1. object Container { 2. val MyConstant = ... 3. } 2015-03-04 17:11 GMT-08:00 Xiangrui Meng men...@gmail.com: Hi all, There are many places where we use enum-like types in Spark, but in different ways. Every approach has both pros and cons. I wonder whether there should be an official approach for enum-like types in Spark. 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc) * All types show up as Enumeration.Value in Java. http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html 2. Java's Enum (e.g., SaveMode, IOMode) * Implementation must be in a Java file. * Values doesn't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode 3. Static fields in Java (e.g., TripletFields) * Implementation must be in a Java file. * Doesn't need () in Java code. * Values don't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields 4. Objects in Scala. (e.g., StorageLevel) * Needs () in Java code. * Values show up in both ScalaDoc and JavaDoc
Re: enum-like types in Spark
#4 with a preference for CamelCaseEnums On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley jos...@databricks.com wrote: another vote for #4 People are already used to adding () in Java. On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch java...@gmail.com wrote: #4 but with MemoryOnly (more scala-like) http://docs.scala-lang.org/style/naming-conventions.html Constants, Values, Variable and Methods Constant names should be in upper camel case. That is, if the member is final, immutable and it belongs to a package object or an object, it may be considered a constant (similar to Java’sstatic final members): 1. object Container { 2. val MyConstant = ... 3. } 2015-03-04 17:11 GMT-08:00 Xiangrui Meng men...@gmail.com: Hi all, There are many places where we use enum-like types in Spark, but in different ways. Every approach has both pros and cons. I wonder whether there should be an “official” approach for enum-like types in Spark. 1. Scala’s Enumeration (e.g., SchedulingMode, WorkerState, etc) * All types show up as Enumeration.Value in Java. http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html 2. Java’s Enum (e.g., SaveMode, IOMode) * Implementation must be in a Java file. * Values doesn’t show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode 3. Static fields in Java (e.g., TripletFields) * Implementation must be in a Java file. * Doesn’t need “()” in Java code. * Values don't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields 4. Objects in Scala. (e.g., StorageLevel) * Needs “()” in Java code. * Values show up in both ScalaDoc and JavaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$ http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html It would be great if we have an “official” approach for this as well as the naming convention for enum-like values (“MEMORY_ONLY” or “MemoryOnly”). Personally, I like 4) with “MEMORY_ONLY”. Any thoughts? Best, Xiangrui - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
enum-like types in Spark
Hi all, There are many places where we use enum-like types in Spark, but in different ways. Every approach has both pros and cons. I wonder whether there should be an “official” approach for enum-like types in Spark. 1. Scala’s Enumeration (e.g., SchedulingMode, WorkerState, etc) * All types show up as Enumeration.Value in Java. http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html 2. Java’s Enum (e.g., SaveMode, IOMode) * Implementation must be in a Java file. * Values doesn’t show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode 3. Static fields in Java (e.g., TripletFields) * Implementation must be in a Java file. * Doesn’t need “()” in Java code. * Values don't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields 4. Objects in Scala. (e.g., StorageLevel) * Needs “()” in Java code. * Values show up in both ScalaDoc and JavaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$ http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html It would be great if we have an “official” approach for this as well as the naming convention for enum-like values (“MEMORY_ONLY” or “MemoryOnly”). Personally, I like 4) with “MEMORY_ONLY”. Any thoughts? Best, Xiangrui - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: enum-like types in Spark
another vote for #4 People are already used to adding () in Java. On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch java...@gmail.com wrote: #4 but with MemoryOnly (more scala-like) http://docs.scala-lang.org/style/naming-conventions.html Constants, Values, Variable and Methods Constant names should be in upper camel case. That is, if the member is final, immutable and it belongs to a package object or an object, it may be considered a constant (similar to Java’sstatic final members): 1. object Container { 2. val MyConstant = ... 3. } 2015-03-04 17:11 GMT-08:00 Xiangrui Meng men...@gmail.com: Hi all, There are many places where we use enum-like types in Spark, but in different ways. Every approach has both pros and cons. I wonder whether there should be an “official” approach for enum-like types in Spark. 1. Scala’s Enumeration (e.g., SchedulingMode, WorkerState, etc) * All types show up as Enumeration.Value in Java. http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html 2. Java’s Enum (e.g., SaveMode, IOMode) * Implementation must be in a Java file. * Values doesn’t show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode 3. Static fields in Java (e.g., TripletFields) * Implementation must be in a Java file. * Doesn’t need “()” in Java code. * Values don't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields 4. Objects in Scala. (e.g., StorageLevel) * Needs “()” in Java code. * Values show up in both ScalaDoc and JavaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$ http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html It would be great if we have an “official” approach for this as well as the naming convention for enum-like values (“MEMORY_ONLY” or “MemoryOnly”). Personally, I like 4) with “MEMORY_ONLY”. Any thoughts? Best, Xiangrui - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: enum-like types in Spark
I'm cool with #4 as well, but make sure we dictate that the values should be defined within an object with the same name as the enumeration (like we do for StorageLevel). Otherwise we may pollute a higher namespace. e.g. we SHOULD do: trait StorageLevel object StorageLevel { case object MemoryOnly extends StorageLevel case object DiskOnly extends StorageLevel } On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust mich...@databricks.com wrote: #4 with a preference for CamelCaseEnums On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley jos...@databricks.com wrote: another vote for #4 People are already used to adding () in Java. On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch java...@gmail.com wrote: #4 but with MemoryOnly (more scala-like) http://docs.scala-lang.org/style/naming-conventions.html Constants, Values, Variable and Methods Constant names should be in upper camel case. That is, if the member is final, immutable and it belongs to a package object or an object, it may be considered a constant (similar to Java’sstatic final members): 1. object Container { 2. val MyConstant = ... 3. } 2015-03-04 17:11 GMT-08:00 Xiangrui Meng men...@gmail.com: Hi all, There are many places where we use enum-like types in Spark, but in different ways. Every approach has both pros and cons. I wonder whether there should be an “official” approach for enum-like types in Spark. 1. Scala’s Enumeration (e.g., SchedulingMode, WorkerState, etc) * All types show up as Enumeration.Value in Java. http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html 2. Java’s Enum (e.g., SaveMode, IOMode) * Implementation must be in a Java file. * Values doesn’t show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode 3. Static fields in Java (e.g., TripletFields) * Implementation must be in a Java file. * Doesn’t need “()” in Java code. * Values don't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields 4. Objects in Scala. (e.g., StorageLevel) * Needs “()” in Java code. * Values show up in both ScalaDoc and JavaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$ http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html It would be great if we have an “official” approach for this as well as the naming convention for enum-like values (“MEMORY_ONLY” or “MemoryOnly”). Personally, I like 4) with “MEMORY_ONLY”. Any thoughts? Best, Xiangrui - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: enum-like types in Spark
#4 but with MemoryOnly (more scala-like) http://docs.scala-lang.org/style/naming-conventions.html Constants, Values, Variable and Methods Constant names should be in upper camel case. That is, if the member is final, immutable and it belongs to a package object or an object, it may be considered a constant (similar to Java’sstatic final members): 1. object Container { 2. val MyConstant = ... 3. } 2015-03-04 17:11 GMT-08:00 Xiangrui Meng men...@gmail.com: Hi all, There are many places where we use enum-like types in Spark, but in different ways. Every approach has both pros and cons. I wonder whether there should be an “official” approach for enum-like types in Spark. 1. Scala’s Enumeration (e.g., SchedulingMode, WorkerState, etc) * All types show up as Enumeration.Value in Java. http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html 2. Java’s Enum (e.g., SaveMode, IOMode) * Implementation must be in a Java file. * Values doesn’t show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode 3. Static fields in Java (e.g., TripletFields) * Implementation must be in a Java file. * Doesn’t need “()” in Java code. * Values don't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields 4. Objects in Scala. (e.g., StorageLevel) * Needs “()” in Java code. * Values show up in both ScalaDoc and JavaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$ http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html It would be great if we have an “official” approach for this as well as the naming convention for enum-like values (“MEMORY_ONLY” or “MemoryOnly”). Personally, I like 4) with “MEMORY_ONLY”. Any thoughts? Best, Xiangrui - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: enum-like types in Spark
I like #4 as well and agree with Aaron's suggestion. - Patrick On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson ilike...@gmail.com wrote: I'm cool with #4 as well, but make sure we dictate that the values should be defined within an object with the same name as the enumeration (like we do for StorageLevel). Otherwise we may pollute a higher namespace. e.g. we SHOULD do: trait StorageLevel object StorageLevel { case object MemoryOnly extends StorageLevel case object DiskOnly extends StorageLevel } On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust mich...@databricks.com wrote: #4 with a preference for CamelCaseEnums On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley jos...@databricks.com wrote: another vote for #4 People are already used to adding () in Java. On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch java...@gmail.com wrote: #4 but with MemoryOnly (more scala-like) http://docs.scala-lang.org/style/naming-conventions.html Constants, Values, Variable and Methods Constant names should be in upper camel case. That is, if the member is final, immutable and it belongs to a package object or an object, it may be considered a constant (similar to Java'sstatic final members): 1. object Container { 2. val MyConstant = ... 3. } 2015-03-04 17:11 GMT-08:00 Xiangrui Meng men...@gmail.com: Hi all, There are many places where we use enum-like types in Spark, but in different ways. Every approach has both pros and cons. I wonder whether there should be an official approach for enum-like types in Spark. 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc) * All types show up as Enumeration.Value in Java. http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html 2. Java's Enum (e.g., SaveMode, IOMode) * Implementation must be in a Java file. * Values doesn't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode 3. Static fields in Java (e.g., TripletFields) * Implementation must be in a Java file. * Doesn't need () in Java code. * Values don't show up in the ScalaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields 4. Objects in Scala. (e.g., StorageLevel) * Needs () in Java code. * Values show up in both ScalaDoc and JavaDoc: http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$ http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html It would be great if we have an official approach for this as well as the naming convention for enum-like values (MEMORY_ONLY or MemoryOnly). Personally, I like 4) with MEMORY_ONLY. Any thoughts? Best, Xiangrui - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org