[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK
[ https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322611#comment-14322611 ] Florian Verhein commented on SPARK-5813: I think it's a good idea to stick to vendor recommendations, but since I can't point to any concrete benefits and there is complexity around handling licensing issues, I don't think there's a good argument for tackling this. Spark-ec2: Switch to OracleJDK -- Key: SPARK-5813 URL: https://issues.apache.org/jira/browse/SPARK-5813 Project: Spark Issue Type: Improvement Components: EC2 Reporter: Florian Verhein Priority: Minor Currently using OpenJDK, however it is generally recommended to use Oracle JDK, esp for Hadoop deployments, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK
[ https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321764#comment-14321764 ] Florian Verhein commented on SPARK-5813: INAL but here are my thoughts: The user ends up downloading it from Oracle and accepting the license terms in that process, so as long as they are (or made) aware then I don't really see a problem. It's just providing a mechanism for them to do this. i.e. It's not a redistribution issue. I think a reasonable solution to this would be to have OpenJDK as the default, with OracleJDK as an option that the user must specifically request (and the option's documentation indicating that this entails acceptance of a license... etc) At least, *the above is true in the case where the user builds their own AMI (that's the approach I take since it best suits my requirements). With provided AMIs I think this is more complex, because I would assume that is redistribution*. I guess that applies to any software that is put on the AMI actually... so this may be an issue that needs looking at more generally... I don't know how to best approach that case other than adhering to any redistribution terms including these as part of an EULA for spark-ec2/AMIs or something? But with the work [~nchammas] has done, I suppose the easiest way would be to provide the public AMIs with OpenJDK, and add an option to build ones with OracleJDK if the user is inclined to do this themselves. Hmmm... is this worthwhile? Spark-ec2: Switch to OracleJDK -- Key: SPARK-5813 URL: https://issues.apache.org/jira/browse/SPARK-5813 Project: Spark Issue Type: Improvement Components: EC2 Reporter: Florian Verhein Priority: Minor Currently using OpenJDK, however it is generally recommended to use Oracle JDK, esp for Hadoop deployments, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK
[ https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321942#comment-14321942 ] Sean Owen commented on SPARK-5813: -- I kind of misstated this. I think this issue is more fundamentally one of distribution. I don't believe others are entitled to redistribute Oracle's JDK/JRE. So I don't think Spark can provide AMIs that contain the Oracle implementation. Providing tools to help someone build an AMI with Oracle JDK is different. However there too I don't think you can actively hide and agree to the license agreement, or slip in what you think is an equivalent license agreement process. It's not our call to make. Dumb question, are AMIs being hosted and redistributed by the Spark project? I wasn't aware of these if so. Whoever does, yes, needs to think about what software licensing terms mean for redistribution. It's perhaps surprising to most people, and an artifact of history, that these OSS licenses kick in almost solely when you distribute, not use, the software! Anyway: every installer that I've seen that provides the Oracle JDK is a wrapper around their downloader and EULA script. You could embed that process in a script, if you dare. My hunch is that it's not worth the trouble, if there's no obvious demand or motivation. Spark-ec2: Switch to OracleJDK -- Key: SPARK-5813 URL: https://issues.apache.org/jira/browse/SPARK-5813 Project: Spark Issue Type: Improvement Components: EC2 Reporter: Florian Verhein Priority: Minor Currently using OpenJDK, however it is generally recommended to use Oracle JDK, esp for Hadoop deployments, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK
[ https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322208#comment-14322208 ] Florian Verhein commented on SPARK-5813: Good point. I think you're right re: scripting away - I understand it's sometimes done by sysadmins/ops to automate their installation processes in-house, but that is a different situation. Thanks for that. spark_ec2 works by looking up an existing ami and using it to instantiate ec2 instances. I don't know who currently maintains these. Spark-ec2: Switch to OracleJDK -- Key: SPARK-5813 URL: https://issues.apache.org/jira/browse/SPARK-5813 Project: Spark Issue Type: Improvement Components: EC2 Reporter: Florian Verhein Priority: Minor Currently using OpenJDK, however it is generally recommended to use Oracle JDK, esp for Hadoop deployments, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK
[ https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321748#comment-14321748 ] Florian Verhein commented on SPARK-5813: No specific technical reason esp WRT Spark... It's more of an attempt to keep in line with recommendations for Hadoop in production (relevant since hadoop is included in spark-ec2 - and cdh seems to be favoured). For example, CDH supports OracleJDK, Horton didn't support OpenJDK before 1.7 and OracleJDK still seems to be the favoured choice in production deployments, e.g. http://wiki.apache.org/hadoop/HadoopJavaVersions. I don't have first had data about how they compare performance wise. I've heard OracleJDK being preferred for Hadoop on that front, but I also found this http://www.slideshare.net/PrincipledTechnologies/big-data-technology-on-red-hat-enterprise-linux-openjdk-vs-oracle-jdk, so perhaps performance is less of a reason these days? Do you know of any performance analysis done with Spark, Tachyon on OpenJDK vs OracleJDK? In terms of difficulty, it's not hard to script installation of OracleJDK. E.g. I've gone down the path of supporting both for the above reasons here (link may break in future): https://github.com/florianverhein/spark-ec2/blob/packer/packer/java-setup.sh Aside: Based on bugs you mentioned, is there a list somewhere of which JDK versions to avoid WRT Spark? Spark-ec2: Switch to OracleJDK -- Key: SPARK-5813 URL: https://issues.apache.org/jira/browse/SPARK-5813 Project: Spark Issue Type: Improvement Components: EC2 Reporter: Florian Verhein Priority: Minor Currently using OpenJDK, however it is generally recommended to use Oracle JDK, esp for Hadoop deployments, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK
[ https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321749#comment-14321749 ] Sean Owen commented on SPARK-5813: -- I could be wrong about this, but I thought one of the reasons Oracle JDK was hard to get at was that it requires the user to accept license terms. You can script around it but is that allowed? Spark-ec2: Switch to OracleJDK -- Key: SPARK-5813 URL: https://issues.apache.org/jira/browse/SPARK-5813 Project: Spark Issue Type: Improvement Components: EC2 Reporter: Florian Verhein Priority: Minor Currently using OpenJDK, however it is generally recommended to use Oracle JDK, esp for Hadoop deployments, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK
[ https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321267#comment-14321267 ] Patrick Wendell commented on SPARK-5813: Hey [~florianverhein]. Just wondering, are there specific features of Oracle's JRE you are interest in? These days, Oracle's JRE and OpenJDK are basically equivalent. In the history of Spark, I don't think I've ever seen us have a bug that was specific to OpenJDK and not also present in Oracle JDK. Given how much easier it is to install open JDK I'm not sure it's worth the extra packaging annoyance to add Oracle Java. Just curious if you have a specific reason to want Oracle. Spark-ec2: Switch to OracleJDK -- Key: SPARK-5813 URL: https://issues.apache.org/jira/browse/SPARK-5813 Project: Spark Issue Type: Improvement Components: EC2 Reporter: Florian Verhein Priority: Minor Currently using OpenJDK, however it is generally recommended to use Oracle JDK, esp for Hadoop deployments, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org