[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-16 Thread Florian Verhein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322611#comment-14322611
 ] 

Florian Verhein commented on SPARK-5813:


I think it's a good idea to stick to vendor recommendations, but since I can't 
point to any concrete benefits and there is complexity around handling 
licensing issues, I don't think there's a good argument for tackling this.

 Spark-ec2: Switch to OracleJDK
 --

 Key: SPARK-5813
 URL: https://issues.apache.org/jira/browse/SPARK-5813
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Florian Verhein
Priority: Minor

 Currently using OpenJDK, however it is generally recommended to use Oracle 
 JDK, esp for Hadoop deployments, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-15 Thread Florian Verhein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321764#comment-14321764
 ] 

Florian Verhein commented on SPARK-5813:


INAL but here are my thoughts:

The user ends up downloading it from Oracle and accepting the license terms in 
that process, so as long as they are (or made) aware then I don't really see a 
problem. It's just providing a mechanism for them to do this. i.e. It's not a 
redistribution issue.
I think a reasonable solution to this would be to have OpenJDK as the default, 
with OracleJDK as an option that the user must specifically request (and the 
option's documentation indicating that this entails acceptance of a license... 
etc)

At least, *the above is true in the case where the user builds their own AMI 
(that's the approach I take since it best suits my requirements). With provided 
AMIs I think this is more complex, because I would assume that is 
redistribution*. I guess that applies to any software that is put on the AMI 
actually... so this may be an issue that needs looking at more generally... 
I don't know how to best approach that case other than adhering to any 
redistribution terms  including these as part of an EULA for spark-ec2/AMIs or 
something? 

But with the work [~nchammas] has done, I suppose the easiest way would be to 
provide the public AMIs with OpenJDK, and add an option to build ones with 
OracleJDK if the user is inclined to do this themselves.
 
Hmmm... is this worthwhile?

 Spark-ec2: Switch to OracleJDK
 --

 Key: SPARK-5813
 URL: https://issues.apache.org/jira/browse/SPARK-5813
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Florian Verhein
Priority: Minor

 Currently using OpenJDK, however it is generally recommended to use Oracle 
 JDK, esp for Hadoop deployments, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-15 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321942#comment-14321942
 ] 

Sean Owen commented on SPARK-5813:
--

I kind of misstated this. I think this issue is more fundamentally one of 
distribution. I don't believe others are entitled to redistribute Oracle's 
JDK/JRE. So I don't think Spark can provide AMIs that contain the Oracle 
implementation. 

Providing tools to help someone build an AMI with Oracle JDK is different. 
However there too I don't think you can actively hide and agree to the license 
agreement, or slip in what you think is an equivalent license agreement 
process. It's not our call to make.

Dumb question, are AMIs being hosted and redistributed by the Spark project? I 
wasn't aware of these if so. Whoever does, yes, needs to think about what 
software licensing terms mean for redistribution. It's perhaps surprising to 
most people, and an artifact of history, that these OSS licenses kick in almost 
solely when you distribute, not use, the software!

Anyway: every installer that I've seen that provides the Oracle JDK is a 
wrapper around their downloader and EULA script. You could embed that process 
in a script, if you dare. My hunch is that it's not worth the trouble, if 
there's no obvious demand or motivation.

 Spark-ec2: Switch to OracleJDK
 --

 Key: SPARK-5813
 URL: https://issues.apache.org/jira/browse/SPARK-5813
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Florian Verhein
Priority: Minor

 Currently using OpenJDK, however it is generally recommended to use Oracle 
 JDK, esp for Hadoop deployments, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-15 Thread Florian Verhein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322208#comment-14322208
 ] 

Florian Verhein commented on SPARK-5813:


Good point. I think you're right re: scripting away - I understand it's 
sometimes done by sysadmins/ops to automate their installation processes 
in-house, but that is a different situation. Thanks for that. 

spark_ec2 works by looking up an existing ami and using it to instantiate ec2 
instances. I don't know who currently maintains these. 



 Spark-ec2: Switch to OracleJDK
 --

 Key: SPARK-5813
 URL: https://issues.apache.org/jira/browse/SPARK-5813
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Florian Verhein
Priority: Minor

 Currently using OpenJDK, however it is generally recommended to use Oracle 
 JDK, esp for Hadoop deployments, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-14 Thread Florian Verhein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321748#comment-14321748
 ] 

Florian Verhein commented on SPARK-5813:


No specific technical reason esp WRT Spark... It's more of an attempt to keep 
in line with recommendations for Hadoop in production (relevant since hadoop is 
included in spark-ec2 - and cdh seems to be favoured). For example, CDH 
supports OracleJDK, Horton didn't support OpenJDK before 1.7 and OracleJDK 
still seems to be the favoured choice in production deployments, e.g. 
http://wiki.apache.org/hadoop/HadoopJavaVersions. 

I don't have first had data about how they compare performance wise. I've heard 
OracleJDK being preferred for Hadoop on that front, but I also found this 
http://www.slideshare.net/PrincipledTechnologies/big-data-technology-on-red-hat-enterprise-linux-openjdk-vs-oracle-jdk,
 so perhaps performance is less of a reason these days?

Do you know of any performance analysis done with Spark, Tachyon on OpenJDK vs 
OracleJDK?

In terms of difficulty, it's not hard to script installation of OracleJDK. E.g. 
I've gone down the path of supporting both for the above reasons here (link may 
break in future): 
https://github.com/florianverhein/spark-ec2/blob/packer/packer/java-setup.sh

Aside: Based on bugs you mentioned, is there a list somewhere of which JDK 
versions to avoid WRT Spark?

 Spark-ec2: Switch to OracleJDK
 --

 Key: SPARK-5813
 URL: https://issues.apache.org/jira/browse/SPARK-5813
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Florian Verhein
Priority: Minor

 Currently using OpenJDK, however it is generally recommended to use Oracle 
 JDK, esp for Hadoop deployments, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-14 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321749#comment-14321749
 ] 

Sean Owen commented on SPARK-5813:
--

I could be wrong about this, but I thought one of the reasons Oracle JDK was 
hard to get at was that it requires the user to accept license terms. You can 
script around it but is that allowed? 

 Spark-ec2: Switch to OracleJDK
 --

 Key: SPARK-5813
 URL: https://issues.apache.org/jira/browse/SPARK-5813
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Florian Verhein
Priority: Minor

 Currently using OpenJDK, however it is generally recommended to use Oracle 
 JDK, esp for Hadoop deployments, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-14 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321267#comment-14321267
 ] 

Patrick Wendell commented on SPARK-5813:


Hey [~florianverhein]. Just wondering, are there specific features of Oracle's 
JRE you are interest in? These days, Oracle's JRE and OpenJDK are basically 
equivalent. In the history of Spark, I don't think I've ever seen us have a bug 
that was specific to OpenJDK and not also present in Oracle JDK. Given how much 
easier it is to install open JDK I'm not sure it's worth the extra packaging 
annoyance to add Oracle Java. Just curious if you have a specific reason to 
want Oracle.

 Spark-ec2: Switch to OracleJDK
 --

 Key: SPARK-5813
 URL: https://issues.apache.org/jira/browse/SPARK-5813
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Florian Verhein
Priority: Minor

 Currently using OpenJDK, however it is generally recommended to use Oracle 
 JDK, esp for Hadoop deployments, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org