[jira] [Commented] (SPARK-6664) Split Ordered RDD into multiple RDDs by keys (boundaries or intervals)

2015-04-03 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394147#comment-14394147 ] Florian Verhein commented on SPARK-6664: I guess the other thing is - we can union

[jira] [Commented] (SPARK-6664) Split Ordered RDD into multiple RDDs by keys (boundaries or intervals)

2015-04-03 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394141#comment-14394141 ] Florian Verhein commented on SPARK-6664: Thanks [~sowen]. I disagree :-) ...If

[jira] [Commented] (SPARK-6665) Randomly Shuffle an RDD

2015-04-03 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394291#comment-14394291 ] Florian Verhein commented on SPARK-6665: Fair enough. I'll have to implement it

[jira] [Commented] (SPARK-6665) Randomly Shuffle an RDD

2015-04-02 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394089#comment-14394089 ] Florian Verhein commented on SPARK-6665: Thanks for the quick response [~sowen].

[jira] [Updated] (SPARK-6664) Split Ordered RDD into multiple RDDs by keys (boundaries or intervals)

2015-04-01 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Verhein updated SPARK-6664: --- Description: I can't find this functionality (if I missed something, apologies!), but it

[jira] [Created] (SPARK-6665) Randomly Shuffle an RDD

2015-04-01 Thread Florian Verhein (JIRA)
Florian Verhein created SPARK-6665: -- Summary: Randomly Shuffle an RDD Key: SPARK-6665 URL: https://issues.apache.org/jira/browse/SPARK-6665 Project: Spark Issue Type: New Feature

[jira] [Commented] (SPARK-6664) Split Ordered RDD into multiple RDDs by keys (boundaries or intervals)

2015-04-01 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391950#comment-14391950 ] Florian Verhein commented on SPARK-6664: The closest approach I've found that

[jira] [Created] (SPARK-6664) Split Ordered RDD into multiple RDDs by keys (boundaries or intervals)

2015-04-01 Thread Florian Verhein (JIRA)
Florian Verhein created SPARK-6664: -- Summary: Split Ordered RDD into multiple RDDs by keys (boundaries or intervals) Key: SPARK-6664 URL: https://issues.apache.org/jira/browse/SPARK-6664 Project:

[jira] [Updated] (SPARK-6601) Add HDFS NFS gateway module to spark-ec2

2015-03-29 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Verhein updated SPARK-6601: --- Description: Add module hdfs-nfs-gateway, which sets up the gateway for (say,

[jira] [Created] (SPARK-6601) Add HDFS NFS gateway module to spark-ec2

2015-03-29 Thread Florian Verhein (JIRA)
Florian Verhein created SPARK-6601: -- Summary: Add HDFS NFS gateway module to spark-ec2 Key: SPARK-6601 URL: https://issues.apache.org/jira/browse/SPARK-6601 Project: Spark Issue Type: New

[jira] [Updated] (SPARK-6600) Open ports in spark-ec2.py to allow HDFS NFS gateway

2015-03-29 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Verhein updated SPARK-6600: --- Description: Use case: User has set up the hadoop hdfs nfs gateway service on their

[jira] [Updated] (SPARK-6600) Open ports in ec2/spark_ec2.py to allow HDFS NFS gateway

2015-03-29 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Verhein updated SPARK-6600: --- Summary: Open ports in ec2/spark_ec2.py to allow HDFS NFS gateway(was: Open ports in

[jira] [Updated] (SPARK-6600) Open ports in ec2/spark_ec2.py to allow HDFS NFS gateway

2015-03-29 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Verhein updated SPARK-6600: --- Description: Use case: User has set up the hadoop hdfs nfs gateway service on their

[jira] [Updated] (SPARK-6601) Add HDFS NFS gateway module to spark-ec2

2015-03-29 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Verhein updated SPARK-6601: --- Description: Add module hdfs-nfs-gateway, which sets up the gateway for (say,

[jira] [Commented] (SPARK-5879) spary_ec2.py should expose/return master and slave lists (e.g. write to file)

2015-02-19 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328612#comment-14328612 ] Florian Verhein commented on SPARK-5879: cc [~shivaram], any opinions on how to

[jira] [Created] (SPARK-5879) spary_ec2.py should expose/return master and slave lists (e.g. write to file)

2015-02-17 Thread Florian Verhein (JIRA)
Florian Verhein created SPARK-5879: -- Summary: spary_ec2.py should expose/return master and slave lists (e.g. write to file) Key: SPARK-5879 URL: https://issues.apache.org/jira/browse/SPARK-5879

[jira] [Commented] (SPARK-5851) spark_ec2.py ssh failure retry handling not always appropriate

2015-02-17 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324986#comment-14324986 ] Florian Verhein commented on SPARK-5851: That makes sense. Yeah, I ran into it

[jira] [Created] (SPARK-5851) spark_ec2.py ssh failure retry handling not always appropriate

2015-02-16 Thread Florian Verhein (JIRA)
Florian Verhein created SPARK-5851: -- Summary: spark_ec2.py ssh failure retry handling not always appropriate Key: SPARK-5851 URL: https://issues.apache.org/jira/browse/SPARK-5851 Project: Spark

[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-16 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322611#comment-14322611 ] Florian Verhein commented on SPARK-5813: I think it's a good idea to stick to

[jira] [Closed] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-16 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Verhein closed SPARK-5813. -- Resolution: Won't Fix Spark-ec2: Switch to OracleJDK --

[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-15 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321764#comment-14321764 ] Florian Verhein commented on SPARK-5813: INAL but here are my thoughts: The user

[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-15 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322208#comment-14322208 ] Florian Verhein commented on SPARK-5813: Good point. I think you're right re:

[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-14 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321748#comment-14321748 ] Florian Verhein commented on SPARK-5813: No specific technical reason esp WRT

[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-02-13 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320995#comment-14320995 ] Florian Verhein commented on SPARK-3821: RE: Java, that reminds me... We should

[jira] [Created] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-13 Thread Florian Verhein (JIRA)
Florian Verhein created SPARK-5813: -- Summary: Spark-ec2: Switch to OracleJDK Key: SPARK-5813 URL: https://issues.apache.org/jira/browse/SPARK-5813 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-5641) Allow spark_ec2.py to copy arbitrary files to cluster

2015-02-12 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Verhein updated SPARK-5641: --- Description: *Updated - no longer via deploy.generic, no substitutions* Essentially, give

[jira] [Updated] (SPARK-5641) Allow spark_ec2.py to copy arbitrary files to cluster via deploy.generic

2015-02-09 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Verhein updated SPARK-5641: --- Description: Useful if binary files need to be uploaded. E.g. I use this for rpm transfer to

[jira] [Comment Edited] (SPARK-5676) License missing from spark-ec2 repo

2015-02-09 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313102#comment-14313102 ] Florian Verhein edited comment on SPARK-5676 at 2/9/15 11:06 PM:

[jira] [Commented] (SPARK-5676) License missing from spark-ec2 repo

2015-02-09 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313102#comment-14313102 ] Florian Verhein commented on SPARK-5676: [~srowen] Yep, that's the one. True.

[jira] [Commented] (SPARK-5676) License missing from spark-ec2 repo

2015-02-09 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313135#comment-14313135 ] Florian Verhein commented on SPARK-5676: Makes sense. Thanks. License missing

[jira] [Created] (SPARK-5676) License missing from spark-ec2 repo

2015-02-08 Thread Florian Verhein (JIRA)
Florian Verhein created SPARK-5676: -- Summary: License missing from spark-ec2 repo Key: SPARK-5676 URL: https://issues.apache.org/jira/browse/SPARK-5676 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-3185) SPARK launch on Hadoop 2 in EC2 throws Tachyon exception when Formatting JOURNAL_FOLDER

2015-02-05 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308644#comment-14308644 ] Florian Verhein commented on SPARK-3185: [~dvohra] Sure, but the exception is

[jira] [Created] (SPARK-5641) Allow spark_ec2.py to copy arbitrary files to cluster via deploy.generic

2015-02-05 Thread Florian Verhein (JIRA)
Florian Verhein created SPARK-5641: -- Summary: Allow spark_ec2.py to copy arbitrary files to cluster via deploy.generic Key: SPARK-5641 URL: https://issues.apache.org/jira/browse/SPARK-5641 Project:

[jira] [Commented] (SPARK-5552) Automated data science AMI creation and data science cluster deployment on EC2

2015-02-03 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304412#comment-14304412 ] Florian Verhein commented on SPARK-5552: Thanks [~sowen]. So it wouldn't fit in

[jira] [Created] (SPARK-5552) Automated data science AMIs creation and cluster deployment on EC2

2015-02-02 Thread Florian Verhein (JIRA)
Florian Verhein created SPARK-5552: -- Summary: Automated data science AMIs creation and cluster deployment on EC2 Key: SPARK-5552 URL: https://issues.apache.org/jira/browse/SPARK-5552 Project: Spark

[jira] [Updated] (SPARK-5552) Automated data science AMI creation and data science cluster deployment on EC2

2015-02-02 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Verhein updated SPARK-5552: --- Summary: Automated data science AMI creation and data science cluster deployment on EC2

[jira] [Commented] (SPARK-3185) SPARK launch on Hadoop 2 in EC2 throws Tachyon exception when Formatting JOURNAL_FOLDER

2015-01-24 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290923#comment-14290923 ] Florian Verhein commented on SPARK-3185: Sure [~grzegorz-dubicki]. You need to

[jira] [Updated] (SPARK-5331) Spark workers can't find tachyon master as spark-ec2 doesn't set spark.tachyonStore.url

2015-01-20 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Verhein updated SPARK-5331: --- Component/s: EC2 Description: ps -ef | grep Tachyon shows Tachyon running on the master

[jira] [Commented] (SPARK-3185) SPARK launch on Hadoop 2 in EC2 throws Tachyon exception when Formatting JOURNAL_FOLDER

2015-01-19 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283493#comment-14283493 ] Florian Verhein commented on SPARK-3185: I built tachyon with the correct hadoop

[jira] [Created] (SPARK-5331) Tachyon workers seem to ignore tachyon.master.hostname and use localhost instead

2015-01-19 Thread Florian Verhein (JIRA)
Florian Verhein created SPARK-5331: -- Summary: Tachyon workers seem to ignore tachyon.master.hostname and use localhost instead Key: SPARK-5331 URL: https://issues.apache.org/jira/browse/SPARK-5331

[jira] [Commented] (SPARK-3185) SPARK launch on Hadoop 2 in EC2 throws Tachyon exception when Formatting JOURNAL_FOLDER

2015-01-13 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276436#comment-14276436 ] Florian Verhein commented on SPARK-3185: I'm also getting this, though with Server

[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-13 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276572#comment-14276572 ] Florian Verhein commented on SPARK-3821: Thanks [~nchammas], that makes sense.

[jira] [Created] (SPARK-5241) spark-ec2 spark init scripts do not handle all hadoop (or tachyon?) dependencies correctly

2015-01-13 Thread Florian Verhein (JIRA)
Florian Verhein created SPARK-5241: -- Summary: spark-ec2 spark init scripts do not handle all hadoop (or tachyon?) dependencies correctly Key: SPARK-5241 URL: https://issues.apache.org/jira/browse/SPARK-5241

[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-13 Thread Florian Verhein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276263#comment-14276263 ] Florian Verhein commented on SPARK-3821: This is great stuff! It'll also help