Re: Google Summer of Code - ideas

2015-02-24 Thread Xiangrui Meng
Would you be interested in working on MLlib's Python API during the summer? We want everything we implemented in Scala can be used in both Java and Python, but we are not there yet. It would be great if someone is willing to help. -Xiangrui On Sat, Feb 21, 2015 at 11:24 AM, Manoj Kumar

RE: spark slave cannot execute without admin permission on windows

2015-02-24 Thread Judy Nash
Update to the thread. Upon investigation, this is a bug on windows. Windows does not grant user permission read permission to jar files by default. Have created a pull request for SPARK-5914https://issues.apache.org/jira/browse/SPARK-5914 to grant read permission to jar owner (slave service

Does Spark delete shuffle files of lost executor in running system(on YARN)?

2015-02-24 Thread nitin
Hi All, I noticed that Spark doesn't delete local shuffle files of a lost executor in a running system(running in yarn-client mode). For long running system, this might fill up disk space in case of frequent executor failures. Can we delete these files when executor loss reported to driver?

Re: [ERROR] bin/compute-classpath.sh: fails with false positive test for java 1.7 vs 1.6

2015-02-24 Thread Mike Hynes
I don't see any version flag for /usr/bin/jar, but I think I see the problem now; the openjdk version is 7, but javac -version gives 1.6.0_34; so spark was compiled with java 6 despite the system using jre 1.7. Thanks for the sanity check! Now I just need to find out why javac is downgraded on the

Re: [ERROR] bin/compute-classpath.sh: fails with false positive test for java 1.7 vs 1.6

2015-02-24 Thread shane knapp
it's not downgraded, it's your /etc/alternatives setup that's causing this. you can update all of those entries by executing the following commands (as root): update-alternatives --install /usr/bin/java java /usr/java/latest/bin/java 1 update-alternatives --install /usr/bin/javah javah

Help vote for Spark talks at the Hadoop Summit

2015-02-24 Thread Reynold Xin
Hi all, The Hadoop Summit uses community choice voting to decide which talks to feature. It would be great if the community could help vote for Spark talks so that Spark has a good showing at this event. You can make three votes on each track. Below I've listed 3 talks that are important to

Re: Streaming partitions to driver for use in .toLocalIterator

2015-02-24 Thread Andrew Ash
I think a cheap way to repartition to a higher partition count without shuffle would be valuable too. Right now you can choose whether to execute a shuffle when going down in partition count, but going up in partition count always requires a shuffle. For the need of having a smaller partitions

PySpark SPARK_CLASSPATH doesn't distribute jars to executors

2015-02-24 Thread Michael Nazario
Has anyone experienced a problem with the SPARK_CLASSPATH not distributing jars for PySpark? I have a detailed description of what I tried in the ticket below, and this seems like a problem that is not a configuration problem. The only other case I can think of is that configuration changed

Re: Have Friedman's glmnet algo running in Spark

2015-02-24 Thread Joseph Bradley
Hi Mike, I'm not aware of a standard big dataset, but there are a number available: * The YearPredictionMSD dataset from the LIBSVM datasets is sizeable (in # instances but not # features): www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html * I've used this text dataset from which

Re: Have Friedman's glmnet algo running in Spark

2015-02-24 Thread mike
Joseph, Thanks for your reply. We'll take the steps you suggest - generate some timing comparisons and post them in the GLMNET JIRA with a link from the OWLQN JIRA. We've got the regression version of GLMNET programmed. The regression version only requires a pass through the data each time the

[ERROR] bin/compute-classpath.sh: fails with false positive test for java 1.7 vs 1.6

2015-02-24 Thread Mike Hynes
./bin/compute-classpath.sh fails with error: $ jar -tf assembly/target/scala-2.10/spark-assembly-1.3.0-SNAPSHOT-hadoop1.0.4.jar nonexistent/class/path java.util.zip.ZipException: invalid CEN header (bad signature) at java.util.zip.ZipFile.open(Native Method) at

Re: [ERROR] bin/compute-classpath.sh: fails with false positive test for java 1.7 vs 1.6

2015-02-24 Thread Sean Owen
So you mean that the script is checking for this error, and takes it as a sign that you compiled with java 6. Your command seems to confirm that reading the assembly jar does fail on your system though. What version does the jar command show? are you sure you don't have JRE 7 but JDK 6 installed?

Re: PySpark SPARK_CLASSPATH doesn't distribute jars to executors

2015-02-24 Thread Denny Lee
Can you try extraClassPath or driver-class-path and see if that helps with the distribution? On Tue, Feb 24, 2015 at 14:54 Michael Nazario mnaza...@palantir.com wrote: Has anyone experienced a problem with the SPARK_CLASSPATH not distributing jars for PySpark? I have a detailed description of