Would you be interested in working on MLlib's Python API during the
summer? We want everything we implemented in Scala can be used in both
Java and Python, but we are not there yet. It would be great if
someone is willing to help. -Xiangrui
On Sat, Feb 21, 2015 at 11:24 AM, Manoj Kumar
Update to the thread.
Upon investigation, this is a bug on windows. Windows does not grant user
permission read permission to jar files by default.
Have created a pull request for
SPARK-5914https://issues.apache.org/jira/browse/SPARK-5914 to grant read
permission to jar owner (slave service
Hi All,
I noticed that Spark doesn't delete local shuffle files of a lost executor
in a running system(running in yarn-client mode). For long running system,
this might fill up disk space in case of frequent executor failures. Can we
delete these files when executor loss reported to driver?
I don't see any version flag for /usr/bin/jar, but I think I see the
problem now; the openjdk version is 7, but javac -version gives
1.6.0_34; so spark was compiled with java 6 despite the system using
jre 1.7.
Thanks for the sanity check! Now I just need to find out why javac is
downgraded on the
it's not downgraded, it's your /etc/alternatives setup that's causing this.
you can update all of those entries by executing the following commands (as
root):
update-alternatives --install /usr/bin/java java
/usr/java/latest/bin/java 1
update-alternatives --install /usr/bin/javah javah
Hi all,
The Hadoop Summit uses community choice voting to decide which talks to
feature. It would be great if the community could help vote for Spark talks
so that Spark has a good showing at this event. You can make three votes on
each track. Below I've listed 3 talks that are important to
I think a cheap way to repartition to a higher partition count without
shuffle would be valuable too. Right now you can choose whether to execute
a shuffle when going down in partition count, but going up in partition
count always requires a shuffle. For the need of having a smaller
partitions
Has anyone experienced a problem with the SPARK_CLASSPATH not distributing jars
for PySpark? I have a detailed description of what I tried in the ticket below,
and this seems like a problem that is not a configuration problem. The only
other case I can think of is that configuration changed
Hi Mike,
I'm not aware of a standard big dataset, but there are a number available:
* The YearPredictionMSD dataset from the LIBSVM datasets is sizeable (in #
instances but not # features):
www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html
* I've used this text dataset from which
Joseph,
Thanks for your reply. We'll take the steps you suggest - generate some timing
comparisons and post them in the GLMNET JIRA with a link from the OWLQN JIRA.
We've got the regression version of GLMNET programmed. The regression version
only requires a pass through the data each time the
./bin/compute-classpath.sh fails with error:
$ jar -tf
assembly/target/scala-2.10/spark-assembly-1.3.0-SNAPSHOT-hadoop1.0.4.jar
nonexistent/class/path
java.util.zip.ZipException: invalid CEN header (bad signature)
at java.util.zip.ZipFile.open(Native Method)
at
So you mean that the script is checking for this error, and takes it
as a sign that you compiled with java 6.
Your command seems to confirm that reading the assembly jar does fail
on your system though. What version does the jar command show? are you
sure you don't have JRE 7 but JDK 6 installed?
Can you try extraClassPath or driver-class-path and see if that helps with
the distribution?
On Tue, Feb 24, 2015 at 14:54 Michael Nazario mnaza...@palantir.com wrote:
Has anyone experienced a problem with the SPARK_CLASSPATH not distributing
jars for PySpark? I have a detailed description of
13 matches
Mail list logo