No, this is just standard Maven informational license info in
META-INF. It is not going to affect runtime behavior or how classes
are loaded.
On Mon, Jun 23, 2014 at 6:30 AM, anoldbrain anoldbr...@gmail.com wrote:
I checked the META-INF/DEPENDENCIES file in the spark-assembly jar from
official
I used Java Decompiler to check the included
org.apache.commons.codec.binary.Base64 .class file (in spark-assembly jar
file) and for both encodeBase64 and decodeBase64, there is only (byte
[]) version and no encodeBase64/decodeBase64(String).
I have encountered the reported issue. This conflicts
We are using Spark 1.0.0 deployed on Spark Standalone cluster and I'm getting
the following exception. With previous version I've seen this error occur
along with OutOfMemory errors which I'm not seeing with Sparks 1.0.
Any suggestions?
Job aborted due to stage failure: Task 3748.0:20 failed 4
Note that regarding a long load time, data format means a whole lot in
terms of query performance. If you load all your data into compressed,
columnar Parquet files on local hardware, Spark SQL would also perform far,
far better than it would reading from gzipped S3 files. You must also be
careful
I'm getting the same behavior and it's crucial I get it fixed for an
evaluation of Spark + Mesos within my company.
I'm bumping +1 for the request of putting this fix in the 1.0.1 if possible!
thanks,
Federico
2014-06-20 20:51 GMT+02:00 Sébastien Rainville sebastienrainvi...@gmail.com
:
Hi,
Hi,
I've implemented a class with measures for evaluation of multiclass
classification (as well as unit tests). They are per class and averaged
Precision, Recall and F1-measure. As far as I know, in Spark, there is binary
classification evaluator only, given that Spark's Bayesian classifier
Hi,
The real-world dataset is a bit more large, so I tested on the MovieLens
data set, and find the same results:
alpha
lambda
rank
top1
top5
EPR_in
EPR_out
40
0.001
50
297
559
0.05855
found a workaround by adding SPARK_CLASSPATH=.../commons-codec-xxx.jar to
spark-env.sh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-Spark-Accumulo-Error-java-lang-NoSuchMethodError-org-apache-commons-codec-binary-Base64-eng-tp7667p8117.html
On Sun, Jun 22, 2014 at 5:53 PM, Debasish Das debasish.da...@gmail.com
wrote:
600s for Spark vs 5s for Redshift...The numbers look much different from
the amplab benchmark...
https://amplab.cs.berkeley.edu/benchmark/
Is it like SSDs or something that's helping redshift or the whole data is
On Mon, Jun 23, 2014 at 8:50 AM, Aaron Davidson ilike...@gmail.com wrote:
Note that regarding a long load time, data format means a whole lot in
terms of query performance. If you load all your data into compressed,
columnar Parquet files on local hardware, Spark SQL would also perform far,
Thanks for pointer...tried Kryo and ran into a strange error:
org.apache.spark.SparkException: Job aborted due to stage failure: Exception
while deserializing and fetching task:
com.esotericsoftware.kryo.KryoException: Unable to find class:
rg.apache.hadoop.hbase.io.ImmutableBytesWritable
It is
Hi folks, hoping someone can explain to me what's going on:
I have the following code, largely based on RecoverableNetworkWordCount
example (
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCount.scala
):
I am setting
So it does not work for files on HDFS either? That is really a problem.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-Failed-to-run-first-tp7691p8128.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
object in Scala is similar to a class with only static fields /
methods in Java. So when you set its fields in the driver, the
object does not get serialized and sent to the executors; they have
their own copy of the class and its static fields, which haven't been
initialized.
Use a proper class,
Hi All,I am new so Scala and Spark. I have a basic question. I have the
following import statements in my Scala program. I want to pass my function
(printScore) to Spark. It will compare a string
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import
Hi all,
I am new to Spark, so this is probably a basic question. i want to explore
the possibilities of this fw, concretely using it in conjunction with 3
party libs, like mongodb, for example.
I have been keeping instructions from
http://codeforhire.com/2014/02/18/using-spark-with-mongodb/ in
Thank you so much! I was trying for a singleton and opted against a class
but clearly this backfired. Clearly time to revisit Scala lessons. Thanks
again
On Mon, Jun 23, 2014 at 1:16 PM, Marcelo Vanzin van...@cloudera.com wrote:
object in Scala is similar to a class with only static fields /
I checked the source code, it looks like it was re-added back based on JIRA
SPARK-1588, but I don't know if there's any test case associated with this?
SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN.
Sandy Ryza sa...@cloudera.com
2014-04-29 12:54:02 -0700
Here is my conversation about the same issue with regression methods:
https://issues.apache.org/jira/browse/SPARK-1859
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-regression-results-way-off-tp7672p8139.html
Sent from the Apache Spark User List
Assuming this should not happen, I don't want to have to keep building a
custom version of spark for every new release, thus preferring the
workaround.
--
View this message in context:
Hello,
I noticed there are some discussions about implementing K-fold validation to
Mllib on Spark and believe it should be in Spark-1.0 now. However there
isn't any documentation or example about how to use it in the training.
While I am reading the code to find out, does anyone use it
I used some standard Java IO libraries to write files directly to the
cluster. It is a little bit trivial tho:
val sc = getSparkContext
val hadoopConf = SparkHadoopUtil.get.newConfiguration
val hdfsPath = hdfs://your/path
val fs = FileSystem.get(hadoopConf)
val path
I am relatively new to Spark and am getting stuck trying to do the following:
- My input is integer key, value pairs where the key is not unique. I'm
interested in information about all possible distinct key combinations, thus
the Cartesian product.
- My first attempt was to create a separate
Sorry, I got my sample outputs wrong
(1,1) - 400
(1,2) - 500
(2,2)- 600
On Jun 23, 2014, at 4:29 PM, Aaron Dossett [via Apache Spark User List]
ml-node+s1001560n8144...@n3.nabble.commailto:ml-node+s1001560n8144...@n3.nabble.com
wrote:
I am relatively new to Spark and am getting stuck trying
Hi All,
It's great to see a growing number of companies Powered By Spark
https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark!
If you're running Spark on Apache Mesos http://mesos.apache.org, drop me
a line or post to the u...@mesos.apache.org list and we'll also be happy to
add
We have a use case where we’d like something to execute once on each node and I
thought it would be good to ask here.
Currently we achieve this by setting the parallelism to the number of nodes and
use a mod partitioner:
val balancedRdd = sc.parallelize(
(0 until Settings.parallelism)
Ah never mind. The 0.0.0.0 is for the UI, not for Master, which uses the
output of the hostname command. But yes, long answer short, go to the web
UI and use that URL.
2014-06-23 11:13 GMT-07:00 Andrew Or and...@databricks.com:
Hm, spark://localhost:7077 should work, because the standalone
I am using Spark 1.0.0. I am able to successfully run sbt package.
However, when I run sbt test or sbt test-only class,
I get the following error:
[error] error while loading root, zip file is empty
scala.reflect.internal.MissingRequirementError: object scala.runtime in
compiler mirror not
I have two jars with the following packages
package a.b.c.d.z found in jar1
package a.b.e found in jar2
In scala REPL (no spark) both imports work just fine, but in the Spark
REPL, I found that
import a.b.c.d.z gives me the following error
object c is not a member of package a.b
Has
Hi All,
I am using spark for text analysis. I have a source file that has few thousand
sentences and a dataset of tens of millions of statements. I want to compare
each statement from the sourceFile with each statement from the dataset and
generate a score. I am having following problem. I
Actually I figured it out. There was a problem was that I was loading the
sbt package-ed jar into the class path and not the sbt assembly-ed jar.
Once I put the right jar in for package a.b.c.d.z everything worked
thanks
shivani
On Mon, Jun 23, 2014 at 4:38 PM, Shivani Rao raoshiv...@gmail.com
The subject should be: org.apache.spark.SparkException: Job aborted due to
stage failure: Task not serializable: java.io.NotSerializableException: and
not DAGScheduler: Failed to run foreach
If I call printScoreCanndedString with a hard-coded string and identical 2nd
parameter, it works fine.
Please note that this:
for (sentence - sourcerdd) {
...
}
is actually Scala syntactic sugar which is converted into
sourcerdd.foreach { sentence = ... }
What this means is that this will actually run on the cluster, which is
probably not what you want if you're trying to print them.
Try
I see that the task will either be a ShuffleMapTask or be a ResultTask, I
wonder which function will generate a ShuffleMapTask, which will generate a
ResultTask?
34 matches
Mail list logo