Thank you very much lot of very small json files was exactly the speed
performance problem, using coalesce makes my Spark program to run on single
node only twice slower (even with starting Spark) than single node Python
program, which is acceptable.
Jan
In my programer, the application always connects to master fail for
serveral iterations. The driver' log is as follows:
WARN AppClient$ClientActor: Connection to
akka.tcp://sparkMaster@master1:7077 failed; waiting for master to
reconnect...
why does this warnning happen and how to avoid it?
In additional, driver receives serveral DisassociatedEvent messages.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/why-does-driver-connects-to-master-fail-tp16758p16759.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Check for any variables you've declared in your class. Even if you're not
calling them from the function they are passed to the worker nodes as part
of the context. Consequently, if you have something without a default
serializer (like an imported class) it will also get passed.
To fix this you
I am working with Spark 1.1.0 and I believe Timestamp is a supported data type
for Spark SQL. However I keep getting this MatchError for java.sql.Timestamp
when I try to use reflection to register a Java Bean with Timestamp field.
Anything wrong with my code below?
public
Can you provide the exception stack?
Thanks,
Daoyuan
From: Ge, Yao (Y.) [mailto:y...@ford.com]
Sent: Sunday, October 19, 2014 10:17 PM
To: user@spark.apache.org
Subject: scala.MatchError: class java.sql.Timestamp
I am working with Spark 1.1.0 and I believe Timestamp is a supported data type
scala.MatchError: class java.sql.Timestamp (of class java.lang.Class)
at
org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:189)
at
I built the latest Spark project and I'm running into these errors when
attempting to run the streaming examples locally on the Mac, how do I fix
these errors?
java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
Hi, I'm new to spark and just trying to make sense of the SVMWithSGD example.
I ran my dataset through it and build a model. When I call predict() on the
testing data (after clearThreshold()) I was expecting to get answers in the
range of 0 to 1. But they aren't, all predictions seem to be
Hi, I'm new to spark and just trying to make sense of the SVMWithSGD
example.
I ran my dataset through it and build a model. When I call predict() on
the testing data (after clearThreshold()) I was expecting to get answers in
the range of 0 to 1. But they aren't, all predictions seem to be
The problem is that you called clearThreshold(). The result becomes the SVM
margin not a 0/1 class prediction. There is no probability output.
There was a very similar question last week. Is there an example out there
suggesting clearThreshold()? I also wonder if it is good to overload the
Thanks.
The example I used is here
https://spark.apache.org/docs/latest/mllib-linear-methods.html see
SVMClassifier
So there's no way to get a probability based output? What about from
linear regression, or logistic regression?
On 19 October 2014 19:52, Sean Owen so...@cloudera.com wrote:
Ah right. It is important to use clearThreshold() in that example in
order to generate margins, because the AUC metric needs the
classifications to be ranked by some relative strength, rather than
just 0/1. These outputs are not probabilities, and that is not what
SVMs give you in general. There
Any response for this?
1. How do I know what statements will be executed on worker side out of the
spark script in a stage.
e.g. if I have
val x = 1 (or any other code)
in my driver code, will the same statements be executed on the worker side
in a stage?
2. How can I do a map side
BTW several people asked about registration and student passes. Registration
will open in a few weeks, and like in previous Spark Summits, I expect there to
be a special pass for students.
Matei
On Oct 18, 2014, at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
After successful
Hi Deb,
Do check out https://github.com/OryxProject/oryx.
It does integrate with Spark. Sean has put in quite a bit of neat details
on the page about the architecture. It has all the things you are thinking
about:)
Thanks,
Jayant
On Sat, Oct 18, 2014 at 8:49 AM, Debasish Das
I am very new to Spark.
I am work on a project that involves reading stock transactions off a number
of TCP connections and
1. periodically (once every few hours) uploads the transaction records to
HBase
2. maintains the records that are not yet written into HBase and acts as a
HTTP query server
Thanks for the info.
On 19 October 2014 20:46, Sean Owen so...@cloudera.com wrote:
Ah right. It is important to use clearThreshold() in that example in
order to generate margins, because the AUC metric needs the
classifications to be ranked by some relative strength, rather than
just 0/1.
I'm building a model in a stand alone cluster with just a single worker
limited to use 3 cores and 4GB ram. The node starts up and spits out the
message:
Starting Spark worker 192.168.1.185:60203 with 3 cores, 4.0 GB RAM
During the model train (SVMWithSGD) the CPU on the worker is very low. It
Hello,
I have a cluster 1 master and 2 slaves running on 1.1.0. I am having
problems to get both slaves working at the same time. When I launch the
driver on the master, one of the slaves is assigned the receiver task, and
initially both slaves start processing tasks. After a few tens of batches,
Trying to upgrade from Spark 1.0.1 to 1.1.0. Can’t imagine the upgrade is the
problem but anyway...
I get a NoClassDefFoundError for RandomGenerator when running a driver from the
CLI. But only when using a named master, even a standalone master. If I run
using master = local[4] the job
@Sean Owen,
Thank you for the information.
I change the pom file to include math3, because I needed the math3 library from
my previous use with 1.0.2.
Best regards,
Henry
-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: Saturday, October 18, 2014 2:19 AM
To: MA33
Seems bugs in the JavaSQLContext.getSchema(), which doesn't enumerate all of
the data types supported by Catalyst.
From: Ge, Yao (Y.) [mailto:y...@ford.com]
Sent: Sunday, October 19, 2014 11:44 PM
To: Wang, Daoyuan; user@spark.apache.org
Subject: RE: scala.MatchError: class java.sql.Timestamp
Hi all,
I have a Spark-0.9 cluster, which has 16 nodes.
I wrote a Spark application to read data from an HBase table, which has 86
regions spreading over 20 RegionServers.
I submitted the Spark app in Spark standalone mode and found that there
were 86 executors running on just 3 nodes and it
Hi,
I usually use file on hdfs to make PairRDD and analyze it by using
combineByKey,reduceByKey, etc.
But sometimes it hangs when I set spark.default.parallelism configuration,
though the size of file is small.
If I remove this configuration, all works fine.
Does anyone tell me why this occur?
I have created an issue for this
https://issues.apache.org/jira/browse/SPARK-4003
From: Cheng, Hao
Sent: Monday, October 20, 2014 9:20 AM
To: Ge, Yao (Y.); Wang, Daoyuan; user@spark.apache.org
Subject: RE: scala.MatchError: class java.sql.Timestamp
Seems bugs in the JavaSQLContext.getSchema(),
Write to hdfs and then get one file locally bu using hdfs dfs -getmerge...
On Friday, October 17, 2014, Sean Owen so...@cloudera.com wrote:
You can save to a local file. What are you trying and what doesn't work?
You can output one file by repartitioning to 1 partition but this is
probably
27 matches
Mail list logo