Hi Kane,
In the sequence file, the class is org.apache.hadoop.io.Text. You need to
convert Text to String. There are two approaches:
1. Use implicit conversions to convert Text to String automatically. I
recommend this one. E.g.,
val t2 = sc.sequenceFile[String, String](/user/hdfs/e1Mseq)
Hi,
Does GraphX currently support Giraph/Pregel's aggregator feature? I
was thinking to implement a PageRank version that is able to correctly
handle dangling vertices (i.e. vertices with no outlinks). Therefore I
would have to globally sum up the rank associated to them in every
iteration,
Yes TD,
I can use tcpdump to see if the data are being accepted by the receiver
and if else them are arriving into the IP packet.
Thanks
Em 3/8/14, 4:19, Tathagata Das escreveu:
I am not sure how to debug this without any more information about the
source. Can you monitor on the receiver side
YARN also have this scheduling option.
The problem is all of our applications have the same flow where the first
stage is the heaviest and the rest are very small.
The problem is when some request (application) start to run on the same time,
the first stage of all is schedule in parallel, and
Hi,
Is the spark docker script now mature enough to substitute spark-ec2
script? Anyone here using the docker script is production?
Hi Dana,
It’s hard to tell exactly what is consuming time, but I’d suggest starting by
profiling the single application first. Three things to look at there:
1) How many stages and how many tasks per stage is Spark launching (in the
application web UI at http://driver:4040)? If you have
Hi,
I've installed Spark 0.81 on IDH 3.0.2 as on YARN.
My cluster have 3 servers, 1 is NN and DN, other 2 only DN.
I manage to launch spark-shell and execute the mllib kmeans.
The problem is it is using only one node ( the NN ) and not running on the
other 2 DN
Please advise
My spark-env.sh
Whoa, wait, the docker scripts are only used for testing purposes right
now. They have not been designed with the intention of replacing the
spark-ec2 scripts. For instance, there isn't an ssh server running so you
can stop and restart the cluster (like sbin/stop-all.sh). Also, we
currently mount
Hi
I have some System.out.println in my Java code that is working ok in a local
environment. But when I run the same code on a standalone mode in a EC2
cluster I do not see them at the worker stdout (in the worker node under spark
location/work ) or at the driver console. Could you help me
Hi, I am running cdh5b2. I have installed the hadoop2 version of shark
0.9.0 for cdh5. Want to know if there is a compatible version of shark that
will run with this combination.
--
View this message in context:
There was an issue related to this fixed recently:
https://github.com/apache/spark/pull/103
On Sun, Mar 9, 2014 at 8:40 PM, Koert Kuipers ko...@tresata.com wrote:
edit last line of sbt/sbt, after which i run:
sbt/sbt test
On Sun, Mar 9, 2014 at 10:24 PM, Sean Owen so...@cloudera.com wrote:
Hey Sen,
Is your code in the driver code or inside one of the tasks?
If it's in the tasks, the place you would expect these to be is in
stdout file under spark/appid/work/[stdout/stderr]. Are you seeing
at least stderr logs in that folder? If not then the tasks might not
be running on the
12 matches
Mail list logo