I don't think you control which host he receiver runs on, right? So that
Spark can handle the failure of that node and reassign the receiver.
On Sep 27, 2014 2:43 AM, centerqi hu cente...@gmail.com wrote:
the receiver is not running on the machine I expect
2014-09-26 14:09 GMT+08:00 Sean
Hi Alexey,
You're looking in the right place in the first log from the driver.
Specifically the locality is on the TaskSetManager INFO log level and looks
like this:
14/09/26 16:57:31 INFO TaskSetManager: Starting task 9.0 in stage 1.0
(TID 10, 10.54.255.191, ANY, 1341 bytes)
The ANY there
hello,
im examining the SPARK RDDs and trying to understand how does the RDD flow
works.
can any one please tell me how does the RDD decide to (and where can i find
the relevant code):
1. re-split to new RDD?
2. move to a new PC?
3. perform PC selection?
4. preform union of multiple RDDs?
5.
Based on your first example it looks like what you want is actually run
length encoding (which parquet does support
https://github.com/Parquet/parquet-format/blob/master/Encodings.md).
Repetition and definition levels are used to reconstruct nested or repeated
(arrays) data that has been shredded
Guys,
- Need help in terms of the interesting features coming up in MLlib 1.2.
- I have a 2 Part, ~3 hr hands-on tutorial at the Big Data Tech Con
- The Hitchhiker's Guide to Machine Learning with Python Apache
Spark[2]
- At minimum, it would be good to take the last 30 min
Hi,
I was able to download the dataset this way (and just reconfirmed it by
doing so again):
//Following before starting spark
export AWS_ACCESS_KEY_ID=*key_id*
export AWS_SECRET_ACCESS_KEY=*access_key*
//Start spark
./spark-shell
//In the spark shell
val dataset =
Hi
I am having a heck of time trying to get python to work correctly on my
cluster created using the spark-ec2 script
The following link was really helpful
https://issues.apache.org/jira/browse/SPARK-922
I am still running into problem with matplotlib. (it works fine on my mac).
I can not
Can you first confirm that the regular PySpark shell works on your cluster?
Without upgrading to 2.7. That is, you log on to your master using spark-ec2
login and run bin/pyspark successfully without any special flags.
And as far as I can tell, you should be able to use IPython at 2.6, so I’d
hi all,
I have a job that works ok in yarn-client mode,but when I try in
yarn-cluster mode it returns the following:
WARN YarnClusterScheduler: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have sufficient
memory
the cluster has
Hi,
you can create a spark context in your python or scala environment and use
that to run your hive queries, pretty much the same way as you'd do it in
the shell.
thanks,
--
View this message in context:
hi,
Yes, I have been using spark sql extensively that way.
I have just tried and saveAsTable() works OK on 1.1.0.
Alternatively, you can write the data from schemaRDD to HDFS using
saveAsTextFile, and create an external table on top of it.
thanks,
--
View this message in context:
I actually got this same exact issue compiling a unrelated project (not using
spark). Maybe it's a protobuf issue?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Build-spark-with-Intellij-IDEA-13-tp9904p15284.html
Sent from the Apache Spark User List
Hi all!
I'm running PageRank on GraphX, and I find on some tasks on one machine
can spend 5~6 times more time than on others, others are perfectly
balance (around 1 second to finish).
And since time for a stage (iteration) is determined by the slowest
task, the performance is undesirable.
I
Hi, everyone
I come across with a problem about increasing the concurency. In a
program, after shuffle write, each node should fetch 16 pair matrices to do
matrix multiplication. such as:
*
import breeze.linalg.{DenseMatrix = BDM}
pairs.map(t = {
val b1 =
in the options of spark-submit, there are two options which may be helpful to
your problem, they are --total-executor-cores NUM(standalone and mesos only),
--executor-cores(yarn only)
qinwei
From: myasukaDate: 2014-09-28 11:44To: userSubject: How to use multi thread in
RDD map function
15 matches
Mail list logo