Hi,
It might be a naive question, but I still wish that somebody could help me
handle it.
I have a textFile, in which every 4 lines represent a record. Since
SparkContext.textFile() API deems of one line as a record, it does not fit
into my case. I know that SparkContext.hadoopFile or
Version: spark 1.1.0
42 workers,40g memory per worker
Running graphx componentgraph ,use five hours
在 Oct 25, 2014,1:27,Sameer Farooqui same...@databricks.com 写道:
That does seem a bit odd. How many Executors are running under this Driver?
Does the spark-submit process start out using
Hi,
I have a simple accumulator that needs to be passed to a foo() function
inside a map job:
val myCounter = sc.accumulator(0)
val myRDD = sc.textFile(inputpath) // :spark.RDD[String]
myRDD.flatMap(line = foo(line))
def foo(line: String) = {
myCounter += 1 // line throwing
Hi all,
I tried to use accumulators without any success so far.
My code is simple:
val sc = new SparkContext(conf)
val accum = sc.accumulator(0)
val partialStats = sc.textFile(f.getAbsolutePath())
.map(line = { val key = line.split(\t).head; (key , line)} )
There is for sure a bug in the Accumulators code.
More specifically, the following code works well as expected:
def main(args: Array[String]) {
val conf = new SparkConf().setAppName(EL LBP SPARK)
val sc = new SparkContext(conf)
val accum = sc.accumulator(0)
Hi Andrew,
We were running the master after SPARK-3613. Will give another shot
against the current master while Josh fixed couple issues in shuffle.
Thanks.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn:
works fine. Spark 1.1.0 on REPL
On Sat, Oct 25, 2014 at 1:41 PM, octavian.ganea octavian.ga...@inf.ethz.ch
wrote:
There is for sure a bug in the Accumulators code.
More specifically, the following code works well as expected:
def main(args: Array[String]) {
val conf = new
Hi,
I have a spark cluster that has 5 machines with 32 GB memory each and 2
machines with 24 GB each.
I believe the spark.executor.memory will assign the executor memory for all
executors.
How can I use 32 GB memory from the first 5 machines and 24 GB from the
next 2 machines.
Thanks
..Manas
If your file is not very large, try
sc.wholeTextFiles(...).values.flatMap(_.split(\n).grouped(4).map(_.mkString(\n)))
-Xiangrui
On Sat, Oct 25, 2014 at 12:57 AM, Parthus peng.wei@gmail.com wrote:
Hi,
It might be a naive question, but I still wish that somebody could help me
handle it.
Ashwin,
What is your motivation for needing to share RDDs between jobs? Optimizing
for reusing data across jobs?
If so, you may want to look into Tachyon. My understanding is that Tachyon
acts like a caching layer and you can designate when data will be reused in
multiple jobs so it know to keep
Hello all,
We are considering Spark for our organization. It is obviously a superb
platform for processing massive amounts of data... how about retrieving it?
We are currently storing our data in a relational database in a star
schema. Retrieving our data requires doing many complicated joins
1. What data store do you want to store your data in ? HDFS, HBase,
Cassandra, S3 or something else?
2. Have you looked at SparkSQL (https://spark.apache.org/sql/)?
One option is to process the data in Spark and then store it in the
relational database of your choice.
On Sat, Oct 25, 2014 at
12 matches
Mail list logo