Hey,
The question is tricky, here is a possible answer by defining years as keys for
a hashmap per client and merging those :
import scalaz._
import Scalaz._
val sc = new SparkContext("local[*]", "sandbox")
// Create RDD of your objects
val rdd = sc.parallelize(Seq(
("A", 2015, 4),
("A",
Hi again,
I found this : https://github.com/NetApp/NetApp-Hadoop-NFS-Connector
Maybe it will enable you to read NFS data from Spark at least. Anyone from the
community used it ?
BR,
Fanilo
De : Andrianasolo Fanilo
Envoyé : lundi 26 octobre 2015 15:24
À : 'Kayode Odeyemi'; user
Objet : RE
Hi,
I believe binaryFiles uses a custom Hadoop Input Format, so it can only read
specific Hadoop protocols.
You can find the full list of supported protocols by typing “Hadoop filesystems
hdfs hftp” in Google (the link I found is a little bit long and references the
Hadoop Definitive Guide,
Hi Sampo,
There is a sliding method you could try inside the
org.apache.spark.mllib.rdd.RDDFunctions package, though it’s DeveloperApi stuff
(https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.rdd.RDDFunctions)
import org.apache.spark.{SparkConf, SparkContext}
Hello Spark fellows :),
I think I need some help to understand how .cache and task input works within a
job.
I have an 7 GB input matrix in HDFS that I load using .textFile(). I also have
a config file which contains an array of 12 Logistic Regression Model
parameters, loaded as an
= PredictionReader.getFeatures(…).cache
Where getFeatures() loads the file then parses it.
De : Sandy Ryza [mailto:sandy.r...@cloudera.com]
Envoyé : mercredi 28 janvier 2015 17:12
À : Andrianasolo Fanilo
Cc : user@spark.apache.org
Objet : Re: RDD caching, memory network input
Hi Fanilo,
How many
Hello Spark fellows :)
I'm a new user of Spark and Scala and have been using both for 6 months without
too many problems.
Here I'm looking for best practices for using non-serializable classes inside
closure. I'm using Spark-0.9.0-incubating here with Hadoop 2.2.
Suppose I am using OpenCSV
data within an executor sadly...
Thanks for the input
Fanilo
-Message d'origine-
De : Sean Owen [mailto:so...@cloudera.com]
Envoyé : jeudi 4 septembre 2014 15:36
À : Andrianasolo Fanilo
Cc : user@spark.apache.org
Objet : Re: Object serialisation inside closures
In your original version