Hi,
It might be a naive question, but I still wish that somebody could help me
handle it.
I have a textFile, in which every 4 lines represent a record. Since
SparkContext.textFile() API deems of one line as a record, it does not fit
into my case. I know that SparkContext.hadoopFile or
Hi there,
I have several large files (500GB per file) to transform into Parquet format
and write to HDFS. The problems I encountered can be described as follows:
1) At first, I tried to load all the records in a file and then used
sc.parallelize(data) to generate RDD and finally used
Hi there,
I was wondering if somebody could tell me how to create an object with given
classtag so as to make the function below work. The only thing to do is just
to write one line to create an object of Class T. I tried new T but it does
not work. Would it possible to give me one scala line to
Hi there,
I was wondering if anybody could help me find an efficient way to make a
MapReduce program like this:
1) For each map function, it need access some huge files, which is around
6GB
2) These files are READ-ONLY. Actually they are like some huge look-up
table, which will not change