Using sc.textFile will also read the file from HDFS one by one line through
iterator, don't need to fit all into memory, even you have small size of
memory, it still can be worked.
2015-06-12 13:19 GMT+08:00 SLiZn Liu sliznmail...@gmail.com:
Hmm, you have a good point. So should I load the file
Hi Spark Users,
I'm trying to load a literally big file (50GB when compressed as gzip file,
stored in HDFS) by receiving a DStream using `ssc.textFileStream`, as this
file cannot be fitted in my memory. However, it looks like no RDD will be
received until I copy this big file to a prior-specified
Hmm, you have a good point. So should I load the file by `sc.textFile()`
and specify a high number of partitions, and the file is then split into
partitions in memory across the cluster?
On Thu, Jun 11, 2015 at 9:27 PM ayan guha guha.a...@gmail.com wrote:
Why do you need to use stream in this