Hi Zeming Yu, Steve
Just to add, we are also going down partitioning using this route but you
should know if you are in AWS land, you are most likely going to use EMRs
at any given time
At the moment EMRs does not do recursive search on wildcards, see this
everything works best if your sources are a few tens to hundreds of MB or
more
Are you referring to the size of the zip file or individual unzipped files?
Any issues with storing a 60 mb zipped file containing heaps of text files
inside?
On 11 Apr. 2017 9:09 pm, "Steve Loughran"
> On 11 Apr 2017, at 11:07, Zeming Yu wrote:
>
> Hi all,
>
> I'm a beginner with spark, and I'm wondering if someone could provide
> guidance on the following 2 questions I have.
>
> Background: I have a data set growing by 6 TB p.a. I plan to use spark to
> read in all
Hi all,
I'm a beginner with spark, and I'm wondering if someone could provide
guidance on the following 2 questions I have.
Background: I have a data set growing by 6 TB p.a. I plan to use spark to
read in all the data, manipulate it and build a predictive model on it (say
GBM) I plan to store