You need to use wholetextfiles to read the whole file at once. Otherwise,
It can be split.
DB Tsai - Sent From My Phone
On Mar 17, 2016 12:45 AM, "Blaž Šnuderl" wrote:
> Hi.
>
> We have json data stored in S3 (json record per line). When reading the
> data from s3 using the
Hi.
We have json data stored in S3 (json record per line). When reading the
data from s3 using the following code we started noticing json decode
errors.
sc.textFile(paths).map(json.loads)
After a bit more investigation we noticed an incomplete line, basically the
line was
> {"key": "value",