Found the problem. Control-M characters. Please ignore the post

On Wed, Nov 25, 2015 at 6:06 PM, George Sigletos <sigle...@textkernel.nl>
wrote:

> Hello,
>
> I have a text file consisting of 483150 lines (wc -l "my_file.txt").
>
> However when I read it using textFile:
>
> %pyspark
> rdd = sc.textFile("my_file.txt")
> print rdd.count()
>
> it returns 554420 lines. Any idea why this is happening? Is it using a
> different new line delimiter and how this can be changed?
>
> Thank you,
> George
>
>
>
>
>

Reply via email to