Hello,

I have a text file consisting of 483150 lines (wc -l "my_file.txt").

However when I read it using textFile:

%pyspark
rdd = sc.textFile("my_file.txt")
print rdd.count()

it returns 554420 lines. Any idea why this is happening? Is it using a
different new line delimiter and how this can be changed?

Thank you,
George

Reply via email to