zipwithIndex will preserve the order whatever is there in your val lines.
I am not sure about the "val lines=sc.textFile("hdfs://mytextFile") " if
this line maintain the order, next will maintain for sure



On 24 April 2015 at 18:35, Spico Florin <spicoflo...@gmail.com> wrote:

> Hello!
>   I know that HadoopRDD partitions are built based on the number of splits
> in HDFS. I'm wondering if these partitions preserve the initial order of
> data in file.
> As an example, if I have an HDFS (myTextFile) file that has these splits:
>
> split 0-> line 1, ..., line k
> split 1->line k+1,..., line k+n
> splt 2->line k+n, line k+n+m
>
> and the code
> val lines=sc.textFile("hdfs://mytextFile")
> lines.zipWithIndex()
>
> will the order of lines preserved?
> (line 1, zipIndex 1) , .. (line k, zipIndex k), and so one.
>
> I found this question on stackoverflow (
> http://stackoverflow.com/questions/26046410/how-can-i-obtain-an-element-position-in-sparks-rdd)
> whose answer intrigued me:
> "Essentially, RDD's zipWithIndex() method seems to do this, but it won't
> preserve the original ordering of the data the RDD was created from"
>
> Can you please confirm that is this the correct answer?
>
> Thanks.
>  Florin
>
>
>
>

Reply via email to