[ https://issues.apache.org/jira/browse/SPARK-17648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517344#comment-15517344 ]
Imran Rashid commented on SPARK-17648: -------------------------------------- in this particular case, the index is used because there are 3 indexed-aligned datastructures. You could switch to iteration by zipping together 3 iterators, but that just makes this code unnecessarily ugly, in my opinion. I was actually considering going the other direction -- converting *all* uses of {{Seq}} to {{IndexedSeq}}, and perhaps even trying to put in a scalastyle rule to ban {{Seq}}. I can't think of any cases in Spark where you actually want {{Seq}} instead. > TaskSchedulerImpl.resourceOffers should take an IndexedSeq, not a Seq > --------------------------------------------------------------------- > > Key: SPARK-17648 > URL: https://issues.apache.org/jira/browse/SPARK-17648 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core > Affects Versions: 2.0.0 > Reporter: Imran Rashid > Assignee: Imran Rashid > Priority: Minor > > {{TaskSchedulerImpl.resourceOffer}} takes in a {{Seq[WorkerOffer]}}. > however, later on it indexes into this by position. If you don't pass in an > {{IndexedSeq}}, this turns an O(n) operation in an O(n^2) operation. > In practice, this isn't an issue, since just by chance the important places > this is called, the datastructures happen to already be {{IndexedSeq}} s. > But we ought to tighten up the types to make this more clear. I ran into > this while doing some performance tests on the scheduler, and performance was > terrible when I passed in a {{Seq}} and even a few hundred offers were > scheduled very slowly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org