The errors maybe happens because that there is not enough memory in worker, so it keeping spilling with many small files, could you verify that the PR [1] could fix your problem?
[1] https://github.com/apache/spark/pull/3252 On Thu, Nov 13, 2014 at 11:28 AM, santon <steven.m.an...@gmail.com> wrote: > Thanks for the thoughts. I've been testing on Spark 1.1 and haven't seen the > IndexError yet. I've run into some other errors ("too many open files"), but > these issues seem to have been discussed already. The dataset, by the way, > was about 40 Gb and 188 million lines; I'm running a sort on 3 worker nodes > with a total of about 80 cores. > > Thanks again for the tips! > > On Fri, Nov 7, 2014 at 6:03 PM, Davies Liu-2 [via Apache Spark User List] > <[hidden email]> wrote: >> >> Could you tell how large is the data set? It will help us to debug this >> issue. >> >> On Thu, Nov 6, 2014 at 10:39 AM, skane <[hidden email]> wrote: >> >> > I don't have any insight into this bug, but on Spark version 1.0.0 I ran >> > into >> > the same bug running the 'sort.py' example. On a smaller data set, it >> > worked >> > fine. On a larger data set I got this error: >> > >> > Traceback (most recent call last): >> > File "/home/skane/spark/examples/src/main/python/sort.py", line 30, in >> > <module> >> > .sortByKey(lambda x: x) >> > File "/usr/lib/spark/python/pyspark/rdd.py", line 480, in sortByKey >> > bounds.append(samples[index]) >> > IndexError: list index out of range >> > >> > >> > >> > -- >> > View this message in context: >> > http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-issue-with-sortByKey-IndexError-list-index-out-of-range-tp16445p18288.html >> > Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: [hidden email] >> > For additional commands, e-mail: [hidden email] >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [hidden email] >> For additional commands, e-mail: [hidden email] >> >> >> >> ________________________________ >> If you reply to this email, your message will be added to the discussion >> below: >> >> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-issue-with-sortByKey-IndexError-list-index-out-of-range-tp16445p18393.html >> To unsubscribe from PySpark issue with sortByKey: "IndexError: list index >> out of range", click here. >> NAML > > > > ________________________________ > View this message in context: Re: PySpark issue with sortByKey: "IndexError: > list index out of range" > > Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org