The errors maybe happens because that there is not enough memory in
worker, so it keeping spilling with many small files, could you verify
that the PR [1] could fix your problem?

[1] https://github.com/apache/spark/pull/3252

On Thu, Nov 13, 2014 at 11:28 AM, santon <steven.m.an...@gmail.com> wrote:
> Thanks for the thoughts. I've been testing on Spark 1.1 and haven't seen the
> IndexError yet. I've run into some other errors ("too many open files"), but
> these issues seem to have been discussed already. The dataset, by the way,
> was about 40 Gb and 188 million lines; I'm running a sort on 3 worker nodes
> with a total of about 80 cores.
>
> Thanks again for the tips!
>
> On Fri, Nov 7, 2014 at 6:03 PM, Davies Liu-2 [via Apache Spark User List]
> <[hidden email]> wrote:
>>
>> Could you tell how large is the data set? It will help us to debug this
>> issue.
>>
>> On Thu, Nov 6, 2014 at 10:39 AM, skane <[hidden email]> wrote:
>>
>> > I don't have any insight into this bug, but on Spark version 1.0.0 I ran
>> > into
>> > the same bug running the 'sort.py' example. On a smaller data set, it
>> > worked
>> > fine. On a larger data set I got this error:
>> >
>> > Traceback (most recent call last):
>> >   File "/home/skane/spark/examples/src/main/python/sort.py", line 30, in
>> > <module>
>> >     .sortByKey(lambda x: x)
>> >   File "/usr/lib/spark/python/pyspark/rdd.py", line 480, in sortByKey
>> >     bounds.append(samples[index])
>> > IndexError: list index out of range
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> > http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-issue-with-sortByKey-IndexError-list-index-out-of-range-tp16445p18288.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [hidden email]
>> > For additional commands, e-mail: [hidden email]
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-issue-with-sortByKey-IndexError-list-index-out-of-range-tp16445p18393.html
>> To unsubscribe from PySpark issue with sortByKey: "IndexError: list index
>> out of range", click here.
>> NAML
>
>
>
> ________________________________
> View this message in context: Re: PySpark issue with sortByKey: "IndexError:
> list index out of range"
>
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to