Re: PySpark issue with sortByKey: "IndexError: list index out of range"

2014-11-13 Thread santon
Thanks for the thoughts. I've been testing on Spark 1.1 and haven't seen
the IndexError yet. I've run into some other errors ("too many open
files"), but these issues seem to have been discussed already. The dataset,
by the way, was about 40 Gb and 188 million lines; I'm running a sort on 3
worker nodes with a total of about 80 cores.

Thanks again for the tips!

On Fri, Nov 7, 2014 at 6:03 PM, Davies Liu-2 [via Apache Spark User List] <
ml-node+s1001560n18393...@n3.nabble.com> wrote:

> Could you tell how large is the data set? It will help us to debug this
> issue.
>
> On Thu, Nov 6, 2014 at 10:39 AM, skane <[hidden email]
> > wrote:
>
> > I don't have any insight into this bug, but on Spark version 1.0.0 I ran
> into
> > the same bug running the 'sort.py' example. On a smaller data set, it
> worked
> > fine. On a larger data set I got this error:
> >
> > Traceback (most recent call last):
> >   File "/home/skane/spark/examples/src/main/python/sort.py", line 30, in
> > 
> > .sortByKey(lambda x: x)
> >   File "/usr/lib/spark/python/pyspark/rdd.py", line 480, in sortByKey
> > bounds.append(samples[index])
> > IndexError: list index out of range
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-issue-with-sortByKey-IndexError-list-index-out-of-range-tp16445p18288.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: [hidden email]
> 
> > For additional commands, e-mail: [hidden email]
> 
> >
>
> -
> To unsubscribe, e-mail: [hidden email]
> 
> For additional commands, e-mail: [hidden email]
> 
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-issue-with-sortByKey-IndexError-list-index-out-of-range-tp16445p18393.html
>  To unsubscribe from PySpark issue with sortByKey: "IndexError: list index
> out of range", click here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-issue-with-sortByKey-IndexError-list-index-out-of-range-tp16445p18871.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: PySpark issue with sortByKey: "IndexError: list index out of range"

2014-11-09 Thread santon
Sorry for the delay. I'll try to add some more details on Monday.

Unfortunately, I don't have a script to reproduce the error. Actually, it
seemed to be more about the data set than the script. The same code on
different data sets lead to different results; only larger data sets on the
order of 40 GB seemed to crash with the described error. Also, I believe
our cluster was recently updated to CDH 5.2, which uses Spark 1.1. I'll
check to see if the issue was resolved.

On Fri, Nov 7, 2014 at 6:03 PM, Davies Liu-2 [via Apache Spark User List] <
ml-node+s1001560n18393...@n3.nabble.com> wrote:

> Could you tell how large is the data set? It will help us to debug this
> issue.
>
> On Thu, Nov 6, 2014 at 10:39 AM, skane <[hidden email]
> > wrote:
>
> > I don't have any insight into this bug, but on Spark version 1.0.0 I ran
> into
> > the same bug running the 'sort.py' example. On a smaller data set, it
> worked
> > fine. On a larger data set I got this error:
> >
> > Traceback (most recent call last):
> >   File "/home/skane/spark/examples/src/main/python/sort.py", line 30, in
> > 
> > .sortByKey(lambda x: x)
> >   File "/usr/lib/spark/python/pyspark/rdd.py", line 480, in sortByKey
> > bounds.append(samples[index])
> > IndexError: list index out of range
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-issue-with-sortByKey-IndexError-list-index-out-of-range-tp16445p18288.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: [hidden email]
> 
> > For additional commands, e-mail: [hidden email]
> 
> >
>
> -
> To unsubscribe, e-mail: [hidden email]
> 
> For additional commands, e-mail: [hidden email]
> 
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-issue-with-sortByKey-IndexError-list-index-out-of-range-tp16445p18393.html
>  To unsubscribe from PySpark issue with sortByKey: "IndexError: list index
> out of range", click here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-issue-with-sortByKey-IndexError-list-index-out-of-range-tp16445p18442.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.