Re: bug for large textfiles on windows

2016-01-28 Thread Christopher Bourez
Dears, I recompiled Spark on Windows, sounds to work better. My problem with Pyspark remains : https://issues.apache.org/jira/browse/SPARK-12261 I do not know how to debug this, sounds to be linked with Pickle, the garbage collector... I would like to clear the Spark context to see if I can gain

Re: bug for large textfiles on windows

2016-01-25 Thread Josh Rosen
Hi Christopher, What would be super helpful here is a standalone reproduction. Ideally this would be a single Scala file or set of commands that I can run in `spark-shell` in order to reproduce this. Ideally, this code would generate a giant file, then try to read it in a way that demonstrates

bug for large textfiles on windows

2016-01-25 Thread Christopher Bourez
Dears, I would like to re-open a case for a potential bug (current status is resolved but it sounds not) : *https://issues.apache.org/jira/browse/SPARK-12261 * I believe there is something wrong about the memory management under windows It has

Re: bug for large textfiles on windows

2016-01-25 Thread Christopher Bourez
The same problem occurs on my desktop at work. What's great with AWS Workspace is that you can easily reproduce it. I created the test file with commands : for i in {0..30}; do VALUE="$RANDOM" for j in {0..6}; do VALUE="$VALUE;$RANDOM"; done echo $VALUE >> test.csv done

Re: bug for large textfiles on windows

2016-01-25 Thread Christopher Bourez
Josh, Thanks a lot ! You can download a video I created : https://s3-eu-west-1.amazonaws.com/christopherbourez/public/video.mov I created a sample file of 13 MB as explained : https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv Here are the commands I did : I created an Aws

Re: bug for large textfiles on windows

2016-01-25 Thread Christopher Bourez
Here is a pic of memory If I put --conf spark.driver.memory=3g, it increases the displaid memory, but the problem remains... for a file that is only 13M. Christopher Bourez 06 17 17 50 60 On Mon, Jan 25, 2016 at 10:06 PM, Christopher Bourez < christopher.bou...@gmail.com> wrote: > The same