DataFrame API and Ordering

2016-02-16 Thread Maciej Szymkiewicz
I am not sure if I've missed something obvious but as far as I can tell DataFrame API doesn't provide a clearly defined ordering rules excluding NaN handling. Methods like DataFrame.sort or sql.functions like min / max provide only general description. Discrepancy between functions.max (min) and

Re: Welcoming two new committers

2016-02-16 Thread Raffael Bottoli Schemmer
Congratulations Herman and Wenchen, 2016-02-16 20:45 GMT-02:00 Igor Costa : > Congratulations Herman and Wenchen. > > On Tue, Feb 9, 2016 at 10:58 AM, Joseph Bradley > wrote: > >> Congrats & welcome! >> >> On Mon, Feb 8, 2016 at 12:19 PM, Ram

Re: Welcoming two new committers

2016-02-16 Thread Igor Costa
Congratulations Herman and Wenchen. On Tue, Feb 9, 2016 at 10:58 AM, Joseph Bradley wrote: > Congrats & welcome! > > On Mon, Feb 8, 2016 at 12:19 PM, Ram Sriharsha > wrote: > >> great job guys! congrats and welcome! >> >> On Mon, Feb 8, 2016 at

Re: SPARK_WORKER_MEMORY in Spark Standalone - conf.getenv vs System.getenv?

2016-02-16 Thread Igor Costa
Actually answering the first question: Is there a reason to use conf to read SPARK_WORKER_MEMORY not System.getenv as for the other env vars? You can use the properties file to change the amount, System.getenv would be bad when you have for example other things running on the JVM which will

Call wholeTextFiles to read gzip files

2016-02-16 Thread Deepak Gopalakrishnan
Hello, I'm reading S3 files using wholeTextFiles() . My files are gzip format but the names of the files does not end with a ".gz". I cannot force the names of these files to end with a ".gz" . Is there a way to specify the InputFormat as Gzip when using wholeTextFiles() ? -- Regards, *Deepak