date:20180824

multiple group by action

2018-08-24 Thread 崔苗

Hi, we have some user data with columns(userId,company,client,country,region,city), now we want to count userId by multiple column,such as : select count(distinct userId) group by company select count(distinct userId) group by company,client select count(distinct userId) group by

Re: Persisting driver logs in yarn client mode (SPARK-25118)

2018-08-24 Thread Marcelo Vanzin

I think this would be useful, but I also share Saisai's and Marco's concern about the extra step when shutting down the application. If that could be minimized this would be a much more interesting feature. e.g. you could upload logs incrementally to HDFS, asynchronously, while the app is

Re: python tests: any reason for a huge tests.py?

2018-08-24 Thread Reynold Xin

We should break it. On Fri, Aug 24, 2018 at 9:53 AM Imran Rashid wrote: > Hi, > > another question from looking more at python recently. Is there any > reason we've got a ton of tests in one humongous tests.py file, rather than > breaking it out into smaller files? > > Having one huge file

python tests: any reason for a huge tests.py?

2018-08-24 Thread Imran Rashid

Hi, another question from looking more at python recently. Is there any reason we've got a ton of tests in one humongous tests.py file, rather than breaking it out into smaller files? Having one huge file doesn't seem great for code organization, and it also makes the test parallelization in

Re: Spark data quality bug when reading parquet files from hive metastore

2018-08-24 Thread Driesprong, Fokko

Hi Andrew, This blog gives an idea how to schema is resolved: https://blog.godatadriven.com/multiformat-spark-partition There is some optimisation going on when reading Parquet using Spark. Hope this helps. Cheers, Fokko Op wo 22 aug. 2018 om 23:59 schreef t4 : >

Off Heap Memory

2018-08-24 Thread Jack Kolokasis

Hello, I recently start studying the Spark's memory management system. My question is about the offHeapExecutionMemoryPool and offHeapStorageMemoryPool. 1. How Spark use the offHeapExecutionMemoryPool ? 2. How use the offHeap memory (I understand the allocation side), but it is

multiple group by action

Re: Persisting driver logs in yarn client mode (SPARK-25118)

Re: python tests: any reason for a huge tests.py?

python tests: any reason for a huge tests.py?

Re: Spark data quality bug when reading parquet files from hive metastore

Off Heap Memory

6 matches

Site Navigation

Mail list logo

Footer information