date:20191104

A question about skew join hint

2019-11-04 Thread zhangliyun

Hi all: i saw skewed join hint optimization in https://docs.azuredatabricks.net/delta/join-performance/skew-join.html. it is a great feature to help users to avoid the problem brought from skewed data. My question 1. which version we will have this ? i have not found the feature in the ma

[DISCUSS] Remove sorting of fields in PySpark SQL Row construction

2019-11-04 Thread Bryan Cutler

Currently, when a PySpark Row is created with keyword arguments, the fields are sorted alphabetically. This has created a lot of confusion with users because it is not obvious (although it is stated in the pydocs) that they will be sorted alphabetically. Then later when applying a schema and the fi

Re: Avro file question

2019-11-04 Thread ayan guha

Assuming you always read data together one large file is good and basic hdfs use case On Tue, 5 Nov 2019 at 4:28 am, Yaniv Harpaz wrote: > It depends on your usage (when and how u read). > the smaller files you were thinking about are also larger than the HDFS > block size? > I would not go for

Re: Avro file question

2019-11-04 Thread Yaniv Harpaz

It depends on your usage (when and how u read). the smaller files you were thinking about are also larger than the HDFS block size? I would not go for something smaller than a block. Usually (if relevant to the way you read the data) the partitioning helps determine that. Yaniv Harpaz [ yaniv.har

Avro file question

2019-11-04 Thread Sam

Hi, How do we choose between single large avro file (size much larger than HDFS block size) vs multiple smaller avro files (close to HDFS block size? Since avro is splittable, is there even a need to split a very large avro file into smaller files? I’m assuming that a single large avro file can

A question about skew join hint

[DISCUSS] Remove sorting of fields in PySpark SQL Row construction

Re: Avro file question

Re: Avro file question

Avro file question

5 matches

Site Navigation

Mail list logo

Footer information