Re: A simple comparison for three SQL engines

2022-04-09 Thread Wes Peng
may I forward this report to spark list as well. Thanks. Wes Peng wrote: Hello, This weekend I made a test against a big dataset. spark, drill, mysql, postgresql were involved. This is the final report: https://blog.cloudcache.net/handles-the-file-larger-than-memory/ The simple

Re: Spark Write BinaryType Column as continues file to S3

2022-04-09 Thread Bjørn Jørgensen
Hi Philipp. I found this SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE paper. Geotrellis do use pdal in geotrellis-pointcloud

binaryFile write

2022-04-09 Thread Philipp Kraus
Hello, I’m using Spark 3.1.1 and cannot yet update to a newer version. I have got a data frame with a single column of DataTypes.BinaryType where each row contains a byte array with generated binary data. I try now to write this data in a single file with mydataframe.coalesce( 1 )

Re: Aggregate over a column: the proper way to do

2022-04-09 Thread sam smith
Yes. Returns the number of rows in the Dataset as *long*. but in my case the aggregation returns a table of two columns. Le ven. 8 avr. 2022 à 14:12, Sean Owen a écrit : > Dataset.count() returns one value directly? > > On Thu, Apr 7, 2022 at 11:25 PM sam smith > wrote: > >> My bad, yes of