Re: [EXTERNAL] Partial data with ADLS Gen2

2022-07-24 Thread Tufan Rakshit
Just use Delta Best Tufan Sent from my iPhone > On 24 Jul 2022, at 12:20, Shay Elbaz wrote: > >  > This is a known issue. Apache Iceberg, Hudi and Delta lake and among the > possible solutions. > Alternatively, instead of writing the output directly to the "official" > location, write it

Re: [EXTERNAL] Partial data with ADLS Gen2

2022-07-24 Thread Shay Elbaz
This is a known issue. Apache Iceberg, Hudi and Delta lake and among the possible solutions. Alternatively, instead of writing the output directly to the "official" location, write it to some staging directory instead. Once the job is done, rename the staging dir to the official location.

Partial data with ADLS Gen2

2022-07-24 Thread kineret M
I have spark batch application writing to ADLS Gen2 (hierarchy). When designing the application I was sure the spark would perform global commit once the job is committed, but what it really does it commits on each task, meaning *once task completes writing it moves from temp to target storage*.

Re: external table with parquet files: problem querying in sparksql since data is stored as integer while hive schema expects a timestamp

2022-07-24 Thread Gourav Sengupta
Hi, please try to query the table directly by loading the hive metastore (we can do that quite easily in AWS EMR, but we can do things quite easily with everything in AWS), rather than querying the s3 location directly. Regards, Gourav On Wed, Jul 20, 2022 at 9:51 PM Joris Billen wrote: >