spark master/worker logs after job completes

2022-03-08 Thread Bulldog20630405
coming from a yarn background; log files can be found after job finishes... with spark master/workers how to configure to get logs after job finishes? we have setup our spark history server and spark-defaults include: spark.eventLog.enabled true spark.eventLog.dir

Re: Decompress Gzip files from EventHub with Structured Streaming

2022-03-08 Thread ayan guha
Hi IMHO this is not the best use of spark. I would suggest to use simple azure function to unzip. Is there any specific reason to use gzip over event hub? If you can wait 10-20 sec to process, you can use eventhub capture to write data to storage and then process it. It all depends on compute

Decompress Gzip files from EventHub with Structured Streaming

2022-03-08 Thread Data Guy
Hi everyone, ** Context: I have events coming into Databricks from an Azure Event Hub in a Gzip compressed format. Currently, I extract the files with a UDF and send the unzipped data into the silver layer in my Delta Lake with .write. Note that even though data comes in continuously I do not

Spark kafka structured streaming - how to prevent dataloss

2022-03-08 Thread Gnanasoundari Soundarajan
Hi, In spark, it uses checkpoints to keep track of offsets in kafka. If there is any data loss, can we edit the file and reduce the data loss? Please suggest the best practices to reduce the data loss under exceptional scenarios. Regards, Gnana