Hi everyone, *<first time writing to this mailing list>*
Context: I have events coming into Databricks from an Azure Event Hub in a Gzip compressed format. Currently, I extract the files with a UDF and send the unzipped data into the silver layer in my Delta Lake with .write. Note that even though data comes in continuously I do not use .writeStream as of now. I have a few design-related questions that I hope someone with experience could help me with! 1. Is there a better way to extract Gzip files than a UDF? 2. Is Spark Structured Streaming or Batch with Databricks Jobs better? (Pipeline runs every 3 hours once, but the data is continuously coming from Event Hub) 3. Should I use Autoloader or just simply stream data into Databricks using Event Hubs? I am especially curious about the trade-offs and the best way forward. I don't have massive amounts of data. Thank you very much in advance! Best wishes, Maurizio Vancho Argall