Stream to Stream joins

2020-08-24 Thread Hamish Whittal
Hi folks, I've got a stream coming from Kafka. It has the following schema: userdata : { id: INT, acctid: INT, uid: STRING, logintm: datetime } I'm trying to count the number of logins by acctid. I can do the count fine, but the table only has the acctid and the count. Now I wish to get all

RE: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

2020-08-24 Thread Luca Canali
Hi Abhishek, Just a few ideas/comments on the topic: When benchmarking/testing I find it useful to collect a more complete view of resources usage and Spark metrics, beyond just measuring query elapsed time. Something like this: https://github.com/cerndb/spark-dashboard I'd rather not use