Thank you Farhan so much for the help.
please help me on the design approach of this problem.what is the best way
to achieve this code to get the results better.
I have some clarification on the code.
want to take daily record count of ingestion source vs databricks delta lake
table vs
Hi Anbutech,
If I am not mistaken, I believe you are trying to read multiple
dataframes from around 150 different paths (in your case the Kafka
topics) to count their records. You have all these paths stored in a
CSV with columns year, month, day and hour.
Here is what I came up with; I have
Hi,
I have a question on the design of monitoring pyspark script on the large
number of source json data coming from more than 100 kafka topics.
These multiple topics are store under separate bucket in aws s3.each of the
kafka topics having more Terabytes of json data with respect to the