Re: Spark streaming receivers

2020-08-09 Thread Dark Crusader
Hi Russell, This is super helpful. Thank you so much. Can you elaborate on the differences between structured streaming vs dstreams? How would the number of receivers required etc change? On Sat, 8 Aug, 2020, 10:28 pm Russell Spitzer, wrote: > Note, none of this applies to Direct streaming

回复:[Spark-Kafka-Streaming] Verifying the approach for multiple queries

2020-08-09 Thread tianlangstudio
Hello, Sir! What about process and group the data first then write grouped data to Kafka topics A and B. Then read topic A or B from another Spark Application and process it more. Like the term ETL's mean. TianlangStudio Some of the biggest lies: I will start tomorrow/Others are better

[Spark-Kafka-Streaming] Verifying the approach for multiple queries

2020-08-09 Thread Amit Joshi
Hi, I have a scenario where a kafka topic is being written with different types of json records. I have to regroup the records based on the type and then fetch the schema and parse and write as parquet. I have tried structured programming. But dynamic schema is a constraint. So I have used

regexp_extract regex for extracting the columns from string

2020-08-09 Thread anbutech
Hi All, I have a following info.in the data column. <1000> date=2020-08-01 time=20:50:04 name=processing id=123 session=new packt=20 orgin=null address=null dest=fgjglgl here I want to create a separate column for the above key value pairs after the integer <1000> separated by spaces. Is there

Re: Spark batch job chaining

2020-08-09 Thread Jun Zhu
Hi I am using Airflow in such scenario