Re: Spark streaming pending mircobatches queue max length

2022-07-13 Thread Anil Dasari
Retry. From: Anil Dasari Date: Tuesday, July 12, 2022 at 3:42 PM To: user@spark.apache.org Subject: Spark streaming pending mircobatches queue max length Hello, Spark is adding entry to pending microbatches queue at periodic batch interval. Is there config to set the max size for pending

[Spark Structured Continous Processing] Plans for future left join support.

2022-07-13 Thread Mikołaj Błaszczyk
Hello, As of now Spark Continous Processing does not support logical relation operations like "dataframe.join()". Are there any plans to make it happen in future relases? Thanks in advance for your work. Mikołaj

Re: reading each JSON file from dataframe...

2022-07-13 Thread Gourav Sengupta
Hi, I think that this is a pure example of over engineering. Ayan's advice is the best. Please use SPARK SQL function called as input_file_name() to join the tables. People do not think in terms of RDD anymore unless absolutely required. Also if you have different JSON schemas, just use the

How use pattern matching in spark

2022-07-13 Thread Sid
Hi Team, I have a dataset like the below one in .dat file: 13/07/2022abc PWJ PWJABC 513213217ABC GM20 05. 6/20/39 #01000count Now I want to extract the header and tail records which I was able to do it. Now, from the header, I need to extract the date and match it with the current system

Re: How reading works?

2022-07-13 Thread Sid
Yeah, I understood that now. Thanks for the explanation, Bjorn. Sid On Wed, Jul 6, 2022 at 1:46 AM Bjørn Jørgensen wrote: > Ehh.. What is "*duplicate column*" ? I don't think Spark supports that. > > duplicate column = duplicate rows > > > tir. 5. jul. 2022 kl. 22:13 skrev Bjørn Jørgensen < >