I think MaxOffsetsPerTrigger in Spark + Kafka integration docs would meet your requirement
Отправлено с iPhone > 21 окт. 2020 г., в 12:36, KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> > написал(а): > > Thanks. Do we have option to limit number of records ? Like process only > 10000 or the property we pass ? This way we can handle the amount of the data > for batches that we need . > > Sent from my iPhone > >>> On Oct 21, 2020, at 12:11 AM, lec ssmi <shicheng31...@gmail.com> wrote: >>> >> >> Structured streaming's bottom layer also uses a micro-batch mechanism. >> It seems that the first batch is slower than the latter, I also often >> encounter this problem. It feels related to the division of batches. >> Other the other hand, spark's batch size is usually bigger than flume >> transaction bache size. >> >> >> KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> 于2020年10月21日周三 下午12:19写道: >>> Yes. Changing back to latest worked but I still see the slowness compared >>> to flume. >>> >>> Sent from my iPhone >>> >>>>> On Oct 20, 2020, at 10:21 PM, lec ssmi <shicheng31...@gmail.com> wrote: >>>>> >>>> >>>> Do you start your application with chasing the early Kafka data ? >>>> >>>> Lalwani, Jayesh <jlalw...@amazon.com.invalid> 于2020年10月21日周三 上午2:19写道: >>>>> Are you getting any output? Streaming jobs typically run forever, and >>>>> keep processing data as it comes in the input. If a streaming job is >>>>> working well, it will typically generate output at a certain cadence >>>>> >>>>> >>>>> >>>>> From: KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> >>>>> Date: Tuesday, October 20, 2020 at 1:23 PM >>>>> To: "user @spark" <user@spark.apache.org> >>>>> Subject: [EXTERNAL] Spark Structured streaming - Kakfa - slowness with >>>>> query 0 >>>>> >>>>> >>>>> >>>>> CAUTION: This email originated from outside of the organization. Do not >>>>> click links or open attachments unless you can confirm the sender and >>>>> know the content is safe. >>>>> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> I have started using spark structured streaming for reading data from >>>>> kaka and the job is very slow. Number of output rows keeps increasing in >>>>> query 0 and the job is running forever. any suggestions for this please? >>>>> >>>>> >>>>> >>>>> <image001.png> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Asmath