I think MaxOffsetsPerTrigger in Spark + Kafka integration docs would meet your 
requirement

Отправлено с iPhone

> 21 окт. 2020 г., в 12:36, KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> 
> написал(а):
> 
> Thanks. Do we have option to limit number of records ? Like process only 
> 10000 or the property we pass ? This way we can handle the amount of the data 
> for batches that we need . 
> 
> Sent from my iPhone
> 
>>> On Oct 21, 2020, at 12:11 AM, lec ssmi <shicheng31...@gmail.com> wrote:
>>> 
>> 
>>     Structured streaming's  bottom layer also uses a micro-batch mechanism. 
>> It seems that the first batch is slower than  the latter, I also often 
>> encounter this problem. It feels related to the division of batches. 
>>    Other the other hand, spark's batch size is usually bigger than flume 
>> transaction bache size. 
>> 
>> 
>> KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> 于2020年10月21日周三 下午12:19写道:
>>> Yes. Changing back to latest worked but I still see the slowness compared 
>>> to flume. 
>>> 
>>> Sent from my iPhone
>>> 
>>>>> On Oct 20, 2020, at 10:21 PM, lec ssmi <shicheng31...@gmail.com> wrote:
>>>>> 
>>>> 
>>>> Do you start your application  with  chasing the early Kafka data  ? 
>>>> 
>>>> Lalwani, Jayesh <jlalw...@amazon.com.invalid> 于2020年10月21日周三 上午2:19写道:
>>>>> Are you getting any output? Streaming jobs typically run forever, and 
>>>>> keep processing data as it comes in the input. If a streaming job is 
>>>>> working well, it will typically generate output at a certain cadence
>>>>> 
>>>>>  
>>>>> 
>>>>> From: KhajaAsmath Mohammed <mdkhajaasm...@gmail.com>
>>>>> Date: Tuesday, October 20, 2020 at 1:23 PM
>>>>> To: "user @spark" <user@spark.apache.org>
>>>>> Subject: [EXTERNAL] Spark Structured streaming - Kakfa - slowness with 
>>>>> query 0
>>>>> 
>>>>>  
>>>>> 
>>>>> CAUTION: This email originated from outside of the organization. Do not 
>>>>> click links or open attachments unless you can confirm the sender and 
>>>>> know the content is safe.
>>>>> 
>>>>>  
>>>>> 
>>>>> Hi,
>>>>> 
>>>>>  
>>>>> 
>>>>> I have started using spark structured streaming for reading data from 
>>>>> kaka and the job is very slow. Number of output rows keeps increasing in 
>>>>> query 0 and the job is running forever. any suggestions for this please? 
>>>>> 
>>>>>  
>>>>> 
>>>>> <image001.png>
>>>>>  
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Asmath

Reply via email to