Re: Spark Kafka Batch Write guarantees

2019-04-01 Thread hemant singh
Thanks Shixiong, read in documentation as well that duplicates might exist
because of task retries.

On Mon, 1 Apr 2019 at 9:43 PM, Shixiong(Ryan) Zhu 
wrote:

> The Kafka source doesn’t support transaction. You may see partial data or
> duplicated data if a Spark task fails.
>
> On Wed, Mar 27, 2019 at 1:15 AM hemant singh  wrote:
>
>> We are using spark batch to write Dataframe to Kafka topic. The spark
>> write function with write.format(source = Kafka).
>> Does spark provide similar guarantee like it provides with saving
>> dataframe to disk; that partial data is not written to Kafka i.e. full
>> dataframe is saved or if job fails no data is written to Kafka topic.
>>
>> Thanks.
>>
> --
>
> Best Regards,
> Ryan
>


Re: Spark Kafka Batch Write guarantees

2019-04-01 Thread Shixiong(Ryan) Zhu
The Kafka source doesn’t support transaction. You may see partial data or
duplicated data if a Spark task fails.

On Wed, Mar 27, 2019 at 1:15 AM hemant singh  wrote:

> We are using spark batch to write Dataframe to Kafka topic. The spark
> write function with write.format(source = Kafka).
> Does spark provide similar guarantee like it provides with saving
> dataframe to disk; that partial data is not written to Kafka i.e. full
> dataframe is saved or if job fails no data is written to Kafka topic.
>
> Thanks.
>
-- 

Best Regards,
Ryan


Spark Kafka Batch Write guarantees

2019-03-27 Thread hemant singh
We are using spark batch to write Dataframe to Kafka topic. The spark write
function with write.format(source = Kafka).
Does spark provide similar guarantee like it provides with saving dataframe
to disk; that partial data is not written to Kafka i.e. full dataframe is
saved or if job fails no data is written to Kafka topic.

Thanks.