Hi Steve,

Thanks for your statement. I tend to use uuid myself to avoid
collisions. This built-in function generates random IDs that are highly
likely to be unique across systems. My concerns are on edge so to speak. If
the Spark application runs for a very long time or encounters restarts, the
monotonically_increasing_id() sequence might restart from the beginning.
This could again cause duplicate IDs if other Spark applications are
running concurrently or if data is processed across multiple runs of the
same application..

HTH

Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".


On Wed, 1 May 2024 at 01:22, Stephen Coy <s...@infomedia.com.au> wrote:

> Hi Mich,
>
> I was just reading random questions on the user list when I noticed that
> you said:
>
> On 25 Apr 2024, at 2:12 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> 1) You are using monotonically_increasing_id(), which is not
> collision-resistant in distributed environments like Spark. Multiple hosts
>    can generate the same ID. I suggest switching to UUIDs (e.g.,
> uuid.uuid4()) for guaranteed uniqueness.
>
>
> It’s my understanding that the *Spark* `monotonically_increasing_id()`
> function exists for the exact purpose of generating a collision-resistant
> unique id across nodes on different hosts.
> We use it extensively for this purpose and have never encountered an issue.
>
> Are we wrong or are you thinking of a different (not Spark) function?
>
> Cheers,
>
> Steve C
>
>
>
>
> This email contains confidential information of and is the copyright of
> Infomedia. It must not be forwarded, amended or disclosed without consent
> of the sender. If you received this message by mistake, please advise the
> sender and delete all copies. Security of transmission on the internet
> cannot be guaranteed, could be infected, intercepted, or corrupted and you
> should ensure you have suitable antivirus protection in place. By sending
> us your or any third party personal details, you consent to (or confirm you
> have obtained consent from such third parties) to Infomedia’s privacy
> policy. http://www.infomedia.com.au/privacy-policy/
>

Reply via email to