Re: Question about ARRAY_INSERT between Spark and Databricks

Sean Owen Sun, 13 Aug 2023 07:35:36 -0700

There shouldn't be any difference here. In fact, I get the results you list
for 'spark' from Databricks. It's possible the difference is a bug fix
along the way that is in the Spark version you are using locally but not in
the DBR you are using. But, yeah seems to work as. you say.


If you're asking about the Spark semantics being 1-indexed vs 0-indexed?
some comments here:
https://github.com/apache/spark/pull/38867#discussion_r1097054656


On Sun, Aug 13, 2023 at 7:28 AM Ran Tao <chucheng...@gmail.com> wrote:

> Hi, devs.
>
> I found that the  ARRAY_INSERT[1] function (from spark 3.4.0) has
> different semantics with databricks[2].
>
> e.g.
>
> // spark
> SELECT array_insert(array('a', 'b', 'c'), -1, 'z');
>  ["a","b","z","c"]
>
> // databricks
> SELECT array_insert(array('a', 'b', 'c'), -1, 'z');
>  ["a","b","c","z"]
>
> // spark
> SELECT array_insert(array('a', 'b', 'c'), -5, 'z');
> ["z",null,null,"a","b","c"]
>
> // databricks
> SELECT array_insert(array('a', 'b', 'c'), -5, 'z');
>  ["z",NULL,"a","b","c"]
>
> It looks like that inserting negative index is more reasonable in
> Databricks.
>
> Of cause, I read the source code of spark, and I can understand the logic
> of spark, but my question is whether spark is designed like this on purpose?
>
>
> [1] https://spark.apache.org/docs/latest/api/sql/index.html#array_insert
> [2]
> https://docs.databricks.com/en/sql/language-manual/functions/array_insert.html
>
>
> Best Regards,
> Ran Tao
> https://github.com/chucheng92
>

Re: Question about ARRAY_INSERT between Spark and Databricks

Reply via email to