Re: [DISCUSS] Un-deprecate Trigger.Once

2024-04-21 Thread Jungtaek Lim
While I understand your concern about confusion for reverting the decision
on deprecation, we had a revert of deprecation against API which was
deprecated for multiple years before reverting the decision. See SPARK-32686
. Maybe we had more
cases, and googling indicates to me there are more cases on various
projects about reverting their decision on reverting.

Also, while we are saying we shouldn't remove API although we deprecate
API, we never describe such a thing into a deprecation message. Users still
understand the deprecation as what they understand for other projects,
"refrain using this and migrate sooner than later". Constructing project
policy and guaranteeing to users are different - if we "guarantee" that the
API is deprecated but never be removed in future, that leads to another
sort of confusion. Do other deprecations really implicitly mean they can be
removed in future, as we start to guarantee some deprecation that we never
remove the API in future?

That said, deprecation would be the right way if Trigger.AvailableNow
covers the entire workload Trigger.Once has covered. We just indicated
there are some gaps, and people are trying to migrate by themselves with a
tricky/hacky approach and ended up complaining to us. I'd say we can truly
deprecate (I'd really like to) when we have a confidence that users don't
need any trick/hack on migration and it's just a piece of cake via changing
the trigger and done. Unfortunately that was figured out to be not the case.

We need some time to figure out gaps and address them in
Trigger.AvailableNow - if it's not addressable, we'd never be able to
deprecate Trigger.Once. Before that, I feel like strongly advising to
migrate "if possible" in documentation (or warn message in runtime) seems
to be the best bet. I meant, my bad on not truly understanding users, I
actually knew the gap and thought that's not Spark should guarantee, but
never imagined users heavily rely on the behavior (not on semantics but on
behavior itself). They consider this as breaking change if the semantic is
the same but behavior is not the same.


On Sat, Apr 20, 2024 at 1:16 PM Dongjoon Hyun 
wrote:

> For that case, I believe it's enough for us to revise the deprecation
> message only by making sure that Apache Spark will keep it without removal
> for backward-compatibility purposes only. That's what the users asked,
> isn't that?
>
> > deprecation  of Trigger.Once confuses users that the trigger won't be
> available sooner (though we rarely remove public API).
>
> The feature was deprecated in Apache Spark 3.4.0 and `Undeprecation(?)`
> may cause another confusion in the community, not only for Trigger.Once but
> also for all historic `Deprecated` items.
>
> Dongjoon.
>
>
> On Fri, Apr 19, 2024 at 7:44 PM Jungtaek Lim 
> wrote:
>
>> Hi dev,
>>
>> I'd like to raise a discussion to un-deprecate Trigger.Once in future
>> releases.
>>
>> I've proposed deprecation of Trigger.Once because it's semantically
>> broken and we made a change, but we've realized that there are really users
>> who strictly require the behavior of Trigger.Once (only run a single batch
>> in whatever reason) despite the semantic issue, and workaround with
>> Trigger.AvailableNow is arguably much more hacky or sometimes not even
>> possible.
>>
>> I still think we have to advise using Trigger.AvailableNow whenever
>> feasible, but deprecation  of Trigger.Once confuses users that the trigger
>> won't be available sooner (though we rarely remove public API). So maybe
>> warning log on usage sounds to me as a reasonable alternative.
>>
>> Thoughts?
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>


[FYI] SPARK-47046: Apache Spark 4.0.0 Dependency Audit and Cleanup

2024-04-21 Thread Dongjoon Hyun
Hi, All.

As a part of Apache Spark 4.0.0 (SPAR-44111), we have been doing dependency
audits. Today, we want to share the current readiness of Apache Spark 4.0.0
and get your feedback for further completeness.

https://issues.apache.org/jira/browse/SPARK-44111
Prepare Apache Spark 4.0.0

Dependency audit(SPARK-47046) started this February (on 14/Feb/24) and
we have only one remaining JIRA about Apache Hive 2.3.10 as of now.

https://issues.apache.org/jira/browse/SPARK-47046
Apache Spark 4.0.0 Dependency Audit and Cleanup

https://issues.apache.org/jira/browse/SPARK-47018
Upgrade built-in Hive to 2.3.10 (WIP)


Although we received Common Vulnerabilities and Exposures (CVE) reports due
to our dependencies historically and only some of them affect us
effectively,
we consider all reports seriously and want to address as much as possible
in Apache Spark 4.0.0 as a new milestone.

Here, we share the full audit list for your awareness.

++-+---+
| CVE_ID | GHSA_ID | SPARK_JIRA_ID |
++-+---+
| CVE-2018-10237 | GHSA-mvr2-9pj6-7w5j | SPARK-47025   |
| CVE-2018-10237 | GHSA-mvr2-9pj6-7w5j | SPARK-47058   |
| CVE-2018-1330  | GHSA-95q3--r683 | SPARK-2   |
| CVE-2019-0205  | GHSA-rj7p-rfgp-852x | SPARK-27029   |
| CVE-2019-10172 | GHSA-r6j9-8759-g62w | SPARK-47119   |
| CVE-2019-10202 | GHSA-c27h-mcmw-48hv | SPARK-47119   |
| CVE-2020-13949 | GHSA-g2fg-mr77-6vrm | SPARK-47018 (WIP) |
| CVE-2020-15522 | GHSA-6xx3-rg99-gc3p | SPARK-1   |
| CVE-2020-8908  | GHSA-5mg8-w23w-74h3 | SPARK-39102   |
| CVE-2020-8908  | GHSA-5mg8-w23w-74h3 | SPARK-47025   |
| CVE-2021-22569 | GHSA-wrvw-hg22-4m67 | SPARK-43489   |
| CVE-2021-22569 | GHSA-wrvw-hg22-4m67 | SPARK-47038   |
| CVE-2021-22570 | GHSA-77rm-9x9h-xj3g | SPARK-45991   |
| CVE-2021-42392 | GHSA-h376-j262-vhq6 | SPARK-38287   |
| CVE-2022-1941  | GHSA-8gq9-2x98-w8hf | SPARK-40552   |
| CVE-2022-1941  | GHSA-8gq9-2x98-w8hf | SPARK-41240   |
| CVE-2022-2047  | GHSA-cj7v-27pg-wf7q | SPARK-39725   |
| CVE-2022-21363 | GHSA-g76j-4cxx-23h9 | SPARK-39540   |
| CVE-2022-21724 | GHSA-673j-qm5f-xpv8 | SPARK-38291   |
| CVE-2022-21724 | GHSA-v7wg-cpwc-24m4 | SPARK-38291   |
| CVE-2022-23221 | GHSA-45hx-wfhj-473x | SPARK-38287   |
| CVE-2022-23437 | GHSA-h65f-jvqw-m9fj | SPARK-39183   |
| CVE-2022-25883 | GHSA-c2qf-rxjj-qqgw | SPARK-44279   |
| CVE-2022-3171  | GHSA-h4h5-3hr4-j3g2 | SPARK-40665   |
| CVE-2022-3171  | GHSA-h4h5-3hr4-j3g2 | SPARK-41076   |
| CVE-2022-3171  | GHSA-h4h5-3hr4-j3g2 | SPARK-41247   |
| CVE-2022-3171  | GHSA-h4h5-3hr4-j3g2 | SPARK-43489   |
| CVE-2022-3171  | GHSA-h4h5-3hr4-j3g2 | SPARK-47038   |
| CVE-2022-3509  | GHSA-g5ww-5jh7-63cx | SPARK-43489   |
| CVE-2022-3509  | GHSA-g5ww-5jh7-63cx | SPARK-47038   |
| CVE-2022-3510  | GHSA-4gg5-vx3j-xwc7 | SPARK-43489   |
| CVE-2022-3510  | GHSA-4gg5-vx3j-xwc7 | SPARK-47038   |
| CVE-2022-3517  | GHSA-f8q6-p94x-37v3 | SPARK-41634   |
| CVE-2022-36944 | GHSA-8qv5-68g4-248j | SPARK-40497   |
| CVE-2022-37865 | GHSA-94rr-4jr5-9h2p | SPARK-41030   |
| CVE-2022-37866 | GHSA-wv7w-rj2x-556x | SPARK-41030   |
| CVE-2022-41946 | GHSA-562r-vg33-8x8h | SPARK-41245   |
| CVE-2022-42889 | GHSA-599f-7c49-w659 | SPARK-40801   |
| CVE-2022-45868 | GHSA-22wj-vf5f-wrvj | SPARK-44393   |
| CVE-2022-46337 | GHSA-rcjc-c4pj-xxrp | SPARK-47108   |
| CVE-2022-46751 | GHSA-2jc4-r94c-rp7h | SPARK-44914   |
| CVE-2023-1428  | GHSA-6628-q6j9-w8vg | SPARK-44222   |
| CVE-2023-26119 | GHSA-3xrr-7m6p-p7xh | SPARK-5   |
| CVE-2023-2976  | GHSA-7g45-4rm6-3mm3 | SPARK-47025   |
| CVE-2023-2976  | GHSA-7g45-4rm6-3mm3 | SPARK-47056   |
| CVE-2023-32731 | GHSA-cfgp-2977-2fmm | SPARK-44222   |
| CVE-2023-32732 | GHSA-9hxf-ppjv-w6rq | SPARK-44222   |
| CVE-2023-33201 | GHSA-hr8g-6v94-x4m9 | SPARK-46411   |
| CVE-2023-34453 | GHSA-pqr6-cmr2-h8hf | SPARK-44070   |
| CVE-2023-34454 | GHSA-fjpj-2g6w-x25r | SPARK-44070   |
| CVE-2023-34455 | GHSA-qcwq-55hx-v3vh | SPARK-44070   |
| CVE-2023-42503 | GHSA-cgwf-w82q-5jrr | SPARK-45172   |
| CVE-2023-43642 | GHSA-55g7-9cwv-5qfv | SPARK-45323   |
| CVE-2023-44981 | GHSA-7286-pgfv-vxvh | SPARK-45956   |
| CVE-2023-44981 | GHSA-7286-pgfv-vxvh | SPARK-46305   |
| CVE-2024-21503 | GHSA-fj7x-q9j7-g6q6 | INVALID*  |
| CVE-2024-26308 | GHSA-4265-ccf5-phj5 | SPARK-47109   |
++-+---+
* `black` is used only in `dev/lint-python` script


Please report us via `priv...@spark.apache.org` if you have any concerns
on the above reports or have new ones for Apache Spark 4.0.0.

Dongjoon Hyun