I am sitting on the fence about that. In the linked PR Xiao wrote the
following
We published the error guideline a few years ago, but not all
contributors adhered to it, resulting in variable quality in error messages.
If a policy exists but is not enforced (if that's indeed the case, I
didn't go through the source to confirm that) it might be useful to
learn the reasons why it happens. Normally, I'd expect
-Policy is too complex to enforce. In such case, additional tooling can
be useful.
-Policy is not well known, and the people responsible for introducing it
are not committed to enforcing it.
-Policy or some of its components don't really reflect community values
and expectations.
If the problem of suspected violations was never raised on our standard
communication channel, and as far as I can tell, it has not, then
introducing a new tool to enforce the policy seems a bit premature.
If these were the only considerations, I'd say that improving the
overall consistency of the project outweighs possible risks, even if the
case for such might be poorly supported.
However, there is an elephant in the room. It is another attempt, after
SPARK-44546, to embed generative tools directly within the Spark dev
workflow. By principle, I am not against such tools. In fact, it is
pretty clear that they are already used by Spark committers, and even if
we wanted to, there is little we can do to prevent that. In such cases,
decisions which tools, if any, to use, to what extent and how to treat
their output are the sole responsibility of contributors.
In contrast, these proposals try to push a proprietary tool burdened
with serious privacy and ethical issues and likely to introduce unclear
liabilities as a standard or even required developer tool.
I can't speak for others, but personally, I'm quite uneasy about it. If
we go this way, I strongly believe that it should be preceded by a
serious discussion, if not the development of a formal policy, about
what categories of tools, to what capacity, to what extent are
acceptable within the project. Ideally, with an official opinion from
the ASF as the copyright owner.
WDYT All? Shall we start a separate discussion?
Best regards,
Maciej Szymkiewicz
Web:https://zero323.net
PGP: A30CEF0C31A501EC
On 8/3/23 18:33, Haejoon Lee wrote:
Additional information:
Please check https://issues.apache.org/jira/browse/SPARK-37935if you
want to start contributing to improving error messages.
You can create sub-tasks if you believe there are error messages that
need improvement, in addition to the tasks listed in the umbrella JIRA.
You can also refer to https://github.com/apache/spark/pull/41504,
https://github.com/apache/spark/pull/41455as an example PR.
On Thu, Aug 3, 2023 at 1:10 PM Ruifeng Zheng wrote:
+1 from my side, I'm fine to have it as a helper script
On Thu, Aug 3, 2023 at 10:53 AM Hyukjin Kwon
wrote:
I think adding that dev tool script to improve the error
message is fine.
On Thu, 3 Aug 2023 at 10:24, Haejoon Lee
wrote:
Dear contributors, I hope you are doing well!
I see there are contributors who are interested in working
on error message improvements and persistent contribution,
so I want to share an llm-based error message improvement
script for helping your contribution.
You can find a detail for the script at
https://github.com/apache/spark/pull/41711. I believe this
can help your error message improvement work, so I
encourage you to take a look at the pull request and
leverage the script.
Please let me know if you have any questions or concerns.
Thanks all for your time and contributions!
Best regards,
Haejoon
OpenPGP_signature
Description: OpenPGP digital signature