I'm not a fan of this approach. Spark configuration keys are defined as
string values in Spark and used as Strings everywhere.
I don't necessarily see the benefit of changing
conf["keyName"] vs conf.get("keyName") or even spark.conf.keyName
Trying to wrap this into magic getattr calls is not ideal either. I believe
there are better ways to improve the pythonic surface of PySpark.
What I do like is wrapping the return call of conf.get() with another
wrapper object to access the doc string. That's very neat.
On Fri, Dec 27, 2024 at 3:07 PM Mich Talebzadeh <[email protected]>
wrote:
> On the surface it looks like a good idea. In essence,writing code that is
> not just functional but also reflects the spirit and style of the Python
> language <https://peps.python.org/pep-0020/>. It is about writing code
> that is readable, and maintainable.
>
> The core objective (if I am correct) of this PR is to enhance the Python
> user experience when working with Spark configurations by introducing a
> more Pythonic, dictionary-like syntax. This approach will improve code
> readability and maintainability by providing a more intuitive and
> consistent way to set and access Spark configurations, aligning with
> Python's emphasis on clarity and expressiveness (as the above link).
>
> HTH
>
> Mich Talebzadeh,
>
> Architect | Data Science | Financial Crime | GDPR & Compliance Specialist
> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College
> London <https://en.wikipedia.org/wiki/Imperial_College_London>
> London, United Kingdom
>
>
> view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
> https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Fri, 27 Dec 2024 at 07:23, Holden Karau <[email protected]> wrote:
>
>> I think having automatic gettr/settr on spark.conf object seems
>> reasonable to me.
>>
>> On Thu, Dec 26, 2024 at 9:32 PM Reynold Xin <[email protected]>
>> wrote:
>>
>>> I actually think this might be confusing (just in general adding too
>>> many different ways to do the same thing is also un-Pythonic).
>>>
>>> On Thu, Dec 26, 2024 at 4:58 PM Hyukjin Kwon <[email protected]>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I hope you guys are enjoying the holiday season. I just wanted to have
>>>> some quick feedback about this PR
>>>> https://github.com/apache/spark/pull/49297
>>>>
>>>> This PR allows you do set/unset SQL configurations in Pythonic way,
>>>> e.g.,
>>>>
>>>> >>>
>>>> spark.conf["spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled"]
>>>> = "false" >>>
>>>> spark.conf["spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled"]
>>>> 'false'
>>>>
>>>> as pandas also supports a similar way (
>>>> https://pandas.pydata.org/docs/user_guide/options.html)
>>>>
>>>> Any feedback on this approach would be appreciated.
>>>>
>>>> Thanks!
>>>>
>>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> <https://www.fighthealthinsurance.com/?q=hk_email>
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>