Re: [DISCUSS] Spark 4.0.0 release

2024-05-08 Thread Erik Krogen
On that note, GitHub recently released (public preview) a new feature called Artifact Attestions which may be relevant/useful here: Introducing Artifact Attestations–now in public beta - The GitHub Blog On Wed,

Re: [DISCUSS] clarify the definition of behavior changes

2024-05-01 Thread Erik Krogen
Thanks for raising this important discussion Wenchen! Two points I would like to raise, though I'm fully supportive of any improvements in this regard, my points below notwithstanding -- I am not intending to let perfect be the enemy of good here. On a similar note as Santosh's comment, we should

Re: How to set platform-level defaults for array-like configs?

2022-07-27 Thread Erik Krogen
I find there's substantial value in being able to set defaults, and I think we can see that the community finds value in it as well, given the handful of "default"-like configs that exist today as mentioned in Shardul's email. The mismatch of conventions used today (suffix with ".defaultList",

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-03 Thread Erik Krogen
+1 (non-binding) Really looking forward to having this natively supported by Spark, so that we can get rid of our own hacks to tie in a custom view catalog implementation. I appreciate the care John has put into various parts of the design and believe this will provide a robust and flexible

Re: [DISCUSS] SPIP: Storage Partitioned Join for Data Source V2

2021-10-26 Thread Erik Krogen
It's great to see this SPIP going live. Once this is complete, it will really help Spark to play nicely with a broader data ecosystem (Hive, Iceberg, Trino, etc.), and it's great to see that besides just bringing the existing bucketed-join support to V2, we are also making the types of

Re: Observer Namenode and Committer Algorithm V1

2021-08-17 Thread Erik Krogen
Hi Adam, Thanks for this great writeup of the issue. We (LinkedIn) also operate Observer NameNodes, and have observed the same issues, but have not yet gotten around to implementing a proper fix. To add a bit of context from our side, there is at least one other place besides the committer v1

Re: [VOTE] SPIP: Add FunctionCatalog

2021-03-09 Thread Erik Krogen
+1 from me (non-binding) On Tue, Mar 9, 2021 at 9:27 AM huaxin gao wrote: > +1 (non-binding) > > On Tue, Mar 9, 2021 at 1:12 AM Kent Yao wrote: > >> +1, looks great! >> >> *Kent Yao * >> @ Data Science Center, Hangzhou Research Institute, NetEase Corp. >> *a spark enthusiast* >> *kyuubi

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-04 Thread Erik Krogen
+1 on Dongjoon's proposal. This is a very nice compromise between the reflective/magic-method approach and the InternalRow approach, providing a lot of flexibility for our users, and allowing for the more complicated reflection-based approach to evolve at its own pace, since you can always fall

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-12 Thread Erik Krogen
I agree that there is a strong need for a FunctionCatalog within Spark to provide support for shareable UDFs, as well as make movement towards more advanced functionality like views which themselves depend on UDFs, so I support this SPIP wholeheartedly. I find both of the proposed UDF APIs to

Re: Usage of JDK Vector API in ML/MLLib

2020-12-15 Thread Erik Krogen
Regarding selective compilation, you can hide sources behind a Maven profile such as `-Pvectorized`. Check out what we do to switch between the `hive-1.2` and `hive-2.3` profiles where different source directories are grabbed at compile-time (the hive-1.2 profile was recently removed so you might

Re: Removing references to slave (and maybe in the future master)

2020-06-19 Thread Erik Krogen
I've created SPARK-32036 and SPARK-32037 for changes related to "blacklist"/"whitelist" terminology, the latter of which focuses on the blacklisting feature. I invite all of you to participate

Re: Removing references to slave (and maybe in the future master)

2020-06-18 Thread Erik Krogen
Thanks a lot for proposing this, Holden. I'd be curious to know how others feel about also tackling the word blacklist -- while I think most would agree it is not as egregious as master/slave, it seems to be an appropriate time to use the momentum to really a make a best effort at removing any