+1 on the first two, don't feel strongly about (3). Thanks, Nishith
On Tue, Nov 12, 2019 at 5:03 AM leesf <[email protected]> wrote: > [1] +1. `views` indeed confused me a lot. > [2] +1. `snapshot` is more reasonable. > [3] I don't feel very strong to rename it, the current name `COPY_ON_WRITE` > is reasonable considering the cost to rename and the behavior that new > version parquet file will be created and seems to be copied from old > version parquet file. > > Best, > Leesf > > Balaji Varadarajan <[email protected]> 于2019年11月12日周二 下午3:55写道: > > > Agree with all 3 changes. The naming now looks more consistent than > > earlier. +1 on them > > > > Depending on whether we are renaming Input formats for (1) and (2) - this > > could require some migration steps for > > > > Balaji.V > > > > > > On Mon, Nov 11, 2019 at 7:38 PM vino yang <[email protected]> wrote: > > > > > Hi Vinoth, > > > > > > Thanks for bringing these proposals. > > > > > > +1 on all three. Especially, big +1 on the third renaming proposal. > > > > > > When I was a newbie. The "COPY_ON_WRITE" term confused me a lot. It > > easily > > > mislead users on the "copy" term. And make users compare it with the > > > `CopyOnWriteArrayList` data structure provided by JDK and thoughts of > > the > > > file systems. > > > > > > Best, > > > Vino > > > > > > > > > Bhavani Sudha <[email protected]> 于2019年11月12日周二 上午9:05写道: > > > > > > > +1 on all three rename proposals. I think this would make the > concepts > > > > super easy to follow for new users. > > > > > > > > If changing [3] seems to be a stretch, we should definitely do [1] & > > [2] > > > at > > > > the least IMO. I will be glad to help out on the renames to whatever > > > extent > > > > possible should the Hudi community incline to pursue this. > > > > > > > > Thanks, > > > > Sudha > > > > > > > > > > > > > > > > On Mon, Nov 11, 2019 at 3:46 PM Vinoth Chandar <[email protected]> > > > wrote: > > > > > > > > > Hello all, > > > > > > > > > > I wanted to raise an important topic with the community around > > whether > > > we > > > > > should rename some of our terminologies in code/docs to be more > > > > > user-friendly and understandable.. > > > > > > > > > > Let me also provide some context for each, since I am probably > guilty > > > of > > > > > introducing most of them in the first place :). > > > > > > > > > > *1. Rename "views" to "query" : *Instead of saying incremental view > > or > > > > > read-optimized view, talk about them as "incremental query" and > > > > > "read-optimized query". The term "view" is very technical, and > what I > > > was > > > > > trying to convey was that we ingest/store the data once and expose > > > views > > > > on > > > > > top. But new users (atleast half dozen of them to me) tend to > confuse > > > > this > > > > > with views/materialized views found in databases. Almost always we > > talk > > > > > about views mostly in terms of expected behavior for a query on the > > > > view. I > > > > > am proposing to just call these different query types since its a > > more > > > > > universally accepted terminology and IMO clearer. > > > > > > > > > > *2. Rename "Read-Optimized/Realtime" views to Snapshot views + Have > > > > > Read-Optimized view only for MOR storage :* This one is probably > the > > > > > trickiest. Hudi was always designed with MOR in mind, even as we > were > > > > > working on COW storage and consequently we named the pure parquet > > > backed > > > > > view as Read-Optimized, hoping to name parquet + avro based view as > > > > > Write-Optimized. However, we opted to name it Realtime to emphasize > > the > > > > > data freshness aspect. In retrospect, the views should have not > been > > > > named > > > > > after their performance characteristics but rather the classes of > > > queries > > > > > done on them and guarantees for those (point above #1). Moreover, > > once > > > we > > > > > have parquet embedded into the log format, then the tradeoffs may > not > > > be > > > > > the same anyways. > > > > > > > > > > So combining with the renaming proposed in #1, we would end up with > > the > > > > > following.. > > > > > > > > > > Copy-On-Write : > > > > > [Old] Read-Optimized View => [New] Snapshot Query > > > > > [Old] Incremental View => [New] Incremental Query > > > > > > > > > > Merge-On-Read: > > > > > [Old] Realtime View => [New] Snapshot Query > > > > > [Old] Incremental View => [New] Incremental Query > > > > > [Old] ReadOptimzied View => [New] Read-Optimized Query (since it is > > > read > > > > > optimized compared to Snapshot query always, at the cost of staler > > > data) > > > > > > > > > > Both changes #1 & #2 could be simpler changes to just code > > references, > > > > docs > > > > > and configs.. we can support both string for sometime and deprecate > > > > > eventually since queries are stateless. > > > > > > > > > > *3. Rename COPY_ON_WRITE to MERGE_ON_WRITE :* Name originated since > > the > > > > > design was very similar to > > https://en.wikipedia.org/wiki/Copy-on-write > > > > > filesystems > > > > > & snapshotting and we once hoped to push some of this logic into > the > > > > > storage itself, all in vain. but the name stuck, even though once > we > > > had > > > > > MERGE_ON_READ the focus was often on merge costs etc, which the > name > > > > > COPY_ON_WRITE does not convey directly. I don't feel very strong > > about > > > > this > > > > > and there is also cost to changing this since its persisted inside > > > > > hoodie.properties and we will support both strings internally in > code > > > for > > > > > backwards compatibility anyway > > > > > > > > > > Naming something is very hard (yes, try :)).I believe these changes > > > will > > > > > make the project simpler to understand for everyone out there. We > > also > > > > have > > > > > tons of new people here, so I am also happy to let go, if its > already > > > > clear > > > > > :) > > > > > > > > > > Please use the bullet number when you share your feedback so we > know > > > what > > > > > the discussion is about. > > > > > > > > > > Thanks > > > > > Vinoth > > > > > > > > > > > > > > >
