Will review the POC in cwiki. +1 Based on this feedback, I will proceed with the changes. Thanks all!
On Tue, Nov 12, 2019 at 10:47 PM Semantic Beeng <n...@semanticbeeng.com> wrote: > @vc, I think of it as elaborating the #ubiquitouslanguage in DDD. > See private email with references to a small POC in wiki and decide how to > proceed. > > On November 12, 2019 at 10:04 PM Vinoth Chandar < vin...@apache.org> > wrote: > > > Thanks everyone for the feedback. Looks like we are in general agreement. > > I am inclined to just do 1 & 2 and leave COPY_ON_WRITE as is based on > great > points Ethan and Shiyan raised. Makes sense.. > Will wait for 1-2 days still to close this thread. > > @semanticbeeing Thats a great idea. Is it more like a technical glossary > of > sorts? Lets may be start a different DISCUSS thread on that specific > topic, > so everyone can chime in and provide more attention to that proposal? > > > > > > On Tue, Nov 12, 2019 at 2:44 PM Y. Ethan Guo < guoyi...@uber.com.invalid> > wrote: > > +1 on [1] and [2]. > > For [3], I have similar doubts as Shiyan. > > For the naming, I can understand the original intent of the analogy for > COW > which is to make another "copy" of columnar/parquet file upon the > modification/update to the records in the file. From the system design > point of view, it's easy to understand. I'm okay with the renaming as > "MERGE_ON_WRITE" since it's probably straightforward for users at the > first > glance. > > In terms of the concept, COW and MOR are listed as storage/table types. > From my understanding, they represent different tradeoffs of the > performance between reading and writing Hudi tables, and within MOR there > are different tradeoffs, e.g., lazy merge on read or periodic compaction > and cleaning pipelined along ingestion. It looks like these can be > controlled through configs, e.g., "disable_merge_on_write", > "compaction_frenquency", etc., instead of fixing the storage type, to > control the tradeoff that a user would like to make. The requirement may > change so a user can switch between COW and MOR by tuning the configs. We > don't have to make such changes now, but I'm wondering if this is > something > worth considering in the future releases. > > - Ethan > > On Tue, Nov 12, 2019 at 8:43 AM nishith agarwal < n3.nas...@gmail.com> > wrote: > > +1 on the first two, don't feel strongly about (3). > > Thanks, > Nishith > > On Tue, Nov 12, 2019 at 5:03 AM leesf < leesf0...@gmail.com> wrote: > > [1] +1. `views` indeed confused me a lot. > [2] +1. `snapshot` is more reasonable. > [3] I don't feel very strong to rename it, the current name > > `COPY_ON_WRITE` > > is reasonable considering the cost to rename and the behavior that new > version parquet file will be created and seems to be copied from old > version parquet file. > > Best, > Leesf > > Balaji Varadarajan < vbal...@apache.org> 于2019年11月12日周二 下午3:55写道: > > Agree with all 3 changes. The naming now looks more consistent than > earlier. +1 on them > > Depending on whether we are renaming Input formats for (1) and (2) - > > this > > could require some migration steps for > > Balaji.V > > > > > On Mon, Nov 11, 2019 at 7:38 PM vino yang < yanghua1...@gmail.com> > > wrote: > > Hi Vinoth, > > Thanks for bringing these proposals. > > +1 on all three. Especially, big +1 on the third renaming proposal. > > When I was a newbie. The "COPY_ON_WRITE" term confused me a lot. It > > easily > > mislead users on the "copy" term. And make users compare it with > > the > > `CopyOnWriteArrayList` data structure provided by JDK and thoughts > > of > > the > > file systems. > > Best, > Vino > > > > > Bhavani Sudha < bhavanisud...@gmail.com> 于2019年11月12日周二 上午9:05写道: > > +1 on all three rename proposals. I think this would make the > > concepts > > super easy to follow for new users. > > If changing [3] seems to be a stretch, we should definitely do > > [1] > > & > > [2] > > at > > the least IMO. I will be glad to help out on the renames to > > whatever > > extent > > possible should the Hudi community incline to pursue this. > > Thanks, > Sudha > > > > > > > On Mon, Nov 11, 2019 at 3:46 PM Vinoth Chandar < > > vin...@apache.org> > > wrote: > > > > Hello all, > > I wanted to raise an important topic with the community around > > whether > > we > > should rename some of our terminologies in code/docs to be more > user-friendly and understandable.. > > Let me also provide some context for each, since I am probably > > guilty > > of > > introducing most of them in the first place :). > > *1. Rename "views" to "query" : *Instead of saying incremental > > view > > or > > read-optimized view, talk about them as "incremental query" and > "read-optimized query". The term "view" is very technical, and > > what I > > was > > trying to convey was that we ingest/store the data once and > > expose > > views > > on > > top. But new users (atleast half dozen of them to me) tend to > > confuse > > this > > with views/materialized views found in databases. Almost always > > we > > talk > > about views mostly in terms of expected behavior for a query on > > the > > view. I > > am proposing to just call these different query types since > > its a > > more > > universally accepted terminology and IMO clearer. > > *2. Rename "Read-Optimized/Realtime" views to Snapshot views + > > Have > > Read-Optimized view only for MOR storage :* This one is > > probably > > the > > trickiest. Hudi was always designed with MOR in mind, even as > > we > > were > > working on COW storage and consequently we named the pure > > parquet > > backed > > view as Read-Optimized, hoping to name parquet + avro based > > view > > as > > Write-Optimized. However, we opted to name it Realtime to > > emphasize > > the > > data freshness aspect. In retrospect, the views should have not > > been > > named > > after their performance characteristics but rather the classes > > of > > queries > > done on them and guarantees for those (point above #1). > > Moreover, > > once > > we > > have parquet embedded into the log format, then the tradeoffs > > may > > not > > be > > the same anyways. > > So combining with the renaming proposed in #1, we would end up > > with > > the > > following.. > > Copy-On-Write : > [Old] Read-Optimized View => [New] Snapshot Query > [Old] Incremental View => [New] Incremental Query > > Merge-On-Read: > [Old] Realtime View => [New] Snapshot Query > [Old] Incremental View => [New] Incremental Query > [Old] ReadOptimzied View => [New] Read-Optimized Query (since > > it > > is > > read > > optimized compared to Snapshot query always, at the cost of > > staler > > data) > > Both changes #1 & #2 could be simpler changes to just code > > references, > > docs > > and configs.. we can support both string for sometime and > > deprecate > > eventually since queries are stateless. > > *3. Rename COPY_ON_WRITE to MERGE_ON_WRITE :* Name originated > > since > > the > > design was very similar to > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Copy-2Don-2Dwrite&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=z456dQQXMUCz1m72nlkFQpylUpdOVMBG38x2peG1m44&m=m1yKGEwnAUe_FyIsWFAo-YVKyfq1nayItNGNc7iv8Yw&s=y9XF8-75xzGHY4yCbfVVWcIC1sbEXDxitqeAS2A6GoQ&e= > > filesystems > & snapshotting and we once hoped to push some of this logic > > into > > the > > storage itself, all in vain. but the name stuck, even though > > once > > we > > had > > MERGE_ON_READ the focus was often on merge costs etc, which the > > name > > COPY_ON_WRITE does not convey directly. I don't feel very > > strong > > about > > this > > and there is also cost to changing this since its persisted > > inside > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__hoodie.properties&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=z456dQQXMUCz1m72nlkFQpylUpdOVMBG38x2peG1m44&m=m1yKGEwnAUe_FyIsWFAo-YVKyfq1nayItNGNc7iv8Yw&s=930ugGMXsrqzE-acg9nfeoePBmVjTRG3gD765ihEiqU&e= > > and we will support both strings internally in > > code > > for > > backwards compatibility anyway > > Naming something is very hard (yes, try :)).I believe these > > changes > > will > > make the project simpler to understand for everyone out there. > > We > > also > > have > > tons of new people here, so I am also happy to let go, if its > > already > > clear > > :) > > Please use the bullet number when you share your feedback so we > > know > > what > > the discussion is about. > > Thanks > Vinoth > >