You can look at RecordBatchMemoryManager.java and follow one of the operator code (like flatten) to see how this was done.
Thanks Padma On Wed, Apr 24, 2019 at 12:00 PM Paul Rogers <par0...@yahoo.com.invalid> wrote: > Hi Igor, > > Thanks for the recap. You asked about vector allocation. Here is where I > think things stand. Others can fill in details that I may miss. > > We have several ways to size value vectors; but no single standard. As you > note, the most common way is simply to accept the cost of letting the > vector double in size multiple times. > > One way to pre-allocate vectors is to use the "sizer" along with its > associated allocation helper. This was always meant to be a quick & dirty > temporary solution, but has turned out, I believe, to be the primary vector > size management solution in most operators. > > Another is the new row set framework: vector size (in terms of number of > items and estimated item size) is expressed in metadata, then is used to > allocate each new batch to the desired size. > > You can also just do the work yourself: pick a number, and, when > allocating a vector, tell it to use that size. You then take on the task of > estimating average width, picking a good target number of rows for your > batch, working out the number of items in arrays, etc. (This is, in fact, > what the other two methods mentioned above actually do.) > > The key problem with the ad-hoc techniques is that they can't limit > maximum vector size to 16 MB (to avoid Netty fragmentation) nor limit > overall batch size to some reasonable number. The ad-hoc techniques can > also lead to internal fragmentation (excessive unused space within each > vector.) Solving these problems is what the row set framework was designed > to do. > > Thanks, > - Paul > > > > On Wednesday, April 24, 2019, 10:48:44 AM PDT, Igor Guzenko < > ihor.huzenko....@gmail.com> wrote: > > Hello Everyone, > > Sorry for the late reply, here is presentations about > > Map<K,V> vector - > > https://docs.google.com/presentation/d/1FG4swOrkFIRL7qjiP7PSOPy8a1vnxs5Z9PM3ZfRPRYo/edit#slide=id.p > Hive complex types - > > https://docs.google.com/presentation/d/1nc0ID5aju-qj-7hjquFpH-TwGjeReWTYogsExuOe8ZA/edit?usp=sharing > . > > Discussion results for Map<K,V> new vector: > - Need to eliminate possibility of key duplication; > - Need to check Hive behavior when ORDER BY is performed for Map > complex type column; > - Need to describe design and all use cases for the vector in design > document. > > Discussion results for Hive complex types: > - Aman Sinha made few great suggestions. First is that creation of > Hive writers may be done once for table scan and second is that at > this moment > would be good to calculate size for vectors and allocate early. Need > to provide few examples describing how will the allocation work for > complex types. > - Need to describe suggested approach in design document and proceed > discussion there. > > Question from my side. Do we have already implemented somewhere > predicted allocation of value vectors ? Any example would be useful, > because > now I can see that our existing vector writers usually use mutator's > setSafe(...) methods inside which size of buffer may be increased when > necessary. > > The future design document will be located at > > https://docs.google.com/document/d/1yEcaJi9dyksfMs4w5_GsZCQH_Pffe-HLeLVNNKsV7CA/edit?usp=sharing > . > Please feel free to leave your comments and suggestions in the > document and presentations. > > Thanks, > Igor Guzenko > > > On Wed, Apr 17, 2019 at 3:04 AM Jyothsna Reddy <jyothsna....@gmail.com> > wrote: > > > > Hi All, > > The hangout will start at 9:30 AM PST instead of 10 AM PST on 04-18-2019. > > > > > > Thank you, > > Jyothsna > > > > > > > > > > On Tue, Apr 16, 2019 at 2:00 PM Jyothsna Reddy <jyothsna....@gmail.com> > > wrote: > > > > > Hi Charles, > > > Yes, sure!! Probably we can start with your discussion first and Hive > > > complex types later since there will be some discussion around the > later > > > topic. > > > > > > Thank you, > > > Jyothsna > > > > > > > > > > > > > > > On Tue, Apr 16, 2019 at 1:40 PM Charles Givre <cgi...@gmail.com> > wrote: > > > > > >> Hi Jyothsna, > > >> Could I get a few minutes on the next Hangout to promote the Drill > day at > > >> ApacheCon? > > >> Thanks > > >> > > >> > On Apr 16, 2019, at 16:38, Jyothsna Reddy <jyothsna....@gmail.com> > > >> wrote: > > >> > > > >> > Hi Everyone, > > >> > > > >> > Here are some key points of today's hangout discussion: > > >> > > > >> > Sorabh mentioned that there are some regressions in TPCDS queries > and > > >> its a > > >> > blocker for 1.16 release. > > >> > > > >> > Bohdan presented tehir proposal for Hive Complex types support. > Here are > > >> > some of the important points > > >> > > > >> > - Structure of MapVector : Keys are of primitive type where values > can > > >> > be of either primitive or complex type. > > >> > - MapReader and MapWriter are used to read and write from the > > >> MapVector > > >> > - MapWriter tracks the current row/length and is used to calculate > > >> write > > >> > position and offset > > >> > > > >> > Following are some of the questions from the audience > > >> > > > >> > - Will the types be implicitly casted since calcite supports keys > of > > >> > type int and string. > > >> > - Future improvements include sorting the keys for better lookup, > Is > > >> it > > >> > per row or across all the rows? > > >> > > > >> > Since there is more to discuss, there will be a hangout session on > > >> > 04-18-2019 at 10 AM PST (link > > >> > http://meet.google.com/yki-iqdf-tai). > > >> > > > >> > Thank you, > > >> > Jyothsna > > >> > > > >> > > > >> > > > >> > On Mon, Apr 15, 2019 at 11:48 AM Bohdan Kazydub < > > >> bohdan.kazy...@gmail.com> > > >> > wrote: > > >> > > > >> >> Hello, > > >> >> Igor and I would like to discuss Hive Complex types support. > > >> >> > > >> >> Thanks, > > >> >> Bohdan > > >> >> > > >> >> On Mon, Apr 15, 2019 at 8:47 PM Charles Givre <cgi...@gmail.com> > > >> wrote: > > >> >> > > >> >>> I’d like to promote the Drill track for ApacheCon. > > >> >>> > > >> >>> Sent from my iPhone > > >> >>> > > >> >>>> On Apr 15, 2019, at 13:09, Jyothsna Reddy < > jyothsna....@gmail.com> > > >> >>> wrote: > > >> >>>> > > >> >>>> Hello Everyone, > > >> >>>> Does anyone have any topics for tomorrow's hangout? > > >> >>>> > > >> >>>> We will start the hangout at 10 AM PST (link > > >> >>>> http://meet.google.com/yki-iqdf-tai). > > >> >>>> > > >> >>>> Thank you, > > >> >>>> Jyothsna > > >> >>> > > >> >> > > >> > > >>