Excellent. Apologies for being absent. I am undergoing a job transition and it has been very busy. I suggest that we start a weekly tagup as well. Lewis
On Sun, Jun 2, 2019 at 1:14 PM Sheriffo Ceesay <sneceesa...@gmail.com> wrote: > The code so far is available at the GitHub link below. > > https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark > > > > **Sheriffo Ceesay** > > > On Sun, Jun 2, 2019 at 8:34 PM Sheriffo Ceesay <sneceesa...@gmail.com> > wrote: > >> Hi Renato, >> >> Thanks for the detailed reply. I agree with your recommendations on the >> way forward. I will go ahead and implement the rest of the functionality >> using reflection and we can follow your recommendations on the next >> iterations. >> >> As for the backend, I am using both HBase and MongoDB and all seems well >> at the moment. >> >> I will let you all know why I push my code to GitHub. >> >> Thank you. >> >> >> **Sheriffo Ceesay** >> >> >> On Sun, Jun 2, 2019 at 7:01 PM Renato Marroquín Mogrovejo < >> renatoj.marroq...@gmail.com> wrote: >> >>> Hi Sheriffo, >>> >>> Some opinions about your questions, but others are more than welcome >>> to suggest other things as well. >>> >>> Q1: Are we going to consider arbitrary field length, e.g. if we set >>> the fieldcount to 100 then we have to create the respective Avro and >>> mapping files? Currently, >>> I don't think this process is automated and may be tedious for large >>> field counts. >>> I think for the first code iteration, we should use whatever >>> fieldcount you have generated for. Ideally, we should be able to >>> invoke the Gora bean generator and generate as many fields as required >>> by the benchmark configuration. >>> >>> Q2: Second: The second problem has to do with the first one, if we >>> allow arbitrary field counts, then there has to be a mechanism to call >>> each of the set or get methods during CRUD operations. So to avoid >>> this I used Java Reflection. See the sample code below. >>> We have some options to deal with having arbitrarily number of fields. >>> 1) Use reflection as you have which might be ok for the first code >>> iteration, but if we want to have some decent performance against >>> using datastores natively (no Gora), we should go away from it. >>> 2) Do Gora class generation (and also generate the method used to >>> insert data through Gora) in a step before the benchmark starts. >>> Something like this: >>> # passing config parameters to generate Gora Beans with number of >>> required fields >>> # this should output the generate class and the method that does the >>> insertion >>> $ gora_compiler.sh --benchmark --fields_required 4 >>> The output path containing the result of this should be then include >>> (or passed) as runtime dependency to the benchmark class. >>> 3) Because Gora uses Avro, we can use complex data types, e.g., >>> arrays, maps. So we could represent number of fields as number of >>> elements inside an array. I would think that this option gives us the >>> best performance. >>> I think we should continue with option (1) until we have the entire >>> pipeline working, and we understand how every piece fits together with >>> each other (YSCB, Gora, Gora compiler, benchmark setup steps). Then we >>> should do (2) which is the most general and the one that reflects how >>> people usually use Gora, and then we test with (3). I think all of >>> these steps are totally doable in our time frame as we build upon >>> previous steps. >>> The other thing that we should decide is which backend to use as there >>> are backends that are more mature than others. I'd say to use the >>> HBase backend as it is the most stable one and the one with more >>> features, and if we feel brave we can try other backends (and fix them >>> if necessary!) >>> >>> >>> Best, >>> >>> Renato M> >>> >>> El dom., 2 jun. 2019 a las 19:10, Sheriffo Ceesay >>> (<sneceesa...@gmail.com>) escribió: >>> > >>> > Dear Mentors, >>> > >>> > My week one report is available at >>> > >>> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report >>> > >>> > I have also included a detailed question of and I will need your >>> guidance >>> > on that. >>> > >>> > Please let me know what your thoughts are. >>> > >>> > Thank you. >>> > >>> > **Sheriffo Ceesay** >>> >> -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc