@Alfonso, Thank you very much for the suggestions! you are totally right about all of your points! Sheriffo, please benefit from them ;)
Also what is strange is this (although it can be optimized as Alfonso pointed out) is that it works for the MongoDB backend. So I would also suspect on the configuration of the Gora-HBase client. Have you taken a look at [A] for example? or other Gora-HBase assumed configurations [B]? Maybe there you can specify some Xmx / Xms config. Best, Renato M. [A] https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/gora.properties [B] https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/hbase-site.xml El lun., 10 jun. 2019 a las 23:39, Alfonso Nishikawa (<alfonso.nishik...@gmail.com>) escribió: > > Hi again, Sheriffo. > > More improvements to [1] over the last email: > > - fields.toArray() doesn't need a full array like in [6]. You should do > just fields.toArray(new String[0]), and better if you create an array [0] > and reuse it. That call only needs the type. > - I guess the class at [2] will always be the same, so you don't need to > set it on every insert call. > - The string concatenation is overkilling for the jvm on the 1M calls * N > fields at [3] and same for [4]. Precalculate the names in a list or array > and reuse then for the 1M*N calls. > - Other optimization for [3] is, given that PersistentBase [5] exctends > SpecificRecordBase, you can access the fields by index with > SpecificRecordBase.get(int) and SpecificRecordBase.put(int, Object). > > [1] - > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/ma1in/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L127 > [2] - > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L134 > [3] - > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L136 > [4] - > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L139 > [5] - > https://github.com/sneceesay77/gora/blob/GORA-532/gora-core/src/main/java/org/apache/gora/persistency/impl/PersistentBase.java#L3 > [6] - > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L163 > > Let's see if with that optimizations we free the jvm memory management from > much stress. > > Regards, > > Alfonso Nishikawa > > > > > > > > > > > El lun., 10 jun. 2019 a las 21:18, Alfonso Nishikawa (< > alfonso.nishik...@gmail.com>) escribió: > > > Hi, Sheriffo. > > > > You can try reusing the Persistent instances [1] to insert the data. I > > don't know all the backends, but they should be reusable, at least in > > mongoDB and HBase. > > > > [1] - > > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130 > > > > Regards, > > > > Alfonso Nishikawa > > > > El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (< > > alfonso.nishik...@gmail.com>) escribió: > > > >> Hi, Sheriffo. > >> > >> I really don't know how to solve it, but are you setting any Xmx / Xms > >> configuration values? > >> > >> Regards, > >> > >> Alfonso NIshikawa > >> > >> > >> El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay (<sneceesa...@gmail.com>) > >> escribió: > >> > >>> Hi All, > >>> > >>> Week 2 progress update is available at > >>> > >>> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report > >>> > >>> I have one question that I would like my mentors to advise on, I am still > >>> working it but thought it would be good to report it because it is HBase > >>> specific. > >>> > >>> So the problem has to do with an OutOfMemory error when inserting 1M + > >>> record in HBase. This happens when I try to run the actual benchmark by > >>> first loading HBase with 1 million plus records. It works perfectly for > >>> MongoDB but not HBase > >>> > >>> So I am assuming this problem is specific to HBase. The stack trace is > >>> given below. > >>> > >>> Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead > >>> limit exceeded > >>> > >>> > >>> > >>> at > >>> java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300) > >>> > >>> > >>> > >>> at java.lang.StringCoding.encode(StringCoding.java:344) > >>> > >>> > >>> > >>> > >>> at java.lang.String.getBytes(String.java:918) > >>> > >>> > >>> > >>> > >>> at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733) > >>> > >>> > >>> > >>> > >>> at > >>> > >>> org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225) > >>> > >>> > >>> > >>> at > >>> > >>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383) > >>> > >>> > >>> > >>> at > >>> > >>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:348) > >>> > >>> > >>> > >>> at > >>> org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:319) > >>> > >>> > >>> > >>> > >>> at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:84) > >>> > >>> > >>> > >>> > >>> at > >>> > >>> org.apache.gora.benchmark.GoraBenchmarkClient.insert(GoraBenchmarkClient.java:141) > >>> > >>> > >>> > >>> at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148) > >>> > >>> > >>> > >>> > >>> at > >>> com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461) > >>> > >>> > >>> > >>> at com.yahoo.ycsb.ClientThread.run(Client.java:269) > >>> > >>> The insert implementation of the module available at > >>> https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark in > >>> GoraBenchmarkClient.java is very straight forward. I have had a brief > >>> look > >>> at HBaseStore.java put() implementation but could not find an issue with > >>> that. > >>> > >>> If I solve this problem, then I will do run more workloads to verify that > >>> the module is stable for the basic implementation. Then I will go ahead > >>> and > >>> work on suggestions made by Renato last week. > >>> > >>> Please let me know what your thoughts are. > >>> > >>> > >>> Thank you. > >>> > >>> > >>> > >>> **Sheriffo Ceesay** > >>> > >>