Hi again, Sheriffo. More improvements to [1] over the last email:
- fields.toArray() doesn't need a full array like in [6]. You should do just fields.toArray(new String[0]), and better if you create an array [0] and reuse it. That call only needs the type. - I guess the class at [2] will always be the same, so you don't need to set it on every insert call. - The string concatenation is overkilling for the jvm on the 1M calls * N fields at [3] and same for [4]. Precalculate the names in a list or array and reuse then for the 1M*N calls. - Other optimization for [3] is, given that PersistentBase [5] exctends SpecificRecordBase, you can access the fields by index with SpecificRecordBase.get(int) and SpecificRecordBase.put(int, Object). [1] - https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/ma1in/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L127 [2] - https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L134 [3] - https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L136 [4] - https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L139 [5] - https://github.com/sneceesay77/gora/blob/GORA-532/gora-core/src/main/java/org/apache/gora/persistency/impl/PersistentBase.java#L3 [6] - https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L163 Let's see if with that optimizations we free the jvm memory management from much stress. Regards, Alfonso Nishikawa El lun., 10 jun. 2019 a las 21:18, Alfonso Nishikawa (< alfonso.nishik...@gmail.com>) escribió: > Hi, Sheriffo. > > You can try reusing the Persistent instances [1] to insert the data. I > don't know all the backends, but they should be reusable, at least in > mongoDB and HBase. > > [1] - > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130 > > Regards, > > Alfonso Nishikawa > > El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (< > alfonso.nishik...@gmail.com>) escribió: > >> Hi, Sheriffo. >> >> I really don't know how to solve it, but are you setting any Xmx / Xms >> configuration values? >> >> Regards, >> >> Alfonso NIshikawa >> >> >> El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay (<sneceesa...@gmail.com>) >> escribió: >> >>> Hi All, >>> >>> Week 2 progress update is available at >>> >>> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report >>> >>> I have one question that I would like my mentors to advise on, I am still >>> working it but thought it would be good to report it because it is HBase >>> specific. >>> >>> So the problem has to do with an OutOfMemory error when inserting 1M + >>> record in HBase. This happens when I try to run the actual benchmark by >>> first loading HBase with 1 million plus records. It works perfectly for >>> MongoDB but not HBase >>> >>> So I am assuming this problem is specific to HBase. The stack trace is >>> given below. >>> >>> Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead >>> limit exceeded >>> >>> >>> >>> at >>> java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300) >>> >>> >>> >>> at java.lang.StringCoding.encode(StringCoding.java:344) >>> >>> >>> >>> >>> at java.lang.String.getBytes(String.java:918) >>> >>> >>> >>> >>> at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733) >>> >>> >>> >>> >>> at >>> >>> org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225) >>> >>> >>> >>> at >>> >>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383) >>> >>> >>> >>> at >>> >>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:348) >>> >>> >>> >>> at >>> org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:319) >>> >>> >>> >>> >>> at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:84) >>> >>> >>> >>> >>> at >>> >>> org.apache.gora.benchmark.GoraBenchmarkClient.insert(GoraBenchmarkClient.java:141) >>> >>> >>> >>> at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148) >>> >>> >>> >>> >>> at >>> com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461) >>> >>> >>> >>> at com.yahoo.ycsb.ClientThread.run(Client.java:269) >>> >>> The insert implementation of the module available at >>> https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark in >>> GoraBenchmarkClient.java is very straight forward. I have had a brief >>> look >>> at HBaseStore.java put() implementation but could not find an issue with >>> that. >>> >>> If I solve this problem, then I will do run more workloads to verify that >>> the module is stable for the basic implementation. Then I will go ahead >>> and >>> work on suggestions made by Renato last week. >>> >>> Please let me know what your thoughts are. >>> >>> >>> Thank you. >>> >>> >>> >>> **Sheriffo Ceesay** >>> >>