@Alfonso,
Thank you very much for the suggestions! you are totally right about
all of your points! Sheriffo, please benefit from them ;)

Also what is strange is this (although it can be optimized as Alfonso
pointed out) is that it works for the MongoDB backend. So I would also
suspect on the configuration of the Gora-HBase client. Have you taken
a look at [A] for example? or other Gora-HBase assumed configurations
[B]? Maybe there you can specify some Xmx / Xms config.


Best,

Renato M.

[A] 
https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/gora.properties
[B] 
https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/hbase-site.xml

El lun., 10 jun. 2019 a las 23:39, Alfonso Nishikawa
(<alfonso.nishik...@gmail.com>) escribió:
>
> Hi again, Sheriffo.
>
> More improvements to [1] over the last email:
>
> - fields.toArray() doesn't need a full array like in [6]. You should do
> just fields.toArray(new String[0]), and better if you create an array [0]
> and reuse it. That call only needs the type.
> - I guess the class at [2] will always be the same, so you don't need to
> set it on every insert call.
> - The string concatenation is overkilling for the jvm on the 1M calls * N
> fields at [3] and same for [4]. Precalculate the names in a list or array
> and reuse then for the 1M*N calls.
> - Other optimization for [3] is, given that PersistentBase [5] exctends
> SpecificRecordBase, you can access the fields by index with
> SpecificRecordBase.get(int) and SpecificRecordBase.put(int, Object).
>
> [1] -
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/ma1in/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L127
> [2] -
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L134
> [3] -
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L136
> [4] -
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L139
> [5] -
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-core/src/main/java/org/apache/gora/persistency/impl/PersistentBase.java#L3
> [6] -
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L163
>
> Let's see if with that optimizations we free the jvm memory management from
> much stress.
>
> Regards,
>
> Alfonso Nishikawa
>
>
>
>
>
>
>
>
>
>
> El lun., 10 jun. 2019 a las 21:18, Alfonso Nishikawa (<
> alfonso.nishik...@gmail.com>) escribió:
>
> > Hi, Sheriffo.
> >
> > You can try reusing the Persistent instances [1] to insert the data. I
> > don't know all the backends, but they should be reusable, at least in
> > mongoDB and HBase.
> >
> > [1] -
> > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130
> >
> > Regards,
> >
> > Alfonso Nishikawa
> >
> > El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (<
> > alfonso.nishik...@gmail.com>) escribió:
> >
> >> Hi, Sheriffo.
> >>
> >> I really don't know how to solve it, but are you setting any Xmx / Xms
> >> configuration values?
> >>
> >> Regards,
> >>
> >> Alfonso NIshikawa
> >>
> >>
> >> El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay (<sneceesa...@gmail.com>)
> >> escribió:
> >>
> >>> Hi All,
> >>>
> >>> Week 2 progress update is available at
> >>>
> >>> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> >>>
> >>> I have one question that I would like my mentors to advise on, I am still
> >>> working it but thought it would be good to report it because it is HBase
> >>> specific.
> >>>
> >>> So the problem has to do with an OutOfMemory error when inserting 1M +
> >>> record in HBase.  This happens when I try to run the actual benchmark by
> >>> first loading HBase with 1 million plus records. It works perfectly for
> >>> MongoDB but not HBase
> >>>
> >>> So I am assuming this problem is specific to HBase.  The stack trace is
> >>> given below.
> >>>
> >>> Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead
> >>> limit exceeded
> >>>
> >>>
> >>>
> >>>         at
> >>> java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
> >>>
> >>>
> >>>
> >>>         at java.lang.StringCoding.encode(StringCoding.java:344)
> >>>
> >>>
> >>>
> >>>
> >>>         at java.lang.String.getBytes(String.java:918)
> >>>
> >>>
> >>>
> >>>
> >>>         at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733)
> >>>
> >>>
> >>>
> >>>
> >>>         at
> >>>
> >>> org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225)
> >>>
> >>>
> >>>
> >>>         at
> >>>
> >>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383)
> >>>
> >>>
> >>>
> >>>         at
> >>>
> >>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:348)
> >>>
> >>>
> >>>
> >>>         at
> >>> org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:319)
> >>>
> >>>
> >>>
> >>>
> >>>         at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:84)
> >>>
> >>>
> >>>
> >>>
> >>>         at
> >>>
> >>> org.apache.gora.benchmark.GoraBenchmarkClient.insert(GoraBenchmarkClient.java:141)
> >>>
> >>>
> >>>
> >>>         at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148)
> >>>
> >>>
> >>>
> >>>
> >>>         at
> >>> com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461)
> >>>
> >>>
> >>>
> >>>         at com.yahoo.ycsb.ClientThread.run(Client.java:269)
> >>>
> >>> The insert implementation of the module available at
> >>> https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark  in
> >>> GoraBenchmarkClient.java is very straight forward. I have had a brief
> >>> look
> >>> at HBaseStore.java put() implementation but could not find an issue with
> >>> that.
> >>>
> >>> If I solve this problem, then I will do run more workloads to verify that
> >>> the module is stable for the basic implementation. Then I will go ahead
> >>> and
> >>> work on suggestions made by Renato last week.
> >>>
> >>> Please let me know what your thoughts are.
> >>>
> >>>
> >>> Thank you.
> >>>
> >>>
> >>>
> >>> **Sheriffo Ceesay**
> >>>
> >>

Reply via email to