Hi!

My hypothesis is taht that the difference between MongoDB and HBase is that
HBase put more stress serializing with avro. It could affect too that if
the HBase's test is performed after MongoDB's ones, then the GC starts from
a "bad" situation.

>From [A] linked by @Renato, if the error was OutOfMemoryException I would
have recommended lowering gora.hbasestore.scanner.caching to 100, 10 or
even 1, but with a GC error I am not that much sure. In anycase, @Sheriffo:
you can try this if with the optimizations still doesn't work :)

@Renato: Thx for the links!

Regards,

Alfonso Nishikawa



El lun., 10 jun. 2019 a las 22:02, Renato Marroquín Mogrovejo (<
renatoj.marroq...@gmail.com>) escribió:

> @Alfonso,
> Thank you very much for the suggestions! you are totally right about
> all of your points! Sheriffo, please benefit from them ;)
>
> Also what is strange is this (although it can be optimized as Alfonso
> pointed out) is that it works for the MongoDB backend. So I would also
> suspect on the configuration of the Gora-HBase client. Have you taken
> a look at [A] for example? or other Gora-HBase assumed configurations
> [B]? Maybe there you can specify some Xmx / Xms config.
>
>
> Best,
>
> Renato M.
>
> [A]
> https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/gora.properties
> [B]
> https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/hbase-site.xml
>
> El lun., 10 jun. 2019 a las 23:39, Alfonso Nishikawa
> (<alfonso.nishik...@gmail.com>) escribió:
> >
> > Hi again, Sheriffo.
> >
> > More improvements to [1] over the last email:
> >
> > - fields.toArray() doesn't need a full array like in [6]. You should do
> > just fields.toArray(new String[0]), and better if you create an array [0]
> > and reuse it. That call only needs the type.
> > - I guess the class at [2] will always be the same, so you don't need to
> > set it on every insert call.
> > - The string concatenation is overkilling for the jvm on the 1M calls * N
> > fields at [3] and same for [4]. Precalculate the names in a list or array
> > and reuse then for the 1M*N calls.
> > - Other optimization for [3] is, given that PersistentBase [5] exctends
> > SpecificRecordBase, you can access the fields by index with
> > SpecificRecordBase.get(int) and SpecificRecordBase.put(int, Object).
> >
> > [1] -
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/ma1in/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L127
> > [2] -
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L134
> > [3] -
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L136
> > [4] -
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L139
> > [5] -
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-core/src/main/java/org/apache/gora/persistency/impl/PersistentBase.java#L3
> > [6] -
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L163
> >
> > Let's see if with that optimizations we free the jvm memory management
> from
> > much stress.
> >
> > Regards,
> >
> > Alfonso Nishikawa
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > El lun., 10 jun. 2019 a las 21:18, Alfonso Nishikawa (<
> > alfonso.nishik...@gmail.com>) escribió:
> >
> > > Hi, Sheriffo.
> > >
> > > You can try reusing the Persistent instances [1] to insert the data. I
> > > don't know all the backends, but they should be reusable, at least in
> > > mongoDB and HBase.
> > >
> > > [1] -
> > >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130
> > >
> > > Regards,
> > >
> > > Alfonso Nishikawa
> > >
> > > El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (<
> > > alfonso.nishik...@gmail.com>) escribió:
> > >
> > >> Hi, Sheriffo.
> > >>
> > >> I really don't know how to solve it, but are you setting any Xmx / Xms
> > >> configuration values?
> > >>
> > >> Regards,
> > >>
> > >> Alfonso NIshikawa
> > >>
> > >>
> > >> El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay (<
> sneceesa...@gmail.com>)
> > >> escribió:
> > >>
> > >>> Hi All,
> > >>>
> > >>> Week 2 progress update is available at
> > >>>
> > >>>
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> > >>>
> > >>> I have one question that I would like my mentors to advise on, I am
> still
> > >>> working it but thought it would be good to report it because it is
> HBase
> > >>> specific.
> > >>>
> > >>> So the problem has to do with an OutOfMemory error when inserting 1M
> +
> > >>> record in HBase.  This happens when I try to run the actual
> benchmark by
> > >>> first loading HBase with 1 million plus records. It works perfectly
> for
> > >>> MongoDB but not HBase
> > >>>
> > >>> So I am assuming this problem is specific to HBase.  The stack trace
> is
> > >>> given below.
> > >>>
> > >>> Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC
> overhead
> > >>> limit exceeded
> > >>>
> > >>>
> > >>>
> > >>>         at
> > >>> java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
> > >>>
> > >>>
> > >>>
> > >>>         at java.lang.StringCoding.encode(StringCoding.java:344)
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>         at java.lang.String.getBytes(String.java:918)
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>         at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733)
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>         at
> > >>>
> > >>>
> org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225)
> > >>>
> > >>>
> > >>>
> > >>>         at
> > >>>
> > >>>
> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383)
> > >>>
> > >>>
> > >>>
> > >>>         at
> > >>>
> > >>>
> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:348)
> > >>>
> > >>>
> > >>>
> > >>>         at
> > >>> org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:319)
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>         at
> org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:84)
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>         at
> > >>>
> > >>>
> org.apache.gora.benchmark.GoraBenchmarkClient.insert(GoraBenchmarkClient.java:141)
> > >>>
> > >>>
> > >>>
> > >>>         at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148)
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>         at
> > >>> com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461)
> > >>>
> > >>>
> > >>>
> > >>>         at com.yahoo.ycsb.ClientThread.run(Client.java:269)
> > >>>
> > >>> The insert implementation of the module available at
> > >>> https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark  in
> > >>> GoraBenchmarkClient.java is very straight forward. I have had a brief
> > >>> look
> > >>> at HBaseStore.java put() implementation but could not find an issue
> with
> > >>> that.
> > >>>
> > >>> If I solve this problem, then I will do run more workloads to verify
> that
> > >>> the module is stable for the basic implementation. Then I will go
> ahead
> > >>> and
> > >>> work on suggestions made by Renato last week.
> > >>>
> > >>> Please let me know what your thoughts are.
> > >>>
> > >>>
> > >>> Thank you.
> > >>>
> > >>>
> > >>>
> > >>> **Sheriffo Ceesay**
> > >>>
> > >>
>

Reply via email to