Re: Week 2 Report and A Question

2019-06-12 Thread Sheriffo Ceesay
Hi All,

Further to my previous email, I have now updated the code to reflect the
code optimisations proposed by Alfonso. I have completely removed the
reflection approach. @Alfonso Nishikawa  , the
reason I did not use *SpecificRecordBase.get(int index)* and
*SpecificRecordBase.put(int
index, Object o)* in my first implementation was due to this comment
" //Used by DatumWriter.  Applications should not call." just before the
methods in the generated class. I have now changed the implementation to
use these two methods.

@Renato Marroquín Mogrovejo  , I forgot to
answer your question about how I run the code, please see below.

First Step: From the gora-benchmark directory execute

*mvn clean install*

Second Step:

*java -cp
.:bmstuff/core-0.1.4.jar:target/gora-benchmark-0.9-SNAPSHOT.jar:bmstuff/sources-dist-0.9-SNAPSHOT.jar:lib/*
*
* com.yahoo.ycsb.Client *
*-load *
*-db org.apache.gora.benchmark.GoraBenchmarkClient *
*-threads 10 -s *
*-P bmstuff/workloads/workloada > bmstuff/out.log*

The following switches or command line option are YCSB specific.

 com.yahoo.ycsb.Client is YCSB implementation that will load our DB class.
-load will load the database
-db Specify our benchmark implementation
-threads specify the number of client threads to start.
-P  the workload to execute, the file *bmstuff/workloads/workloada* contains
various key-value pairs e.g. number of records to count, number of records
to read and update.
> bmstuff/out.log send standard out and logs to out.log.

YCSB is highly configurable, we can pass a lot of KV pairs as a property
file or command line options. See [1] and [2] for more information. In the
end, we can write an executable script like ycsb.sh to automate the entire
benchmark process of Gora. This would range from creating the Avro files to
producing the benchmarking results.

 Please let me know if that makes sense.

Thank you.

[1] https://github.com/brianfrankcooper/YCSB/wiki/Running-a-Workload
[2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Properties


**Sheriffo Ceesay**


On Wed, Jun 12, 2019 at 11:50 AM Sheriffo Ceesay 
wrote:

> Hi Renato,
>
> I will follow Alfonso's recommendations about reusing objects as much as I
> can. I will push those changes to the branch by the end of this week.
>
> To answer your questions.
>
> Yes, you are right I am using a clean cold JVM. If necessary, I can also
> have a look at warming the JVM down the line.
>
> Yes, I have tried setting *gora.hbasestore.scanner.caching* to different
> values but there was no significant difference. Also, I may be wrong but  I
> think this setting has to do with scan operation and not insert operation?
>
> As for flushing, I tried but it quickly throws an error and hence I
> commented that line of code. I think this is due to the fact that the
> insert operation inserts a single user object for each call, so calling
> dataStore.flush() within that method would mean calling flush on every
> object insertion. Is that not the case? There should be a way to track the
> progress of inserts then that can be used to call flush after N insert
> calls. So I used *gora.hbasestore.hbase.client.autoflush.enabled=true *which
> would automatically call flush at some point. However, like I mentioned in
> my previous email, enabling autoflush decreases write performance [1].
>
> [1] https://gora.apache.org/current/gora-hbase.html
>
> Thank you.
>
> **Sheriffo Ceesay**
>
>
> On Tue, Jun 11, 2019 at 10:52 PM Renato Marroquín Mogrovejo <
> renatoj.marroq...@gmail.com> wrote:
>
>> Hey Sheriffo,
>>
>> Cool to hear you are making progress! :) and great to see that we have
>> some numbers already! :)
>> Regarding optimization point (1), regardless that this was not he
>> cause of the issue or not, Alfonso suggestions are something we should
>> follow, many objects with a short life in java might create a
>> performance problem sooner or later. Also about your comment:
>>
>> "Also, I may be wrong but the way I understand YCSB framework is, it
>> will execute an insert operation for each user object, so I thought it
>> was right to create a user object within the insert method."
>>
>> As you pointed out, YCSB is about inserting the objects, and NOT about
>> creating them, so it doesn't matter if we reuse the objects, as long
>> as the values that we insert are actually correct. We don't want to
>> end up measuring object creation+gc. I think Alfonso's comment was
>> hinting on that direction (please feel free to correct me @Alfonso if
>> I am misunderstanding you) and I think his comments are just on the
>> spot.
>> I have some other questions regarding the numbers you sent around:
>> - are you running YCSB for each data store with warm JVM? or are these
>> numbers each with a clean cold JVM? I suppose the latter, right?
>> - did you try setting gora.hbasestore.scanner.caching to a lower value?
>> - which is the command that you are using to run/start this code?
>> - did you try flushing the commits more regularly in:
>>

Re: Week 2 Report and A Question

2019-06-12 Thread Sheriffo Ceesay
Hi Renato,

I will follow Alfonso's recommendations about reusing objects as much as I
can. I will push those changes to the branch by the end of this week.

To answer your questions.

Yes, you are right I am using a clean cold JVM. If necessary, I can also
have a look at warming the JVM down the line.

Yes, I have tried setting *gora.hbasestore.scanner.caching* to different
values but there was no significant difference. Also, I may be wrong but  I
think this setting has to do with scan operation and not insert operation?

As for flushing, I tried but it quickly throws an error and hence I
commented that line of code. I think this is due to the fact that the
insert operation inserts a single user object for each call, so calling
dataStore.flush() within that method would mean calling flush on every
object insertion. Is that not the case? There should be a way to track the
progress of inserts then that can be used to call flush after N insert
calls. So I used *gora.hbasestore.hbase.client.autoflush.enabled=true *which
would automatically call flush at some point. However, like I mentioned in
my previous email, enabling autoflush decreases write performance [1].

[1] https://gora.apache.org/current/gora-hbase.html

Thank you.

**Sheriffo Ceesay**


On Tue, Jun 11, 2019 at 10:52 PM Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hey Sheriffo,
>
> Cool to hear you are making progress! :) and great to see that we have
> some numbers already! :)
> Regarding optimization point (1), regardless that this was not he
> cause of the issue or not, Alfonso suggestions are something we should
> follow, many objects with a short life in java might create a
> performance problem sooner or later. Also about your comment:
>
> "Also, I may be wrong but the way I understand YCSB framework is, it
> will execute an insert operation for each user object, so I thought it
> was right to create a user object within the insert method."
>
> As you pointed out, YCSB is about inserting the objects, and NOT about
> creating them, so it doesn't matter if we reuse the objects, as long
> as the values that we insert are actually correct. We don't want to
> end up measuring object creation+gc. I think Alfonso's comment was
> hinting on that direction (please feel free to correct me @Alfonso if
> I am misunderstanding you) and I think his comments are just on the
> spot.
> I have some other questions regarding the numbers you sent around:
> - are you running YCSB for each data store with warm JVM? or are these
> numbers each with a clean cold JVM? I suppose the latter, right?
> - did you try setting gora.hbasestore.scanner.caching to a lower value?
> - which is the command that you are using to run/start this code?
> - did you try flushing the commits more regularly in:
>
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L142
> let's say every 1000 elements? or something like that? I mean instead
> of at the end of the 1M elements?
>
> Thanks a lot for the report Sheriffo!
>
>
> Best,
>
> Renato M.
>
> El mar., 11 jun. 2019 a las 16:12, Sheriffo Ceesay
> () escribió:
> >
> > Hello,
> >
> > I have taken a proper look at the recommendations from @Alfonso and
> @Renato and below are the outcomes.
> >
> > Failed Attempts
> > 1. Optimisation, for the insert operation, to avoid the concatenation
> issue, I have just taken the quickest route by calling the methods directly
> without reflection. Below are those calls. Note: I have moved all reusable
> codes to the init method.
> >
> >> public int insert(String table, String key, HashMap ByteIterator> values) {
> >>   user.setField0(values.get("field0").toString());
> >>   user.setField1(values.get("field1").toString());
> >>   user.setField2(values.get("field2").toString());
> >>   user.setField3(values.get("field3").toString());
> >>   user.setField4(values.get("field4").toString());
> >>   user.setField5(values.get("field5").toString());
> >>   user.setField6(values.get("field6").toString());
> >>   user.setField7(values.get("field7").toString());
> >>   user.setField8(values.get("field8").toString());
> >>   user.setField9(values.get("field9").toString());
> >>   dataStore.put(user.getUserId().toString(), user);
> >> } catch (Exception e) {
> >>   return FAILED;
> >> }
> >> return SUCCESS;
> >>   }
> >
> >
> > if the above had worked, I would have changed the code as suggested by
> Alfonso. Also, I may be wrong but the way I understand YCSB framework is,
> it will execute an insert operation for each user object, so I thought it
> was right to create a user object within the insert method.
> >
> >
> > 2. I used different config values for -Xmx (256MB, 512MB, 1GB, 2GB) and
> even disabled GC checking using -XX:-UseGCOverheadLimit but they all failed
> with the same GC error.
> >
> > Successful Attempt -- There may be room for improvement

Re: Week 2 Report and A Question

2019-06-11 Thread Renato Marroquín Mogrovejo
Hey Sheriffo,

Cool to hear you are making progress! :) and great to see that we have
some numbers already! :)
Regarding optimization point (1), regardless that this was not he
cause of the issue or not, Alfonso suggestions are something we should
follow, many objects with a short life in java might create a
performance problem sooner or later. Also about your comment:

"Also, I may be wrong but the way I understand YCSB framework is, it
will execute an insert operation for each user object, so I thought it
was right to create a user object within the insert method."

As you pointed out, YCSB is about inserting the objects, and NOT about
creating them, so it doesn't matter if we reuse the objects, as long
as the values that we insert are actually correct. We don't want to
end up measuring object creation+gc. I think Alfonso's comment was
hinting on that direction (please feel free to correct me @Alfonso if
I am misunderstanding you) and I think his comments are just on the
spot.
I have some other questions regarding the numbers you sent around:
- are you running YCSB for each data store with warm JVM? or are these
numbers each with a clean cold JVM? I suppose the latter, right?
- did you try setting gora.hbasestore.scanner.caching to a lower value?
- which is the command that you are using to run/start this code?
- did you try flushing the commits more regularly in:
https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L142
let's say every 1000 elements? or something like that? I mean instead
of at the end of the 1M elements?

Thanks a lot for the report Sheriffo!


Best,

Renato M.

El mar., 11 jun. 2019 a las 16:12, Sheriffo Ceesay
() escribió:
>
> Hello,
>
> I have taken a proper look at the recommendations from @Alfonso and @Renato 
> and below are the outcomes.
>
> Failed Attempts
> 1. Optimisation, for the insert operation, to avoid the concatenation issue, 
> I have just taken the quickest route by calling the methods directly without 
> reflection. Below are those calls. Note: I have moved all reusable codes to 
> the init method.
>
>> public int insert(String table, String key, HashMap 
>> values) {
>>   user.setField0(values.get("field0").toString());
>>   user.setField1(values.get("field1").toString());
>>   user.setField2(values.get("field2").toString());
>>   user.setField3(values.get("field3").toString());
>>   user.setField4(values.get("field4").toString());
>>   user.setField5(values.get("field5").toString());
>>   user.setField6(values.get("field6").toString());
>>   user.setField7(values.get("field7").toString());
>>   user.setField8(values.get("field8").toString());
>>   user.setField9(values.get("field9").toString());
>>   dataStore.put(user.getUserId().toString(), user);
>> } catch (Exception e) {
>>   return FAILED;
>> }
>> return SUCCESS;
>>   }
>
>
> if the above had worked, I would have changed the code as suggested by 
> Alfonso. Also, I may be wrong but the way I understand YCSB framework is, it 
> will execute an insert operation for each user object, so I thought it was 
> right to create a user object within the insert method.
>
>
> 2. I used different config values for -Xmx (256MB, 512MB, 1GB, 2GB) and even 
> disabled GC checking using -XX:-UseGCOverheadLimit but they all failed with 
> the same GC error.
>
> Successful Attempt -- There may be room for improvement
> Using the configurations below worked but I think it is not the best for 
> write performance.
>
> First, I read from [1] related to [2] that the following oneliner code should 
> be executed for better HBase performance when using YCSB. It basically avoids 
> overloading a single region server.
>
> hbase(main):001:0> n_splits = 200 # HBase recommends (10 * number of 
> regionservers)
> hbase(main):002:0> create 'users', 'info', {SPLITS => (1..n_splits).map {|i| 
> "user#{1000+i*(-1000)/n_splits}"}}
>
> Second, as suggested by @Renato Marroquín Mogrovejo , it only works when I set
>
> hbase.client.autoflush.default=true
>
> However, from [3], I found "HBase autoflushing. Enabling autoflush decreases 
> write performance. Available since Gora 0.2. Defaults to disabled.". So I am 
> of the opinion that the problem is not entirely solved.
>
> I have done the following testing to insert 1M records into MongoDB and 
> HBase, so I think this may not be bad after all but more benchmarks may be 
> required to validate this. HBase in Gora has almost the same performance as 
> vanilla YCSB to benchmark it.
>
> Backend  Ave Time Taken (sec)
> MongoDB  ~90
> HBase in Gora  ~160
> HBase YCSB~160
>
>
> [1] https://github.com/brianfrankcooper/YCSB/tree/master/hbase098
> [2] https://issues.apache.org/jira/browse/HBASE-4163
> [3] https://gora.apache.org/current/gora-hbase.html
>
> Comments are welcomed.
>
> Thank you.
> *Sheriffo Ceesay*

Re: Week 2 Report and A Question

2019-06-11 Thread Sheriffo Ceesay
Hello,

I have taken a proper look at the recommendations from @Alfonso and @Renato
and below are the outcomes.

Failed Attempts
1. Optimisation, for the insert operation, to avoid the concatenation
issue, I have just taken the quickest route by calling the methods directly
without reflection. Below are those calls. Note: I have moved all reusable
codes to the init method.

public int insert(String table, String key, HashMap
> values) {
>   user.setField0(values.get("field0").toString());
>   user.setField1(values.get("field1").toString());
>   user.setField2(values.get("field2").toString());
>   user.setField3(values.get("field3").toString());
>   user.setField4(values.get("field4").toString());
>   user.setField5(values.get("field5").toString());
>   user.setField6(values.get("field6").toString());
>   user.setField7(values.get("field7").toString());
>   user.setField8(values.get("field8").toString());
>   user.setField9(values.get("field9").toString());
>   dataStore.put(user.getUserId().toString(), user);
> } catch (Exception e) {
>   return FAILED;
> }
> return SUCCESS;
>   }
>

if the above had worked, I would have changed the code as suggested by
Alfonso. Also, I may be wrong but the way I understand YCSB framework is,
it will execute an insert operation for each user object, so I thought it
was right to create a user object within the insert method.


2. I used different config values for *-Xmx (256MB, 512MB, 1GB, 2GB)* and
even disabled GC checking using *-XX:-UseGCOverheadLimit* but they all
failed with the same GC error.

Successful Attempt -- There may be room for improvement
Using the configurations below worked but I think it is not the best for
write performance.

First, I read from [1] related to [2] that the following oneliner code
should be executed for better HBase performance when using YCSB. It
basically avoids overloading a single region server.

hbase(main):001:0> n_splits = 200 # HBase recommends (10 * number of
regionservers)
hbase(main):002:0> create 'users', 'info', {SPLITS =>
(1..n_splits).map {|i| "user#{1000+i*(-1000)/n_splits}"}}

Second, as suggested by @Renato Marroquín Mogrovejo
 , it only works when I set

*hbase.client.autoflush.default=true*

However, from [3], I found "HBase autoflushing. Enabling autoflush
decreases write performance. Available since Gora 0.2. Defaults to
disabled.". So I am of the opinion that the problem is not entirely solved.

I have done the following testing to insert 1M records into MongoDB and
HBase, so I think this may not be bad after all but more benchmarks may be
required to validate this. HBase in Gora has almost the same performance as
vanilla YCSB to benchmark it.

*Backend  Ave Time Taken (sec)*
MongoDB  ~90
HBase in Gora  ~160
HBase YCSB~160


[1] https://github.com/brianfrankcooper/YCSB/tree/master/hbase098
[2] https://issues.apache.org/jira/browse/HBASE-4163
[3] https://gora.apache.org/current/gora-hbase.html

Comments are welcomed.

Thank you.

**Sheriffo Ceesay**


On Tue, Jun 11, 2019 at 12:04 AM Sheriffo Ceesay 
wrote:

> Hello Alfonso and Renato,
>
> Thank you for getting in touch and thanks for the detailed replies.
>
> I will have proper look at this tomorrow morning. I did some
> troubleshooting yesterday (mostly playing with Xmx and zookeeper timeout
> settings), that improved the conditions, but it did not entirely solve the
> problem. Preliminary, it seems the problem has to do with configuration or
> how HBaseStore is implemented (this may not be entirely true).
>
> I will keep you all posted whenever I thoroughly have a look at your
> suggestions.
>
> Thanks again.
>
>
> **Sheriffo Ceesay**
>
>
> On Mon, Jun 10, 2019 at 11:14 PM Alfonso Nishikawa <
> alfonso.nishik...@gmail.com> wrote:
>
>> Hi!
>>
>> My hypothesis is taht that the difference between MongoDB and HBase is
>> that
>> HBase put more stress serializing with avro. It could affect too that if
>> the HBase's test is performed after MongoDB's ones, then the GC starts
>> from
>> a "bad" situation.
>>
>> From [A] linked by @Renato, if the error was OutOfMemoryException I would
>> have recommended lowering gora.hbasestore.scanner.caching to 100, 10 or
>> even 1, but with a GC error I am not that much sure. In anycase,
>> @Sheriffo:
>> you can try this if with the optimizations still doesn't work :)
>>
>> @Renato: Thx for the links!
>>
>> Regards,
>>
>> Alfonso Nishikawa
>>
>>
>>
>> El lun., 10 jun. 2019 a las 22:02, Renato Marroquín Mogrovejo (<
>> renatoj.marroq...@gmail.com>) escribió:
>>
>> > @Alfonso,
>> > Thank you very much for the suggestions! you are totally right about
>> > all of your points! Sheriffo, please benefit from them ;)
>> >
>> > Also what is strange is this (although it can be optimized as Alfonso
>> > pointed out) is that it works for the MongoDB backend. So I would also
>> > suspect on the configuration of the Gora-HBase cli

Re: Week 2 Report and A Question

2019-06-10 Thread Sheriffo Ceesay
Hello Alfonso and Renato,

Thank you for getting in touch and thanks for the detailed replies.

I will have proper look at this tomorrow morning. I did some
troubleshooting yesterday (mostly playing with Xmx and zookeeper timeout
settings), that improved the conditions, but it did not entirely solve the
problem. Preliminary, it seems the problem has to do with configuration or
how HBaseStore is implemented (this may not be entirely true).

I will keep you all posted whenever I thoroughly have a look at your
suggestions.

Thanks again.


**Sheriffo Ceesay**


On Mon, Jun 10, 2019 at 11:14 PM Alfonso Nishikawa <
alfonso.nishik...@gmail.com> wrote:

> Hi!
>
> My hypothesis is taht that the difference between MongoDB and HBase is that
> HBase put more stress serializing with avro. It could affect too that if
> the HBase's test is performed after MongoDB's ones, then the GC starts from
> a "bad" situation.
>
> From [A] linked by @Renato, if the error was OutOfMemoryException I would
> have recommended lowering gora.hbasestore.scanner.caching to 100, 10 or
> even 1, but with a GC error I am not that much sure. In anycase, @Sheriffo:
> you can try this if with the optimizations still doesn't work :)
>
> @Renato: Thx for the links!
>
> Regards,
>
> Alfonso Nishikawa
>
>
>
> El lun., 10 jun. 2019 a las 22:02, Renato Marroquín Mogrovejo (<
> renatoj.marroq...@gmail.com>) escribió:
>
> > @Alfonso,
> > Thank you very much for the suggestions! you are totally right about
> > all of your points! Sheriffo, please benefit from them ;)
> >
> > Also what is strange is this (although it can be optimized as Alfonso
> > pointed out) is that it works for the MongoDB backend. So I would also
> > suspect on the configuration of the Gora-HBase client. Have you taken
> > a look at [A] for example? or other Gora-HBase assumed configurations
> > [B]? Maybe there you can specify some Xmx / Xms config.
> >
> >
> > Best,
> >
> > Renato M.
> >
> > [A]
> >
> https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/gora.properties
> > [B]
> >
> https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/hbase-site.xml
> >
> > El lun., 10 jun. 2019 a las 23:39, Alfonso Nishikawa
> > () escribió:
> > >
> > > Hi again, Sheriffo.
> > >
> > > More improvements to [1] over the last email:
> > >
> > > - fields.toArray() doesn't need a full array like in [6]. You should do
> > > just fields.toArray(new String[0]), and better if you create an array
> [0]
> > > and reuse it. That call only needs the type.
> > > - I guess the class at [2] will always be the same, so you don't need
> to
> > > set it on every insert call.
> > > - The string concatenation is overkilling for the jvm on the 1M calls
> * N
> > > fields at [3] and same for [4]. Precalculate the names in a list or
> array
> > > and reuse then for the 1M*N calls.
> > > - Other optimization for [3] is, given that PersistentBase [5] exctends
> > > SpecificRecordBase, you can access the fields by index with
> > > SpecificRecordBase.get(int) and SpecificRecordBase.put(int, Object).
> > >
> > > [1] -
> > >
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/ma1in/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L127
> > > [2] -
> > >
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L134
> > > [3] -
> > >
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L136
> > > [4] -
> > >
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L139
> > > [5] -
> > >
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-core/src/main/java/org/apache/gora/persistency/impl/PersistentBase.java#L3
> > > [6] -
> > >
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L163
> > >
> > > Let's see if with that optimizations we free the jvm memory management
> > from
> > > much stress.
> > >
> > > Regards,
> > >
> > > Alfonso Nishikawa
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > El lun., 10 jun. 2019 a las 21:18, Alfonso Nishikawa (<
> > > alfonso.nishik...@gmail.com>) escribió:
> > >
> > > > Hi, Sheriffo.
> > > >
> > > > You can try reusing the Persistent instances [1] to insert the data.
> I
> > > > don't know all the backends, but they should be reusable, at least in
> > > > mongoDB and HBase.
> > > >
> > > > [1] -
> > > >
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130
> > > >
> > > > Regards,
> > > >
> > > > Alfonso Nishikawa
> > > >
> > > > El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (<
> > > > alfonso.nishik...@gmail.com>) escribió:
> > > >
> > > >> Hi, Sheriffo.
> > > >>
> > > >> I really don't k

Re: Week 2 Report and A Question

2019-06-10 Thread Alfonso Nishikawa
Hi!

My hypothesis is taht that the difference between MongoDB and HBase is that
HBase put more stress serializing with avro. It could affect too that if
the HBase's test is performed after MongoDB's ones, then the GC starts from
a "bad" situation.

>From [A] linked by @Renato, if the error was OutOfMemoryException I would
have recommended lowering gora.hbasestore.scanner.caching to 100, 10 or
even 1, but with a GC error I am not that much sure. In anycase, @Sheriffo:
you can try this if with the optimizations still doesn't work :)

@Renato: Thx for the links!

Regards,

Alfonso Nishikawa



El lun., 10 jun. 2019 a las 22:02, Renato Marroquín Mogrovejo (<
renatoj.marroq...@gmail.com>) escribió:

> @Alfonso,
> Thank you very much for the suggestions! you are totally right about
> all of your points! Sheriffo, please benefit from them ;)
>
> Also what is strange is this (although it can be optimized as Alfonso
> pointed out) is that it works for the MongoDB backend. So I would also
> suspect on the configuration of the Gora-HBase client. Have you taken
> a look at [A] for example? or other Gora-HBase assumed configurations
> [B]? Maybe there you can specify some Xmx / Xms config.
>
>
> Best,
>
> Renato M.
>
> [A]
> https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/gora.properties
> [B]
> https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/hbase-site.xml
>
> El lun., 10 jun. 2019 a las 23:39, Alfonso Nishikawa
> () escribió:
> >
> > Hi again, Sheriffo.
> >
> > More improvements to [1] over the last email:
> >
> > - fields.toArray() doesn't need a full array like in [6]. You should do
> > just fields.toArray(new String[0]), and better if you create an array [0]
> > and reuse it. That call only needs the type.
> > - I guess the class at [2] will always be the same, so you don't need to
> > set it on every insert call.
> > - The string concatenation is overkilling for the jvm on the 1M calls * N
> > fields at [3] and same for [4]. Precalculate the names in a list or array
> > and reuse then for the 1M*N calls.
> > - Other optimization for [3] is, given that PersistentBase [5] exctends
> > SpecificRecordBase, you can access the fields by index with
> > SpecificRecordBase.get(int) and SpecificRecordBase.put(int, Object).
> >
> > [1] -
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/ma1in/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L127
> > [2] -
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L134
> > [3] -
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L136
> > [4] -
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L139
> > [5] -
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-core/src/main/java/org/apache/gora/persistency/impl/PersistentBase.java#L3
> > [6] -
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L163
> >
> > Let's see if with that optimizations we free the jvm memory management
> from
> > much stress.
> >
> > Regards,
> >
> > Alfonso Nishikawa
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > El lun., 10 jun. 2019 a las 21:18, Alfonso Nishikawa (<
> > alfonso.nishik...@gmail.com>) escribió:
> >
> > > Hi, Sheriffo.
> > >
> > > You can try reusing the Persistent instances [1] to insert the data. I
> > > don't know all the backends, but they should be reusable, at least in
> > > mongoDB and HBase.
> > >
> > > [1] -
> > >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130
> > >
> > > Regards,
> > >
> > > Alfonso Nishikawa
> > >
> > > El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (<
> > > alfonso.nishik...@gmail.com>) escribió:
> > >
> > >> Hi, Sheriffo.
> > >>
> > >> I really don't know how to solve it, but are you setting any Xmx / Xms
> > >> configuration values?
> > >>
> > >> Regards,
> > >>
> > >> Alfonso NIshikawa
> > >>
> > >>
> > >> El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay (<
> sneceesa...@gmail.com>)
> > >> escribió:
> > >>
> > >>> Hi All,
> > >>>
> > >>> Week 2 progress update is available at
> > >>>
> > >>>
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> > >>>
> > >>> I have one question that I would like my mentors to advise on, I am
> still
> > >>> working it but thought it would be good to report it because it is
> HBase
> > >>> specific.
> > >>>
> > >>> So the problem has to do with an OutOfMemory error when inserting 1M
> +
> > >>> record in HBase.  This happens when I try to run the actual
> benchmark by
> > >>> first loading HBase with 1 million plus records. It works perfectly
> for
> > >>> M

Re: Week 2 Report and A Question

2019-06-10 Thread Renato Marroquín Mogrovejo
@Alfonso,
Thank you very much for the suggestions! you are totally right about
all of your points! Sheriffo, please benefit from them ;)

Also what is strange is this (although it can be optimized as Alfonso
pointed out) is that it works for the MongoDB backend. So I would also
suspect on the configuration of the Gora-HBase client. Have you taken
a look at [A] for example? or other Gora-HBase assumed configurations
[B]? Maybe there you can specify some Xmx / Xms config.


Best,

Renato M.

[A] 
https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/gora.properties
[B] 
https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/hbase-site.xml

El lun., 10 jun. 2019 a las 23:39, Alfonso Nishikawa
() escribió:
>
> Hi again, Sheriffo.
>
> More improvements to [1] over the last email:
>
> - fields.toArray() doesn't need a full array like in [6]. You should do
> just fields.toArray(new String[0]), and better if you create an array [0]
> and reuse it. That call only needs the type.
> - I guess the class at [2] will always be the same, so you don't need to
> set it on every insert call.
> - The string concatenation is overkilling for the jvm on the 1M calls * N
> fields at [3] and same for [4]. Precalculate the names in a list or array
> and reuse then for the 1M*N calls.
> - Other optimization for [3] is, given that PersistentBase [5] exctends
> SpecificRecordBase, you can access the fields by index with
> SpecificRecordBase.get(int) and SpecificRecordBase.put(int, Object).
>
> [1] -
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/ma1in/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L127
> [2] -
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L134
> [3] -
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L136
> [4] -
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L139
> [5] -
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-core/src/main/java/org/apache/gora/persistency/impl/PersistentBase.java#L3
> [6] -
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L163
>
> Let's see if with that optimizations we free the jvm memory management from
> much stress.
>
> Regards,
>
> Alfonso Nishikawa
>
>
>
>
>
>
>
>
>
>
> El lun., 10 jun. 2019 a las 21:18, Alfonso Nishikawa (<
> alfonso.nishik...@gmail.com>) escribió:
>
> > Hi, Sheriffo.
> >
> > You can try reusing the Persistent instances [1] to insert the data. I
> > don't know all the backends, but they should be reusable, at least in
> > mongoDB and HBase.
> >
> > [1] -
> > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130
> >
> > Regards,
> >
> > Alfonso Nishikawa
> >
> > El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (<
> > alfonso.nishik...@gmail.com>) escribió:
> >
> >> Hi, Sheriffo.
> >>
> >> I really don't know how to solve it, but are you setting any Xmx / Xms
> >> configuration values?
> >>
> >> Regards,
> >>
> >> Alfonso NIshikawa
> >>
> >>
> >> El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay ()
> >> escribió:
> >>
> >>> Hi All,
> >>>
> >>> Week 2 progress update is available at
> >>>
> >>> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> >>>
> >>> I have one question that I would like my mentors to advise on, I am still
> >>> working it but thought it would be good to report it because it is HBase
> >>> specific.
> >>>
> >>> So the problem has to do with an OutOfMemory error when inserting 1M +
> >>> record in HBase.  This happens when I try to run the actual benchmark by
> >>> first loading HBase with 1 million plus records. It works perfectly for
> >>> MongoDB but not HBase
> >>>
> >>> So I am assuming this problem is specific to HBase.  The stack trace is
> >>> given below.
> >>>
> >>> Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead
> >>> limit exceeded
> >>>
> >>>
> >>>
> >>> at
> >>> java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
> >>>
> >>>
> >>>
> >>> at java.lang.StringCoding.encode(StringCoding.java:344)
> >>>
> >>>
> >>>
> >>>
> >>> at java.lang.String.getBytes(String.java:918)
> >>>
> >>>
> >>>
> >>>
> >>> at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733)
> >>>
> >>>
> >>>
> >>>
> >>> at
> >>>
> >>> org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225)
> >>>
> >>>
> >>>
> >>> at
> >>>
> >>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383)
> >>>
> >>>
> >>>
> >>> at
> >>>
> >>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseS

Re: Week 2 Report and A Question

2019-06-10 Thread Alfonso Nishikawa
Hi again, Sheriffo.

More improvements to [1] over the last email:

- fields.toArray() doesn't need a full array like in [6]. You should do
just fields.toArray(new String[0]), and better if you create an array [0]
and reuse it. That call only needs the type.
- I guess the class at [2] will always be the same, so you don't need to
set it on every insert call.
- The string concatenation is overkilling for the jvm on the 1M calls * N
fields at [3] and same for [4]. Precalculate the names in a list or array
and reuse then for the 1M*N calls.
- Other optimization for [3] is, given that PersistentBase [5] exctends
SpecificRecordBase, you can access the fields by index with
SpecificRecordBase.get(int) and SpecificRecordBase.put(int, Object).

[1] -
https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/ma1in/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L127
[2] -
https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L134
[3] -
https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L136
[4] -
https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L139
[5] -
https://github.com/sneceesay77/gora/blob/GORA-532/gora-core/src/main/java/org/apache/gora/persistency/impl/PersistentBase.java#L3
[6] -
https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L163

Let's see if with that optimizations we free the jvm memory management from
much stress.

Regards,

Alfonso Nishikawa










El lun., 10 jun. 2019 a las 21:18, Alfonso Nishikawa (<
alfonso.nishik...@gmail.com>) escribió:

> Hi, Sheriffo.
>
> You can try reusing the Persistent instances [1] to insert the data. I
> don't know all the backends, but they should be reusable, at least in
> mongoDB and HBase.
>
> [1] -
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130
>
> Regards,
>
> Alfonso Nishikawa
>
> El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (<
> alfonso.nishik...@gmail.com>) escribió:
>
>> Hi, Sheriffo.
>>
>> I really don't know how to solve it, but are you setting any Xmx / Xms
>> configuration values?
>>
>> Regards,
>>
>> Alfonso NIshikawa
>>
>>
>> El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay ()
>> escribió:
>>
>>> Hi All,
>>>
>>> Week 2 progress update is available at
>>>
>>> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
>>>
>>> I have one question that I would like my mentors to advise on, I am still
>>> working it but thought it would be good to report it because it is HBase
>>> specific.
>>>
>>> So the problem has to do with an OutOfMemory error when inserting 1M +
>>> record in HBase.  This happens when I try to run the actual benchmark by
>>> first loading HBase with 1 million plus records. It works perfectly for
>>> MongoDB but not HBase
>>>
>>> So I am assuming this problem is specific to HBase.  The stack trace is
>>> given below.
>>>
>>> Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead
>>> limit exceeded
>>>
>>>
>>>
>>> at
>>> java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
>>>
>>>
>>>
>>> at java.lang.StringCoding.encode(StringCoding.java:344)
>>>
>>>
>>>
>>>
>>> at java.lang.String.getBytes(String.java:918)
>>>
>>>
>>>
>>>
>>> at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733)
>>>
>>>
>>>
>>>
>>> at
>>>
>>> org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225)
>>>
>>>
>>>
>>> at
>>>
>>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383)
>>>
>>>
>>>
>>> at
>>>
>>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:348)
>>>
>>>
>>>
>>> at
>>> org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:319)
>>>
>>>
>>>
>>>
>>> at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:84)
>>>
>>>
>>>
>>>
>>> at
>>>
>>> org.apache.gora.benchmark.GoraBenchmarkClient.insert(GoraBenchmarkClient.java:141)
>>>
>>>
>>>
>>> at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148)
>>>
>>>
>>>
>>>
>>> at
>>> com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461)
>>>
>>>
>>>
>>> at com.yahoo.ycsb.ClientThread.run(Client.java:269)
>>>
>>> The insert implementation of the module available at
>>> https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark  in
>>> GoraBenchmarkClient.java is very straight forward. I have had a brief
>>> look
>>> at HBaseStore.java put() implementation but could not find an issue with
>>> that.
>>>
>>> If I solve this problem, then I will do run more workloads to verify that
>>> the module is stable for the

Re: Week 2 Report and A Question

2019-06-10 Thread Alfonso Nishikawa
Hi, Sheriffo.

You can try reusing the Persistent instances [1] to insert the data. I
don't know all the backends, but they should be reusable, at least in
mongoDB and HBase.

[1] -
https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130

Regards,

Alfonso Nishikawa

El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (<
alfonso.nishik...@gmail.com>) escribió:

> Hi, Sheriffo.
>
> I really don't know how to solve it, but are you setting any Xmx / Xms
> configuration values?
>
> Regards,
>
> Alfonso NIshikawa
>
>
> El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay ()
> escribió:
>
>> Hi All,
>>
>> Week 2 progress update is available at
>>
>> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
>>
>> I have one question that I would like my mentors to advise on, I am still
>> working it but thought it would be good to report it because it is HBase
>> specific.
>>
>> So the problem has to do with an OutOfMemory error when inserting 1M +
>> record in HBase.  This happens when I try to run the actual benchmark by
>> first loading HBase with 1 million plus records. It works perfectly for
>> MongoDB but not HBase
>>
>> So I am assuming this problem is specific to HBase.  The stack trace is
>> given below.
>>
>> Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead
>> limit exceeded
>>
>>
>>
>> at
>> java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
>>
>>
>>
>> at java.lang.StringCoding.encode(StringCoding.java:344)
>>
>>
>>
>>
>> at java.lang.String.getBytes(String.java:918)
>>
>>
>>
>>
>> at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733)
>>
>>
>>
>>
>> at
>>
>> org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225)
>>
>>
>>
>> at
>>
>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383)
>>
>>
>>
>> at
>>
>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:348)
>>
>>
>>
>> at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:319)
>>
>>
>>
>>
>> at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:84)
>>
>>
>>
>>
>> at
>>
>> org.apache.gora.benchmark.GoraBenchmarkClient.insert(GoraBenchmarkClient.java:141)
>>
>>
>>
>> at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148)
>>
>>
>>
>>
>> at
>> com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461)
>>
>>
>>
>> at com.yahoo.ycsb.ClientThread.run(Client.java:269)
>>
>> The insert implementation of the module available at
>> https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark  in
>> GoraBenchmarkClient.java is very straight forward. I have had a brief look
>> at HBaseStore.java put() implementation but could not find an issue with
>> that.
>>
>> If I solve this problem, then I will do run more workloads to verify that
>> the module is stable for the basic implementation. Then I will go ahead
>> and
>> work on suggestions made by Renato last week.
>>
>> Please let me know what your thoughts are.
>>
>>
>> Thank you.
>>
>>
>>
>> **Sheriffo Ceesay**
>>
>


Re: Week 2 Report and A Question

2019-06-10 Thread Alfonso Nishikawa
Hi, Sheriffo.

I really don't know how to solve it, but are you setting any Xmx / Xms
configuration values?

Regards,

Alfonso NIshikawa


El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay ()
escribió:

> Hi All,
>
> Week 2 progress update is available at
>
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
>
> I have one question that I would like my mentors to advise on, I am still
> working it but thought it would be good to report it because it is HBase
> specific.
>
> So the problem has to do with an OutOfMemory error when inserting 1M +
> record in HBase.  This happens when I try to run the actual benchmark by
> first loading HBase with 1 million plus records. It works perfectly for
> MongoDB but not HBase
>
> So I am assuming this problem is specific to HBase.  The stack trace is
> given below.
>
> Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead
> limit exceeded
>
>
>
> at
> java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
>
>
>
> at java.lang.StringCoding.encode(StringCoding.java:344)
>
>
>
>
> at java.lang.String.getBytes(String.java:918)
>
>
>
>
> at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733)
>
>
>
>
> at
>
> org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225)
>
>
>
> at
>
> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383)
>
>
>
> at
>
> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:348)
>
>
>
> at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:319)
>
>
>
>
> at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:84)
>
>
>
>
> at
>
> org.apache.gora.benchmark.GoraBenchmarkClient.insert(GoraBenchmarkClient.java:141)
>
>
>
> at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148)
>
>
>
>
> at
> com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461)
>
>
>
> at com.yahoo.ycsb.ClientThread.run(Client.java:269)
>
> The insert implementation of the module available at
> https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark  in
> GoraBenchmarkClient.java is very straight forward. I have had a brief look
> at HBaseStore.java put() implementation but could not find an issue with
> that.
>
> If I solve this problem, then I will do run more workloads to verify that
> the module is stable for the basic implementation. Then I will go ahead and
> work on suggestions made by Renato last week.
>
> Please let me know what your thoughts are.
>
>
> Thank you.
>
>
>
> **Sheriffo Ceesay**
>


Week 2 Report and A Question

2019-06-08 Thread Sheriffo Ceesay
Hi All,

Week 2 progress update is available at
https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report

I have one question that I would like my mentors to advise on, I am still
working it but thought it would be good to report it because it is HBase
specific.

So the problem has to do with an OutOfMemory error when inserting 1M +
record in HBase.  This happens when I try to run the actual benchmark by
first loading HBase with 1 million plus records. It works perfectly for
MongoDB but not HBase

So I am assuming this problem is specific to HBase.  The stack trace is
given below.

Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead
limit exceeded



at
java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)



at java.lang.StringCoding.encode(StringCoding.java:344)




at java.lang.String.getBytes(String.java:918)




at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733)




at
org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225)



at
org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383)



at
org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:348)



at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:319)




at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:84)




at
org.apache.gora.benchmark.GoraBenchmarkClient.insert(GoraBenchmarkClient.java:141)



at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148)




at
com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461)



at com.yahoo.ycsb.ClientThread.run(Client.java:269)

The insert implementation of the module available at
https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark  in
GoraBenchmarkClient.java is very straight forward. I have had a brief look
at HBaseStore.java put() implementation but could not find an issue with
that.

If I solve this problem, then I will do run more workloads to verify that
the module is stable for the basic implementation. Then I will go ahead and
work on suggestions made by Renato last week.

Please let me know what your thoughts are.


Thank you.



**Sheriffo Ceesay**