Re: Rough notes from dev meetup, day after hbaseconasia 2018, saturday morning

Yu Li Sun, 19 Aug 2018 20:52:02 -0700

Good one boss, please teach me how to do shorthand next time ^o^

Some supplement/comments below.


bq. Lucent (lucene?) has a perf curve on home page with markings for when
large features arrived and when releases were cut so can see if
increase/decrease in perf.
Yes Lucene. Should be this page about benchmark
<https://home.apache.org/~mikemccand/lucenebench/> and curves like this
<https://home.apache.org/~mikemccand/lucenebench/indexing.html>, I believe
this will help a lot for users to decide which version to use and what to
notice.

bq. One attendee suggested the async core be done with coroutine. Was
asked, which JDK has coroutine. Answer, the AJDK. Whats that? The Alibaba
JDK (they have their own JDK team)
Kotlin was also mentioned and more details on this link
<https://kotlinlang.org/docs/tutorials/coroutines-basic-jvm.html>. There're
also some other existing coroutine implementations in java, please check this
link
<https://stackoverflow.com/questions/2846664/implementing-coroutines-in-java>
on stackoverflow.
Still plan to upstream what we have based on SEDA and queues/callbacks and
in parallel people with interests could check coroutine tech (smile)

bq. One attendee asked where does hbase want to go? Is it storage or a db
system? Need to draw a sharp line. Do what we are good at.
Wish more people could respond to this question, maybe throwing a separate
thread even to the @user list? Shall we have some discussion in our PMC
list?
*HBase is a mature project but are we too old to innovate and evolve? Where
is our passion and ambition? I want to see the answer. We need to see the
answer.*

Maybe we should extract some umbrella JIRAs to further discuss/move on?
Turning the discussion into plans and action.

Thanks.


Best Regards,
Yu

On 19 August 2018 at 18:57, Stack <st...@duboce.net> wrote:

> Attendees! If anything to add, pile it on of if error, please correct.
> S
>
> On Sun, Aug 19, 2018 at 6:48 PM Stack <st...@duboce.net> wrote:
>
> > There were about 30 of us. I didn't take roll. See photos below [1][2].
> > PMCers, committers, contributors, and speakers from the day before. There
> > is no attribution of comments or ideas. Please excuse. No agenda.
> >
> > TESTING
> > What do people do testing?
> > Allan Yang is finding good stuff when he tests AMv2 compared to me. Why?
> > slowDeterministic does more op types than serverKilling.
> > What do others do for testing?
> > Add more variety to the ITBLLs, more chaos?
> > What for performance testing?
> > YCSB.
> > Batch is important. Its what our users do. Recent addition of batch in
> > YCSB (and in PerformanceEvaluation). Size of batch matters too. And
> number
> > of clients.
> > Alibaba described what they do.
> > Advocate that we all try different test types rather than all do same
> runs.
> > Need to add new async client into YCSB. Alibaba use it for their testing
> > of new SEDA core (upstreaming soon).
> > Understanding each others benchmarks can take a while. Common
> > understanding takes some effort, communication.
> > New hbase-operation-tools will be good place to put perf and testing
> > tooling.
> >
> > GITHUB
> > Can hbase adopt the github dev flow? Support PRs?
> > Its a case of just starting the discussion on the dev list?
> > Do we lose review/commentary information if we go github route? Brief
> > overview of what is possible w/ the new gitbox repos follows ultimately
> > answering that no, there should be no loss (github comments show as jira
> > comments).
> > Most have github but not apache accounts. PRs are easier. Could encourage
> > more contribution, lower the barrier to contrib.
> > Other tools for hbase-operation-tools would be stuff like the alibaba
> > tooling for shutting down servers... moving regions to new one.
> >
> > PERF ACROSS VERSIONS
> > Lucent (lucene?) has a perf curve on home page with markings for when
> > large features arrived and when releases were cut so can see if
> > increase/decrease in perf.
> > There was a big slowdown going from 0.98 to 1.1.2 hbase.
> > We talked about doing such a perf curve on hbase home page. Would be a
> big
> > project. Asked if anyone interested?
> > Perhaps a dedicated cluster up on Apache. We could do a whip-around to
> pay
> > for it.
> >
> > USER FRIENDLY
> > Small/new users have a hard time. Is there a UI for users to see data in
> > cells or to change schema, or to create/drop tables. Is there anything we
> > can do here?
> > Much back and forth.
> > Xiaomi don't let users have access to shell. Have a web ui where you
> click
> > to build command that is run for you. Afraid that users will mistakenly
> > destroy the database so shudown access.
> > It turns out that most of the bigger players present have some form of UI
> > built against hbase. Alibaba have something. The DiDi folks have howto
> wiki
> > pages.
> > Talked about upstreaming.
> > Where to put it? hbase-operator-tools?
> > What about Docker file to give devs their own hbase easily. Can throw
> away
> > when done.
> > One attendee talked of Hue from CDH, how it is good for simple insert and
> > view.
> > Can check the data. For testing and feel-good getting to know system, it
> > helps.
> > Another uses Apache Drill but tough when types.
> > New users need to be able to import data from a csv.
> > How hard to have a few pages of clicky, clicky, wizard to create/drop
> > tables or for small query...
> > A stripped-down version of Hue to come with HBase.... how hard to do
> this?
> >
> > Next we went over backburner items mention on previous day staring with
> > SQL-like access.
> > What about lightweight SQL support?
> > At Huawei... they have a project going for lightweight SQL support in
> > hbase based-on calcite.
> > For big queries, they'd go to sparksql.
> > Did you look at phoenix?
> > Phoenix is complicated, difficult. Calcite migration not done in Phoenix
> > (Sparksql is not calcite-based).
> > Talk to phoenix project about generating a lightweight artifact. We could
> > help with build. One nice idea was building with a cut-down grammar, one
> > that removed all the "big stuff" and problematics. Could return to the
> user
> > a nice "not supported" if they try to do a 10Bx10B join.
> > An interesting idea about a facade query analyzer making transfer to
> > sparksql if big query. Would need stats.
> >
> > COPROCESSORS
> > Can we add some identifiers to distinguish whether request from CP or
> from
> > client. Can we calculate stats on CP resources used? Limit? Can we update
> > CPs more gracefully. If heavy usage, when update. Have to disable the
> > table. Short answer was no.
> > Move CPs to another process. A sidecar process is way to go.
> > The Huawei effort at lightweight would also use CPs (like Phoenix).
> > Bring the types into hbase, the phoenix types for spark to use etc.
> >
> > SECONDARY INDICES
> > Full support is hard, can we do step-by-step...
> > Seperate into several steps?
> > Push back that this is a well covered space. Problems known. Contribs in
> > tier above welcome.
> >
> > One attendee asked where does hbase want to go? Is it storage or a db
> > system? If the former, then should draw the line and sql, graph, geo, is
> in
> > layers above, not integrated. Need to draw a sharp line. Do what we are
> > good at.
> >
> > END-TO-END-ASYNC
> > Lots of pieces in place now. Last bit is core. Alibaba working on this.
> > Put request into a Queue, another thread into memory, another thread to
> > HDFS. Another thread to get result and response to users.
> > What to do if blocked HDFS? How you stop process from spinning up too
> many
> > threads and having too many ongoing requests? Queues would be bounded.
> > One attendee suggested the async core be done with coroutine. Was asked,
> > which JDK has coroutine. Answer, the AJDK. Whats that? The Alibaba JDK
> > (they have their own JDK team). Laughter all around.
> >
> > JAVA SUPPORT
> > We don't support JDK9... JDK10. We'll are bound by HDFS and Spark.
> > Is there any perf to be had in new JDKs. Answer, some, and yes.
> Offheaping
> > will be able to save a copy. Direct I/O. New API for BBs.
> >
> > SPARK
> > Be able to scan hfiles directly. Work to transfer to parquet for spark to
> > query.
> > One attendee using the replication for streaming out to parquet. Then
> > having spark go against that. Talk of compacting into parquet then having
> > spark query parquet files and for the difference between now and last
> > compaction, go to hbase api.
> >
> > 1.
> > https://drive.google.com/file/d/0B4a3E58mCyOfSVJNQklEM0gyQ3VDY
> V9aMHlqTmdNNWgwQ3Bj/view?usp=sharing
> > 2.
> > https://drive.google.com/file/d/0B4a3E58mCyOfMnF5QWpNTDRkc3M1a
> nRRMEJVSjlBYVhsQm9F/view?usp=sharing
> >
>

Re: Rough notes from dev meetup, day after hbaseconasia 2018, saturday morning

Reply via email to