Re: Rough notes from dev meetup, day after hbaseconasia 2018, saturday morning

Misty Linville Sun, 19 Aug 2018 21:02:09 -0700

Thanks for the report and for attending, and thanks to everyone who made
HbaseCon Asia 2018 and the dev meet-up happen!


On Sun, Aug 19, 2018, 3:50 AM Stack <st...@duboce.net> wrote:

> There were about 30 of us. I didn't take roll. See photos below [1][2].
> PMCers, committers, contributors, and speakers from the day before. There
> is no attribution of comments or ideas. Please excuse. No agenda.
>
> TESTING
> What do people do testing?
> Allan Yang is finding good stuff when he tests AMv2 compared to me. Why?
> slowDeterministic does more op types than serverKilling.
> What do others do for testing?
> Add more variety to the ITBLLs, more chaos?
> What for performance testing?
> YCSB.
> Batch is important. Its what our users do. Recent addition of batch in YCSB
> (and in PerformanceEvaluation). Size of batch matters too. And number of
> clients.
> Alibaba described what they do.
> Advocate that we all try different test types rather than all do same runs.
> Need to add new async client into YCSB. Alibaba use it for their testing of
> new SEDA core (upstreaming soon).
> Understanding each others benchmarks can take a while. Common understanding
> takes some effort, communication.
> New hbase-operation-tools will be good place to put perf and testing
> tooling.
>
> GITHUB
> Can hbase adopt the github dev flow? Support PRs?
> Its a case of just starting the discussion on the dev list?
> Do we lose review/commentary information if we go github route? Brief
> overview of what is possible w/ the new gitbox repos follows ultimately
> answering that no, there should be no loss (github comments show as jira
> comments).
> Most have github but not apache accounts. PRs are easier. Could encourage
> more contribution, lower the barrier to contrib.
> Other tools for hbase-operation-tools would be stuff like the alibaba
> tooling for shutting down servers... moving regions to new one.
>
> PERF ACROSS VERSIONS
> Lucent (lucene?) has a perf curve on home page with markings for when large
> features arrived and when releases were cut so can see if increase/decrease
> in perf.
> There was a big slowdown going from 0.98 to 1.1.2 hbase.
> We talked about doing such a perf curve on hbase home page. Would be a big
> project. Asked if anyone interested?
> Perhaps a dedicated cluster up on Apache. We could do a whip-around to pay
> for it.
>
> USER FRIENDLY
> Small/new users have a hard time. Is there a UI for users to see data in
> cells or to change schema, or to create/drop tables. Is there anything we
> can do here?
> Much back and forth.
> Xiaomi don't let users have access to shell. Have a web ui where you click
> to build command that is run for you. Afraid that users will mistakenly
> destroy the database so shudown access.
> It turns out that most of the bigger players present have some form of UI
> built against hbase. Alibaba have something. The DiDi folks have howto wiki
> pages.
> Talked about upstreaming.
> Where to put it? hbase-operator-tools?
> What about Docker file to give devs their own hbase easily. Can throw away
> when done.
> One attendee talked of Hue from CDH, how it is good for simple insert and
> view.
> Can check the data. For testing and feel-good getting to know system, it
> helps.
> Another uses Apache Drill but tough when types.
> New users need to be able to import data from a csv.
> How hard to have a few pages of clicky, clicky, wizard to create/drop
> tables or for small query...
> A stripped-down version of Hue to come with HBase.... how hard to do this?
>
> Next we went over backburner items mention on previous day staring with
> SQL-like access.
> What about lightweight SQL support?
> At Huawei... they have a project going for lightweight SQL support in hbase
> based-on calcite.
> For big queries, they'd go to sparksql.
> Did you look at phoenix?
> Phoenix is complicated, difficult. Calcite migration not done in Phoenix
> (Sparksql is not calcite-based).
> Talk to phoenix project about generating a lightweight artifact. We could
> help with build. One nice idea was building with a cut-down grammar, one
> that removed all the "big stuff" and problematics. Could return to the user
> a nice "not supported" if they try to do a 10Bx10B join.
> An interesting idea about a facade query analyzer making transfer to
> sparksql if big query. Would need stats.
>
> COPROCESSORS
> Can we add some identifiers to distinguish whether request from CP or from
> client. Can we calculate stats on CP resources used? Limit? Can we update
> CPs more gracefully. If heavy usage, when update. Have to disable the
> table. Short answer was no.
> Move CPs to another process. A sidecar process is way to go.
> The Huawei effort at lightweight would also use CPs (like Phoenix).
> Bring the types into hbase, the phoenix types for spark to use etc.
>
> SECONDARY INDICES
> Full support is hard, can we do step-by-step...
> Seperate into several steps?
> Push back that this is a well covered space. Problems known. Contribs in
> tier above welcome.
>
> One attendee asked where does hbase want to go? Is it storage or a db
> system? If the former, then should draw the line and sql, graph, geo, is in
> layers above, not integrated. Need to draw a sharp line. Do what we are
> good at.
>
> END-TO-END-ASYNC
> Lots of pieces in place now. Last bit is core. Alibaba working on this.
> Put request into a Queue, another thread into memory, another thread to
> HDFS. Another thread to get result and response to users.
> What to do if blocked HDFS? How you stop process from spinning up too many
> threads and having too many ongoing requests? Queues would be bounded.
> One attendee suggested the async core be done with coroutine. Was asked,
> which JDK has coroutine. Answer, the AJDK. Whats that? The Alibaba JDK
> (they have their own JDK team). Laughter all around.
>
> JAVA SUPPORT
> We don't support JDK9... JDK10. We'll are bound by HDFS and Spark.
> Is there any perf to be had in new JDKs. Answer, some, and yes. Offheaping
> will be able to save a copy. Direct I/O. New API for BBs.
>
> SPARK
> Be able to scan hfiles directly. Work to transfer to parquet for spark to
> query.
> One attendee using the replication for streaming out to parquet. Then
> having spark go against that. Talk of compacting into parquet then having
> spark query parquet files and for the difference between now and last
> compaction, go to hbase api.
>
> 1.
>
> https://drive.google.com/file/d/0B4a3E58mCyOfSVJNQklEM0gyQ3VDYV9aMHlqTmdNNWgwQ3Bj/view?usp=sharing
> 2.
>
> https://drive.google.com/file/d/0B4a3E58mCyOfMnF5QWpNTDRkc3M1anRRMEJVSjlBYVhsQm9F/view?usp=sharing
>

Re: Rough notes from dev meetup, day after hbaseconasia 2018, saturday morning

Reply via email to