Thanks for the report and for attending, and thanks to everyone who made HbaseCon Asia 2018 and the dev meet-up happen!
On Sun, Aug 19, 2018, 3:50 AM Stack <st...@duboce.net> wrote: > There were about 30 of us. I didn't take roll. See photos below [1][2]. > PMCers, committers, contributors, and speakers from the day before. There > is no attribution of comments or ideas. Please excuse. No agenda. > > TESTING > What do people do testing? > Allan Yang is finding good stuff when he tests AMv2 compared to me. Why? > slowDeterministic does more op types than serverKilling. > What do others do for testing? > Add more variety to the ITBLLs, more chaos? > What for performance testing? > YCSB. > Batch is important. Its what our users do. Recent addition of batch in YCSB > (and in PerformanceEvaluation). Size of batch matters too. And number of > clients. > Alibaba described what they do. > Advocate that we all try different test types rather than all do same runs. > Need to add new async client into YCSB. Alibaba use it for their testing of > new SEDA core (upstreaming soon). > Understanding each others benchmarks can take a while. Common understanding > takes some effort, communication. > New hbase-operation-tools will be good place to put perf and testing > tooling. > > GITHUB > Can hbase adopt the github dev flow? Support PRs? > Its a case of just starting the discussion on the dev list? > Do we lose review/commentary information if we go github route? Brief > overview of what is possible w/ the new gitbox repos follows ultimately > answering that no, there should be no loss (github comments show as jira > comments). > Most have github but not apache accounts. PRs are easier. Could encourage > more contribution, lower the barrier to contrib. > Other tools for hbase-operation-tools would be stuff like the alibaba > tooling for shutting down servers... moving regions to new one. > > PERF ACROSS VERSIONS > Lucent (lucene?) has a perf curve on home page with markings for when large > features arrived and when releases were cut so can see if increase/decrease > in perf. > There was a big slowdown going from 0.98 to 1.1.2 hbase. > We talked about doing such a perf curve on hbase home page. Would be a big > project. Asked if anyone interested? > Perhaps a dedicated cluster up on Apache. We could do a whip-around to pay > for it. > > USER FRIENDLY > Small/new users have a hard time. Is there a UI for users to see data in > cells or to change schema, or to create/drop tables. Is there anything we > can do here? > Much back and forth. > Xiaomi don't let users have access to shell. Have a web ui where you click > to build command that is run for you. Afraid that users will mistakenly > destroy the database so shudown access. > It turns out that most of the bigger players present have some form of UI > built against hbase. Alibaba have something. The DiDi folks have howto wiki > pages. > Talked about upstreaming. > Where to put it? hbase-operator-tools? > What about Docker file to give devs their own hbase easily. Can throw away > when done. > One attendee talked of Hue from CDH, how it is good for simple insert and > view. > Can check the data. For testing and feel-good getting to know system, it > helps. > Another uses Apache Drill but tough when types. > New users need to be able to import data from a csv. > How hard to have a few pages of clicky, clicky, wizard to create/drop > tables or for small query... > A stripped-down version of Hue to come with HBase.... how hard to do this? > > Next we went over backburner items mention on previous day staring with > SQL-like access. > What about lightweight SQL support? > At Huawei... they have a project going for lightweight SQL support in hbase > based-on calcite. > For big queries, they'd go to sparksql. > Did you look at phoenix? > Phoenix is complicated, difficult. Calcite migration not done in Phoenix > (Sparksql is not calcite-based). > Talk to phoenix project about generating a lightweight artifact. We could > help with build. One nice idea was building with a cut-down grammar, one > that removed all the "big stuff" and problematics. Could return to the user > a nice "not supported" if they try to do a 10Bx10B join. > An interesting idea about a facade query analyzer making transfer to > sparksql if big query. Would need stats. > > COPROCESSORS > Can we add some identifiers to distinguish whether request from CP or from > client. Can we calculate stats on CP resources used? Limit? Can we update > CPs more gracefully. If heavy usage, when update. Have to disable the > table. Short answer was no. > Move CPs to another process. A sidecar process is way to go. > The Huawei effort at lightweight would also use CPs (like Phoenix). > Bring the types into hbase, the phoenix types for spark to use etc. > > SECONDARY INDICES > Full support is hard, can we do step-by-step... > Seperate into several steps? > Push back that this is a well covered space. Problems known. Contribs in > tier above welcome. > > One attendee asked where does hbase want to go? Is it storage or a db > system? If the former, then should draw the line and sql, graph, geo, is in > layers above, not integrated. Need to draw a sharp line. Do what we are > good at. > > END-TO-END-ASYNC > Lots of pieces in place now. Last bit is core. Alibaba working on this. > Put request into a Queue, another thread into memory, another thread to > HDFS. Another thread to get result and response to users. > What to do if blocked HDFS? How you stop process from spinning up too many > threads and having too many ongoing requests? Queues would be bounded. > One attendee suggested the async core be done with coroutine. Was asked, > which JDK has coroutine. Answer, the AJDK. Whats that? The Alibaba JDK > (they have their own JDK team). Laughter all around. > > JAVA SUPPORT > We don't support JDK9... JDK10. We'll are bound by HDFS and Spark. > Is there any perf to be had in new JDKs. Answer, some, and yes. Offheaping > will be able to save a copy. Direct I/O. New API for BBs. > > SPARK > Be able to scan hfiles directly. Work to transfer to parquet for spark to > query. > One attendee using the replication for streaming out to parquet. Then > having spark go against that. Talk of compacting into parquet then having > spark query parquet files and for the difference between now and last > compaction, go to hbase api. > > 1. > > https://drive.google.com/file/d/0B4a3E58mCyOfSVJNQklEM0gyQ3VDYV9aMHlqTmdNNWgwQ3Bj/view?usp=sharing > 2. > > https://drive.google.com/file/d/0B4a3E58mCyOfMnF5QWpNTDRkc3M1anRRMEJVSjlBYVhsQm9F/view?usp=sharing >