There were about 30 of us. I didn't take roll. See photos below [1][2]. PMCers, committers, contributors, and speakers from the day before. There is no attribution of comments or ideas. Please excuse. No agenda.
TESTING What do people do testing? Allan Yang is finding good stuff when he tests AMv2 compared to me. Why? slowDeterministic does more op types than serverKilling. What do others do for testing? Add more variety to the ITBLLs, more chaos? What for performance testing? YCSB. Batch is important. Its what our users do. Recent addition of batch in YCSB (and in PerformanceEvaluation). Size of batch matters too. And number of clients. Alibaba described what they do. Advocate that we all try different test types rather than all do same runs. Need to add new async client into YCSB. Alibaba use it for their testing of new SEDA core (upstreaming soon). Understanding each others benchmarks can take a while. Common understanding takes some effort, communication. New hbase-operation-tools will be good place to put perf and testing tooling. GITHUB Can hbase adopt the github dev flow? Support PRs? Its a case of just starting the discussion on the dev list? Do we lose review/commentary information if we go github route? Brief overview of what is possible w/ the new gitbox repos follows ultimately answering that no, there should be no loss (github comments show as jira comments). Most have github but not apache accounts. PRs are easier. Could encourage more contribution, lower the barrier to contrib. Other tools for hbase-operation-tools would be stuff like the alibaba tooling for shutting down servers... moving regions to new one. PERF ACROSS VERSIONS Lucent (lucene?) has a perf curve on home page with markings for when large features arrived and when releases were cut so can see if increase/decrease in perf. There was a big slowdown going from 0.98 to 1.1.2 hbase. We talked about doing such a perf curve on hbase home page. Would be a big project. Asked if anyone interested? Perhaps a dedicated cluster up on Apache. We could do a whip-around to pay for it. USER FRIENDLY Small/new users have a hard time. Is there a UI for users to see data in cells or to change schema, or to create/drop tables. Is there anything we can do here? Much back and forth. Xiaomi don't let users have access to shell. Have a web ui where you click to build command that is run for you. Afraid that users will mistakenly destroy the database so shudown access. It turns out that most of the bigger players present have some form of UI built against hbase. Alibaba have something. The DiDi folks have howto wiki pages. Talked about upstreaming. Where to put it? hbase-operator-tools? What about Docker file to give devs their own hbase easily. Can throw away when done. One attendee talked of Hue from CDH, how it is good for simple insert and view. Can check the data. For testing and feel-good getting to know system, it helps. Another uses Apache Drill but tough when types. New users need to be able to import data from a csv. How hard to have a few pages of clicky, clicky, wizard to create/drop tables or for small query... A stripped-down version of Hue to come with HBase.... how hard to do this? Next we went over backburner items mention on previous day staring with SQL-like access. What about lightweight SQL support? At Huawei... they have a project going for lightweight SQL support in hbase based-on calcite. For big queries, they'd go to sparksql. Did you look at phoenix? Phoenix is complicated, difficult. Calcite migration not done in Phoenix (Sparksql is not calcite-based). Talk to phoenix project about generating a lightweight artifact. We could help with build. One nice idea was building with a cut-down grammar, one that removed all the "big stuff" and problematics. Could return to the user a nice "not supported" if they try to do a 10Bx10B join. An interesting idea about a facade query analyzer making transfer to sparksql if big query. Would need stats. COPROCESSORS Can we add some identifiers to distinguish whether request from CP or from client. Can we calculate stats on CP resources used? Limit? Can we update CPs more gracefully. If heavy usage, when update. Have to disable the table. Short answer was no. Move CPs to another process. A sidecar process is way to go. The Huawei effort at lightweight would also use CPs (like Phoenix). Bring the types into hbase, the phoenix types for spark to use etc. SECONDARY INDICES Full support is hard, can we do step-by-step... Seperate into several steps? Push back that this is a well covered space. Problems known. Contribs in tier above welcome. One attendee asked where does hbase want to go? Is it storage or a db system? If the former, then should draw the line and sql, graph, geo, is in layers above, not integrated. Need to draw a sharp line. Do what we are good at. END-TO-END-ASYNC Lots of pieces in place now. Last bit is core. Alibaba working on this. Put request into a Queue, another thread into memory, another thread to HDFS. Another thread to get result and response to users. What to do if blocked HDFS? How you stop process from spinning up too many threads and having too many ongoing requests? Queues would be bounded. One attendee suggested the async core be done with coroutine. Was asked, which JDK has coroutine. Answer, the AJDK. Whats that? The Alibaba JDK (they have their own JDK team). Laughter all around. JAVA SUPPORT We don't support JDK9... JDK10. We'll are bound by HDFS and Spark. Is there any perf to be had in new JDKs. Answer, some, and yes. Offheaping will be able to save a copy. Direct I/O. New API for BBs. SPARK Be able to scan hfiles directly. Work to transfer to parquet for spark to query. One attendee using the replication for streaming out to parquet. Then having spark go against that. Talk of compacting into parquet then having spark query parquet files and for the difference between now and last compaction, go to hbase api. 1. https://drive.google.com/file/d/0B4a3E58mCyOfSVJNQklEM0gyQ3VDYV9aMHlqTmdNNWgwQ3Bj/view?usp=sharing 2. https://drive.google.com/file/d/0B4a3E58mCyOfMnF5QWpNTDRkc3M1anRRMEJVSjlBYVhsQm9F/view?usp=sharing