Looks like you folks had a good time there. I wish I could've made it. Good write-up too.
Thanks. Jerry On Mon, Aug 7, 2017 at 10:38 AM, ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com> wrote: > Thanks for the write up Stack. I could not make it to Shenzhen. Nice to > know the conference and meet up went great. > > Regards > Ram > > On Mon, Aug 7, 2017 at 9:36 PM, Stack <st...@duboce.net> wrote: > >> At fancy Huawei headquarters, 10:00-12:00AM or so (with nice coffee and >> fancy little cake squares provided about half way through the session). >> >> For list of attendees, see picture at end of this email. >> >> Discussion was mostly in Chinese with about 25% in English plus some >> gracious sideline translation so the below is patchy. Hopefully you get the >> gist. >> >> For client-side scanner going against hfiles directly; is there a means of >> being able to pass the permissions from hbase to hdfs? >> >> Issues w/ the hbase 99th percentile were brought up. "DynamoDB can do >> 10ms". How to do better? >> >> SSD is not enough. >> >> GC messes us up. >> >> Will the Distributed Log Replay come back to help improve MTTR? We could >> redo on new ProcedureV2 basis. ZK timeout is the biggest issue. Do as we >> used to and just rely on the regionserver heartbeating... >> >> Read replica helps w/ MTTR. >> >> Ratis incubator project to do a quorum based hbase? >> >> Digression on licensing issues around fb wangle and folly. >> >> Redo of hbase but quorum based would be another project altogether. >> >> Decided to go around the table to talk about concerns and what people are >> working on. >> >> Jieshan wondered what could be done to improve OLAP over hbase. >> >> Client side scanner was brought up again as means of skipping RS overhead >> and doing better OLAP. >> >> Have HBase compact to parquet files. Query parquet and hbase. >> >> At Huawei, they are using 1.0 hbase. Most problems are assignment. They >> have .5M regions. RIT is a killer. Double assignment issues. And RIT. They >> run their own services. Suggested they upgrade to get fixes at least. Then >> 2.0. >> >> Will HBase federate like HDFS? Can Master handle load at large scale? It >> needs to do federation too? >> >> Anyone using Bulk loaded replication? (Yes, it just works so no one talks >> about it...) >> >> Request that fixes be backported to all active branches, not just most >> current. >> >> Andrew was good at backporting... not all RMs are. >> >> Too many branches. What should we do? >> >> Proliferation of branches makes for too much work. >> >> Need to cleanup bugs in 1.3. Make it stable release now. >> >> Lets do more active EOL'ing of branches. 1.1?. >> >> Hubert asked if we can have clusters where RS are differently capable? >> i.e. several generations of HW all running in the same cluster. >> >> What if fat server goes down. >> >> Balancer could take of it all. RS Capacity. Balancer can take it into >> account. >> Regionserver labels like YARN labels. Characteristics. >> >> Or run it all in docker when heterogeneous cluster. The K8 talk from day >> before was mentioned; we should all look at being able to deploy in k8 and >> docker. >> >> Lets put out kubernetes blog...(Doing). >> >> Alibaba looking at HBase as native YARN app. >> >> i/o is hard even when containers. >> >> Use autoscaler of K8 when heavy user. >> >> Limit i/o use w/ CP. Throttle. >> >> Spark and client-side scanner came up again. >> >> Snapshot input format in spark. >> >> HBase federation came up again. jd.com talking of 3k to 4k nodes in a >> cluster. Millions of regions. Region assignment is messing them up. >> >> Maybe federation is good idea? Argument that it is too much operational >> conplexity. Can we fix master load w/ splittable meta, etc? >> >> Was brought up that even w/ 100s of RS there is scale issue, nvm thousands. >> >> Alibaba talked about disaster recovery. Described issue where HDFS has >> fencing problem during an upgrade. There was no active NN. All RS went down. >> ZK is another POF. If ZK is not available. Operators were being asked how >> much longer the cluster was going to be down but they could not answer the >> question. No indicators from HBase on how much longer it will be down or >> how many WALs its processed and how many more to go. Operator unable to >> tell his org how long it would be before it all came back on line. Should >> say how many regions are online and how many more to do. >> >> Alibaba use SQL to lower cost. HBase API is low-level. Row-key >> construction is tricky. New users make common mistakes. If you don't do >> schema right, high-performance is difficult. >> >> Alibaba are using a subset of Phoenix... simple sql only; throws >> exceptions if user tries to do joins, etc.., anything but basic ops. >> >> HareQL is using hive for meta store. Don't have data typing in hbase. >> >> HareQL could perhaps contribute some piece... or a module in hbase to >> sql... From phoenix? >> >> Secondary index. >> >> Client is complicated in phoenix. Was suggested thin client just does >> parse... and then offload to server for optimization and execution. >> >> Then secondary index. Need transaction engine. Consistency of secondary >> index. >> >> We adjourned. >> >> Your dodgy secretary, >> St.Ack >> P.S. Please add to this base set of notes if I missed anything. >> >> >> >>