[ https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Lawlor updated HBASE-11544: ------------------------------------ Attachment: HBASE-11544-v1.patch Hey folks, I've been working on this issue and I am attaching a patch of what I have so far. Below I have included some discussion points that would be great to get some feedback on: A few issues were encountered while implementing a solution for this problem. The issues, as well as their current solutions, are outlined below (any feedback on alternative ways to solve these problems would be appreciated): * In some cases, the concept of partial results doesn't seem appropriate. In these cases, I ensured that partial results would not be created as it would only hurt performance or cause confusion. The cases where I felt partial results should be avoided were: ** When the client has defined a filter for their scan that requires the entire row to be read. ** When the client has specified that the scan is a Small scan. Small scans are designed to execute in a single RPC request and so the idea of having to make multiple RPC requests to form the complete Result seems inappropriate * When I changed the default value of caching to Integer.MAX_VALUE I was running into OOME on the server since caching is used to presize the ArrayList that holds results. A simple solution to this is to simply not set an initial size on the array list. However, this solution may still run into memory issues if the ArrayList must expand the underlying array many times (e.g. if the table being scanned has many small rows leading to a large amount of Results in the array list). I was wondering what everyone thought of the simple solution. If a more sophisticated solution is required it may be best to move the caching change into its own JIRA. * When combining the partial results into a single complete result on the client side, an exception will be thrown from within ResultScanner#next() if it is found that the partial results belong to different rows. This is a corner case issue that should never show up since sequence numbers are already used in each RPC request to ensure proper ordering of request/responses but I figured it is worth mentioning The fine grained details of implementation can be seen in the patch, but I thought it would be worth highlighting how this new partial result workflow can be used to avoid OOME on the server: * The setting of Scan#setMaxResultSize will now operate at the cell level rather than the row level. This allows a client to retrieve very large rows in fragments/partials that would previously cause the server to OOME. By default, the entire complete result will only be formed on the client side, whereas the server will only ever see partial Results for very large rows. * A new option (Scan#setAllowPartials) has been added to Scans to allow the client to see the partial results returned by the server. This setting will be useful in cases where the client would OOME if they were forced to reconstruct the complete result. * If clients want to utilize this partial result workflow, they should use non-filtered, non-small scans (see issues above for reasoning). Areas for future improvement: * As [~lhofhansl] has pointed out, RPC is inefficient and could be improved by prefetching results server side. This issue has been raised in HBASE-12994 * As called out in the issues above, the initial sizing of the ArrayList on the server side seems like it could be improved to avoid resizing of the underlying array * Streaming is the most ideal workflow for RPC requests but will require a large rework Any feedback on the patch would be greatly appreciated. I am expecting the QA run to come back with some test failures which I will address in a subsequent patch. I'm pinging [~lhofhansl] and [~stack] as we were discussing this solution above, but if anyone else has any feedback it would be appreciated as well! Thanks > [Ergonomics] hbase.client.scanner.caching is dogged and will try to return > batch even if it means OOME > ------------------------------------------------------------------------------------------------------ > > Key: HBASE-11544 > URL: https://issues.apache.org/jira/browse/HBASE-11544 > Project: HBase > Issue Type: Bug > Reporter: stack > Assignee: Jonathan Lawlor > Priority: Critical > Labels: beginner > Attachments: HBASE-11544-v1.patch > > > Running some tests, I set hbase.client.scanner.caching=1000. Dataset has > large cells. I kept OOME'ing. > Serverside, we should measure how much we've accumulated and return to the > client whatever we've gathered once we pass out a certain size threshold > rather than keep accumulating till we OOME. -- This message was sent by Atlassian JIRA (v6.3.4#6332)