[ https://issues.apache.org/jira/browse/PHOENIX-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836991#comment-15836991 ]
Peter Conrad edited comment on PHOENIX-3218 at 1/25/17 1:22 AM: ---------------------------------------------------------------- [~elserj] Thanks again for the thorough and thoughful review. I'm working on a revision, and I have one question for you: bq. When using `UPSERT` to write a large number of records, turn off autocommit and batch records. Start with a batch size of 1000 and adjust as needed. Here's some pseudocode showing one way to commit records in batches: _Recommend putting a caveat here that the use of commit() by Phoenix to control batches of data written to HBase as being "non-standard" in terms of JDBC._ Is this doc the right place to say this? Seems like it would be kind of hidden here. The Grammar page mentions commits kind of off-handedly, as does the Atomic Upsert page. The Transactions page seems to be the one that kind of sort of defines them. But I wonder if the Overview page is the right place to clarify this. ... and some follow-on questions for [~apurtell] or [~jamestaylor]: The doc says: bq. When specifying machines for HBase, do not skimp on cores; HBase needs them. Josh Elser says: _How can this be made into a more concrete recommendation?_ Do we have any hardware recommendations? The doc says: bq. Set the `UPDATE_CACHE_FREQUENCY` [option](http://phoenix.apache.org/language/index.html#options) to 15 minutes or so if your metadata doesn't change very often Josh Elser says: _Don't guess, make a concrete recommendation. If 15 minutes isn't a good recommendation, let's come up with a good number._ Similar question—what's a more reliable way to determine cache update frequency? The doc says: bq. If you regularly scan large data sets from spinning disk, you're best off with GZIP (but watch write speed) Josh Elser says: _Numbers/reference-material to back this up?_ The doc says: bq. When deleting a large data set, turn on autoCommit before issuing the `DELETE` query so that the client does not need to remember the row keys of all the keys as they are deleted. Josh Elser says: _Reasoning behind this one isn't clear to me. Batching DELETEs would have the same benefit of batching UPSERTs, no? (I may just be missing an implementation detail here..._ *Can you help me answer his questions?* was (Author: pconrad): [~elserj] Thanks again for the thorough and thoughful review. I'm working on a revision, and I have one question for you: .bq When using `UPSERT` to write a large number of records, turn off autocommit and batch records. Start with a batch size of 1000 and adjust as needed. Here's some pseudocode showing one way to commit records in batches: _Recommend putting a caveat here that the use of commit() by Phoenix to control batches of data written to HBase as being "non-standard" in terms of JDBC._ Is this doc the right place to say this? Seems like it would be kind of hidden here. The Grammar page mentions commits kind of off-handedly, as does the Atomic Upsert page. The Transactions page seems to be the one that kind of sort of defines them. But I wonder if the Overview page is the right place to clarify this. ... and some follow-on questions for [~apurtell] or [~jamestaylor]: The doc says: .bq When specifying machines for HBase, do not skimp on cores; HBase needs them. Josh Elser says: _How can this be made into a more concrete recommendation?_ Do we have any hardware recommendations? The doc says: .bq Set the `UPDATE_CACHE_FREQUENCY` [option](http://phoenix.apache.org/language/index.html#options) to 15 minutes or so if your metadata doesn't change very often Josh Elser says: _Don't guess, make a concrete recommendation. If 15 minutes isn't a good recommendation, let's come up with a good number._ Similar question—what's a more reliable way to determine cache update frequency? The doc says: .bq If you regularly scan large data sets from spinning disk, you're best off with GZIP (but watch write speed) Josh Elser says: _Numbers/reference-material to back this up?_ The doc says: .bq When deleting a large data set, turn on autoCommit before issuing the `DELETE` query so that the client does not need to remember the row keys of all the keys as they are deleted. Josh Elser says: _Reasoning behind this one isn't clear to me. Batching DELETEs would have the same benefit of batching UPSERTs, no? (I may just be missing an implementation detail here..._ *Can you help me answer his questions?* > First draft of Phoenix Tuning Guide > ----------------------------------- > > Key: PHOENIX-3218 > URL: https://issues.apache.org/jira/browse/PHOENIX-3218 > Project: Phoenix > Issue Type: Improvement > Reporter: Peter Conrad > Attachments: Phoenix-Tuning-Guide-20170110.md, > Phoenix-Tuning-Guide.md, Phoenix-Tuning-Guide.md > > > Here's a first draft of a Tuning Guide for Phoenix performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)