[jira] [Comment Edited] (PHOENIX-3218) First draft of Phoenix Tuning Guide

Peter Conrad (JIRA) Tue, 24 Jan 2017 17:24:03 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836991#comment-15836991
 ]


Peter Conrad edited comment on PHOENIX-3218 at 1/25/17 1:22 AM:
----------------------------------------------------------------

[~elserj] Thanks again for the thorough and thoughful review. I'm working on a 
revision, and I have one question for you:

bq. When using `UPSERT` to write a large number of records, turn off autocommit 
and batch records. Start with a batch size of 1000 and adjust as needed. Here's 
some pseudocode showing one way to commit records in batches:
_Recommend putting a caveat here that the use of commit() by Phoenix to control 
batches of data written to HBase as being "non-standard" in terms of JDBC._

Is this doc the right place to say this? Seems like it would be kind of hidden 
here. The Grammar page mentions commits kind of off-handedly, as does the 
Atomic Upsert page. The Transactions page seems to be the one that kind of sort 
of defines them. But I wonder if the Overview page is the right place to 
clarify this.

... and some follow-on questions for [~apurtell] or [~jamestaylor]:

The doc says:
bq. When specifying machines for HBase, do not skimp on cores; HBase needs them.
Josh Elser says:
_How can this be made into a more concrete recommendation?_
Do we have any hardware recommendations?

The doc says:
bq. Set the `UPDATE_CACHE_FREQUENCY` 
[option](http://phoenix.apache.org/language/index.html#options) to 15 minutes 
or so if your metadata doesn't change very often
Josh Elser says:
_Don't guess, make a concrete recommendation. If 15 minutes isn't a good 
recommendation, let's come up with a good number._

Similar question—what's a more reliable way to determine cache update frequency?

The doc says:
bq. If you regularly scan large data sets from spinning disk, you're best off 
with GZIP (but watch write speed)
Josh Elser says:
_Numbers/reference-material to back this up?_

The doc says:
bq. When deleting a large data set, turn on autoCommit before issuing the 
`DELETE` query so that the client does not need to remember the row keys of all 
the keys as they are deleted.
Josh Elser says:
_Reasoning behind this one isn't clear to me. Batching DELETEs would have the 
same benefit of batching UPSERTs, no? (I may just be missing an implementation 
detail here..._
*Can you help me answer his questions?*


was (Author: pconrad):
[~elserj] Thanks again for the thorough and thoughful review. I'm working on a 
revision, and I have one question for you:
.bq
When using `UPSERT` to write a large number of records, turn off autocommit and 
batch records. Start with a batch size of 1000 and adjust as needed. Here's 
some pseudocode showing one way to commit records in batches:
_Recommend putting a caveat here that the use of commit() by Phoenix to control 
batches of data written to HBase as being "non-standard" in terms of JDBC._

Is this doc the right place to say this? Seems like it would be kind of hidden 
here. The Grammar page mentions commits kind of off-handedly, as does the 
Atomic Upsert page. The Transactions page seems to be the one that kind of sort 
of defines them. But I wonder if the Overview page is the right place to 
clarify this.

... and some follow-on questions for [~apurtell] or [~jamestaylor]:

The doc says:
.bq 
When specifying machines for HBase, do not skimp on cores; HBase needs them.
Josh Elser says:
_How can this be made into a more concrete recommendation?_
Do we have any hardware recommendations?

The doc says:
.bq
Set the `UPDATE_CACHE_FREQUENCY` 
[option](http://phoenix.apache.org/language/index.html#options) to 15 minutes 
or so if your metadata doesn't change very often
Josh Elser says:
_Don't guess, make a concrete recommendation. If 15 minutes isn't a good 
recommendation, let's come up with a good number._

Similar question—what's a more reliable way to determine cache update frequency?

The doc says:
.bq
If you regularly scan large data sets from spinning disk, you're best off with 
GZIP (but watch write speed)
Josh Elser says:
_Numbers/reference-material to back this up?_

The doc says:
.bq
When deleting a large data set, turn on autoCommit before issuing the `DELETE` 
query so that the client does not need to remember the row keys of all the keys 
as they are deleted.
Josh Elser says:
_Reasoning behind this one isn't clear to me. Batching DELETEs would have the 
same benefit of batching UPSERTs, no? (I may just be missing an implementation 
detail here..._
*Can you help me answer his questions?*

> First draft of Phoenix Tuning Guide
> -----------------------------------
>
>                 Key: PHOENIX-3218
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3218
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Peter Conrad
>         Attachments: Phoenix-Tuning-Guide-20170110.md, 
> Phoenix-Tuning-Guide.md, Phoenix-Tuning-Guide.md
>
>
> Here's a first draft of a Tuning Guide for Phoenix performance. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-3218) First draft of Phoenix Tuning Guide

Reply via email to