[ https://issues.apache.org/jira/browse/PHOENIX-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412589#comment-15412589 ]
James Taylor commented on PHOENIX-2995: --------------------------------------- - Is this thread safe? What if the count changes between the get and decrementAndGet()? {code} + int count = this.referenceCount.get(); + if (count>0) { + this.referenceCount.decrementAndGet(); + return new PMetaDataCache(this); + } + else { + return this; + } {code} - This can be improved by not doing multiple gets/containsKey calls. Instead always just do a single put and look at the return value. If non null, just combine the old value and new value (making sure new values appear last). {code} + TableInfo tableInfo = new TableInfo(isDataTable, hTableName, tableRef); + if (!physicalTableMutationMap.containsKey(tableInfo)) { + physicalTableMutationMap.put(tableInfo, Lists.<Mutation>newArrayList()); + } isDataTable = false; - } - if (tableRef.getTable().getType() != PTableType.INDEX) { - numRows -= valuesMap.size(); + physicalTableMutationMap.get(tableInfo).addAll(mutationList); {code} - Perhaps another change you've made circumvents this, but I don't think we can always do a mutations.remove() here as we may get a concurrent modification exception (see previous code). Not positive if we're using numRows anywhere (we had a check before about that I believe). If not using, I suppose we can remove. {code} + if (tableInfo.isDataTable()) { + numRows -= numMutations; + } + // Remove batches as we process them + mutations.remove(origTableRef); {code} - Overall the data structures can be greatly improved here in MutationState. We don't need to use these maps (both at the top level and within the map), but instead can just use arrays. We end up dumping everything into a Map eventually for HBase, so rows at the end would naturally overwrite the earlier rows. I believe I have a separate JIRA for this, but if not I'll file one. > Write performance severely degrades with large number of views > --------------------------------------------------------------- > > Key: PHOENIX-2995 > URL: https://issues.apache.org/jira/browse/PHOENIX-2995 > Project: Phoenix > Issue Type: Bug > Reporter: Mujtaba Chohan > Assignee: Thomas D'Silva > Labels: Argus > Fix For: 4.8.1 > > Attachments: PHOENIX-2995-v2.patch, PHOENIX-2995.patch, > create_view_and_upsert.png, image.png, image2.png, image3.png, upsert_rate.png > > > Write performance for each 1K batch degrades significantly when there are > *10K* views being written in random with default > {{phoenix.client.maxMetaDataCacheSize}}. With all views created, upsert rate > remains around 25 seconds per 1K batch i.e. ~2K rows/min upsert rate. > When {{phoenix.client.maxMetaDataCacheSize}} is increased to 100MB+ then view > does not need to get re-resolved and upsert rate gets back to normal ~60K > rows/min. > With *100K* views and {{phoenix.client.maxMetaDataCacheSize}} set to 1GB, I > wasn't able create all 100K views as upsert time for each 1K batch keeps on > steadily increasing. > Following graph shows 1K batch upsert rate over time with variation of number > of views. Rows are upserted to random views {{CREATE VIEW IF NOT EXISTS ... > APPEND_ONLY_SCHEMA = true, UPDATE_CACHE_FREQUENCY=900000}} is executed before > upsert statement. > !upsert_rate.png! > Base table is also created with {{APPEND_ONLY_SCHEMA = true, > UPDATE_CACHE_FREQUENCY = 900000, AUTO_PARTITION_SEQ}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)