[ https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361862#comment-15361862 ]
Ishan Chattopadhyaya edited comment on SOLR-5944 at 7/5/16 12:48 AM: --------------------------------------------------------------------- New patch fixing all nocommits. Still a few additional tests, which Hoss mentioned, are TODO. Here's a stab at replying to Hoss' comments (Maybe I'll keep updating this comment itself as and when I fix some of the TODO items here): {panel:title=JettySolrRunner} * javadocs, javadocs, javadocs {color:green}[FIXED]{color} {panel} {panel:title=XMLLoader + JavabinLoader} * why is this param checks logic duplicated in these classes? {color:red}[Not sure what you mean here, I just set the prevVersion to the cmd here now]{color} * why not put this in DUP (which already has access to the request params) when it's doing it's "FROMLEADER" logic? {color:red}[Since commitWithin and overwrite was being set here, I thought this is an appropriate place to set the prevVersion to the cmd]{color} {panel} {panel:title=AddUpdateCommand} * these variables (like all variables) should have javadocs explaining what they are and what they mean {color:green}[FIXED]{color} ** people skimming a class shouldn't have to grep the code for a variable name to understand it's purpose * having 2 variables here seems like it might be error prone? what does it mean if {{prevVersion < 0 && isInPlaceUpdate == true}} ? or {{0 < prevVersion && isInPlaceUpdate == false}} ? {color:green}[FIXED: Now just have one variable]{color} ** would it make more sense to use a single {{long prevVersion}} variable and have a {{public boolean isInPlaceUpdate()}} that simply does {{return (0 < prevVersion); }} ? {color:green}[FIXED]{color} {panel} {panel:title=TransactionLog} * javadocs for both the new {{write}} method and the existig {{write}} method {color:green}[FIXED]{color} ** explain what "prevPointer" means and note in the 2 arg method what the effective default "prevPoint" is. * we should really have some "int" constants for refering to the List indexes involved in these records, so instead of code like {{entry.get(3)}} sprinkled in various classes like UpdateLog and PeerSync it can be smething more readable like {{entry.get(PREV_VERSION_IDX)}} {color:red}[TODO]{color} {panel} {panel:title=UpdateLog} * javadocs for both the new {{LogPtr}} constructure and the existing constructor {color:green}[FIXED]{color} ** explain what "prevPointer" means and note in the 2 arg constructure what the effective default "prevPoint" is. {color:green}[FIXED]{color} * {{add(AddUpdateCommand, boolean)}} ** this new code for doing lookups in {{map}}, {{prevMap}} and {{preMap2}} seems weird to me (but admitedly i'm not really an expert on UpdateLog in general and how these maps are used ** what primarily concerns me is what the expected behavior is if the "id" isn't found in any of these maps -- it looks like prevPointer defaults to "-1" regardless of whether this is an inplace update ... is that intentional? ... is it possible there are older records we will miss and need to flag that? [Yes, this was intentional, and I think it doesn't make any difference. If an "id" isn't found in any of these maps, it would mean that the previous update was committed and should be looked up in the index. ] ** ie: do we need to worry about distinguising here between "not an in place update, therefore prePointer=-1" vs "is an in place update, but we can't find the prevPointer" ?? [I think we don't need to worry. Upon receiving a prevPointer=-1 by whoever reads this LogPtr, it should be clear why it was -1: if the command's {{flags|UpdateLog.UPDATE_INPLACE}} is set, then this command is an in-place update whose previous update is in the index and not in the tlog; if that flag is not set, it is not an in-place update at all, and don't bother about the prevPointer value at all (which is -1 as a dummy value).] ** assuming this code is correct, it might be a little easier to read if it were refactored into something like:{code} // nocommit: jdocs private synchronized long getPrevPointerForUpdate(AddUpdateCommand cmd) { // note: sync required to ensure maps aren't changed out form under us if (cmd.isInPlaceUpdate) { BytesRef indexedId = cmd.getIndexedId(); for (Map<BytesRef,TransactionLog> currentMap : Arrays.asList(map, prevMap, prevMap2)) { LogPtr prevEntry = currentMap.get(indexedId); if (null != prevEntry) { return prevEntry.pointer; } } } return -1; // default when not inplace, or if we can't find a previous entry } {code} [FIXED: Refactored into something similar to above] * {{applyPartialUpdates}} ** it seems like this method would be a really good candidate for some direct unit testing? [TODO] *** ie: construct a synthetic UpdateLog, and confirm applyPartialUpdates does the right thing ** the sync block in this method, and how the resulting {{lookupLogs}} list is used subsequently, doesn't seem safe to me -- particularly the way {{getEntryFromTLog}} calls incref/decref on each TransactionLog as it loops over that list... *** what prevents some other thread from decref'ing one of these TransactionLog objects (and possibly auto-closing it) in between the sync block and the incref in getEntryFromTLog? **** (most existing usages of TransactionLog.incref() seem to be in blocks that sync on the UpdateLog -- and the ones that aren't in sync blocks look sketchy to me as well) *** in general i'm wondering if {{lookupLogs}} should be created outside of the while loop, so that there is a consistent set of "logs" for the duration of the method call ... what happens right now if some other thread changes tlog/prevMapLog/prevMapLog2 in between iterations of the while loop? ** shouldn't we make some sanity check assertions about the results of getEntryFromTLog? -- there's an INVALID_STATE if it's not an ADD or a list of 5 elements, but what about actually asserting that it's either an ADD or an UPDATE_INPLACE? ... {color:green}[FIXED]{color} what about asserting the doc's uniqueKey matches? [We could do that, but I think it is not necessary] *** (because unless i'm missing something, it's possible for 2 docs to have the same version, so if there is a glitch in the pointer we can't just trust the version check can we?) [I think we can trust a document to be of the same id if the version matches. It is possible for 2 docs to have same version, but then they would have to be from two different shards altogether. Since all of this processing is happening within a particular replica (which obviously belongs to only one shard), I think we can get away safely without asserting the id and just relying on the version.] ** {{partialUpdateEntry}} seems like a missleading variable name ... can't it be *either* a full document, or partial update (not in place), or an UPDATE_INPLACE partial update? [FIXED: calling it entry now] ** if there is only 1 way to {{break}} out of this while loop, then the method would probably be easier to make sense of if the {{applyOlderUpdate}} and {{return 0}} calls replaced the {{break}} statement {color:green}[FIXED]{color} ** semi-related: {{while (true)}} is generally a red flag: it seems like might be better if it was refactored inot a {{while (0 <= prevPointer)}} loop? * {{getEntryFromTLog}} {color:green}[FIXED]{color} ** I don't really see the point of using {{get\(i\)}} over and over .. why not a simple {{for (TransactionLog log : lookupLogs)}} ? {color:green}[FIXED]{color} ** why is the Exception & Error handling here just a debug log? shouldn't that be a pretty hard fail? [This shouldn't be a hard fail. This is basically a seek operation on a transaction log at the specified position. If the seek results in an exception due to unable to deserialize the tlog entry, we can ignore it, since it just means we were looking up the wrong tlog. I have added a comment to this effect in the catch block.] ** as mentioned above re: {{applyPartialUpdates}}, isn't is possible for 2 diff docs to have the same version? if we get unlucky and those 2 docs, with identical versions, at the same position in diff TransactionLog files then isn't a sanity check of the doc ID in {{applyPartialUpdates}} too late? ... if {{applyPartialUpdates}} tries again it's just going to keep getting the same (wrong) document. It seems like this method actaully needs to know the ID it's looking for, and "skip" any entries thta don't match, checking the next (older) {{TransactionLog}} [I think this is not a likely scenario, since in a given replica, a version should be unique to a document (I think we have bigger problems if this assumption isn't true).] * {{lookupPartialUpdates}} ** what is the purpose/intent of this method? ... it seems to be unused. [FIXED: Removed] * {{doReplay}} ** switch statement ordering... [FIXED: I'll add it to my knowledge!] *** in theory, switch statement cases should be ordered from most likeley to least likely (it's a microoptimization, but in heavily executed loops it might be worth it) *** so i wouldn't inject UPDATE_INPLACE at the begining of the switch -- it should definitely come after ADD, probably best to put it at the end of the list ** why is {{entry.get(2)}} commented out? and why does it say "idBytes" ? ... isn't slot #2 the prevPointer? copy paste confusion from "ADD" ? [FIXED: True, it was copy-paste confusion from ADD. Removed the commented out line.] *** if slot#2 really isn't needed in this code, get rid of the missleading comment about idBytes and replace it with an explicit comment that prevVersion isn't needed for reply. [FIXED: I have removed the spurious commented out lines, refactored that part into a updateCommandFromTlog() method. Does it address your concern here?] {panel} {panel:title=PeerSync} * ditto comments about switch statement ordering from above comments about {{UpdateLog}} {color:green}[FIXED]{color} * a lot of code here looks duplicated straight from {{UpdateLog.doReplay}} ** I realize that's true for the other case values as well, but bad code shouldn't be emulated ** lets refactor this duplicated code into a new {{public static AddUpdateCommand updateCommandFromTlog(List tlogEntry)}} method in {{UpdateLog}} and re-use it here. {color:green}[FIXED]{color} * {{log.info}} looks wrong here ... especially inside of an {{if (debug)}} block ... pretty sure this should be {{log.debug}} like the other case blocks {color:green}[FIXED]{color} {panel} {panel:title=DirectUpdateHandler2} * I don't really understand why we need any schema/DocValue checks here? [This was unnecessary and I've removed it. I have done a somewhat related refactoring to the AddUpdateCommand.getLuceneDocument(boolean isInPlace) to now only generate a Lucene document that has docValues. This was needed because the lucene document that was originally being returned had copy fields targets of id field, default fields, multiple Field per field (due to FieldType.createFields()) etc., which are not needed for in-place updates.] * If {{cmd.isInPlaceUpdate}} is true, then doesn't that mean the update is by definition an in place update that only contains values for DV fields? ... wasn't it the responsibility of the upstream code that set that value to true to ensure that? [True, it was. However, copy field targets of id field, default fields etc. were added in this doNormalAdd() method itself due to cmd.getLuceneDocument(); I have overloaded that method to tackle the needs of in-place updates and filter out such unnecessary fields being added to the lucene doc] ** if not, why can't it be? (ie: why can't we move the schema checks upstream, and out of the sync block here? * if that's not what {{cmd.isInPlaceUpdate}} means, then why isn't there any error handling here in the event that non-DV field/values are found in the luceneDocument? ... doesn't that mean we need to fall back to the original {{writer.updateDocument}} call? * If it is neccessary to do SchemaField validation here for some reason, then shouldn't it be the exact same validation done in {{AtomicUpdateDocumentMerger.isSupportedForInPlaceUpdate}} ? [We should do all schema validation there only, I have removed everything from here, except for some filtering logic at cmd.getLuceneDocument()] {panel} {panel:title=AtomicUpdateDocumentMerger} * {{isSupportedForInPlaceUpdate}} ** shouldn't this method either be static or final? the rules don't change if someone subclasses AtomicUpdateDocumentMerger do they? ** why isn't {{TrieDateField}} also valid? ... could this just be checking for {{instanceof TrieField}} ? *** particularly suspicious since {{doInPlaceUpdateMerge}} does in fact check {{instanceof TrieField}} ** if the intent is truely to only support "numerics" then, why not {{instanceof NumericValueFieldType}} ? ** shouldn't we also check "not-indexed" and "not-stored" here as well? * {{doInPlaceUpdateMerge}} ** why does this method have 3 args? can't all of the neccessary info be deterined from the {{AddUpdateCommand}} ? ** this method seems like another good candidate for some explicit unit testing... *** build up an index & tlog with some explicitly crated non trivial docs/updates, then call this method with a variety of inputs and assert the expected modifications to the {{AddUpdateCommand}} (or assert no modifications if they aren't viable in place update candaites *** then hard commit everything in the tlog and assert that all the same calls return the exact same output/modifications. ** we should probably continue to assume the common case is _not_ to need (in-place) updates ... just regular adds. in which case anything we can do to short circut out faster -- before any checks that require stuff like {{SchemaField}} -- wouldn't reordering the checks in the loop to something like this be equivilent to what we have currently but faster in the common case? ...{code} for (SolrInputField f : sdoc) { final String fname = f.getName(); if (idField.getName().equals(fname)) { continue; } if (! f.getValue() instanceof Map) { // doc contains field value that is not an update, therefore definitely not an inplace update return false; } if (!isSupportedForInPlaceUpdate(schema.getField(fname))) { // doc contains update to a field that does not support in place updates return false; } } {code} *** Even if i've overloked something and that code isn't better, i think in general the "is this sdoc a candidate for in place updating" logic should be refactored into a public static helper method that has some unit tests. ** this method calls {{RealTimeGetComponent.getInputDocumentFromTlog}} but doesn't check for the special {{DELETED}} return value... *** definitely smells like a bug ... there are many assumptions made about {{uncommittedDoc}} as long as it's non-null *** based on this, now i really want to see more testing of in place updates mixed with document deletions: **** some explicit single node testing of "add docX, delete docX, do a 'set':'42' on docX" **** introduce some randomized deleteById calls into the randomized single/multi node tests ** this method calls {{RealTimeGetComponent.getVersionFromTlog}} which has docs that say "If null is returned, it could still be in the latest index." *** i don't see any code accounting for that possibility, cmd.prevVersion is just blindly assigned the null in that case ... which could lead to an NPE since it's declared as {{public long prevVersion}} *** the fact that this hasn't caused an NPE in testing is a red flag that there is a code path not being tested here ... but at a glance i can't really tell what it is? ** In general, I guess I don't really understand why this method is jumping through as many hoops as it is with the RTG code? *** it seems to duplicate a lot of functionality already in {{RealTimeGetComponent.getInputDocument}} ... why not just use that method? *** if the concern is avoiding the {{searcher.doc(docid)}} call to get all stored fields, perhaps {{RealTimeGetComponent.getInputDocument}} could be refactored to make it easier to re-use most of the functionality here? ... not sure what would make the most sense off the top of my head. *** at a minimum it seems like using {{SolrIndexSearcher.decorateDocValueFields(...)}} would make more sense then doing it ourselves as we loop over the fields -- we can even pass in the explicit list of field names we know we care about based on the SolrInputDocument (or even just the field names we know use "inc" if that's all we really need) **** (Or is there something i'm missing about why using {{decorateDocValueFields}} would be a mistake here?) ** in the switch statement, shouldn't we be using the {{doSet}} and {{doInc}} methods to actaully cary out the operations? *** that would simplify the "inc" case a lot ** the {{default}} on the switch statement looks sketchy to me ... i understand that only "inc" and "set" are supported, but why does this method only {{warn}} if it sees something else? shouldn't this be a hard failure? *** for that matter: shouldn't the {{instanceof Map}} check when looping over the fields at the begining of the method short circut out if the Map contains an operation that isn't one of the supported "in place update" operations? *** in fact: if we pre-checked the Maps only contained "set" and "inc", and used something like {{decorateDocValueFields}} (or did the equivilent ourselves in a smaller loop) then couldn't we also simplify this method a lot by just delegating to the existing {{merge(SolrInputDocument,SolrInputDocument)}} method? ** these assumptions seem sketchy, if that's the only reason for these "else" blocks then let's include some {{asert fieldName.equals(...)}} calls to prove it...{code} } else { // for version field, which is a docvalue but there's no set/inc operation ... } } else { // for id field ... {code} *** in particluar i'm curious about the {{VERSION_FIELD}}... **** this method is only called from one place -- {{DistributedUpdateProcessor.getUpdatedDocument}} -- and in the existing code of that method, when a {{SolrInputDocument}} is fetched from {{RealTimeGetComponent}}, the {{VERSION_FIELD}} is explicitly removed from it before using it & returning. **** should this method also be explicitly removing the {{VERSION_FIELD}} field? and/or should the caller ({{getUpdatedDocument}}) be removing it consistently before returning? {panel} {panel:title=RealTimeGetComponent} * {{process}} ** I like this {{SearcherInfo}} refactoring, but a few suggestions: *** it should be promoted into a (private) static (inner) class ... no need for a new class instance every time {{RealTimeGetComponent.process}} is called. *** add a one arg constructor and pass the SolrCore to that. *** javadocs, javadocs, javadocs .... note that it's not thread safe *** let's make {{searcherHolder}} and {{searcher}} private, and replace direct {{searcher}} access with:{code} public SolrIndexSearcher getSearcher() { assert null != searcher : "init not called!"; return searcher; } {code} ** in the switch statement, it seems like there is a lot of code duplicated between the {{ADD}} and {{UPDATE_INPLACE}} cases *** why not consolidate those cases into one block of code using (a modified) {{resolveFullDocument}} which can start with a call to {{toSolrDoc(...)}} and then return immediately if the entry is {{UpdateLog.ADD}} ? * {{resolveFullDocument}} ** see comments above about modifying this method to call {{toSolrDocument}} itself rather then expecting as input, and return early if the {{entry}} is an {{UpdateLog.ADD}} ** let's put an {{assert 0==lastPrevPointer}} in this else block in case someone improves/breaks {{ulog.applyPartialUpdates}} to return {{-2}} in the future...{code} } else { // i.e. lastPrevPointer==0 {code} ** since {{ulog.applyPartialUpdates(...)}} is a No-Op when {{prevPointer == -1}}, can't we remove the redundent calls to {{mergePartialDocWithFullDocFromIndex}} & {{reopenRealtimeSearcherAndGet}} before and after calling {{ulog.applyPartialUpdates(...)}} ... ie:{code} long prevPointer = (long) logEntry.get(2); long prevVersion = (long) logEntry.get(3); // this is a No-Op if prevPointer is already negative, otherwise... // get the last full document from ulog prevPointer = ulog.applyPartialUpdates(idBytes, prevPointer, prevVersion, partialDoc); if (-1 == prevPointer) { ... } else if (0 < prevPointer) { ... } else { assert 0 == prevPointer; ... } {code} *** If there is some reason i'm not seeing why it's important to call {{mergePartialDocWithFullDocFromIndex}} & {{reopenRealtimeSearcherAndGet}} before calling {{ulog.applyPartialUpdates}}, then perhaps we should at least refactor the "if mergedDoc == null, return reopenRealtimeSearcherAndGet" logic into {{mergePartialDocWithFullDocFromIndex}} since that's the only way it's ever used. * {{mergePartialDocWithFullDocFromIndex}} ** since this is a private method, what's the expected usage of {{docidFromIndex}} ? ... it's never used, so can we refactor it away? ** see previous comment about refactoring this method to automatically return {{reopenRealtimeSearcherAndGet(...)}} when it would otherwise return null * {{reopenRealtimeSearcherAndGet}} ** javadocs, javadocs, javadocs ** since this method is only used in conjunction with {{mergePartialDocWithFullDocFromIndex}}, if the code is refactored so that {{mergePartialDocWithFullDocFromIndex}} calls this method directly (see suggestion above), we could make a small micro-optimization by changing the method sig to take in a {{Term}} to (re)use rather then passing in {{idBytes}} and calling {{core.getLatestSchema().getUniqueKeyField()}} twice. ** re: the {{INVALID_STATE}} ... is that really a fatal error, or should this method be accounting for the possibility of a doc that has been completley deleted (or was never in the index) in a diff way? * {{getInputDocumentFromTlog}} ** lets put an explicit comment here noting that we are intentionally falling through to the {{Updatelog.ADD}} case * {{getVersionFromTlog}} ** based on it's usage, should this method be changed to return {{-1}} instead of null? ... not clear to me from the given caller usage ... if so should be be declared to return {{long}} instead of {{Long}} ? {panel} {panel:title=DistributedUpdateProcessor} * Similar question from AddUpdateCommand: do we really need 2 distinct params here, or would it be cleaner / less error-prone to have a single {{distrib.inplace.prevversion}} which indicates we're doing an inplace update if it's a positive # ? * {{versionAdd}} ** "Something weird has happened" ... perhaps {{waitForDependentUpdates}} should return the {{lastFoundVersion}} so we can use it in this error msg? ... "Dependent version not found after waiting: ..." ** "TODO: Should we call waitForDependentUpdates() again?" ... i don't see how that would help? if we get to this point it definitely seems like a "WTF?" fail fast situation. ** the way the existing code in this method has been refactored into an "else" block (see comment: {{// non inplace update, i.e. full document update}}) makes sense, but the way the {{Long lastVersion}} declaration was refactored _out_ of that block to reuse with the "if isInPlaceUpdate" side of things is a bit confusing and doesn't seem to actaully simplify anything... *** It's not used after the if/else block, so there's no reason to declare it before the "if" statement *** by definition it must be null in the "else" case, so {{if (lastVersion == null)}} will also be true in that code path *** it seems simpiler to just let both the "if" and "else" branches declare/define their own "Long lastVersion" and not risk any confusion about why that variable needs to be "shared" * {{waitForDependentUpdates}} ** javadocs, javadocs, javadocs ** {{fetchFromLeader}} is always true? why not eliminate arg? ** I'm not a concurrency expert, but is this control flow with the sync/wait actaully safe? ... my understanding was the conditional check you're waiting on should always be a loop inside the sync block? ** even if there are no spurious returns from wait, logging at "info" level every 100ms is excessive ... logging that often at trace seems excessive. *** why don't we log an info once at the start of method (since our of order updates should be rare) and once at debug anytime lastFoundVersion changes? (diff log message if/when lastFoundVersion is == or > prev) *** the "Waiting attempt" counter "i" doesn't seem like useful info to log given how {{wait(...)}} works ** "...Dropping current update." - that log msg seems missleading, this method doesn't do anything to drop the current update, it just assumes the current update will be droped later ** i don't really see the point of the {{boolean foundDependentUpdate}} variable... why not change the only place where it's set to {{true}} to return immediately? ** {{fetchMissingUpdateFromLeader}} can return null, but that possibility isn't handled here. ** {{if (uc instanceof AddUpdateCommand)}} ... what if it's not? *** currently it's just silently ignored *** is this a viable scenerio that needs accounted for, or an exceptional scenerio that should have error checking? *** looks like maybe it's just a confusing way to do a null check? ** {{((System.nanoTime()-startTime)/1000000 )}} ... that's a missleading thing to include in the Exception *** we didn't wait that long, we waited at most 5 seconds -- who knows how long we spent calling {{fetchMissingUpdateFromLeader}} & executing it. * {{fetchMissingUpdateFromLeader}} ** javadocs, javadocs, javadocs ** should we really be constructing a new HttpSolrClient on the fly like this? ** is this on the fly SolrClient + GenericSolrRequest going to work if/when the security auth/acl features are in use in the solr cluster? ** this seems to assume the node that forwarded us the _current_ request (via {{get(DISTRIB_FROM)}} is still the leader -- but what if there was a leader election? *** if the leader failed, and a new one was elected, isn't that a pretty viable/likeley reason why {{waitForDependentUpdates}} timed-out and needed to call {{fetchMissingUpdateFromLeader}} in the first place? ** {{e.printStackTrace();}} ... huge red flag that serious error handling is missing here. ** This method seems like it expects the possibility that {{missingUpdates}} will contain more then one entry, and if it does contain more then one entry it will convert/copy *all* of them into the {{updates}} list -- but then it will completley ignore all but the first one. *** if we don't expect more then 1, why not assert that? *** if we expect more then one, and they are all important, why don't we return {{List<UpdateCommand>}} ? *** if we expect more then one, but only care about one in particularly -- why loop over all of them? **** how do we know for sure which one in the list is the one we care about? ** ditto comments about {{PeerSync}} & {{UpdateLog}} and creating a {{public static AddUpdateCommand updateCommandFromTlog(List tlogEntry)}} somwhere that can be reused here as well ** switch statement should have some sort of {{default}} case ... even if i'ts just to throw an error because anything but an {{ADD}} or {{UPDATE_INPLACE}} is impossible ** need to future proof this code against the posibility of other stuff down the road {panel} And here are some general overall comments / impressions I got while reviewing the code and then edited up once i was all finished... {panel} * Given that this patch requires some non-trivial changes to the types of records that go in the update log, and requires corisponding changes to {{PeerSync}}, it seems like there should definitely be some very explicit testing of log reply and peer sync ** ie: {{TestReplay}} and {{PeerSyncTest}} should be updated to include a variety of scenerios involving in-place updates * after seeing how complex & hairy some of the new code has to be around the diff handling of "in-place atomic updates", vs existing "atomic updates" (that aren't in place) It seems like we should definitely have more test code that mixes and matches diff types of "updates" ** static, non randomized, examples of explicit tests we should definitely have... *** a doc gets a sequence of atomic updates each containing multiple "in place" inc/set ops on diff fields *** a doc gets a sequence of atomic updates each containing multiple inc/set ops, where a single update may have a mix of "in place" vs "not in place" eligable ops (due to diff fields having diff docvalue/stored settings) ** our randomized (cloud and non-cloud) testing of in-place updates should also include updates to the canidate docs that may ocasionally not be viable "in-place" updates (because they involved updates to additional non-dv fields) ** in all of these tests we should be checking that both the RTG and regualr search results make sense * we also need a lot more testing of various {{deleteById}} and {{deleteByQuery}} commands mixed with in-place atomic updates ** both deleteByQuerys against the DV fields used in the in-place atomic updates as well as DBQs against other fields in the documents ** test the results of (uncommited) RTG as well as searches when these deletes are intermixed with in-place atomic updates ** test the results of peer sync and reply when deletes are mixed with in-place atomic updates. ** test that we correctly get {{409}} error codes when trying to do in-place updates w/optimistic concurrency after a delete (and vice versa: DBQ/dID afte in-place update) * all logging needs heavily audited ** there's a lot of logging happening at the {{info}} and {{debug}} level that should probably be much lower. ** likewise there may be a few existing {{info}} or {{debug}} msgs that might be good candidates for {{warn}} or {{error}} level msgs. * uniqueKey and Exception/log msgs ** there is a lot of log msgs or Exception msgs that cite a version# when reporting a problem, but don't include the uniqueKey of the document involved ** these messages aren't going to be remotely useful to end users w/o also including the (human readable) uniqueKey of the document involved. * it feels like we now have a really large number of methods involved in the merging/combining/converting of documents to apply atomic updates ("in place" or otherwise) ... either for use in RTG, or for use when writing updates to disk, or from reading from the tlog, etc... ** the ones that jump out at me w/o digging very hard...{noformat} RealTimeGetComponent.resolveFullDocument RealTimeGetComponent.toSolrDoc RealTimeGetComponent.mergePartialDocWithFullDocFromIndex UpdateLog.applyPartialUpdates UpdateLog.applyOlderUpdate AtomicUpdateDocumentMerger.merge AtomicUpdateDocumentMerger.doInPlaceUpdateMerge {noformat} ** i can't help but wonder if there is room for consolidation? ** in many cases these "merge" methods actually delegate to other "merge" methods, before/after applying additional logic -- in which case at a minimum using {{@link}} or {{@see}} tags in the javadocs to make this (intentional) relationship/similarity more obvious would be helpful. ** in cases where methods do _not_ delegate to eachother, or have any relationship w/eachother, having {{@link}} mentions of eachother in the javadocs to compare/constrast *why* they are different would be equally helpful. ** and who knows -- perhaps in the process of writing these docs we'll find good oportunities to refactor/consolidate {panel} was (Author: ichattopadhyaya): New patch fixing all nocommits. Still a few additional tests, which Hoss mentioned, are TODO. Here's a stab at replying to Hoss' comments (Maybe I'll keep updating this comment itself as and when I fix some of the TODO items here): {panel:title=JettySolrRunner} * javadocs, javadocs, javadocs [FIXED] {panel} {panel:title=XMLLoader + JavabinLoader} * why is this param checks logic duplicated in these classes? [Not sure what you mean here, I just set the prevVersion to the cmd here now] * why not put this in DUP (which already has access to the request params) when it's doing it's "FROMLEADER" logic? [Since commitWithin and overwrite was being set here, I thought this is an appropriate place to set the prevVersion to the cmd] {panel} {panel:title=AddUpdateCommand} * these variables (like all variables) should have javadocs explaining what they are and what they mean [FIXED] ** people skimming a class shouldn't have to grep the code for a variable name to understand it's purpose * having 2 variables here seems like it might be error prone? what does it mean if {{prevVersion < 0 && isInPlaceUpdate == true}} ? or {{0 < prevVersion && isInPlaceUpdate == false}} ? [FIXED: Now just have one variable] ** would it make more sense to use a single {{long prevVersion}} variable and have a {{public boolean isInPlaceUpdate()}} that simply does {{return (0 < prevVersion); }} ? [FIXED] {panel} {panel:title=TransactionLog} * javadocs for both the new {{write}} method and the existig {{write}} method [FIXED] ** explain what "prevPointer" means and note in the 2 arg method what the effective default "prevPoint" is. * we should really have some "int" constants for refering to the List indexes involved in these records, so instead of code like {{entry.get(3)}} sprinkled in various classes like UpdateLog and PeerSync it can be smething more readable like {{entry.get(PREV_VERSION_IDX)}} [TODO] {panel} {panel:title=UpdateLog} * javadocs for both the new {{LogPtr}} constructure and the existing constructor [FIXED] ** explain what "prevPointer" means and note in the 2 arg constructure what the effective default "prevPoint" is. [FIXED] * {{add(AddUpdateCommand, boolean)}} ** this new code for doing lookups in {{map}}, {{prevMap}} and {{preMap2}} seems weird to me (but admitedly i'm not really an expert on UpdateLog in general and how these maps are used ** what primarily concerns me is what the expected behavior is if the "id" isn't found in any of these maps -- it looks like prevPointer defaults to "-1" regardless of whether this is an inplace update ... is that intentional? ... is it possible there are older records we will miss and need to flag that? [Yes, this was intentional, and I think it doesn't make any difference. If an "id" isn't found in any of these maps, it would mean that the previous update was committed and should be looked up in the index. ] ** ie: do we need to worry about distinguising here between "not an in place update, therefore prePointer=-1" vs "is an in place update, but we can't find the prevPointer" ?? [I think we don't need to worry. Upon receiving a prevPointer=-1 by whoever reads this LogPtr, it should be clear why it was -1: if the command's {{flags|UpdateLog.UPDATE_INPLACE}} is set, then this command is an in-place update whose previous update is in the index and not in the tlog; if that flag is not set, it is not an in-place update at all, and don't bother about the prevPointer value at all (which is -1 as a dummy value).] ** assuming this code is correct, it might be a little easier to read if it were refactored into something like:{code} // nocommit: jdocs private synchronized long getPrevPointerForUpdate(AddUpdateCommand cmd) { // note: sync required to ensure maps aren't changed out form under us if (cmd.isInPlaceUpdate) { BytesRef indexedId = cmd.getIndexedId(); for (Map<BytesRef,TransactionLog> currentMap : Arrays.asList(map, prevMap, prevMap2)) { LogPtr prevEntry = currentMap.get(indexedId); if (null != prevEntry) { return prevEntry.pointer; } } } return -1; // default when not inplace, or if we can't find a previous entry } {code} [FIXED: Refactored into something similar to above] * {{applyPartialUpdates}} ** it seems like this method would be a really good candidate for some direct unit testing? [TODO] *** ie: construct a synthetic UpdateLog, and confirm applyPartialUpdates does the right thing ** the sync block in this method, and how the resulting {{lookupLogs}} list is used subsequently, doesn't seem safe to me -- particularly the way {{getEntryFromTLog}} calls incref/decref on each TransactionLog as it loops over that list... *** what prevents some other thread from decref'ing one of these TransactionLog objects (and possibly auto-closing it) in between the sync block and the incref in getEntryFromTLog? **** (most existing usages of TransactionLog.incref() seem to be in blocks that sync on the UpdateLog -- and the ones that aren't in sync blocks look sketchy to me as well) *** in general i'm wondering if {{lookupLogs}} should be created outside of the while loop, so that there is a consistent set of "logs" for the duration of the method call ... what happens right now if some other thread changes tlog/prevMapLog/prevMapLog2 in between iterations of the while loop? ** shouldn't we make some sanity check assertions about the results of getEntryFromTLog? -- there's an INVALID_STATE if it's not an ADD or a list of 5 elements, but what about actually asserting that it's either an ADD or an UPDATE_INPLACE? ... [FIXED] what about asserting the doc's uniqueKey matches? [We could do that, but I think it is not necessary] *** (because unless i'm missing something, it's possible for 2 docs to have the same version, so if there is a glitch in the pointer we can't just trust the version check can we?) [I think we can trust a document to be of the same id if the version matches. It is possible for 2 docs to have same version, but then they would have to be from two different shards altogether. Since all of this processing is happening within a particular replica (which obviously belongs to only one shard), I think we can get away safely without asserting the id and just relying on the version.] ** {{partialUpdateEntry}} seems like a missleading variable name ... can't it be *either* a full document, or partial update (not in place), or an UPDATE_INPLACE partial update? [FIXED: calling it entry now] ** if there is only 1 way to {{break}} out of this while loop, then the method would probably be easier to make sense of if the {{applyOlderUpdate}} and {{return 0}} calls replaced the {{break}} statement [FIXED] ** semi-related: {{while (true)}} is generally a red flag: it seems like might be better if it was refactored inot a {{while (0 <= prevPointer)}} loop? * {{getEntryFromTLog}} [FIXED] ** I don't really see the point of using {{get\(i\)}} over and over .. why not a simple {{for (TransactionLog log : lookupLogs)}} ? [FIXED] ** why is the Exception & Error handling here just a debug log? shouldn't that be a pretty hard fail? [This shouldn't be a hard fail. This is basically a seek operation on a transaction log at the specified position. If the seek results in an exception due to unable to deserialize the tlog entry, we can ignore it, since it just means we were looking up the wrong tlog. I have added a comment to this effect in the catch block.] ** as mentioned above re: {{applyPartialUpdates}}, isn't is possible for 2 diff docs to have the same version? if we get unlucky and those 2 docs, with identical versions, at the same position in diff TransactionLog files then isn't a sanity check of the doc ID in {{applyPartialUpdates}} too late? ... if {{applyPartialUpdates}} tries again it's just going to keep getting the same (wrong) document. It seems like this method actaully needs to know the ID it's looking for, and "skip" any entries thta don't match, checking the next (older) {{TransactionLog}} [I think this is not a likely scenario, since in a given replica, a version should be unique to a document (I think we have bigger problems if this assumption isn't true).] * {{lookupPartialUpdates}} ** what is the purpose/intent of this method? ... it seems to be unused. [FIXED: Removed] * {{doReplay}} ** switch statement ordering... [FIXED: I'll add it to my knowledge!] *** in theory, switch statement cases should be ordered from most likeley to least likely (it's a microoptimization, but in heavily executed loops it might be worth it) *** so i wouldn't inject UPDATE_INPLACE at the begining of the switch -- it should definitely come after ADD, probably best to put it at the end of the list ** why is {{entry.get(2)}} commented out? and why does it say "idBytes" ? ... isn't slot #2 the prevPointer? copy paste confusion from "ADD" ? [FIXED: True, it was copy-paste confusion from ADD. Removed the commented out line.] *** if slot#2 really isn't needed in this code, get rid of the missleading comment about idBytes and replace it with an explicit comment that prevVersion isn't needed for reply. [FIXED: I have removed the spurious commented out lines, refactored that part into a updateCommandFromTlog() method. Does it address your concern here?] {panel} {panel:title=PeerSync} * ditto comments about switch statement ordering from above comments about {{UpdateLog}} [FIXED] * a lot of code here looks duplicated straight from {{UpdateLog.doReplay}} ** I realize that's true for the other case values as well, but bad code shouldn't be emulated ** lets refactor this duplicated code into a new {{public static AddUpdateCommand updateCommandFromTlog(List tlogEntry)}} method in {{UpdateLog}} and re-use it here. [FIXED] * {{log.info}} looks wrong here ... especially inside of an {{if (debug)}} block ... pretty sure this should be {{log.debug}} like the other case blocks [FIXED] {panel} {panel:title=DirectUpdateHandler2} * I don't really understand why we need any schema/DocValue checks here? [This was unnecessary and I've removed it. I have done a somewhat related refactoring to the AddUpdateCommand.getLuceneDocument(boolean isInPlace) to now only generate a Lucene document that has docValues. This was needed because the lucene document that was originally being returned had copy fields targets of id field, default fields, multiple Field per field (due to FieldType.createFields()) etc., which are not needed for in-place updates.] * If {{cmd.isInPlaceUpdate}} is true, then doesn't that mean the update is by definition an in place update that only contains values for DV fields? ... wasn't it the responsibility of the upstream code that set that value to true to ensure that? [True, it was. However, copy field targets of id field, default fields etc. were added in this doNormalAdd() method itself due to cmd.getLuceneDocument(); I have overloaded that method to tackle the needs of in-place updates and filter out such unnecessary fields being added to the lucene doc] ** if not, why can't it be? (ie: why can't we move the schema checks upstream, and out of the sync block here? * if that's not what {{cmd.isInPlaceUpdate}} means, then why isn't there any error handling here in the event that non-DV field/values are found in the luceneDocument? ... doesn't that mean we need to fall back to the original {{writer.updateDocument}} call? * If it is neccessary to do SchemaField validation here for some reason, then shouldn't it be the exact same validation done in {{AtomicUpdateDocumentMerger.isSupportedForInPlaceUpdate}} ? [We should do all schema validation there only, I have removed everything from here, except for some filtering logic at cmd.getLuceneDocument()] {panel} {panel:title=AtomicUpdateDocumentMerger} * {{isSupportedForInPlaceUpdate}} ** shouldn't this method either be static or final? the rules don't change if someone subclasses AtomicUpdateDocumentMerger do they? ** why isn't {{TrieDateField}} also valid? ... could this just be checking for {{instanceof TrieField}} ? *** particularly suspicious since {{doInPlaceUpdateMerge}} does in fact check {{instanceof TrieField}} ** if the intent is truely to only support "numerics" then, why not {{instanceof NumericValueFieldType}} ? ** shouldn't we also check "not-indexed" and "not-stored" here as well? * {{doInPlaceUpdateMerge}} ** why does this method have 3 args? can't all of the neccessary info be deterined from the {{AddUpdateCommand}} ? ** this method seems like another good candidate for some explicit unit testing... *** build up an index & tlog with some explicitly crated non trivial docs/updates, then call this method with a variety of inputs and assert the expected modifications to the {{AddUpdateCommand}} (or assert no modifications if they aren't viable in place update candaites *** then hard commit everything in the tlog and assert that all the same calls return the exact same output/modifications. ** we should probably continue to assume the common case is _not_ to need (in-place) updates ... just regular adds. in which case anything we can do to short circut out faster -- before any checks that require stuff like {{SchemaField}} -- wouldn't reordering the checks in the loop to something like this be equivilent to what we have currently but faster in the common case? ...{code} for (SolrInputField f : sdoc) { final String fname = f.getName(); if (idField.getName().equals(fname)) { continue; } if (! f.getValue() instanceof Map) { // doc contains field value that is not an update, therefore definitely not an inplace update return false; } if (!isSupportedForInPlaceUpdate(schema.getField(fname))) { // doc contains update to a field that does not support in place updates return false; } } {code} *** Even if i've overloked something and that code isn't better, i think in general the "is this sdoc a candidate for in place updating" logic should be refactored into a public static helper method that has some unit tests. ** this method calls {{RealTimeGetComponent.getInputDocumentFromTlog}} but doesn't check for the special {{DELETED}} return value... *** definitely smells like a bug ... there are many assumptions made about {{uncommittedDoc}} as long as it's non-null *** based on this, now i really want to see more testing of in place updates mixed with document deletions: **** some explicit single node testing of "add docX, delete docX, do a 'set':'42' on docX" **** introduce some randomized deleteById calls into the randomized single/multi node tests ** this method calls {{RealTimeGetComponent.getVersionFromTlog}} which has docs that say "If null is returned, it could still be in the latest index." *** i don't see any code accounting for that possibility, cmd.prevVersion is just blindly assigned the null in that case ... which could lead to an NPE since it's declared as {{public long prevVersion}} *** the fact that this hasn't caused an NPE in testing is a red flag that there is a code path not being tested here ... but at a glance i can't really tell what it is? ** In general, I guess I don't really understand why this method is jumping through as many hoops as it is with the RTG code? *** it seems to duplicate a lot of functionality already in {{RealTimeGetComponent.getInputDocument}} ... why not just use that method? *** if the concern is avoiding the {{searcher.doc(docid)}} call to get all stored fields, perhaps {{RealTimeGetComponent.getInputDocument}} could be refactored to make it easier to re-use most of the functionality here? ... not sure what would make the most sense off the top of my head. *** at a minimum it seems like using {{SolrIndexSearcher.decorateDocValueFields(...)}} would make more sense then doing it ourselves as we loop over the fields -- we can even pass in the explicit list of field names we know we care about based on the SolrInputDocument (or even just the field names we know use "inc" if that's all we really need) **** (Or is there something i'm missing about why using {{decorateDocValueFields}} would be a mistake here?) ** in the switch statement, shouldn't we be using the {{doSet}} and {{doInc}} methods to actaully cary out the operations? *** that would simplify the "inc" case a lot ** the {{default}} on the switch statement looks sketchy to me ... i understand that only "inc" and "set" are supported, but why does this method only {{warn}} if it sees something else? shouldn't this be a hard failure? *** for that matter: shouldn't the {{instanceof Map}} check when looping over the fields at the begining of the method short circut out if the Map contains an operation that isn't one of the supported "in place update" operations? *** in fact: if we pre-checked the Maps only contained "set" and "inc", and used something like {{decorateDocValueFields}} (or did the equivilent ourselves in a smaller loop) then couldn't we also simplify this method a lot by just delegating to the existing {{merge(SolrInputDocument,SolrInputDocument)}} method? ** these assumptions seem sketchy, if that's the only reason for these "else" blocks then let's include some {{asert fieldName.equals(...)}} calls to prove it...{code} } else { // for version field, which is a docvalue but there's no set/inc operation ... } } else { // for id field ... {code} *** in particluar i'm curious about the {{VERSION_FIELD}}... **** this method is only called from one place -- {{DistributedUpdateProcessor.getUpdatedDocument}} -- and in the existing code of that method, when a {{SolrInputDocument}} is fetched from {{RealTimeGetComponent}}, the {{VERSION_FIELD}} is explicitly removed from it before using it & returning. **** should this method also be explicitly removing the {{VERSION_FIELD}} field? and/or should the caller ({{getUpdatedDocument}}) be removing it consistently before returning? {panel} {panel:title=RealTimeGetComponent} * {{process}} ** I like this {{SearcherInfo}} refactoring, but a few suggestions: *** it should be promoted into a (private) static (inner) class ... no need for a new class instance every time {{RealTimeGetComponent.process}} is called. *** add a one arg constructor and pass the SolrCore to that. *** javadocs, javadocs, javadocs .... note that it's not thread safe *** let's make {{searcherHolder}} and {{searcher}} private, and replace direct {{searcher}} access with:{code} public SolrIndexSearcher getSearcher() { assert null != searcher : "init not called!"; return searcher; } {code} ** in the switch statement, it seems like there is a lot of code duplicated between the {{ADD}} and {{UPDATE_INPLACE}} cases *** why not consolidate those cases into one block of code using (a modified) {{resolveFullDocument}} which can start with a call to {{toSolrDoc(...)}} and then return immediately if the entry is {{UpdateLog.ADD}} ? * {{resolveFullDocument}} ** see comments above about modifying this method to call {{toSolrDocument}} itself rather then expecting as input, and return early if the {{entry}} is an {{UpdateLog.ADD}} ** let's put an {{assert 0==lastPrevPointer}} in this else block in case someone improves/breaks {{ulog.applyPartialUpdates}} to return {{-2}} in the future...{code} } else { // i.e. lastPrevPointer==0 {code} ** since {{ulog.applyPartialUpdates(...)}} is a No-Op when {{prevPointer == -1}}, can't we remove the redundent calls to {{mergePartialDocWithFullDocFromIndex}} & {{reopenRealtimeSearcherAndGet}} before and after calling {{ulog.applyPartialUpdates(...)}} ... ie:{code} long prevPointer = (long) logEntry.get(2); long prevVersion = (long) logEntry.get(3); // this is a No-Op if prevPointer is already negative, otherwise... // get the last full document from ulog prevPointer = ulog.applyPartialUpdates(idBytes, prevPointer, prevVersion, partialDoc); if (-1 == prevPointer) { ... } else if (0 < prevPointer) { ... } else { assert 0 == prevPointer; ... } {code} *** If there is some reason i'm not seeing why it's important to call {{mergePartialDocWithFullDocFromIndex}} & {{reopenRealtimeSearcherAndGet}} before calling {{ulog.applyPartialUpdates}}, then perhaps we should at least refactor the "if mergedDoc == null, return reopenRealtimeSearcherAndGet" logic into {{mergePartialDocWithFullDocFromIndex}} since that's the only way it's ever used. * {{mergePartialDocWithFullDocFromIndex}} ** since this is a private method, what's the expected usage of {{docidFromIndex}} ? ... it's never used, so can we refactor it away? ** see previous comment about refactoring this method to automatically return {{reopenRealtimeSearcherAndGet(...)}} when it would otherwise return null * {{reopenRealtimeSearcherAndGet}} ** javadocs, javadocs, javadocs ** since this method is only used in conjunction with {{mergePartialDocWithFullDocFromIndex}}, if the code is refactored so that {{mergePartialDocWithFullDocFromIndex}} calls this method directly (see suggestion above), we could make a small micro-optimization by changing the method sig to take in a {{Term}} to (re)use rather then passing in {{idBytes}} and calling {{core.getLatestSchema().getUniqueKeyField()}} twice. ** re: the {{INVALID_STATE}} ... is that really a fatal error, or should this method be accounting for the possibility of a doc that has been completley deleted (or was never in the index) in a diff way? * {{getInputDocumentFromTlog}} ** lets put an explicit comment here noting that we are intentionally falling through to the {{Updatelog.ADD}} case * {{getVersionFromTlog}} ** based on it's usage, should this method be changed to return {{-1}} instead of null? ... not clear to me from the given caller usage ... if so should be be declared to return {{long}} instead of {{Long}} ? {panel} {panel:title=DistributedUpdateProcessor} * Similar question from AddUpdateCommand: do we really need 2 distinct params here, or would it be cleaner / less error-prone to have a single {{distrib.inplace.prevversion}} which indicates we're doing an inplace update if it's a positive # ? * {{versionAdd}} ** "Something weird has happened" ... perhaps {{waitForDependentUpdates}} should return the {{lastFoundVersion}} so we can use it in this error msg? ... "Dependent version not found after waiting: ..." ** "TODO: Should we call waitForDependentUpdates() again?" ... i don't see how that would help? if we get to this point it definitely seems like a "WTF?" fail fast situation. ** the way the existing code in this method has been refactored into an "else" block (see comment: {{// non inplace update, i.e. full document update}}) makes sense, but the way the {{Long lastVersion}} declaration was refactored _out_ of that block to reuse with the "if isInPlaceUpdate" side of things is a bit confusing and doesn't seem to actaully simplify anything... *** It's not used after the if/else block, so there's no reason to declare it before the "if" statement *** by definition it must be null in the "else" case, so {{if (lastVersion == null)}} will also be true in that code path *** it seems simpiler to just let both the "if" and "else" branches declare/define their own "Long lastVersion" and not risk any confusion about why that variable needs to be "shared" * {{waitForDependentUpdates}} ** javadocs, javadocs, javadocs ** {{fetchFromLeader}} is always true? why not eliminate arg? ** I'm not a concurrency expert, but is this control flow with the sync/wait actaully safe? ... my understanding was the conditional check you're waiting on should always be a loop inside the sync block? ** even if there are no spurious returns from wait, logging at "info" level every 100ms is excessive ... logging that often at trace seems excessive. *** why don't we log an info once at the start of method (since our of order updates should be rare) and once at debug anytime lastFoundVersion changes? (diff log message if/when lastFoundVersion is == or > prev) *** the "Waiting attempt" counter "i" doesn't seem like useful info to log given how {{wait(...)}} works ** "...Dropping current update." - that log msg seems missleading, this method doesn't do anything to drop the current update, it just assumes the current update will be droped later ** i don't really see the point of the {{boolean foundDependentUpdate}} variable... why not change the only place where it's set to {{true}} to return immediately? ** {{fetchMissingUpdateFromLeader}} can return null, but that possibility isn't handled here. ** {{if (uc instanceof AddUpdateCommand)}} ... what if it's not? *** currently it's just silently ignored *** is this a viable scenerio that needs accounted for, or an exceptional scenerio that should have error checking? *** looks like maybe it's just a confusing way to do a null check? ** {{((System.nanoTime()-startTime)/1000000 )}} ... that's a missleading thing to include in the Exception *** we didn't wait that long, we waited at most 5 seconds -- who knows how long we spent calling {{fetchMissingUpdateFromLeader}} & executing it. * {{fetchMissingUpdateFromLeader}} ** javadocs, javadocs, javadocs ** should we really be constructing a new HttpSolrClient on the fly like this? ** is this on the fly SolrClient + GenericSolrRequest going to work if/when the security auth/acl features are in use in the solr cluster? ** this seems to assume the node that forwarded us the _current_ request (via {{get(DISTRIB_FROM)}} is still the leader -- but what if there was a leader election? *** if the leader failed, and a new one was elected, isn't that a pretty viable/likeley reason why {{waitForDependentUpdates}} timed-out and needed to call {{fetchMissingUpdateFromLeader}} in the first place? ** {{e.printStackTrace();}} ... huge red flag that serious error handling is missing here. ** This method seems like it expects the possibility that {{missingUpdates}} will contain more then one entry, and if it does contain more then one entry it will convert/copy *all* of them into the {{updates}} list -- but then it will completley ignore all but the first one. *** if we don't expect more then 1, why not assert that? *** if we expect more then one, and they are all important, why don't we return {{List<UpdateCommand>}} ? *** if we expect more then one, but only care about one in particularly -- why loop over all of them? **** how do we know for sure which one in the list is the one we care about? ** ditto comments about {{PeerSync}} & {{UpdateLog}} and creating a {{public static AddUpdateCommand updateCommandFromTlog(List tlogEntry)}} somwhere that can be reused here as well ** switch statement should have some sort of {{default}} case ... even if i'ts just to throw an error because anything but an {{ADD}} or {{UPDATE_INPLACE}} is impossible ** need to future proof this code against the posibility of other stuff down the road {panel} And here are some general overall comments / impressions I got while reviewing the code and then edited up once i was all finished... {panel} * Given that this patch requires some non-trivial changes to the types of records that go in the update log, and requires corisponding changes to {{PeerSync}}, it seems like there should definitely be some very explicit testing of log reply and peer sync ** ie: {{TestReplay}} and {{PeerSyncTest}} should be updated to include a variety of scenerios involving in-place updates * after seeing how complex & hairy some of the new code has to be around the diff handling of "in-place atomic updates", vs existing "atomic updates" (that aren't in place) It seems like we should definitely have more test code that mixes and matches diff types of "updates" ** static, non randomized, examples of explicit tests we should definitely have... *** a doc gets a sequence of atomic updates each containing multiple "in place" inc/set ops on diff fields *** a doc gets a sequence of atomic updates each containing multiple inc/set ops, where a single update may have a mix of "in place" vs "not in place" eligable ops (due to diff fields having diff docvalue/stored settings) ** our randomized (cloud and non-cloud) testing of in-place updates should also include updates to the canidate docs that may ocasionally not be viable "in-place" updates (because they involved updates to additional non-dv fields) ** in all of these tests we should be checking that both the RTG and regualr search results make sense * we also need a lot more testing of various {{deleteById}} and {{deleteByQuery}} commands mixed with in-place atomic updates ** both deleteByQuerys against the DV fields used in the in-place atomic updates as well as DBQs against other fields in the documents ** test the results of (uncommited) RTG as well as searches when these deletes are intermixed with in-place atomic updates ** test the results of peer sync and reply when deletes are mixed with in-place atomic updates. ** test that we correctly get {{409}} error codes when trying to do in-place updates w/optimistic concurrency after a delete (and vice versa: DBQ/dID afte in-place update) * all logging needs heavily audited ** there's a lot of logging happening at the {{info}} and {{debug}} level that should probably be much lower. ** likewise there may be a few existing {{info}} or {{debug}} msgs that might be good candidates for {{warn}} or {{error}} level msgs. * uniqueKey and Exception/log msgs ** there is a lot of log msgs or Exception msgs that cite a version# when reporting a problem, but don't include the uniqueKey of the document involved ** these messages aren't going to be remotely useful to end users w/o also including the (human readable) uniqueKey of the document involved. * it feels like we now have a really large number of methods involved in the merging/combining/converting of documents to apply atomic updates ("in place" or otherwise) ... either for use in RTG, or for use when writing updates to disk, or from reading from the tlog, etc... ** the ones that jump out at me w/o digging very hard...{noformat} RealTimeGetComponent.resolveFullDocument RealTimeGetComponent.toSolrDoc RealTimeGetComponent.mergePartialDocWithFullDocFromIndex UpdateLog.applyPartialUpdates UpdateLog.applyOlderUpdate AtomicUpdateDocumentMerger.merge AtomicUpdateDocumentMerger.doInPlaceUpdateMerge {noformat} ** i can't help but wonder if there is room for consolidation? ** in many cases these "merge" methods actually delegate to other "merge" methods, before/after applying additional logic -- in which case at a minimum using {{@link}} or {{@see}} tags in the javadocs to make this (intentional) relationship/similarity more obvious would be helpful. ** in cases where methods do _not_ delegate to eachother, or have any relationship w/eachother, having {{@link}} mentions of eachother in the javadocs to compare/constrast *why* they are different would be equally helpful. ** and who knows -- perhaps in the process of writing these docs we'll find good oportunities to refactor/consolidate {panel} > Support updates of numeric DocValues > ------------------------------------ > > Key: SOLR-5944 > URL: https://issues.apache.org/jira/browse/SOLR-5944 > Project: Solr > Issue Type: New Feature > Reporter: Ishan Chattopadhyaya > Assignee: Shalin Shekhar Mangar > Attachments: DUP.patch, SOLR-5944.patch, SOLR-5944.patch, > SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, > SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, > SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, > SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, > SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, > SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, > SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, > SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, > SOLR-5944.patch, SOLR-5944.patch, > TestStressInPlaceUpdates.eb044ac71.beast-167-failure.stdout.txt, > TestStressInPlaceUpdates.eb044ac71.beast-587-failure.stdout.txt, > TestStressInPlaceUpdates.eb044ac71.failures.tar.gz, > hoss.62D328FA1DEA57FD.fail.txt, hoss.62D328FA1DEA57FD.fail2.txt, > hoss.62D328FA1DEA57FD.fail3.txt, hoss.D768DD9443A98DC.fail.txt, > hoss.D768DD9443A98DC.pass.txt > > > LUCENE-5189 introduced support for updates to numeric docvalues. It would be > really nice to have Solr support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org