[jira] [Comment Edited] (SOLR-5944) Support updates of numeric DocValues

Ishan Chattopadhyaya (JIRA) Mon, 04 Jul 2016 17:48:26 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361862#comment-15361862
 ]


Ishan Chattopadhyaya edited comment on SOLR-5944 at 7/5/16 12:48 AM:
---------------------------------------------------------------------

New patch fixing all nocommits. Still a few additional tests, which Hoss 
mentioned, are TODO. Here's a stab at replying to Hoss' comments (Maybe I'll 
keep updating this comment itself as and when I fix some of the TODO items 
here):

 {panel:title=JettySolrRunner}
* javadocs, javadocs, javadocs {color:green}[FIXED]{color}
{panel}

{panel:title=XMLLoader + JavabinLoader}
* why is this param checks logic duplicated in these classes? {color:red}[Not 
sure what you mean here, I just set the prevVersion to the cmd here now]{color}
* why not put this in DUP (which already has access to the request params) when 
it's doing it's "FROMLEADER" logic? {color:red}[Since commitWithin and 
overwrite was being set here, I thought this is an appropriate place to set the 
prevVersion to the cmd]{color}
{panel}

{panel:title=AddUpdateCommand}
* these variables (like all variables) should have javadocs explaining what 
they are and what they mean {color:green}[FIXED]{color}
** people skimming a class shouldn't have to grep the code for a variable name 
to understand it's purpose
* having 2 variables here seems like it might be error prone?  what does it 
mean if {{prevVersion < 0 && isInPlaceUpdate == true}} ? or {{0 < prevVersion 
&& isInPlaceUpdate == false}} ? {color:green}[FIXED: Now just have one 
variable]{color}
** would it make more sense to use a single {{long prevVersion}} variable and 
have a {{public boolean isInPlaceUpdate()}} that simply does {{return (0 < 
prevVersion); }} ? {color:green}[FIXED]{color}
{panel}

{panel:title=TransactionLog}
* javadocs for both the new {{write}} method and the existig {{write}} method  
{color:green}[FIXED]{color}
** explain what "prevPointer" means and note in the 2 arg method what the 
effective default "prevPoint" is.
* we should really have some "int" constants for refering to the List indexes 
involved in these records, so instead of code like {{entry.get(3)}} sprinkled 
in various classes like UpdateLog and PeerSync it can be smething more readable 
like {{entry.get(PREV_VERSION_IDX)}}  {color:red}[TODO]{color}
{panel}


{panel:title=UpdateLog}
* javadocs for both the new {{LogPtr}} constructure and the existing 
constructor {color:green}[FIXED]{color}
** explain what "prevPointer" means and note in the 2 arg constructure what the 
effective default "prevPoint" is.  {color:green}[FIXED]{color}
* {{add(AddUpdateCommand, boolean)}}
** this new code for doing lookups in {{map}}, {{prevMap}} and {{preMap2}} 
seems weird to me (but admitedly i'm not really an expert on UpdateLog in 
general and how these maps are used
** what primarily concerns me is what the expected behavior is if the "id" 
isn't found in any of these maps -- it looks like prevPointer defaults to "-1" 
regardless of whether this is an inplace update ... is that intentional? ... is 
it possible there are older records we will miss and need to flag that?  [Yes, 
this was intentional, and I think it doesn't make any difference. If an "id" 
isn't found in any of these maps, it would mean that the previous update was 
committed and should be looked up in the index. ]
** ie: do we need to worry about distinguising here between "not an in place 
update, therefore prePointer=-1" vs "is an in place update, but we can't find 
the prevPointer" ?? [I think we don't need to worry. Upon receiving a 
prevPointer=-1 by whoever reads this LogPtr, it should be clear why it was -1: 
if the command's {{flags|UpdateLog.UPDATE_INPLACE}} is set, then this command 
is an in-place update whose previous update is in the index and not in the 
tlog; if that flag is not set, it is not an in-place update at all, and don't 
bother about the prevPointer value at all (which is -1 as a dummy value).]
** assuming this code is correct, it might be a little easier to read if it 
were refactored into something like:{code}
// nocommit: jdocs
private synchronized long getPrevPointerForUpdate(AddUpdateCommand cmd) {
  // note: sync required to ensure maps aren't changed out form under us
  if (cmd.isInPlaceUpdate) {
    BytesRef indexedId = cmd.getIndexedId();
    for (Map<BytesRef,TransactionLog> currentMap : Arrays.asList(map, prevMap, 
prevMap2)) {
      LogPtr prevEntry = currentMap.get(indexedId);
      if (null != prevEntry) {
        return prevEntry.pointer;
      }
    }
  }
  return -1; // default when not inplace, or if we can't find a previous entry
}
{code} [FIXED: Refactored into something similar to above]
* {{applyPartialUpdates}}
** it seems like this method would be a really good candidate for some direct 
unit testing? [TODO]
*** ie: construct a synthetic UpdateLog, and confirm applyPartialUpdates does 
the right thing
** the sync block in this method, and how the resulting {{lookupLogs}} list is 
used subsequently, doesn't seem safe to me -- particularly the way 
{{getEntryFromTLog}} calls incref/decref on each TransactionLog as it loops 
over that list...
*** what prevents some other thread from decref'ing one of these TransactionLog 
objects (and possibly auto-closing it) in between the sync block and the incref 
in getEntryFromTLog?
**** (most existing usages of TransactionLog.incref() seem to be in blocks that 
sync on the UpdateLog -- and the ones that aren't in sync blocks look sketchy 
to me as well)
*** in general i'm wondering if {{lookupLogs}} should be created outside of the 
while loop, so that there is a consistent set of "logs" for the duration of the 
method call ... what happens right now if some other thread changes 
tlog/prevMapLog/prevMapLog2 in between iterations of the while loop?
** shouldn't we make some sanity check assertions about the results of 
getEntryFromTLog? -- there's an INVALID_STATE if it's not an ADD or a list of 5 
elements, but what about actually asserting that it's either an ADD or an 
UPDATE_INPLACE? ... {color:green}[FIXED]{color} what about asserting the doc's 
uniqueKey matches? [We could do that, but I think it is not necessary]
*** (because unless i'm missing something, it's possible for 2 docs to have the 
same version, so if there is a glitch in the pointer we can't just trust the 
version check can we?) [I think we can trust a document to be of the same id if 
the version matches. It is possible for 2 docs to have same version, but then 
they would have to be from two different shards altogether. Since all of this 
processing is happening within a particular replica (which obviously belongs to 
only one shard), I think we can get away safely without asserting the id and 
just relying on the version.]
** {{partialUpdateEntry}} seems like a missleading variable name ... can't it 
be *either* a full document, or partial update (not in place), or an 
UPDATE_INPLACE partial update? [FIXED: calling it entry now]
** if there is only 1 way to {{break}} out of this while loop, then the method 
would probably be easier to make sense of if the {{applyOlderUpdate}} and 
{{return 0}} calls replaced the {{break}} statement {color:green}[FIXED]{color}
** semi-related: {{while (true)}} is generally a red flag: it seems like might 
be better if it was refactored inot a {{while (0 <= prevPointer)}} loop?
* {{getEntryFromTLog}} {color:green}[FIXED]{color}
** I don't really see the point of using {{get\(i\)}} over and over .. why not 
a simple {{for (TransactionLog log : lookupLogs)}} ? {color:green}[FIXED]{color}
** why is the Exception & Error handling here just a debug log? shouldn't that 
be a pretty hard fail? [This shouldn't be a hard fail. This is basically a seek 
operation on a transaction log at the specified position. If the seek results 
in an exception due to unable to deserialize the tlog entry, we can ignore it, 
since it just means we were looking up the wrong tlog. I have added a comment 
to this effect in the catch block.]
** as mentioned above re: {{applyPartialUpdates}}, isn't is possible for 2 diff 
docs to have the same version? if we get unlucky and those 2 docs, with 
identical versions, at the same position in diff TransactionLog files then 
isn't a sanity check of the doc ID in {{applyPartialUpdates}} too late? ... if 
{{applyPartialUpdates}} tries again it's just going to keep getting the same 
(wrong) document.  It seems like this method actaully needs to know the ID it's 
looking for, and "skip" any entries thta don't match, checking the next (older) 
{{TransactionLog}} [I think this is not a likely scenario, since in a given 
replica, a version should be unique to a document (I think we have bigger 
problems if this assumption isn't true).]
* {{lookupPartialUpdates}}
** what is the purpose/intent of this method? ... it seems to be unused. 
[FIXED: Removed]
* {{doReplay}}
** switch statement ordering...  [FIXED: I'll add it to my knowledge!]
*** in theory, switch statement cases should be ordered from most likeley to 
least likely (it's a microoptimization, but in heavily executed loops it might 
be worth it)
*** so i wouldn't inject UPDATE_INPLACE at the begining of the switch -- it 
should definitely come after ADD, probably best to put it at the end of the list
** why is {{entry.get(2)}} commented out? and why does it say "idBytes" ? ... 
isn't slot #2 the prevPointer?  copy paste confusion from "ADD" ?  [FIXED: 
True, it was copy-paste confusion from ADD. Removed the commented out line.]
*** if slot#2 really isn't needed in this code, get rid of the missleading 
comment about idBytes and replace it with an explicit comment that prevVersion 
isn't needed for reply. [FIXED: I have removed the spurious commented out 
lines, refactored that part into a updateCommandFromTlog() method. Does it 
address your concern here?]
{panel}

{panel:title=PeerSync}
* ditto comments about switch statement ordering from above comments about 
{{UpdateLog}} {color:green}[FIXED]{color}
* a lot of code here looks duplicated straight from {{UpdateLog.doReplay}}
** I realize that's true for the other case values as well, but bad code 
shouldn't be emulated
** lets refactor this duplicated code into a new {{public static 
AddUpdateCommand updateCommandFromTlog(List tlogEntry)}} method in 
{{UpdateLog}} and re-use it here. {color:green}[FIXED]{color}
* {{log.info}} looks wrong here ... especially inside of an {{if (debug)}} 
block ... pretty sure this should be {{log.debug}} like the other case blocks 
{color:green}[FIXED]{color}
{panel}

{panel:title=DirectUpdateHandler2}
* I don't really understand why we need any schema/DocValue checks here? [This 
was unnecessary and I've removed it. I have done a somewhat related refactoring 
to the AddUpdateCommand.getLuceneDocument(boolean isInPlace) to now only 
generate a Lucene document that has docValues. This was needed because the 
lucene document that was originally being returned had copy fields targets of 
id field, default fields, multiple Field per field (due to 
FieldType.createFields()) etc., which are not needed for in-place updates.]
* If {{cmd.isInPlaceUpdate}} is true, then doesn't that mean the update is by 
definition an in place update that only contains values for DV fields? ... 
wasn't it the responsibility of the upstream code that set that value to true 
to ensure that? [True, it was. However, copy field targets of id field, default 
fields etc. were added in this doNormalAdd() method itself due to 
cmd.getLuceneDocument(); I have overloaded that method to tackle the needs of 
in-place updates and filter out such unnecessary fields being added to the 
lucene doc]
** if not, why can't it be? (ie: why can't we move the schema checks upstream, 
and out of the sync block here?
* if that's not what {{cmd.isInPlaceUpdate}} means, then why isn't there any 
error handling here in the event that non-DV field/values are found in the 
luceneDocument? ... doesn't that mean we need to fall back to the original 
{{writer.updateDocument}} call?
* If it is neccessary to do SchemaField validation here for some reason, then 
shouldn't it be the exact same validation done in 
{{AtomicUpdateDocumentMerger.isSupportedForInPlaceUpdate}} ? [We should do all 
schema validation there only, I have removed everything from here, except for 
some filtering logic at cmd.getLuceneDocument()]
{panel}

{panel:title=AtomicUpdateDocumentMerger}
* {{isSupportedForInPlaceUpdate}}
** shouldn't this method either be static or final?  the rules don't change if 
someone subclasses AtomicUpdateDocumentMerger do they?
** why isn't {{TrieDateField}} also valid? ... could this just be checking for 
{{instanceof TrieField}} ?
*** particularly suspicious since {{doInPlaceUpdateMerge}} does in fact check 
{{instanceof TrieField}}
** if the intent is truely to only support "numerics" then, why not 
{{instanceof NumericValueFieldType}} ?
** shouldn't we also check "not-indexed" and "not-stored" here as well?
* {{doInPlaceUpdateMerge}}
** why does this method have 3 args?  can't all of the neccessary info be 
deterined from the {{AddUpdateCommand}} ?
** this method seems like another good candidate for some explicit unit 
testing...
*** build up an index & tlog with some explicitly crated non trivial 
docs/updates, then call this method with a variety of inputs and assert the 
expected modifications to the {{AddUpdateCommand}} (or assert no modifications 
if they aren't viable in place update candaites
*** then hard commit everything in the tlog and assert that all the same calls 
return the exact same output/modifications.
** we should probably continue to assume the common case is _not_ to need 
(in-place) updates ... just regular adds.  in which case anything we can do to 
short circut out faster -- before any checks that require stuff like 
{{SchemaField}} -- wouldn't reordering the checks in the loop to something like 
this be equivilent to what we have currently but faster in the common case? 
...{code}
for (SolrInputField f : sdoc) {
  final String fname = f.getName();
  if (idField.getName().equals(fname)) {
    continue;
  }
  if (! f.getValue() instanceof Map) {
    // doc contains field value that is not an update, therefore definitely not 
an inplace update
    return false;
  }
  if (!isSupportedForInPlaceUpdate(schema.getField(fname))) {
    // doc contains update to a field that does not support in place updates
    return false;
  }
}
{code}
*** Even if i've overloked something and that code isn't better, i think in 
general the "is this sdoc a candidate for in place updating" logic should be 
refactored into a public static helper method that has some unit tests.
** this method calls {{RealTimeGetComponent.getInputDocumentFromTlog}} but 
doesn't check for the special {{DELETED}} return value...
*** definitely smells like a bug ... there are many assumptions made about 
{{uncommittedDoc}} as long as it's non-null
*** based on this, now i really want to see more testing of in place updates 
mixed with document deletions:
**** some explicit single node testing of "add docX, delete docX, do a 
'set':'42' on docX"
**** introduce some randomized deleteById calls into the randomized 
single/multi node tests
** this method calls {{RealTimeGetComponent.getVersionFromTlog}} which has docs 
that say "If null is returned, it could still be in the latest index."
*** i don't see any code accounting for that possibility, cmd.prevVersion is 
just blindly assigned the null in that case ... which could lead to an NPE 
since it's declared as {{public long prevVersion}}
*** the fact that this hasn't caused an NPE in testing is a red flag that there 
is a code path not being tested here ... but at a glance i can't really tell 
what it is?
** In general, I guess I don't really understand why this method is jumping 
through as many hoops as it is with the RTG code?
*** it seems to duplicate a lot of functionality already in 
{{RealTimeGetComponent.getInputDocument}} ... why not just use that method?
*** if the concern is avoiding the {{searcher.doc(docid)}} call to get all 
stored fields, perhaps {{RealTimeGetComponent.getInputDocument}} could be 
refactored to make it easier to re-use most of the functionality here? ... not 
sure what would make the most sense off the top of my head.
*** at a minimum it seems like using 
{{SolrIndexSearcher.decorateDocValueFields(...)}} would make more sense then 
doing it ourselves as we loop over the fields -- we can even pass in the 
explicit list of field names we know we care about based on the 
SolrInputDocument (or even just the field names we know use "inc" if that's all 
we really need)
**** (Or is there something i'm missing about why using 
{{decorateDocValueFields}} would be a mistake here?)
** in the switch statement, shouldn't we be using the {{doSet}} and {{doInc}} 
methods to actaully cary out the operations?
*** that would simplify the "inc" case a lot
** the {{default}} on the switch statement looks sketchy to me ... i understand 
that only "inc" and "set" are supported, but why does this method only {{warn}} 
if it sees something else? shouldn't this be a hard failure?
*** for that matter: shouldn't the {{instanceof Map}} check when looping over 
the fields at the begining of the method short circut out if the Map contains 
an operation that isn't one of the supported "in place update" operations?
*** in fact: if we pre-checked the Maps only contained "set" and "inc", and 
used something like {{decorateDocValueFields}} (or did the equivilent ourselves 
in a smaller loop) then couldn't we also simplify this method a lot by just 
delegating to the existing {{merge(SolrInputDocument,SolrInputDocument)}} 
method?
** these assumptions seem sketchy, if that's the only reason for these "else" 
blocks then let's include some {{asert fieldName.equals(...)}} calls to prove 
it...{code}
          } else { // for version field, which is a docvalue but there's no 
set/inc operation
            ...
          }
        } else { // for id field
          ...
{code}
*** in particluar i'm curious about the {{VERSION_FIELD}}...
**** this method is only called from one place -- 
{{DistributedUpdateProcessor.getUpdatedDocument}} -- and in the existing code 
of that method, when a {{SolrInputDocument}} is fetched from 
{{RealTimeGetComponent}}, the {{VERSION_FIELD}} is explicitly removed from it 
before using it & returning.
**** should this method also be explicitly removing the {{VERSION_FIELD}} 
field?  and/or should the caller ({{getUpdatedDocument}}) be removing it 
consistently before returning?
{panel}


{panel:title=RealTimeGetComponent}
* {{process}}
** I like this {{SearcherInfo}} refactoring, but a few suggestions:
*** it should be promoted into a (private) static (inner) class ... no need for 
a new class instance every time {{RealTimeGetComponent.process}} is called.
*** add a one arg constructor and pass the SolrCore to that.
*** javadocs, javadocs, javadocs .... note that it's not thread safe
*** let's make {{searcherHolder}} and {{searcher}} private, and replace direct 
{{searcher}} access with:{code}
public SolrIndexSearcher getSearcher() {
  assert null != searcher : "init not called!";
  return searcher;
}
{code}
** in the switch statement, it seems like there is a lot of code duplicated 
between the {{ADD}} and {{UPDATE_INPLACE}} cases
*** why not consolidate those cases into one block of code using (a modified) 
{{resolveFullDocument}} which can start with a call to {{toSolrDoc(...)}} and 
then return immediately if the entry is {{UpdateLog.ADD}} ?
* {{resolveFullDocument}}
** see comments above about modifying this method to call {{toSolrDocument}} 
itself rather then expecting as input, and return early if the {{entry}} is an 
{{UpdateLog.ADD}}
** let's put an {{assert 0==lastPrevPointer}} in this else block in case 
someone improves/breaks {{ulog.applyPartialUpdates}} to return {{-2}} in the 
future...{code}
    } else { // i.e. lastPrevPointer==0
{code}
** since {{ulog.applyPartialUpdates(...)}} is a No-Op when {{prevPointer == 
-1}}, can't we remove the redundent calls to 
{{mergePartialDocWithFullDocFromIndex}} & {{reopenRealtimeSearcherAndGet}} 
before and after calling {{ulog.applyPartialUpdates(...)}} ... ie:{code}
    long prevPointer = (long) logEntry.get(2);
    long prevVersion = (long) logEntry.get(3);
    // this is a No-Op if prevPointer is already negative, otherwise...
    // get the last full document from ulog
    prevPointer = ulog.applyPartialUpdates(idBytes, prevPointer, prevVersion, 
partialDoc);
    if (-1 == prevPointer) {
      ...
    } else if (0 < prevPointer) {
      ...
    } else {
      assert 0 == prevPointer;
      ...
    }
{code}
*** If there is some reason i'm not seeing why it's important to call 
{{mergePartialDocWithFullDocFromIndex}} & {{reopenRealtimeSearcherAndGet}} 
before calling {{ulog.applyPartialUpdates}}, then perhaps we should at least 
refactor the "if mergedDoc == null, return reopenRealtimeSearcherAndGet" logic 
into {{mergePartialDocWithFullDocFromIndex}} since that's the only way it's 
ever used.
* {{mergePartialDocWithFullDocFromIndex}}
** since this is a private method, what's the expected usage of 
{{docidFromIndex}} ? ... it's never used, so can we refactor it away?
** see previous comment about refactoring this method to automatically return 
{{reopenRealtimeSearcherAndGet(...)}} when it would otherwise return null
* {{reopenRealtimeSearcherAndGet}}
** javadocs, javadocs, javadocs
** since this method is only used in conjunction with 
{{mergePartialDocWithFullDocFromIndex}}, if the code is refactored so that 
{{mergePartialDocWithFullDocFromIndex}} calls this method directly (see 
suggestion above), we could make a small micro-optimization by changing the 
method sig to take in a {{Term}} to (re)use rather then passing in {{idBytes}} 
and calling {{core.getLatestSchema().getUniqueKeyField()}} twice.
** re: the {{INVALID_STATE}} ... is that really a fatal error, or should this 
method be accounting for the possibility of a doc that has been completley 
deleted (or was never in the index) in a diff way?
* {{getInputDocumentFromTlog}}
** lets put an explicit comment here noting that we are intentionally falling 
through to the {{Updatelog.ADD}} case
* {{getVersionFromTlog}}
** based on it's usage, should this method be changed to return {{-1}} instead 
of null?  ... not clear to me from the given caller usage ... if so should be 
be declared to return {{long}} instead of {{Long}} ?
{panel}


{panel:title=DistributedUpdateProcessor}
* Similar question from AddUpdateCommand: do we really need 2 distinct params 
here, or would it be cleaner / less error-prone to have a single 
{{distrib.inplace.prevversion}} which indicates we're doing an inplace update 
if it's a positive # ?
* {{versionAdd}}
** "Something weird has happened" ... perhaps {{waitForDependentUpdates}} 
should return the {{lastFoundVersion}} so we can use it in this error msg? ... 
"Dependent version not found after waiting: ..."
** "TODO: Should we call waitForDependentUpdates() again?" ... i don't see how 
that would help?  if we get to this point it definitely seems like a "WTF?" 
fail fast situation.
** the way the existing code in this method has been refactored into an "else" 
block (see comment: {{// non inplace update, i.e. full document update}}) makes 
sense, but the way the {{Long lastVersion}} declaration was refactored _out_ of 
that block to reuse with the "if isInPlaceUpdate" side of things is a bit 
confusing and doesn't seem to actaully simplify anything...
*** It's not used after the if/else block, so there's no reason to declare it 
before the "if" statement
*** by definition it must be null in the "else" case, so {{if (lastVersion == 
null)}} will also be true in that code path
*** it seems simpiler to just let both the "if" and "else" branches 
declare/define their own "Long lastVersion" and not risk any confusion about 
why that variable needs to be "shared"
* {{waitForDependentUpdates}}
** javadocs, javadocs, javadocs
** {{fetchFromLeader}} is always true? why not eliminate arg?
** I'm not a concurrency expert, but is this control flow with the sync/wait 
actaully safe? ... my understanding was the conditional check you're waiting on 
should always be a loop inside the sync block?
** even if there are no spurious returns from wait, logging at "info" level 
every 100ms is excessive ... logging that often at trace seems excessive.
*** why don't we log an info once at the start of method (since our of order 
updates should be rare) and once at debug anytime lastFoundVersion changes? 
(diff log message if/when lastFoundVersion is == or > prev)
*** the "Waiting attempt" counter "i" doesn't seem like useful info to log 
given how {{wait(...)}} works
** "...Dropping current update." - that log msg seems missleading, this method 
doesn't do anything to drop the current update, it just assumes the current 
update will be droped later
** i don't really see the point of the {{boolean foundDependentUpdate}} 
variable... why not change the only place where it's set to {{true}} to return 
immediately?
** {{fetchMissingUpdateFromLeader}} can return null, but that possibility isn't 
handled here.
** {{if (uc instanceof AddUpdateCommand)}} ... what if it's not?
*** currently it's just silently ignored
*** is this a viable scenerio that needs accounted for, or an exceptional 
scenerio that should have error checking?
*** looks like maybe it's just a confusing way to do a null check?
** {{((System.nanoTime()-startTime)/1000000 )}} ... that's a missleading thing 
to include in the Exception
*** we didn't wait that long, we waited at most 5 seconds -- who knows how long 
we spent calling {{fetchMissingUpdateFromLeader}} & executing it.
* {{fetchMissingUpdateFromLeader}}
** javadocs, javadocs, javadocs
** should we really be constructing a new HttpSolrClient on the fly like this?
** is this on the fly SolrClient + GenericSolrRequest going to work if/when the 
security auth/acl features are in use in the solr cluster?
** this seems to assume the node that forwarded us the _current_ request (via 
{{get(DISTRIB_FROM)}} is still the leader -- but what if there was a leader 
election?
*** if the leader failed, and a new one was elected, isn't that a pretty 
viable/likeley reason why {{waitForDependentUpdates}} timed-out and needed to 
call {{fetchMissingUpdateFromLeader}} in the first place?
** {{e.printStackTrace();}} ... huge red flag that serious error handling is 
missing here.
** This method seems like it expects the possibility that {{missingUpdates}} 
will contain more then one entry, and if it does contain more then one entry it 
will convert/copy *all* of them into the {{updates}} list -- but then it will 
completley ignore all but the first one.
*** if we don't expect more then 1, why not assert that?
*** if we expect more then one, and they are all important, why don't we return 
{{List<UpdateCommand>}} ?
*** if we expect more then one, but only care about one in particularly -- why 
loop over all of them?
**** how do we know for sure which one in the list is the one we care about?
** ditto comments about {{PeerSync}} & {{UpdateLog}} and creating a {{public 
static AddUpdateCommand updateCommandFromTlog(List tlogEntry)}} somwhere that 
can be reused here as well
** switch statement should have some sort of {{default}} case ... even if i'ts 
just to throw an error because anything but an {{ADD}} or {{UPDATE_INPLACE}} is 
impossible
** need to future proof this code against the posibility of other stuff down 
the road
{panel}

And here are some general overall comments / impressions I got while reviewing 
the code and then edited up once i was all finished...

{panel}
* Given that this patch requires some non-trivial changes to the types of 
records that go in the update log, and requires corisponding changes to 
{{PeerSync}}, it seems like there should definitely be some very explicit 
testing of log reply and peer sync
** ie: {{TestReplay}} and {{PeerSyncTest}} should be updated to include a 
variety of scenerios involving in-place updates
* after seeing how complex & hairy some of the new code has to be around the 
diff handling of "in-place atomic updates", vs existing "atomic updates" (that 
aren't in place) It seems like we should definitely have more test code that 
mixes and matches diff types of "updates"
** static, non randomized, examples of explicit tests we should definitely 
have...
*** a doc gets a sequence of atomic updates each containing multiple "in place" 
inc/set ops on diff fields
*** a doc gets a sequence of atomic updates each containing multiple inc/set 
ops, where a single update may have a mix of "in place" vs "not in place" 
eligable ops (due to diff fields having diff docvalue/stored settings)
** our randomized (cloud and non-cloud) testing of in-place updates should also 
include updates to the canidate docs that may ocasionally not be viable 
"in-place" updates (because they involved updates to additional non-dv fields)
** in all of these tests we should be checking that both the RTG and regualr 
search results make sense
* we also need a lot more testing of various {{deleteById}} and 
{{deleteByQuery}} commands mixed with in-place atomic updates
** both deleteByQuerys against the DV fields used in the in-place atomic 
updates as well as DBQs against other fields in the documents
** test the results of (uncommited) RTG as well as searches when these deletes 
are intermixed with in-place atomic updates
** test the results of peer sync and reply when deletes are mixed with in-place 
atomic updates.
** test that we correctly get {{409}} error codes when trying to do in-place 
updates w/optimistic concurrency after a delete (and vice versa: DBQ/dID afte 
in-place update)
* all logging needs heavily audited
** there's a lot of logging happening at the {{info}} and {{debug}} level that 
should probably be much lower.
** likewise there may be a few existing {{info}} or {{debug}} msgs that might 
be good candidates for {{warn}} or {{error}} level msgs.
* uniqueKey and Exception/log msgs
** there is a lot of log msgs or Exception msgs that cite a version# when 
reporting a problem, but don't include the uniqueKey of the document involved
** these messages aren't going to be remotely useful to end users w/o also 
including the (human readable) uniqueKey of the document involved.
* it feels like we now have a really large number of methods involved in the 
merging/combining/converting of documents to apply atomic updates ("in place" 
or otherwise) ... either for use in RTG, or for use when writing updates to 
disk, or from reading from the tlog, etc...
** the ones that jump out at me w/o digging very hard...{noformat}
RealTimeGetComponent.resolveFullDocument
RealTimeGetComponent.toSolrDoc
RealTimeGetComponent.mergePartialDocWithFullDocFromIndex
UpdateLog.applyPartialUpdates
UpdateLog.applyOlderUpdate
AtomicUpdateDocumentMerger.merge
AtomicUpdateDocumentMerger.doInPlaceUpdateMerge
{noformat}
** i can't help but wonder if there is room for consolidation?
** in many cases these "merge" methods actually delegate to other "merge" 
methods, before/after applying additional logic -- in which case at a minimum 
using {{@link}} or {{@see}} tags in the javadocs to make this (intentional) 
relationship/similarity more obvious would be helpful.
** in cases where methods do _not_ delegate to eachother, or have any 
relationship w/eachother, having {{@link}} mentions of eachother in the 
javadocs to compare/constrast *why* they are different would be equally helpful.
** and who knows -- perhaps in the process of writing these docs we'll find 
good oportunities to refactor/consolidate
{panel}


was (Author: ichattopadhyaya):
New patch fixing all nocommits. Still a few additional tests, which Hoss 
mentioned, are TODO. Here's a stab at replying to Hoss' comments (Maybe I'll 
keep updating this comment itself as and when I fix some of the TODO items 
here):

 {panel:title=JettySolrRunner}
* javadocs, javadocs, javadocs [FIXED]
{panel}

{panel:title=XMLLoader + JavabinLoader}
* why is this param checks logic duplicated in these classes? [Not sure what 
you mean here, I just set the prevVersion to the cmd here now]
* why not put this in DUP (which already has access to the request params) when 
it's doing it's "FROMLEADER" logic? [Since commitWithin and overwrite was being 
set here, I thought this is an appropriate place to set the prevVersion to the 
cmd]
{panel}

{panel:title=AddUpdateCommand}
* these variables (like all variables) should have javadocs explaining what 
they are and what they mean [FIXED]
** people skimming a class shouldn't have to grep the code for a variable name 
to understand it's purpose
* having 2 variables here seems like it might be error prone?  what does it 
mean if {{prevVersion < 0 && isInPlaceUpdate == true}} ? or {{0 < prevVersion 
&& isInPlaceUpdate == false}} ? [FIXED: Now just have one variable]
** would it make more sense to use a single {{long prevVersion}} variable and 
have a {{public boolean isInPlaceUpdate()}} that simply does {{return (0 < 
prevVersion); }} ? [FIXED]
{panel}

{panel:title=TransactionLog}
* javadocs for both the new {{write}} method and the existig {{write}} method  
[FIXED]
** explain what "prevPointer" means and note in the 2 arg method what the 
effective default "prevPoint" is.
* we should really have some "int" constants for refering to the List indexes 
involved in these records, so instead of code like {{entry.get(3)}} sprinkled 
in various classes like UpdateLog and PeerSync it can be smething more readable 
like {{entry.get(PREV_VERSION_IDX)}}  [TODO]
{panel}


{panel:title=UpdateLog}
* javadocs for both the new {{LogPtr}} constructure and the existing 
constructor [FIXED]
** explain what "prevPointer" means and note in the 2 arg constructure what the 
effective default "prevPoint" is.  [FIXED]
* {{add(AddUpdateCommand, boolean)}}
** this new code for doing lookups in {{map}}, {{prevMap}} and {{preMap2}} 
seems weird to me (but admitedly i'm not really an expert on UpdateLog in 
general and how these maps are used
** what primarily concerns me is what the expected behavior is if the "id" 
isn't found in any of these maps -- it looks like prevPointer defaults to "-1" 
regardless of whether this is an inplace update ... is that intentional? ... is 
it possible there are older records we will miss and need to flag that?  [Yes, 
this was intentional, and I think it doesn't make any difference. If an "id" 
isn't found in any of these maps, it would mean that the previous update was 
committed and should be looked up in the index. ]
** ie: do we need to worry about distinguising here between "not an in place 
update, therefore prePointer=-1" vs "is an in place update, but we can't find 
the prevPointer" ?? [I think we don't need to worry. Upon receiving a 
prevPointer=-1 by whoever reads this LogPtr, it should be clear why it was -1: 
if the command's {{flags|UpdateLog.UPDATE_INPLACE}} is set, then this command 
is an in-place update whose previous update is in the index and not in the 
tlog; if that flag is not set, it is not an in-place update at all, and don't 
bother about the prevPointer value at all (which is -1 as a dummy value).]
** assuming this code is correct, it might be a little easier to read if it 
were refactored into something like:{code}
// nocommit: jdocs
private synchronized long getPrevPointerForUpdate(AddUpdateCommand cmd) {
  // note: sync required to ensure maps aren't changed out form under us
  if (cmd.isInPlaceUpdate) {
    BytesRef indexedId = cmd.getIndexedId();
    for (Map<BytesRef,TransactionLog> currentMap : Arrays.asList(map, prevMap, 
prevMap2)) {
      LogPtr prevEntry = currentMap.get(indexedId);
      if (null != prevEntry) {
        return prevEntry.pointer;
      }
    }
  }
  return -1; // default when not inplace, or if we can't find a previous entry
}
{code} [FIXED: Refactored into something similar to above]
* {{applyPartialUpdates}}
** it seems like this method would be a really good candidate for some direct 
unit testing? [TODO]
*** ie: construct a synthetic UpdateLog, and confirm applyPartialUpdates does 
the right thing
** the sync block in this method, and how the resulting {{lookupLogs}} list is 
used subsequently, doesn't seem safe to me -- particularly the way 
{{getEntryFromTLog}} calls incref/decref on each TransactionLog as it loops 
over that list...
*** what prevents some other thread from decref'ing one of these TransactionLog 
objects (and possibly auto-closing it) in between the sync block and the incref 
in getEntryFromTLog?
**** (most existing usages of TransactionLog.incref() seem to be in blocks that 
sync on the UpdateLog -- and the ones that aren't in sync blocks look sketchy 
to me as well)
*** in general i'm wondering if {{lookupLogs}} should be created outside of the 
while loop, so that there is a consistent set of "logs" for the duration of the 
method call ... what happens right now if some other thread changes 
tlog/prevMapLog/prevMapLog2 in between iterations of the while loop?
** shouldn't we make some sanity check assertions about the results of 
getEntryFromTLog? -- there's an INVALID_STATE if it's not an ADD or a list of 5 
elements, but what about actually asserting that it's either an ADD or an 
UPDATE_INPLACE? ... [FIXED] what about asserting the doc's uniqueKey matches? 
[We could do that, but I think it is not necessary]
*** (because unless i'm missing something, it's possible for 2 docs to have the 
same version, so if there is a glitch in the pointer we can't just trust the 
version check can we?) [I think we can trust a document to be of the same id if 
the version matches. It is possible for 2 docs to have same version, but then 
they would have to be from two different shards altogether. Since all of this 
processing is happening within a particular replica (which obviously belongs to 
only one shard), I think we can get away safely without asserting the id and 
just relying on the version.]
** {{partialUpdateEntry}} seems like a missleading variable name ... can't it 
be *either* a full document, or partial update (not in place), or an 
UPDATE_INPLACE partial update? [FIXED: calling it entry now]
** if there is only 1 way to {{break}} out of this while loop, then the method 
would probably be easier to make sense of if the {{applyOlderUpdate}} and 
{{return 0}} calls replaced the {{break}} statement [FIXED]
** semi-related: {{while (true)}} is generally a red flag: it seems like might 
be better if it was refactored inot a {{while (0 <= prevPointer)}} loop?
* {{getEntryFromTLog}} [FIXED]
** I don't really see the point of using {{get\(i\)}} over and over .. why not 
a simple {{for (TransactionLog log : lookupLogs)}} ? [FIXED]
** why is the Exception & Error handling here just a debug log? shouldn't that 
be a pretty hard fail? [This shouldn't be a hard fail. This is basically a seek 
operation on a transaction log at the specified position. If the seek results 
in an exception due to unable to deserialize the tlog entry, we can ignore it, 
since it just means we were looking up the wrong tlog. I have added a comment 
to this effect in the catch block.]
** as mentioned above re: {{applyPartialUpdates}}, isn't is possible for 2 diff 
docs to have the same version? if we get unlucky and those 2 docs, with 
identical versions, at the same position in diff TransactionLog files then 
isn't a sanity check of the doc ID in {{applyPartialUpdates}} too late? ... if 
{{applyPartialUpdates}} tries again it's just going to keep getting the same 
(wrong) document.  It seems like this method actaully needs to know the ID it's 
looking for, and "skip" any entries thta don't match, checking the next (older) 
{{TransactionLog}} [I think this is not a likely scenario, since in a given 
replica, a version should be unique to a document (I think we have bigger 
problems if this assumption isn't true).]
* {{lookupPartialUpdates}}
** what is the purpose/intent of this method? ... it seems to be unused. 
[FIXED: Removed]
* {{doReplay}}
** switch statement ordering...  [FIXED: I'll add it to my knowledge!]
*** in theory, switch statement cases should be ordered from most likeley to 
least likely (it's a microoptimization, but in heavily executed loops it might 
be worth it)
*** so i wouldn't inject UPDATE_INPLACE at the begining of the switch -- it 
should definitely come after ADD, probably best to put it at the end of the list
** why is {{entry.get(2)}} commented out? and why does it say "idBytes" ? ... 
isn't slot #2 the prevPointer?  copy paste confusion from "ADD" ?  [FIXED: 
True, it was copy-paste confusion from ADD. Removed the commented out line.]
*** if slot#2 really isn't needed in this code, get rid of the missleading 
comment about idBytes and replace it with an explicit comment that prevVersion 
isn't needed for reply. [FIXED: I have removed the spurious commented out 
lines, refactored that part into a updateCommandFromTlog() method. Does it 
address your concern here?]
{panel}

{panel:title=PeerSync}
* ditto comments about switch statement ordering from above comments about 
{{UpdateLog}} [FIXED]
* a lot of code here looks duplicated straight from {{UpdateLog.doReplay}}
** I realize that's true for the other case values as well, but bad code 
shouldn't be emulated
** lets refactor this duplicated code into a new {{public static 
AddUpdateCommand updateCommandFromTlog(List tlogEntry)}} method in 
{{UpdateLog}} and re-use it here. [FIXED]
* {{log.info}} looks wrong here ... especially inside of an {{if (debug)}} 
block ... pretty sure this should be {{log.debug}} like the other case blocks 
[FIXED]
{panel}

{panel:title=DirectUpdateHandler2}
* I don't really understand why we need any schema/DocValue checks here? [This 
was unnecessary and I've removed it. I have done a somewhat related refactoring 
to the AddUpdateCommand.getLuceneDocument(boolean isInPlace) to now only 
generate a Lucene document that has docValues. This was needed because the 
lucene document that was originally being returned had copy fields targets of 
id field, default fields, multiple Field per field (due to 
FieldType.createFields()) etc., which are not needed for in-place updates.]
* If {{cmd.isInPlaceUpdate}} is true, then doesn't that mean the update is by 
definition an in place update that only contains values for DV fields? ... 
wasn't it the responsibility of the upstream code that set that value to true 
to ensure that? [True, it was. However, copy field targets of id field, default 
fields etc. were added in this doNormalAdd() method itself due to 
cmd.getLuceneDocument(); I have overloaded that method to tackle the needs of 
in-place updates and filter out such unnecessary fields being added to the 
lucene doc]
** if not, why can't it be? (ie: why can't we move the schema checks upstream, 
and out of the sync block here?
* if that's not what {{cmd.isInPlaceUpdate}} means, then why isn't there any 
error handling here in the event that non-DV field/values are found in the 
luceneDocument? ... doesn't that mean we need to fall back to the original 
{{writer.updateDocument}} call?
* If it is neccessary to do SchemaField validation here for some reason, then 
shouldn't it be the exact same validation done in 
{{AtomicUpdateDocumentMerger.isSupportedForInPlaceUpdate}} ? [We should do all 
schema validation there only, I have removed everything from here, except for 
some filtering logic at cmd.getLuceneDocument()]
{panel}

{panel:title=AtomicUpdateDocumentMerger}
* {{isSupportedForInPlaceUpdate}}
** shouldn't this method either be static or final?  the rules don't change if 
someone subclasses AtomicUpdateDocumentMerger do they?
** why isn't {{TrieDateField}} also valid? ... could this just be checking for 
{{instanceof TrieField}} ?
*** particularly suspicious since {{doInPlaceUpdateMerge}} does in fact check 
{{instanceof TrieField}}
** if the intent is truely to only support "numerics" then, why not 
{{instanceof NumericValueFieldType}} ?
** shouldn't we also check "not-indexed" and "not-stored" here as well?
* {{doInPlaceUpdateMerge}}
** why does this method have 3 args?  can't all of the neccessary info be 
deterined from the {{AddUpdateCommand}} ?
** this method seems like another good candidate for some explicit unit 
testing...
*** build up an index & tlog with some explicitly crated non trivial 
docs/updates, then call this method with a variety of inputs and assert the 
expected modifications to the {{AddUpdateCommand}} (or assert no modifications 
if they aren't viable in place update candaites
*** then hard commit everything in the tlog and assert that all the same calls 
return the exact same output/modifications.
** we should probably continue to assume the common case is _not_ to need 
(in-place) updates ... just regular adds.  in which case anything we can do to 
short circut out faster -- before any checks that require stuff like 
{{SchemaField}} -- wouldn't reordering the checks in the loop to something like 
this be equivilent to what we have currently but faster in the common case? 
...{code}
for (SolrInputField f : sdoc) {
  final String fname = f.getName();
  if (idField.getName().equals(fname)) {
    continue;
  }
  if (! f.getValue() instanceof Map) {
    // doc contains field value that is not an update, therefore definitely not 
an inplace update
    return false;
  }
  if (!isSupportedForInPlaceUpdate(schema.getField(fname))) {
    // doc contains update to a field that does not support in place updates
    return false;
  }
}
{code}
*** Even if i've overloked something and that code isn't better, i think in 
general the "is this sdoc a candidate for in place updating" logic should be 
refactored into a public static helper method that has some unit tests.
** this method calls {{RealTimeGetComponent.getInputDocumentFromTlog}} but 
doesn't check for the special {{DELETED}} return value...
*** definitely smells like a bug ... there are many assumptions made about 
{{uncommittedDoc}} as long as it's non-null
*** based on this, now i really want to see more testing of in place updates 
mixed with document deletions:
**** some explicit single node testing of "add docX, delete docX, do a 
'set':'42' on docX"
**** introduce some randomized deleteById calls into the randomized 
single/multi node tests
** this method calls {{RealTimeGetComponent.getVersionFromTlog}} which has docs 
that say "If null is returned, it could still be in the latest index."
*** i don't see any code accounting for that possibility, cmd.prevVersion is 
just blindly assigned the null in that case ... which could lead to an NPE 
since it's declared as {{public long prevVersion}}
*** the fact that this hasn't caused an NPE in testing is a red flag that there 
is a code path not being tested here ... but at a glance i can't really tell 
what it is?
** In general, I guess I don't really understand why this method is jumping 
through as many hoops as it is with the RTG code?
*** it seems to duplicate a lot of functionality already in 
{{RealTimeGetComponent.getInputDocument}} ... why not just use that method?
*** if the concern is avoiding the {{searcher.doc(docid)}} call to get all 
stored fields, perhaps {{RealTimeGetComponent.getInputDocument}} could be 
refactored to make it easier to re-use most of the functionality here? ... not 
sure what would make the most sense off the top of my head.
*** at a minimum it seems like using 
{{SolrIndexSearcher.decorateDocValueFields(...)}} would make more sense then 
doing it ourselves as we loop over the fields -- we can even pass in the 
explicit list of field names we know we care about based on the 
SolrInputDocument (or even just the field names we know use "inc" if that's all 
we really need)
**** (Or is there something i'm missing about why using 
{{decorateDocValueFields}} would be a mistake here?)
** in the switch statement, shouldn't we be using the {{doSet}} and {{doInc}} 
methods to actaully cary out the operations?
*** that would simplify the "inc" case a lot
** the {{default}} on the switch statement looks sketchy to me ... i understand 
that only "inc" and "set" are supported, but why does this method only {{warn}} 
if it sees something else? shouldn't this be a hard failure?
*** for that matter: shouldn't the {{instanceof Map}} check when looping over 
the fields at the begining of the method short circut out if the Map contains 
an operation that isn't one of the supported "in place update" operations?
*** in fact: if we pre-checked the Maps only contained "set" and "inc", and 
used something like {{decorateDocValueFields}} (or did the equivilent ourselves 
in a smaller loop) then couldn't we also simplify this method a lot by just 
delegating to the existing {{merge(SolrInputDocument,SolrInputDocument)}} 
method?
** these assumptions seem sketchy, if that's the only reason for these "else" 
blocks then let's include some {{asert fieldName.equals(...)}} calls to prove 
it...{code}
          } else { // for version field, which is a docvalue but there's no 
set/inc operation
            ...
          }
        } else { // for id field
          ...
{code}
*** in particluar i'm curious about the {{VERSION_FIELD}}...
**** this method is only called from one place -- 
{{DistributedUpdateProcessor.getUpdatedDocument}} -- and in the existing code 
of that method, when a {{SolrInputDocument}} is fetched from 
{{RealTimeGetComponent}}, the {{VERSION_FIELD}} is explicitly removed from it 
before using it & returning.
**** should this method also be explicitly removing the {{VERSION_FIELD}} 
field?  and/or should the caller ({{getUpdatedDocument}}) be removing it 
consistently before returning?
{panel}


{panel:title=RealTimeGetComponent}
* {{process}}
** I like this {{SearcherInfo}} refactoring, but a few suggestions:
*** it should be promoted into a (private) static (inner) class ... no need for 
a new class instance every time {{RealTimeGetComponent.process}} is called.
*** add a one arg constructor and pass the SolrCore to that.
*** javadocs, javadocs, javadocs .... note that it's not thread safe
*** let's make {{searcherHolder}} and {{searcher}} private, and replace direct 
{{searcher}} access with:{code}
public SolrIndexSearcher getSearcher() {
  assert null != searcher : "init not called!";
  return searcher;
}
{code}
** in the switch statement, it seems like there is a lot of code duplicated 
between the {{ADD}} and {{UPDATE_INPLACE}} cases
*** why not consolidate those cases into one block of code using (a modified) 
{{resolveFullDocument}} which can start with a call to {{toSolrDoc(...)}} and 
then return immediately if the entry is {{UpdateLog.ADD}} ?
* {{resolveFullDocument}}
** see comments above about modifying this method to call {{toSolrDocument}} 
itself rather then expecting as input, and return early if the {{entry}} is an 
{{UpdateLog.ADD}}
** let's put an {{assert 0==lastPrevPointer}} in this else block in case 
someone improves/breaks {{ulog.applyPartialUpdates}} to return {{-2}} in the 
future...{code}
    } else { // i.e. lastPrevPointer==0
{code}
** since {{ulog.applyPartialUpdates(...)}} is a No-Op when {{prevPointer == 
-1}}, can't we remove the redundent calls to 
{{mergePartialDocWithFullDocFromIndex}} & {{reopenRealtimeSearcherAndGet}} 
before and after calling {{ulog.applyPartialUpdates(...)}} ... ie:{code}
    long prevPointer = (long) logEntry.get(2);
    long prevVersion = (long) logEntry.get(3);
    // this is a No-Op if prevPointer is already negative, otherwise...
    // get the last full document from ulog
    prevPointer = ulog.applyPartialUpdates(idBytes, prevPointer, prevVersion, 
partialDoc);
    if (-1 == prevPointer) {
      ...
    } else if (0 < prevPointer) {
      ...
    } else {
      assert 0 == prevPointer;
      ...
    }
{code}
*** If there is some reason i'm not seeing why it's important to call 
{{mergePartialDocWithFullDocFromIndex}} & {{reopenRealtimeSearcherAndGet}} 
before calling {{ulog.applyPartialUpdates}}, then perhaps we should at least 
refactor the "if mergedDoc == null, return reopenRealtimeSearcherAndGet" logic 
into {{mergePartialDocWithFullDocFromIndex}} since that's the only way it's 
ever used.
* {{mergePartialDocWithFullDocFromIndex}}
** since this is a private method, what's the expected usage of 
{{docidFromIndex}} ? ... it's never used, so can we refactor it away?
** see previous comment about refactoring this method to automatically return 
{{reopenRealtimeSearcherAndGet(...)}} when it would otherwise return null
* {{reopenRealtimeSearcherAndGet}}
** javadocs, javadocs, javadocs
** since this method is only used in conjunction with 
{{mergePartialDocWithFullDocFromIndex}}, if the code is refactored so that 
{{mergePartialDocWithFullDocFromIndex}} calls this method directly (see 
suggestion above), we could make a small micro-optimization by changing the 
method sig to take in a {{Term}} to (re)use rather then passing in {{idBytes}} 
and calling {{core.getLatestSchema().getUniqueKeyField()}} twice.
** re: the {{INVALID_STATE}} ... is that really a fatal error, or should this 
method be accounting for the possibility of a doc that has been completley 
deleted (or was never in the index) in a diff way?
* {{getInputDocumentFromTlog}}
** lets put an explicit comment here noting that we are intentionally falling 
through to the {{Updatelog.ADD}} case
* {{getVersionFromTlog}}
** based on it's usage, should this method be changed to return {{-1}} instead 
of null?  ... not clear to me from the given caller usage ... if so should be 
be declared to return {{long}} instead of {{Long}} ?
{panel}


{panel:title=DistributedUpdateProcessor}
* Similar question from AddUpdateCommand: do we really need 2 distinct params 
here, or would it be cleaner / less error-prone to have a single 
{{distrib.inplace.prevversion}} which indicates we're doing an inplace update 
if it's a positive # ?
* {{versionAdd}}
** "Something weird has happened" ... perhaps {{waitForDependentUpdates}} 
should return the {{lastFoundVersion}} so we can use it in this error msg? ... 
"Dependent version not found after waiting: ..."
** "TODO: Should we call waitForDependentUpdates() again?" ... i don't see how 
that would help?  if we get to this point it definitely seems like a "WTF?" 
fail fast situation.
** the way the existing code in this method has been refactored into an "else" 
block (see comment: {{// non inplace update, i.e. full document update}}) makes 
sense, but the way the {{Long lastVersion}} declaration was refactored _out_ of 
that block to reuse with the "if isInPlaceUpdate" side of things is a bit 
confusing and doesn't seem to actaully simplify anything...
*** It's not used after the if/else block, so there's no reason to declare it 
before the "if" statement
*** by definition it must be null in the "else" case, so {{if (lastVersion == 
null)}} will also be true in that code path
*** it seems simpiler to just let both the "if" and "else" branches 
declare/define their own "Long lastVersion" and not risk any confusion about 
why that variable needs to be "shared"
* {{waitForDependentUpdates}}
** javadocs, javadocs, javadocs
** {{fetchFromLeader}} is always true? why not eliminate arg?
** I'm not a concurrency expert, but is this control flow with the sync/wait 
actaully safe? ... my understanding was the conditional check you're waiting on 
should always be a loop inside the sync block?
** even if there are no spurious returns from wait, logging at "info" level 
every 100ms is excessive ... logging that often at trace seems excessive.
*** why don't we log an info once at the start of method (since our of order 
updates should be rare) and once at debug anytime lastFoundVersion changes? 
(diff log message if/when lastFoundVersion is == or > prev)
*** the "Waiting attempt" counter "i" doesn't seem like useful info to log 
given how {{wait(...)}} works
** "...Dropping current update." - that log msg seems missleading, this method 
doesn't do anything to drop the current update, it just assumes the current 
update will be droped later
** i don't really see the point of the {{boolean foundDependentUpdate}} 
variable... why not change the only place where it's set to {{true}} to return 
immediately?
** {{fetchMissingUpdateFromLeader}} can return null, but that possibility isn't 
handled here.
** {{if (uc instanceof AddUpdateCommand)}} ... what if it's not?
*** currently it's just silently ignored
*** is this a viable scenerio that needs accounted for, or an exceptional 
scenerio that should have error checking?
*** looks like maybe it's just a confusing way to do a null check?
** {{((System.nanoTime()-startTime)/1000000 )}} ... that's a missleading thing 
to include in the Exception
*** we didn't wait that long, we waited at most 5 seconds -- who knows how long 
we spent calling {{fetchMissingUpdateFromLeader}} & executing it.
* {{fetchMissingUpdateFromLeader}}
** javadocs, javadocs, javadocs
** should we really be constructing a new HttpSolrClient on the fly like this?
** is this on the fly SolrClient + GenericSolrRequest going to work if/when the 
security auth/acl features are in use in the solr cluster?
** this seems to assume the node that forwarded us the _current_ request (via 
{{get(DISTRIB_FROM)}} is still the leader -- but what if there was a leader 
election?
*** if the leader failed, and a new one was elected, isn't that a pretty 
viable/likeley reason why {{waitForDependentUpdates}} timed-out and needed to 
call {{fetchMissingUpdateFromLeader}} in the first place?
** {{e.printStackTrace();}} ... huge red flag that serious error handling is 
missing here.
** This method seems like it expects the possibility that {{missingUpdates}} 
will contain more then one entry, and if it does contain more then one entry it 
will convert/copy *all* of them into the {{updates}} list -- but then it will 
completley ignore all but the first one.
*** if we don't expect more then 1, why not assert that?
*** if we expect more then one, and they are all important, why don't we return 
{{List<UpdateCommand>}} ?
*** if we expect more then one, but only care about one in particularly -- why 
loop over all of them?
**** how do we know for sure which one in the list is the one we care about?
** ditto comments about {{PeerSync}} & {{UpdateLog}} and creating a {{public 
static AddUpdateCommand updateCommandFromTlog(List tlogEntry)}} somwhere that 
can be reused here as well
** switch statement should have some sort of {{default}} case ... even if i'ts 
just to throw an error because anything but an {{ADD}} or {{UPDATE_INPLACE}} is 
impossible
** need to future proof this code against the posibility of other stuff down 
the road
{panel}

And here are some general overall comments / impressions I got while reviewing 
the code and then edited up once i was all finished...

{panel}
* Given that this patch requires some non-trivial changes to the types of 
records that go in the update log, and requires corisponding changes to 
{{PeerSync}}, it seems like there should definitely be some very explicit 
testing of log reply and peer sync
** ie: {{TestReplay}} and {{PeerSyncTest}} should be updated to include a 
variety of scenerios involving in-place updates
* after seeing how complex & hairy some of the new code has to be around the 
diff handling of "in-place atomic updates", vs existing "atomic updates" (that 
aren't in place) It seems like we should definitely have more test code that 
mixes and matches diff types of "updates"
** static, non randomized, examples of explicit tests we should definitely 
have...
*** a doc gets a sequence of atomic updates each containing multiple "in place" 
inc/set ops on diff fields
*** a doc gets a sequence of atomic updates each containing multiple inc/set 
ops, where a single update may have a mix of "in place" vs "not in place" 
eligable ops (due to diff fields having diff docvalue/stored settings)
** our randomized (cloud and non-cloud) testing of in-place updates should also 
include updates to the canidate docs that may ocasionally not be viable 
"in-place" updates (because they involved updates to additional non-dv fields)
** in all of these tests we should be checking that both the RTG and regualr 
search results make sense
* we also need a lot more testing of various {{deleteById}} and 
{{deleteByQuery}} commands mixed with in-place atomic updates
** both deleteByQuerys against the DV fields used in the in-place atomic 
updates as well as DBQs against other fields in the documents
** test the results of (uncommited) RTG as well as searches when these deletes 
are intermixed with in-place atomic updates
** test the results of peer sync and reply when deletes are mixed with in-place 
atomic updates.
** test that we correctly get {{409}} error codes when trying to do in-place 
updates w/optimistic concurrency after a delete (and vice versa: DBQ/dID afte 
in-place update)
* all logging needs heavily audited
** there's a lot of logging happening at the {{info}} and {{debug}} level that 
should probably be much lower.
** likewise there may be a few existing {{info}} or {{debug}} msgs that might 
be good candidates for {{warn}} or {{error}} level msgs.
* uniqueKey and Exception/log msgs
** there is a lot of log msgs or Exception msgs that cite a version# when 
reporting a problem, but don't include the uniqueKey of the document involved
** these messages aren't going to be remotely useful to end users w/o also 
including the (human readable) uniqueKey of the document involved.
* it feels like we now have a really large number of methods involved in the 
merging/combining/converting of documents to apply atomic updates ("in place" 
or otherwise) ... either for use in RTG, or for use when writing updates to 
disk, or from reading from the tlog, etc...
** the ones that jump out at me w/o digging very hard...{noformat}
RealTimeGetComponent.resolveFullDocument
RealTimeGetComponent.toSolrDoc
RealTimeGetComponent.mergePartialDocWithFullDocFromIndex
UpdateLog.applyPartialUpdates
UpdateLog.applyOlderUpdate
AtomicUpdateDocumentMerger.merge
AtomicUpdateDocumentMerger.doInPlaceUpdateMerge
{noformat}
** i can't help but wonder if there is room for consolidation?
** in many cases these "merge" methods actually delegate to other "merge" 
methods, before/after applying additional logic -- in which case at a minimum 
using {{@link}} or {{@see}} tags in the javadocs to make this (intentional) 
relationship/similarity more obvious would be helpful.
** in cases where methods do _not_ delegate to eachother, or have any 
relationship w/eachother, having {{@link}} mentions of eachother in the 
javadocs to compare/constrast *why* they are different would be equally helpful.
** and who knows -- perhaps in the process of writing these docs we'll find 
good oportunities to refactor/consolidate
{panel}

> Support updates of numeric DocValues
> ------------------------------------
>
>                 Key: SOLR-5944
>                 URL: https://issues.apache.org/jira/browse/SOLR-5944
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ishan Chattopadhyaya
>            Assignee: Shalin Shekhar Mangar
>         Attachments: DUP.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, 
> TestStressInPlaceUpdates.eb044ac71.beast-167-failure.stdout.txt, 
> TestStressInPlaceUpdates.eb044ac71.beast-587-failure.stdout.txt, 
> TestStressInPlaceUpdates.eb044ac71.failures.tar.gz, 
> hoss.62D328FA1DEA57FD.fail.txt, hoss.62D328FA1DEA57FD.fail2.txt, 
> hoss.62D328FA1DEA57FD.fail3.txt, hoss.D768DD9443A98DC.fail.txt, 
> hoss.D768DD9443A98DC.pass.txt
>
>
> LUCENE-5189 introduced support for updates to numeric docvalues. It would be 
> really nice to have Solr support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5944) Support updates of numeric DocValues

Reply via email to