I was initially thinking about the case where the splits change between the job setup and the Map execution, but given more thought I think I went down the wrong path. Tablet splitting should not affect the overall range of keys for the MR job. If a Tablet splits after the job computes the splits, but before the Map is run, then that Map will just scan multiple tablets.
On Tue, Apr 19, 2022 at 5:33 AM Christopher <[email protected]> wrote: > Isolation should only give you consistency within a row, to ensure you're > not scanning over partial changes from a mutation that is currently being > written to a row. It shouldn't have anything to do with compactions or > missing data that has already been written before the MapReduce scan has > started. > > Splits shouldn't cause you to miss data either. It's been awhile since I > looked, but I believe the MapReduce APIs simply break up a table into > separate ranges to scan based on current tablet boundaries. If there are > splits, then all that means is that some of the ranges will span across > more than one tablet, but that's fine... a scan is a scan... scans don't > need to be limited to a single tablet. > > Compactions could cause missed data if they transform the data in some way, > but otherwise, I wouldn't expect them to. > > Are you seeing any error messages anywhere? > > On Mon, Apr 18, 2022, 15:23 Vincent Russell <[email protected]> > wrote: > > > Hi Dave, > > > > Yes we are using the new MapReduce API, but we are not setting any > > settings for isolated scan so we are using whatever the default is. > > > > Thanks, > > Vincent > > > > On Mon, Apr 18, 2022 at 3:12 PM Dave Marion <[email protected]> wrote: > > > > > Major compactions should not move rows to new tablets, but a tablet > split > > > could. Are you using the new MapReduce API introduced in 2.0? Are you > > > setting it to use an isolated scan? > > > > > > On Mon, Apr 18, 2022 at 3:01 PM Vincent Russell < > > [email protected] > > > > > > > wrote: > > > > > > > Hello All, > > > > > > > > Could major compactions that occur while a map reduce job is running > > > cause > > > > the map reduce job to miss records because rows have been moved to a > > > > different tablet? > > > > > > > > How does this work? > > > > > > > > I'm using accumulo 2.0.1 > > > > > > > > Thank you, > > > > Vincent > > > > > > > > > >
