I was initially thinking about the case where the splits change between the
job setup and the Map execution, but given more thought I think I went down
the wrong path. Tablet splitting should not affect the overall range of
keys for the MR job. If a Tablet splits after the job computes the splits,
but before the Map is run, then that Map will just scan multiple tablets.

On Tue, Apr 19, 2022 at 5:33 AM Christopher <[email protected]> wrote:

> Isolation should only give you consistency within a row, to ensure you're
> not scanning over partial changes from a mutation that is currently being
> written to a row. It shouldn't have anything to do with compactions or
> missing data that has already been written before the MapReduce scan has
> started.
>
> Splits shouldn't cause you to miss data either. It's been awhile since I
> looked, but I believe the MapReduce APIs simply break up a table into
> separate ranges to scan based on current tablet boundaries. If there are
> splits, then all that means is that some of the ranges will span across
> more than one tablet, but that's fine... a scan is a scan... scans don't
> need to be limited to a single tablet.
>
> Compactions could cause missed data if they transform the data in some way,
> but otherwise, I wouldn't expect them to.
>
> Are you seeing any error messages anywhere?
>
> On Mon, Apr 18, 2022, 15:23 Vincent Russell <[email protected]>
> wrote:
>
> > Hi Dave,
> >
> > Yes we are using the new MapReduce API, but we are not setting any
> > settings for isolated scan so we are using whatever the default is.
> >
> > Thanks,
> > Vincent
> >
> > On Mon, Apr 18, 2022 at 3:12 PM Dave Marion <[email protected]> wrote:
> >
> > > Major compactions should not move rows to new tablets, but a tablet
> split
> > > could. Are you using the new MapReduce API introduced in 2.0? Are you
> > > setting it to use an isolated scan?
> > >
> > > On Mon, Apr 18, 2022 at 3:01 PM Vincent Russell <
> > [email protected]
> > > >
> > > wrote:
> > >
> > > > Hello All,
> > > >
> > > > Could major compactions that occur while a map reduce job is running
> > > cause
> > > > the map reduce job to miss records because rows have been moved to a
> > > > different tablet?
> > > >
> > > > How does this work?
> > > >
> > > > I'm using accumulo 2.0.1
> > > >
> > > > Thank you,
> > > > Vincent
> > > >
> > >
> >
>

Reply via email to