Hi Andrew,

Thank you for sharing your detailed views.

Originally my idea was to try to do a complete fix to this issue based on
the existing implementation after working through this phase.
But as we discussed before, LRUCache cannot be rolled back, so this may
come at the cost of performance (can't roll back to any positions),
and the process will be painful (we do have a lot of seek to somewhere code
, need to take care them all).
So, I agree your point:

> We read from the WAL stream into the buffer. If the reader is reset, the
> buffer is reset.

I think that would be perfect if we complete such an implementation, this
problem will be fundamentally solved.


Andrew Purtell <apurt...@apache.org> 于2022年3月17日周四 01:21写道:

> I wish email had an edit button. Let me rephrase
>
> I think what we need is an intermediate buffer that the reader fills with
> the WALedit contents. We read from the WAL stream into the buffer. If the
> reader is reset, the buffer is reset. Once the reader fully reads in a
> WALedit, it would pass the complete WALedit serialization on. Beacuse the
> WALedit can be known to be complete, there will be no rollback issues with
> compression.
>
> Earlier I said "WAL codec" but that would not be correct, because the codec
> is where we would attempt to decompress. We need a stage of WAL reading
> that comes before any deserialization. That stage knows somehow how many
> bytes to read in from the stream. That stage reads in exactly that number
> of bytes, rewinding etc as necessary until successful. Once that stage is
> finished, then the WALedit can be deserialized.
>
> Hope that is better.
>
> On Wed, Mar 16, 2022 at 10:16 AM Andrew Purtell <apurt...@apache.org>
> wrote:
>
> > When I did the WAL value compression work (HBASE-25869) I tested the
> > resulting code with an integration scenario with replication active. I
> > initially had an issue with unrecoverable errors during rewind, but it
> was
> > due to an error in my implementation and was corrected (HBASE-25994).
> After
> > that the integration tests would pass.
> >
> > However overall I would say we lack test coverage in this area. We should
> > have a test scenario that is better at introducing short reads.
> >
> > I would agree there is an overall weakness with the WAL compression
> > feature design. 唐天航 expressed it well:
> >
> > > Because theoretically we can roll back
> > > the position to any position at any time, but the LRUCache in the
> > > corresponding LRUDictionary should also be rolled back, otherwise the
> > read
> > > and write link behavior may be inconsistent. But LRUCache can't roll
> back
> > > at all...
> >
> > With WAL value compression I address this scenario by resetting the
> > compressor for each value. It affects the potential compression ratio a
> > lot, though, and therefore is really only useful when dealing with large
> > cells that are like documents, or with zstandard compression using a
> > pre-trained dictionary.
> >
> > WAL compression would not be very effective if we reset the dictionary
> for
> > each cell.
> >
> > I think what we need is an intermediate buffer that the reader fills with
> > the WALedit contents. The WAL codec reads from the WAL stream into the
> > buffer. If the reader is reset, the buffer is reset. Once the WAL codec
> > fully reads in a WALedit, it would pass the complete WALedit
> serialization
> > on. Beacuse the WALedit can be known to be complete, there will be no
> > rollback issues with compression.
> >
> > I believe people have been trying to avoid the extra buffering of WALedit
> > contents because of the inefficiency of the extra data copy, but it seems
> > to be a correctness problem not to do it...
> >
> >
> >
> > On Wed, Mar 16, 2022 at 7:40 AM Xiaolin Ha <summer.he...@gmail.com>
> wrote:
> >
> >> We have also used replication+wal compression in our production
> clusters.
> >> It really seems unstable when replicating compressed cells.
> >> I'm digging with the log queue overstock issues, and pointed out one in
> >> issue HBASE-26843, which might be related to HBASE-26849.
> >> We can discuss more details if you are interested in this problem.
> >> Thanks.
> >>
> >> Regards
> >>
> >> 张铎(Duo Zhang) <palomino...@gmail.com> 于2022年3月16日周三 18:36写道:
> >>
> >> > +1 on updating doc first. You can file an issue for the documentation
> >> > change, and let's also send an NOTICE email to both dev and user list
> to
> >> > warn our users about this.
> >> >
> >> > 唐天航 <tangtianhang...@gmail.com> 于2022年3月16日周三 18:08写道:
> >> >
> >> > > If we only reset the position to the head, yes we can fix it.
> >> > > In fact, 26849 is to fix the problem in this scenario.
> >> > > But unfortunately, we have some other scenarios where we roll back
> the
> >> > > position to some intermediate position, such as
> >> > ProtobufLogReader.java#L381
> >> > > <
> >> >
> >>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java
> >> > >
> /org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java#L381
> >> > > <
> >> >
> >>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java#L381
> >> > >
> >> > > >
> >> > > I think we cannot rollback the LRUCache too...
> >> > > While my cluster works fine after 26849, the fix is still
> >> theoretically
> >> > > incomplete.
> >> > >
> >> > > 张铎(Duo Zhang) <palomino...@gmail.com> 于2022年3月16日周三 17:59写道:
> >> > >
> >> > > > The old WAL compression implementation is buggy when used together
> >> with
> >> > > > replication, that's true...
> >> > > >
> >> > > > But in general I think it is fixable, the dict is per file IIRC,
> so
> >> I
> >> > > think
> >> > > > clearing the LRUCache when resetting to the head of the file can
> fix
> >> > the
> >> > > > problem?
> >> > > >
> >> > > > Maybe we need to do some refactoring...
> >> > > >
> >> > > > Thanks.
> >> > > >
> >> > > > 唐天航 <tangtianhang...@gmail.com> 于2022年3月16日周三 16:20写道:
> >> > > >
> >> > > > > Hi masters,
> >> > > > >
> >> > > > > I have created an issue HBASE-26849
> >> > > > > <https://issues.apache.org/jira/browse/HBASE-26849> about NPE
> >> caused
> >> > > by
> >> > > > > WAL
> >> > > > > Compression and Replication.
> >> > > > >
> >> > > > > For this problem, I try to reopen a WAL reader when we reset the
> >> > > position
> >> > > > > to 0 and it looks like it's working well. But it didn't
> >> fundamentally
> >> > > > solve
> >> > > > > the problem.
> >> > > > >
> >> > > > > Since we have the WAL Compression feature, Replication has
> >> > introduced a
> >> > > > lot
> >> > > > > of new code, and there are many places that reset the HLog
> >> position,
> >> > > such
> >> > > > > as seekOnFs to originalPosition. I guess none of these codes
> >> consider
> >> > > > > compatibility with WAL Compression. Because theoretically we can
> >> roll
> >> > > > back
> >> > > > > the position to any position at any time, but the LRUCache in
> the
> >> > > > > corresponding LRUDictionary should also be rolled back,
> otherwise
> >> the
> >> > > > read
> >> > > > > and write link behavior may be inconsistent. But LRUCache can't
> >> roll
> >> > > back
> >> > > > > at all...
> >> > > > >
> >> > > > > So my thought is, open another issue and add some description in
> >> the
> >> > > doc,
> >> > > > > WAL Compression and Replication are not compatible.
> >> > > > >
> >> > > > > What do you think?
> >> > > > >
> >> > > > > Thank you. Regards
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Unrest, ignorance distilled, nihilistic imbeciles -
> >     It's what we’ve earned
> > Welcome, apocalypse, what’s taken you so long?
> > Bring us the fitting end that we’ve been counting on
> >    - A23, Welcome, Apocalypse
> >
>
>
> --
> Best regards,
> Andrew
>
> Unrest, ignorance distilled, nihilistic imbeciles -
>     It's what we’ve earned
> Welcome, apocalypse, what’s taken you so long?
> Bring us the fitting end that we’ve been counting on
>    - A23, Welcome, Apocalypse
>

Reply via email to