Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5

2019-08-13 Thread Jörn Franke
For me the issue occurred only in standalone mode. With the ensemble I simply 
cleared the data directory and it received the zookeeper data from the quorum. 

> Am 13.08.2019 um 15:42 schrieb Koen De Groote :
> 
> I would also like to know if this is possible.
> 
> From going over the github page, it seems there is a JMX method to force
> the creation of a snapshot. Yet the docker image is configured as such that
> a port will never be assigned to the JMX process.
> 
> Is there any way to bypass this?
> 
>> On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke  wrote:
>> 
>> Thanks. It is possible to force Zookeeper to create a snapshot? I will
>> check I think the snapshot count is set to 1 in the cfg
>> 
>>> Am 30.07.2019 um 08:06 schrieb Enrico Olivelli :
>>> 
>>> Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
>> jornfra...@gmail.com>
>>> ha scritto:
>>> 
 ok, then let me verify tomorrow if a snapshot file is indeed there. If
>> it
 is missing then I wonder why it was missing. There was no crash or
>> whatever
 and 3.4.14 works without issue, but of course it could have loaded them
 from the log files. However, then I wonder why it does not create one.
 
>>> 
>>> 
>>> 
>>> I remember now that some other user, I think Sijie, reported a similar
>>> problem some month ago, that it is not possible to upgrade from 3.4 to
>> 3.5
>>> if no snapshot is present.
>>> IIRC The fix was to force the creation of at least one snapshot file and
>>> then upgrade
>>> 
>>> Enrico
>>> 
>>> 
 
 On Mon, Jul 29, 2019 at 11:45 PM Michael Han  wrote:
 
>>> I just wonder why it does not find a valid snapshot.
> 
> If there are local snapshot files and the files are valid, then it's a
 bug
> that server fails to load them.
> 
>>> Is it because the format changed in 3.5.5 compared to 3.4.14?
> 
> Not I am aware of. There are some format changes (added compression
> support) in master branch, but that's not shipped with 3.5.5.
> 
> 
> 
> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke 
 wrote:
> 
>> ok, then it affects basically all standalone nodes? This is fine,
 despite
>> that it means some extra work (for uncritical lab environments).
>> I am not sure it is ZOOKEEPER-2325, but I don't know the full history
>> behind it).The logs are fine (it works in 3.4.14 without issues, even
> after
>> downgrading back). There is no issue with disk space and there are no
>> 0
>> byte files.  I just wonder why it does not find a valid snapshot. Is
>> it
>> because the format changed in 3.5.5 compared to 3.4.14?
>> 
>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han  wrote:
>> 
> java.io.IOException: No snapshot found, but there are log entries.
>>> Something is broken!
>>> 
>>> This is expected behavior introduced in ZOOKEEPER-2325. We don't want
> to
>>> end up with potential inconsistent state across the ensemble when
>>> recovering from empty snapshot.
>>> 
>>> To continue upgrade, just delete all txn log files and let the node
> sync
>>> the snapshot from the quorum.
>>> 
>>> 
>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli  
>>> wrote:
>>> 
 Il lun 29 lug 2019, 22:32 Jörn Franke  ha
>> scritto:
 
> It also seems that 3.5.5 does not attempt to read all of the
> logfiles
>>> (I
> have to still confirm), but the two it reads exist, it has access
> and
 they
> are much more than 0 byte
> 
 
 We should have the stackstace of the EOFException.
 
 Anyone on this list has a better idea?
 
 Enrico
 
 
> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
 jornfra...@gmail.com
>> 
 wrote:
> 
>> (of course i do not run them at the same time)
>> 
>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
> jornfra...@gmail.com
>>> 
> wrote:
>> 
>>> thank you for the quick reply. They read from the same disk
> paths
>>> and
>>> have the same access rights (in fact the RHEL service executes
>> them
>>> as
> the
>>> same specific user).
>>> 
>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli <
>>> eolive...@gmail.com
> 
>>> wrote:
>>> 
 Il lun 29 lug 2019, 21:50 Jörn Franke 
> ha
> scritto:
 
> Hi,
> 
> I tried to migrate a lab environment from Zookeepr 3.4.14
> (used
>>> for
 Solr)
> to 3.5.5 and encountered an issue. It is ZooKeeper in
>> standalone
 mode
> (other environments have a proper ensemble). I increased
> jute.maxbuffer
> beyond the default (but not excessively) - this was working
 

Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5

2019-08-13 Thread Enrico Olivelli
Il mar 13 ago 2019, 15:43 Koen De Groote  ha
scritto:

> I would also like to know if this is possible.
>
> From going over the github page, it seems there is a JMX method to force
> the creation of a snapshot. Yet the docker image is configured as such that
> a port will never be assigned to the JMX process.
>

Can't you modify your docker image in order to expose the JMX API? I am not
a docket expert but it should be possible

Enrico


> Is there any way to bypass this?
>
> On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke  wrote:
>
> > Thanks. It is possible to force Zookeeper to create a snapshot? I will
> > check I think the snapshot count is set to 1 in the cfg
> >
> > > Am 30.07.2019 um 08:06 schrieb Enrico Olivelli :
> > >
> > > Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
> > jornfra...@gmail.com>
> > > ha scritto:
> > >
> > >> ok, then let me verify tomorrow if a snapshot file is indeed there. If
> > it
> > >> is missing then I wonder why it was missing. There was no crash or
> > whatever
> > >> and 3.4.14 works without issue, but of course it could have loaded
> them
> > >> from the log files. However, then I wonder why it does not create one.
> > >>
> > >
> > >
> > >
> > > I remember now that some other user, I think Sijie, reported a similar
> > > problem some month ago, that it is not possible to upgrade from 3.4 to
> > 3.5
> > > if no snapshot is present.
> > > IIRC The fix was to force the creation of at least one snapshot file
> and
> > > then upgrade
> > >
> > > Enrico
> > >
> > >
> > >>
> > >> On Mon, Jul 29, 2019 at 11:45 PM Michael Han  wrote:
> > >>
> > > I just wonder why it does not find a valid snapshot.
> > >>>
> > >>> If there are local snapshot files and the files are valid, then it's
> a
> > >> bug
> > >>> that server fails to load them.
> > >>>
> > > Is it because the format changed in 3.5.5 compared to 3.4.14?
> > >>>
> > >>> Not I am aware of. There are some format changes (added compression
> > >>> support) in master branch, but that's not shipped with 3.5.5.
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke 
> > >> wrote:
> > >>>
> >  ok, then it affects basically all standalone nodes? This is fine,
> > >> despite
> >  that it means some extra work (for uncritical lab environments).
> >  I am not sure it is ZOOKEEPER-2325, but I don't know the full
> history
> >  behind it).The logs are fine (it works in 3.4.14 without issues,
> even
> > >>> after
> >  downgrading back). There is no issue with disk space and there are
> no
> > 0
> >  byte files.  I just wonder why it does not find a valid snapshot. Is
> > it
> >  because the format changed in 3.5.5 compared to 3.4.14?
> > 
> >  On Mon, Jul 29, 2019 at 11:25 PM Michael Han 
> wrote:
> > 
> > >>> java.io.IOException: No snapshot found, but there are log
> entries.
> > > Something is broken!
> > >
> > > This is expected behavior introduced in ZOOKEEPER-2325. We don't
> want
> > >>> to
> > > end up with potential inconsistent state across the ensemble when
> > > recovering from empty snapshot.
> > >
> > > To continue upgrade, just delete all txn log files and let the node
> > >>> sync
> > > the snapshot from the quorum.
> > >
> > >
> > > On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <
> eolive...@gmail.com
> > >>>
> > > wrote:
> > >
> > >> Il lun 29 lug 2019, 22:32 Jörn Franke  ha
> >  scritto:
> > >>
> > >>> It also seems that 3.5.5 does not attempt to read all of the
> > >>> logfiles
> > > (I
> > >>> have to still confirm), but the two it reads exist, it has access
> > >>> and
> > >> they
> > >>> are much more than 0 byte
> > >>>
> > >>
> > >> We should have the stackstace of the EOFException.
> > >>
> > >> Anyone on this list has a better idea?
> > >>
> > >> Enrico
> > >>
> > >>
> > >>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
> > >> jornfra...@gmail.com
> > 
> > >> wrote:
> > >>>
> >  (of course i do not run them at the same time)
> > 
> >  On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
> > >>> jornfra...@gmail.com
> > >
> > >>> wrote:
> > 
> > > thank you for the quick reply. They read from the same disk
> > >>> paths
> > > and
> > > have the same access rights (in fact the RHEL service executes
> >  them
> > > as
> > >>> the
> > > same specific user).
> > >
> > > On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli <
> > > eolive...@gmail.com
> > >>>
> > > wrote:
> > >
> > >> Il lun 29 lug 2019, 21:50 Jörn Franke 
> > >>> ha
> > >>> scritto:
> > >>
> > >>> Hi,
> > >>>
> > >>> I tried to migrate a lab environment from Zookeepr 3.4.14
> > >>> (used
> > > for
> > >> Solr)
> > >>> to 3.5.5 and encountered an issue. It is 

Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5

2019-08-13 Thread Koen De Groote
I would also like to know if this is possible.

>From going over the github page, it seems there is a JMX method to force
the creation of a snapshot. Yet the docker image is configured as such that
a port will never be assigned to the JMX process.

Is there any way to bypass this?

On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke  wrote:

> Thanks. It is possible to force Zookeeper to create a snapshot? I will
> check I think the snapshot count is set to 1 in the cfg
>
> > Am 30.07.2019 um 08:06 schrieb Enrico Olivelli :
> >
> > Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
> jornfra...@gmail.com>
> > ha scritto:
> >
> >> ok, then let me verify tomorrow if a snapshot file is indeed there. If
> it
> >> is missing then I wonder why it was missing. There was no crash or
> whatever
> >> and 3.4.14 works without issue, but of course it could have loaded them
> >> from the log files. However, then I wonder why it does not create one.
> >>
> >
> >
> >
> > I remember now that some other user, I think Sijie, reported a similar
> > problem some month ago, that it is not possible to upgrade from 3.4 to
> 3.5
> > if no snapshot is present.
> > IIRC The fix was to force the creation of at least one snapshot file and
> > then upgrade
> >
> > Enrico
> >
> >
> >>
> >> On Mon, Jul 29, 2019 at 11:45 PM Michael Han  wrote:
> >>
> > I just wonder why it does not find a valid snapshot.
> >>>
> >>> If there are local snapshot files and the files are valid, then it's a
> >> bug
> >>> that server fails to load them.
> >>>
> > Is it because the format changed in 3.5.5 compared to 3.4.14?
> >>>
> >>> Not I am aware of. There are some format changes (added compression
> >>> support) in master branch, but that's not shipped with 3.5.5.
> >>>
> >>>
> >>>
> >>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke 
> >> wrote:
> >>>
>  ok, then it affects basically all standalone nodes? This is fine,
> >> despite
>  that it means some extra work (for uncritical lab environments).
>  I am not sure it is ZOOKEEPER-2325, but I don't know the full history
>  behind it).The logs are fine (it works in 3.4.14 without issues, even
> >>> after
>  downgrading back). There is no issue with disk space and there are no
> 0
>  byte files.  I just wonder why it does not find a valid snapshot. Is
> it
>  because the format changed in 3.5.5 compared to 3.4.14?
> 
>  On Mon, Jul 29, 2019 at 11:25 PM Michael Han  wrote:
> 
> >>> java.io.IOException: No snapshot found, but there are log entries.
> > Something is broken!
> >
> > This is expected behavior introduced in ZOOKEEPER-2325. We don't want
> >>> to
> > end up with potential inconsistent state across the ensemble when
> > recovering from empty snapshot.
> >
> > To continue upgrade, just delete all txn log files and let the node
> >>> sync
> > the snapshot from the quorum.
> >
> >
> > On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli  >>>
> > wrote:
> >
> >> Il lun 29 lug 2019, 22:32 Jörn Franke  ha
>  scritto:
> >>
> >>> It also seems that 3.5.5 does not attempt to read all of the
> >>> logfiles
> > (I
> >>> have to still confirm), but the two it reads exist, it has access
> >>> and
> >> they
> >>> are much more than 0 byte
> >>>
> >>
> >> We should have the stackstace of the EOFException.
> >>
> >> Anyone on this list has a better idea?
> >>
> >> Enrico
> >>
> >>
> >>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
> >> jornfra...@gmail.com
> 
> >> wrote:
> >>>
>  (of course i do not run them at the same time)
> 
>  On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
> >>> jornfra...@gmail.com
> >
> >>> wrote:
> 
> > thank you for the quick reply. They read from the same disk
> >>> paths
> > and
> > have the same access rights (in fact the RHEL service executes
>  them
> > as
> >>> the
> > same specific user).
> >
> > On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli <
> > eolive...@gmail.com
> >>>
> > wrote:
> >
> >> Il lun 29 lug 2019, 21:50 Jörn Franke 
> >>> ha
> >>> scritto:
> >>
> >>> Hi,
> >>>
> >>> I tried to migrate a lab environment from Zookeepr 3.4.14
> >>> (used
> > for
> >> Solr)
> >>> to 3.5.5 and encountered an issue. It is ZooKeeper in
>  standalone
> >> mode
> >>> (other environments have a proper ensemble). I increased
> >>> jute.maxbuffer
> >>> beyond the default (but not excessively) - this was working
> >> perfectly
> >> fine
> >>> in 3.4.14.
> >>>
> >>> Basically I reuse for the migration the same config files,
>  except
> >>> that
> >> I
> >>> whitelist some commands (later I am also interested in
> >> adding
> > SSL).
> >>>